Senior Applications Developer : Bigdata - Python, Pyspark
Citigroup View all jobs
- Pune, Maharashtra
- Permanent
- Full-time
- Design, develop, and implement robust, scalable, and high-performance data pipelines and applications using Python, PySpark, and Big Data technologies.
- Work autonomously to analyze requirements, propose technical solutions, and deliver high-quality code and data products, ensuring alignment with architectural standards and business objectives.
- Utilize expertise in various Big Data platforms (e.g., Hadoop, Hive, Kafka, Spark) to process, transform, and manage large datasets efficiently.
- Write complex SQL queries, stored procedures, and optimize database performance for large-scale data warehousing and analytics solutions.
- Develop and enhance ETL (Extract, Transform, Load) processes, ensuring data quality, integrity, and timely delivery. Experience with various ETL tools and methodologies is a plus.
- Proactively research, evaluate, and integrate new and emerging technologies, frameworks, and tools to improve development processes and solution capabilities.
- Ensure adherence to coding standards, conduct thorough code reviews, and implement best practices for software development, data governance, and security.
- Diagnose and resolve complex technical issues related to data pipelines, performance bottlenecks, and system integrations in a fast-paced environment.
- Collaborate effectively with cross-functional teams including architects, data scientists, business analysts, and QA engineers. Provide technical guidance and mentorship to junior team members.
- Leverage KYC domain knowledge to develop data solutions that support client due diligence (CDD), and other financial crime compliance initiatives, understanding the nuances of regulatory data requirements.
- Identify opportunities to integrate AI-driven functionalities to enhance KYC processes, such as intelligent data extraction, anomaly detection, or predictive analytics.
- 8 + years of experience in Applications Development, Systems Analysis, or equivalent senior engineering roles.
- Extensive hands‑on experience delivering enterprise‑scale, database‑driven platforms in a regulated environment.
- Expert-level proficiency in Python programming, including object-oriented design, data structures, algorithms, and extensive experience with various Python libraries (e.g., Pandas, NumPy, Flask/Django for web development, asyncio for async programming).
- Deep expertise in developing, optimizing, and deploying PySpark applications for large-scale data processing, ETL, and real-time analytics on distributed systems (e.g., Spark SQL, Spark Streaming, DataFrames).
- Strong understanding of Apache Spark architecture, Hadoop ecosystem, and experience with distributed computing concepts. Familiarity with big data storage formats (e.g., Parquet, ORC).
- Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP), specifically with their compute, storage, and data services (e.g., S3, ADLS, EMR, Databricks, Azure Synapse).
- Solid experience with both relational databases (e.g., PostgreSQL, Oracle, SQL Server, MySQL) and NoSQL databases (e.g., MongoDB). Strong SQL writing and optimization skills.
- Experience in designing, developing, and consuming RESTful APIs using Python frameworks (e.g., Flask, FastAPI, Django REST Framework).
- Strong understanding and practical experience with CI/CD tools (e.g., Jenkins, GitLab CI, Azure DevOps) and containerization technologies (Docker, Kubernetes).
- Expert-level proficiency with Git.
- Experience with unit testing (e.g., Pytest), integration testing, and performance testing frameworks for Python and PySpark applications.
- Experience with tools like Apache Airflow, Azure Data Factory, or AWS Step Functions.
- Exposure to or direct experience with Artificial Intelligence (AI) and Machine Learning (ML) concepts, frameworks (e.g., TensorFlow, PyTorch), or relevant projects is a significant advantage
- Exceptional analytical and problem-solving abilities, with a strong capacity to understand complex business needs and translate them into effective technical solutions.
- Excellent leadership, team management, and mentoring capabilities.
- Superior verbal and written communication skills, with the ability to articulate complex technical concepts clearly to both technical and non-technical audiences.
- Strong collaboration and interpersonal skills, with a proven ability to work effectively with cross-functional teams.
- Highly proactive, results-oriented, and a strong commitment to delivering high-quality, innovative solutions.
- Ability to thrive and lead in an agile, dynamic, and fast-paced work environment.
- Bachelor’s degree/University degree or equivalent experience
- Master’s degree preferred