EY - GDS Consulting - AI And DATA - AWS Data Engineer - Senior
- Pune, Maharashtra
- Permanent
- Full-time
- Develop, optimize, and deploy scalable ETL/ELT pipelines using PySpark, SQL, and AWS services.
- Build and manage data lakehouse solutions leveraging AWS S3, Glue, Iceberg, and other AWS-native components.
- Migrate on premises ETL workloads to modern AWS-based architectures with a focus on performance, reliability, and cost efficiency.
- Implement metadata-driven ingestion frameworks and medallion/layered architecture (Bronze/Silver/Gold).
- Work on orchestration frameworks such as Astronomer (Airflow), AWS Step Functions, or AWS-managed workflows.
- Design and optimize Spark-based data processing jobs for high throughput.
- Follow data warehouse (DW) concepts and best practices for modeling and data integration.
- Collaborate with data analysts, data scientists, and BI engineers for seamless data delivery.
- Perform code reviews, troubleshoot issues, and ensure end-to-end data quality.
- (Optional) Leverage Databricks for PySpark, Delta Lake, Lakehouse, or workflow orchestration where applicable.
- 4+ years of total IT experience with at least 2+ years in AWS-based data engineering.
- Strong hands-on experience in:
- PySpark, SQL, Python
- ETL pipeline development
- AWS services: S3, Glue, Lambda, Step Functions, CloudWatch
- Unix/Linux environments
- Iceberg table format
- Astronomer (Airflow) or other orchestration tools
- Experience with structured and semi structured data formats such as Parquet, JSON, CSV, XML.
- Understanding of data warehousing concepts, star/snowflake schemas, and dimensional modeling.
- Good knowledge of CI/CD, version control (GitHub, Azure DevOps, Jenkins).
- Strong analytical, troubleshooting, and problem solving skills.
- Ability to work independently and collaborate with stakeholders to deliver high quality solutions.
- (Optional but good to have) Experience working with Databricks, Delta Lake, or Unity Catalog.
- Bachelor's or Master's degree in Computer Science, IT, or related field.
- 4-7 years of industry experience in data engineering.
- Production-grade experience building and managing AWS data pipelines.
- Hands-on experience with Agile/Scrum delivery models.
- Strong communication and stakeholder management skills.
- Proactive, self driven approach with ownership of deliverables.
- Client-facing and stakeholder management experience.
- Experience working in large-scale, multi-environment data platforms.
- Technically strong, curious, and adaptable professionals who enjoy solving challenging data problems and continuously learning new technologies in a fast-moving environment.
- Opportunities to work on diverse, meaningful, and industry-leading projects.
- Coaching, learning programs, and a personalized growth and development plan.
- Exposure to a collaborative, interdisciplinary work culture.
- Flexibility to manage your work in the way that suits you best.
- Supportive colleagues and a global environment for continuous knowledge exchange.