
PySpark Hive Data Engineer - Assistant Vice President
- Pune, Maharashtra
- Permanent
- Full-time
- Developing and supporting scalable, extensible, and highly available data solutions
- Deliver on critical business priorities while ensuring alignment with the wider architectural vision
- Identify and help address potential risks in the data supply chain
- Follow and contribute to technical standards
- Design and develop analytical data models
- First Class Degree in Engineering/Technology (4-year graduate course)
- 8 to 12 years’ experience implementing data-intensive solutions using agile methodologies, should be hands on on PySpark, Hive, HDFS, Hadoop
- Should have strong understanding of AWS Glue Serverless Data Integration, Terraform, deploying Apache Spark on AWS, using Elastic Kubernetes Service (EKS), use of deployment tools LightSpeed Tetkon
- Experience of relational databases and using SQL for data querying, transformation and manipulation
- Experience of modelling data for analytical consumers
- Ability to automate and streamline the build, test and deployment of data pipelines
- Experience in cloud native technologies and patterns
- A passion for learning new technologies, and a desire for personal growth, through self-study, formal classes, or on-the-job training
- Excellent communication and problem-solving skills
- An inclination to mentor; an ability to lead and deliver medium sized components independently
- ETL: Hands on experience of building data pipelines. Proficiency in two or more data integration platforms such as Ab Initio, Apache Spark, Talend and Informatica
- Big Data: Experience of ‘big data’ platforms such as Hadoop, Hive or Snowflake for data storage and processing
- Data Warehousing & Database Management: Expertise around Data Warehousing concepts, Relational (Oracle, MSSQL, MySQL) and NoSQL (MongoDB, DynamoDB) database design
- Data Modeling & Design: Good exposure to data modeling techniques; design, optimization and maintenance of data models and data structures
- Languages: Proficient in one or more programming languages commonly used in data engineering such as Python, Java or Scala
- DevOps: Exposure to concepts and enablers - CI/CD platforms, version control, automated quality control management
- Data Governance: A strong grasp of principles and practice including data quality, security, privacy and compliance
- Ab Initio: Experience developing Co
IT, Data Profiler and Conduct
IT, Control
Center, Continuous
Flows * Cloud: Good exposure to public cloud data platforms such as S3, Snowflake, Redshift, Databricks, BigQuery, etc. Demonstratable understanding of underlying architectures and trade-offs
- Data Quality & Controls: Exposure to data validation, cleansing, enrichment and data controls
- Containerization: Fair understanding of containerization platforms like Docker, Kubernetes
- File Formats: Exposure in working on Event/File/Table Formats such as Avro, Parquet, Protobuf, Iceberg, Delta
- Others: Experience of using a Job scheduler e.g., Autosys. Exposure to Business Intelligence tools e.g., Tableau, Power BI