
Data Engineer
- Hyderabad, Telangana
- Permanent
- Full-time
- Design, develop, and maintain complex ETL/ELT data pipelines in Databricks using PySpark, Scala, and SQL to process large-scale datasets
- Understand the biotech/pharma or related domains & build highly efficient data pipelines to migrate and deploy complex data across systems
- Design and Implement solutions to enable unified data access, governance, and interoperability across hybrid cloud environments
- Ingest and transform structured and unstructured data from databases (PostgreSQL, MySQL, SQL Server, MongoDB etc.), APIs, logs, event streams, images, pdf, and third-party platforms
- Ensuring data integrity, accuracy, and consistency through rigorous quality checks and monitoring
- Expert in data quality, data validation and verification frameworks
- Innovate, explore and implement new tools and technologies to enhance efficient data processing
- Proactively identify and implement opportunities to automate tasks and develop reusable frameworks
- Work in an Agile and Scaled Agile (SAFe) environment, collaborating with cross-functional teams, product owners, and Scrum Masters to deliver incremental value
- Use JIRA, Confluence, and Agile DevOps tools to manage sprints, backlogs, and user stories.
- Support continuous improvement, test automation, and DevOps practices in the data engineering lifecycle
- Collaborate and communicate effectively with the product teams, with cross-functional teams to understand business requirements and translate them into technical solutions
- Hands-on experience in data engineering technologies such as Databricks, PySpark, SparkSQL Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies.
- Proficiency in workflow orchestration, performance tuning on big data processing.
- Strong understanding of AWS services
- Ability to quickly learn, adapt and apply new technologies
- Strong problem-solving and analytical skills
- Excellent communication and teamwork skills
- Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices.
- Data Engineering experience in Biotechnology or pharma industry
- Experience in writing APIs to make the data available to the consumers
- Experienced with SQL/NOSQL database, vector database for large language models
- Experienced with data modeling and performance tuning for both OLAP and OLTP databases
- Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven etc.), automated unit testing, and Dev Ops
- Master’s degree and 3 to 4 + years of Computer Science, IT or related field experience
- Bachelor’s degree and 5 to 8 + years of Computer Science, IT or related field experience
- AWS Certified Data Engineer preferred
- Databricks Certificate preferred
- Scaled Agile SAFe certification preferred
- Excellent analytical and troubleshooting skills.
- Strong verbal and written communication skills
- Ability to work effectively with global, virtual teams
- High degree of initiative and self-motivation.
- Ability to manage multiple priorities successfully.
- Team-oriented, with a focus on achieving team goals.
- Ability to learn quickly, be organized and detail oriented.
- Strong presentation and public speaking skills.