
Data Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- Design, build, test and deploy new data pipelines within On Prem or Cloud Data Eco-Systems
- Improve existing data pipelines by simplifying and increasing performance.
- Follow best practices and execute architected techniques and solutions for data collection, management and usage to aid in company-wide data governance and management framework.
- Work closely with the data analysts and scientists, and database and systems administrators to create data solutions.
- Evaluate new data sources for quality and attribution to support product requirements.
- Document new/existing pipelines, Datasets and Lineage.
- The work activity includes processing complex data sets, leveraging technologies used to process these disparate data sets and understanding the correlations as well as patterns that exist between these different complex data sets.
- Experience in designing and building production data pipelines from ingestion to consumption within a hybrid data architecture, using Java, Python, C# etc.
- Experience in Designing and implementing scalable and secure data processing pipelines using, Databricks (on AWS), AWS Glue, Lambda,AWS services, Azure Data Factory and Azure Databricks.
- Experience in developing ETL/ELT workflows leveraging Apache Spark on Databricks orchestrated with AWS Glue Workflows and Step Functions.
- Experience in managing and optimizing data storage using Amazon S3 and Amazon Redshift and ADLS Gen2.
- Knowledge and hands on experience of working on data lake, lakehouse and Delta lake.
- Proficient in building scalable data pipelines using Pyspark, Notebooks, Workflows, and Delta Live Tables.
- Hands on Experience on realtime data ingestion and realtime data analytics.
- Strong grasp of data governance and tools like Unity catalogue/Alation and access control.
- Strong experience in common data warehouse modelling principles including Kimball, Inmon.
- Ensuring data quality and consistency through data cleaning, transformation, and integration processes.
- Knowledge of Dev-Ops processes (including CI/CD) and Infrastructure as code is essential.
- Experience in Managing Monitoring and troubleshooting data-related issues within the Azure/AWS/Databricks environment to maintain high availability and performance.
- Collaborating with data scientists, business analysts, and other stakeholders to understand data requirements and implement appropriate data solutions.
- Implementing data security measures, including encryption, access controls, and auditing, to protect sensitive information.
- Automating data pipelines and workflows to streamline data ingestion, processing, and distribution tasks.
- Knowledge of Microsoft BI Stack (SSRS/SSAS (Tabular with DAX & OLAP with MDX) SSIS) is desirable.
- Knowledge of Microsoft D365 / Dataverse / Salesforce / SAP data services / Knime
- Keeping abreast of the latest Databricks features and technologies to enhance data engineering processes and capabilities.
- Documenting data procedures, systems, and architectures to maintain clarity and ensure compliance with regulatory standards.
- Providing guidance and support for data governance, including metadata management, data lineage, and data cataloging.
- BA/BS/Btech/BE in Computer Science or related field, or equivalent experience.
- 5+ years of experience in area of data management and/or data curation.
- 5+ years development experience with Oracle and/or SQL Server.
- 5+ years of Experience with Python/Pyspark frameworks/libraries.
- Expert-level understanding of Databricks Workspaces, Delta Lake, Databricks SQL, Unity Catalog, and integration with cloud-native services (e.g., AWS S3, Azure Data Lake Storage).
- Expert level hands on experience on any public cloud AWS or Azure.
- Deep knowledge of Delta Lake architecture and Databricks SQL for managing structured and semi-structured data.
- Ability to develop, implement and optimize code using procedural languages such as PL/SQL, T-SQL, etc..
- Experienced with SSIS and C# development
- Experienced with data normalization and denormalization techniques
- Experienced in implementing largescale event based streaming architectures
- Experienced data transformation and data processing techniques
- Knowledge of API and Microservice development
- Experienced in Agile methodology and/or pair programming
- Preferred knowledge of AI/ML concepts and technologies
- Preferred experience with Stream-processing systems
- Strong communication skills
- Strong writing and documentation skills
- Experienced in working with cross functional teams, building alignment and collaboration
- Preferred certification Databricks Certified Data Engineer Associate/Professional