Data Engineer

NucleusTeq View all jobs

Raipur, Chhattisgarh Indore, Madhya Pradesh
Permanent
Full-time

29 days ago

Location: Indore/RaipurExperience: 2 to 4 yearsKey Responsibilities:

Design, develop, and manage scalable data pipelines and ETL workflows using Databricks, PySpark, and SQL for large-scale data processing.
Build and maintain data ingestion frameworks to extract data from enterprise systems such as SAP APIs, REST services, and relational databases.
Develop and optimize Delta Lake based data architecture to ensure reliable, high-performance data storage and processing.
Design and implement data transformation pipelines to convert raw data into curated datasets for analytics and reporting.
Optimize Spark jobs and SQL queries to improve performance and reduce compute costs.
Implement data quality validation, monitoring, and error handling frameworks for reliable pipeline execution.
Build automated workflow orchestration and scheduling mechanisms for end-to-end data processing pipelines.
Collaborate with data analysts, business stakeholders, and platform teams to design efficient data solutions.
Develop and maintain data models and schema design for data lake and downstream analytical systems.
Support data platform engineering activities, including cluster configuration, performance tuning, and reusable utility development.
Troubleshoot production pipeline failures, data inconsistencies, and performance issues.
Develop Python utilities and frameworks to support data ingestion, transformation, and automation tasks.
Implement data governance, security, and access control standards across enterprise data pipelines.
Participate in code reviews, documentation, and best practices to improve overall data engineering standards.
Support large-scale data integrations and migrations from legacy systems to modern cloud data platforms.
Ownership of the entire data pipeline lifecycle, from development to deployment

Required Skills:

2+ years of experience in Data Engineering, Data Pipeline Development, and Data Processing.
Strong experience with Python, PySpark, and SQL for large-scale data transformations.
Hands-on experience with Databricks, Delta Lake, and distributed data processing frameworks.
Experience integrating data from REST APIs, SAP systems, and enterprise data sources.
Strong knowledge of data modeling, schema design, and ETL best practices.
Experience working with cloud data platforms (GCP / AWS / Azure) and cloud storage systems.
Experience with workflow orchestration, job scheduling, and automated data pipelines.
Ability to optimize Spark workloads and troubleshoot performance issues in large datasets.
Strong problem-solving skills and ability to work in fast-paced data platform environments.

NucleusTeq

Apply Now