Data Engineer
NucleusTeq View all jobs
- Raipur, Chhattisgarh Indore, Madhya Pradesh
- Permanent
- Full-time
- Design, develop, and manage scalable data pipelines and ETL workflows using Databricks, PySpark, and SQL for large-scale data processing.
- Build and maintain data ingestion frameworks to extract data from enterprise systems such as SAP APIs, REST services, and relational databases.
- Develop and optimize Delta Lake based data architecture to ensure reliable, high-performance data storage and processing.
- Design and implement data transformation pipelines to convert raw data into curated datasets for analytics and reporting.
- Optimize Spark jobs and SQL queries to improve performance and reduce compute costs.
- Implement data quality validation, monitoring, and error handling frameworks for reliable pipeline execution.
- Build automated workflow orchestration and scheduling mechanisms for end-to-end data processing pipelines.
- Collaborate with data analysts, business stakeholders, and platform teams to design efficient data solutions.
- Develop and maintain data models and schema design for data lake and downstream analytical systems.
- Support data platform engineering activities, including cluster configuration, performance tuning, and reusable utility development.
- Troubleshoot production pipeline failures, data inconsistencies, and performance issues.
- Develop Python utilities and frameworks to support data ingestion, transformation, and automation tasks.
- Implement data governance, security, and access control standards across enterprise data pipelines.
- Participate in code reviews, documentation, and best practices to improve overall data engineering standards.
- Support large-scale data integrations and migrations from legacy systems to modern cloud data platforms.
- Ownership of the entire data pipeline lifecycle, from development to deployment
- 2+ years of experience in Data Engineering, Data Pipeline Development, and Data Processing.
- Strong experience with Python, PySpark, and SQL for large-scale data transformations.
- Hands-on experience with Databricks, Delta Lake, and distributed data processing frameworks.
- Experience integrating data from REST APIs, SAP systems, and enterprise data sources.
- Strong knowledge of data modeling, schema design, and ETL best practices.
- Experience working with cloud data platforms (GCP / AWS / Azure) and cloud storage systems.
- Experience with workflow orchestration, job scheduling, and automated data pipelines.
- Ability to optimize Spark workloads and troubleshoot performance issues in large datasets.
- Strong problem-solving skills and ability to work in fast-paced data platform environments.