
Databricks Developer
- Mumbai, Maharashtra
- Permanent
- Full-time
- Design, build, and maintain scalable data pipelines and workflows using Databricks (SQL, PySpark, Delta Lake).
- Develop efficient ETL/ELT pipelines for structured and semi-structured data using Azure Data Factory (ADF) and Databricks notebooks/jobs.
- Integrate and transform large-scale datasets from multiple sources into unified, analytics-ready outputs.
- Optimize Spark jobs and manage Delta Lake performance using techniques such as partitioning, Z-ordering, broadcast joins, and caching.
- Design and implement data ingestion pipelines for RESTful APIs, transforming JSON responses into Spark tables.
- Apply best practices in data modeling and data warehousing concepts.
- Perform data validation and quality checks.
- Work with various data formats, including JSON, Parquet, and Avro.
- Build and manage data orchestration pipelines, including linked services and datasets for ADLS, Databricks, and SQL Server.
- Create parameterized and dynamic ADF pipelines, and trigger Databricks notebooks from ADF.
- Collaborate closely with Data Scientists, Data Analysts, Business Analysts, and Data Architects to deliver trusted, high-quality datasets.
- Contribute to data governance, metadata documentation, and ensure adherence to data quality standards.
- Use version control tools (e.g., Git) and CI/CD pipelines to manage code deployment and workflow changes.
- Develop real-time and batch processing pipelines for streaming data sources such as MQTT, Kafka, and Event Hub.
- 5+ years of experience in data engineering or big data development.
- Bachelor's degree in computer science or a relevant field, or equivalent training and work experience.
- Strong hands-on experience with Databricks and Apache Spark (PySpark/SQL).
- Proven experience with Azure Data Factory, Azure Data Lake, and related Azure services.
- Experience integrating with APIs using libraries such as requests and http.
- Deep understanding of Delta Lake architecture, including performance tuning and advanced features.
- Proficiency in SQL and Python for data processing, transformation, and validation.
- Familiarity with data lakehouse architecture and both real-time and batch processing design patterns.
- Comfortable working with Git, DevOps pipelines, and Agile delivery methodologies.
- Experience with dbt, Azure Synapse, or Microsoft Fabric.
- Familiarity with Unity Catalog features in Databricks.
- Relevant certifications such as Azure Data Engineer, Databricks, or similar.
- Understanding of predictive modeling, anomaly detection, or machine learning, particularly with IoT datasets.
- As Seaspan is a global company, occasional work outside of regular office hours may be required.