Senior Data & AI Engineer
Antal International View all jobs
- Pune, Maharashtra
- Permanent
- Full-time
If you love working with big data pipelines and are excited by the possibilities of AI-assisted development and LLM application design, this role is built for you.Key Responsibilities:Data Platform Engineering -Design, build, and maintain scalable data pipelines and lakehouse architectures on Databricks and Apache Spark
Optimize Spark jobs for performance, reliability, and cost efficiency at enterprise scale
Own data quality, lineage, and governance across the platform, partnering with domain teams
Drive adoption of Delta Lake, Unity Catalog, and modern lakehouse patterns across the organization
Collaborate with data consumers — analysts, scientists, and product teams — to model and serve reliable data productsAI & LLM Application Development -Build LLM-powered applications and AI agents using frameworks such as LangChain and LangGraph, grounded in our enterprise data
Design and implement Retrieval-Augmented Generation (RAG) pipelines that leverage our data platform as the knowledge backbone
Develop and deploy AI agents for data discovery, automated insights, and intelligent data operations workflows
Leverage AI-assisted development tools (GitHub Copilot, Claude code) to accelerate development velocity and code quality
Evaluate, prototype, and integrate emerging LLM tooling with a pragmatic, production-first mindsetPlatform Leadership -Contribute to the platform roadmap by identifying opportunities where AI can improve data workflows
Champion engineering best practices: CI/CD for pipelines, testing, observability, and documentationExperience & Education:4-8 years of hands-on data engineering experience, with deep expertise in Databricks and Apache Spark
Strong Python skills; comfort with PySpark, Delta Lake, and medallion / Lakehouse architecture patterns
Practical experience building with LLM frameworks — LangChain, LangGraph, or equivalent
Demonstrated experience designing and deploying LLM applications or agents in a production or near-production environment
Proficiency using AI-assisted coding tools (Claude code, GitHub Copilot) as a daily part of your development workflow
Solid understanding of REST APIs, cloud data services, and modern software engineering practices
Excellent communication skills — able to translate complex technical work for non-technical stakeholders
Experience with Databricks Model Serving, MLflow, or feature stores
Familiarity with vector databases for RAG implementations
Exposure to DBT for data transformation and Databricks SQL for serving
Knowledge of prompt engineering techniques and LLM evaluation frameworks
Prior experience in an enterprise or regulated data environmentKey Working Relationships: Interfaces regularly with various internal and external groups.