
Senior Software Engineer II
- Mumbai, Maharashtra
- Permanent
- Full-time
Design and build scalable, high-performance ETL/ELT pipelines using Azure Data Factory and Azure Databricks, handling ingestion and transformation of data from diverse internal and external sources.Advanced Data Preparation & Transformation:
Implement complex transformation logic using PySpark and SQL to standardize, cleanse, and normalize large-scale datasets including nested JSON.Data Modeling:
Create and maintain dimensional models (star/snowflake schemas) and structured outputs to support analytics, reporting, and regulatory use cases.Performance Optimization:
Profile and tune PySpark jobs for performance using techniques such as caching, partitioning, and broadcast joins. Ensure pipelines scale efficiently with increasing data volumes.Governance & Quality Assurance:
Apply robust data validation, quality checks, and maintain metadata lineage with Unity Catalog. Adhere to governance standards across the pipeline lifecycle.Cross-Functional Collaboration:
Partner with business analysts, product managers, data scientists, and QA teams to understand data requirements and deliver fit-for-purpose solutions.Documentation & Knowledge Sharing:
Maintain comprehensive documentation on data flows, architecture, and models. Support upskilling and mentoring of junior team members.AI-Enhanced Development:
Utilize Databricks Genie and related AI assistants to accelerate development, ensure code quality, and assist in troubleshooting.Requirements:Bachelor’s or Master’s degree in Computer Science, Data Engineering, or equivalent practical experience.Total experience of 6 to 8 years and 5+ years in data engineering roles with hands-on experience in PySpark, SQL, and Azure-based data ecosystems.Strong background in building robust and maintainable pipelines using Azure Data Factory and Databricks.Proven expertise in data modeling, schema design, and large-scale data preparation for analytics.Proficient in Git, version control workflows, and CI/CD practices.Solid understanding of data governance, metadata management, and data quality frameworks.Exposure to Unity Catalog, Databricks Genie, or other AI-enhanced data engineering tools is preferred.Excellent analytical, debugging, and stakeholder communication skills. Familiarity with Agile or Scrum-based delivery.Experience with machine learning workflows and integrating model outputs into data pipelines.Exposure to Salesforce, Alfresco, or Schedule A contract data formats is a bonus.Learn more about the LexisNexis Risk team and how we work atWe are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our or please contact 1-855-833-5120.Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams .Please read our .We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.USA Job Seekers:.