
Lead Data Scientist
- Mumbai, Maharashtra
- Permanent
- Full-time
- Own the end-to-end data science lifecycle - from problem definition and experimentation to deployment, monitoring, and continuous improvement
- Design and deploy robust, explainable, and scalable ML models for clinical document understanding, named entity recognition, context disambiguation, and semantic search across prospective and retrospective use cases
- Lead model development with a focus on production-readiness, incorporating solid MLOps, reproducibility, and experimentation practices
- Diagnose and optimize model performance, mitigate bias, and ensure analytical integrity, accuracy, and operational efficiency
- Work hands-on with multi-modal transformer models for tasks like NER, handwriting and form understanding, and document classification
- Leverage LLMs and SLMs for clinical reasoning, automated annotation, data generation, and downstream distillation
- Collaborate with cross-functional teams - including ML engineers, annotators, and clinical domain experts - to translate business challenges into deployable AI solutions
- Implement automated data labeling pipelines using techniques like active learning, weak supervision, and human-in-the-loop systems
- Ensure reproducibility and operational excellence through Git, DVC, CI/CD pipelines, and orchestration tools (e.g., Airflow, Kafka)
- Mentor and guide junior scientists and engineers, lead technical design reviews, and set best practices for model architecture and evaluation
- Continuously identify and close gaps in the ML platform, proposing and implementing innovative solutions to improve performance, scalability, and reliability
- Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
- Bachelor's degree in Computer Science or adjacent field
- Advanced degree in a field that emphasizes the use of data science/statistics techniques (e.g., Computer Science, Applied Mathematics, or a field with direct NLP application)
- 5+ years of experience in Data Science with a focus on Machine Learning and Natural Language Processing
- Solid understanding of machine learning algorithms, NLP principles, and data modeling principles
- Proficiency in Python, R, and SQL. Experience in NLP libraries such as NLTK, SpaCy, and BERT
- Proven excellent Communication Skills
- Proven flexibility to provide support during critical business periods
- Proven ability to interpret and present complex data in various formats.
- Proven solid leadership skills, ability to meet deadlines, and work independently. An analytical mindset for addressing complex business needs
- Proven positive team player with a drive to learn and contribute to achieving results
- Willingness to work in varying shifts