
Senior Data Scientist
- Mumbai, Maharashtra
- Permanent
- Full-time
- Collaborate with cross-functional teams - including ML engineers, annotators, and clinical domain experts - to translate business challenges into deployable AI solutions
- Implement automated data labeling pipelines using techniques like active learning, weak supervision, and human-in-the-loop systems
- Support the design, development, and maintenance of scalable data pipelines for AI/ML workflows
- Perform exploratory data analysis (EDA), profiling, and validation on healthcare data to ensure readiness for downstream ML tasks
- Partner with data scientists to prepare datasets for model training, evaluation, and monitoring
- Ensure data quality, consistency, and documentation across structured (e.g., EHRs) and unstructured (e.g., scanned PDFs) sources
- Integrate and monitor data workflows using orchestration tools (e.g., Airflow, Step Functions)
- Build dashboards or reports to communicate insights, trends, or pipeline health as needed
- Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regard to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
- Bachelor's degree in computer science or adjacent field
- Solid experience in Ms Excel and Version Control using GIT
- Proficiency in Python (Advanced), SQL(Advanced). Experience in tools like Airflow, Jupyter notebook
- Cloud Exposure: Basic familiarity with AWS ecosystem
- Visualization Tools: Power BI, Tableau, or Plotly for dashboarding and reporting
- Data Quality Monitoring: Experience with tools or techniques for detecting data drift or label inconsistencies
- Healthcare/NLP Domain Knowledge: Prior work with clinical documents, EMR data, or coding workflows
- Proven excellent Communication Skills
- Proven flexibility to provide support during critical business periods
- Proven ability to interpret and present complex data in various formats
- Proven positive team player with a drive to learn and contribute to achieving results
- Willingness to work in varying shifts