
Data Scientist 4
- Bangalore, Karnataka
- Permanent
- Full-time
- Lead the design, development, and scaling of large, high-quality datasets to advance generative AI models in multimodal domains (e.g., text, vision, speech).
- Define data standards and best practices for acquisition, cleaning, augmentation, annotation, and evaluation to ensure fairness, diversity, and representativeness.
- Guide the integration of cutting-edge techniques (e.g., fine-tuning, RLHF, domain adaptation) into data generation and model alignment pipelines.
- Provide technical leadership in building scalable, reliable data pipelines and synthetic data platforms for production environments.
- Evaluate and operationalize research innovations, shaping how data preparation and generative AI methods transition into production-ready solutions.
- Partner with research, engineering, and product leaders to define long-term data strategy and accelerate the adoption of generative AI solutions at scale.
- Mentor and provide thought leadership to scientists and engineers, fostering a culture of data excellence and innovation
- Bachelors or Master's in Computer Science, Data Science, AI/ML, or related field with 6+ years of industry experience.
- Proficiency in Python and solid foundation in applied ML methods.
- Proficiency with Pytorch, Torchvision, OpenCV, and similar, as well as building and deploying DNN models in production.
- Experience building large-scale data pipelines for acquisition, cleaning, augmentation, and validation.
- Ability to evaluate datasets for distribution, diversity, anomalies and fairness to assess overall quality and suitability for generative AI.
- Experience with Computer Vision, NLP, Transformers, Large Language Models, Generative AI, optimizations around LLM training and serving. Experience with Multimodal models a bonus.
- Familiarity with advanced techniques (e.g., RLHF, domain adaptation, data augmentation) and their application in generative AI workflows.
- Proven track record of delivering scalable, data-centric ML solutions.
- Excellent communication and leadership skills, with experience mentoring junior scientists/engineers and presenting technical strategies to senior stakeholders.