Machine Learning Engineer – Computer Vision & VLM

Sarvam AI

  • Bangalore, Karnataka
  • Permanent
  • Full-time
  • 23 days ago
Machine Learning Engineer - Computer Vision & Vision-Language Models (VLMs)About Sarvam AISarvam.ai is a pioneering generative-AI startup headquartered in Bengaluru, India. We are dedicated to transformative R & D in language technologies, building scalable and efficient Large Language Models (LLMs) that serve a wide spectrum of languages-especially Indic languages. Our mission is to re-imagine human-computer interaction and craft novel AI-driven solutions that make language technology inclusive for diverse communities worldwide.Role OverviewAs a Machine Learning Engineer (MLE) in the Vision-Language team, you will build and refine vision, OCR, and language models for varied use-cases. Your work will span research, scalable training, and rigorous evaluation of cutting-edge computer-vision and VLM systems.Key ResponsibilitiesModel R & DPrototype and fine-tune state-of-the-art vision architectures and vision-language models.Design and evaluate multimodal fusion strategies for robust image-text understanding.Data & Training PipelinesBuild distributed pipelines (PySpark / Ray) to curate and preprocess large-scale multimodal datasets (images, geospatial rasters, PDFs, video frames, captions).Implement efficient training loops in PyTorch/Lightning with mixed precision, gradient accumulation, and multi-GPU (≥ 4) parallelism.Domain-Focused ApplicationsDevelop models for geospatial analysis, Indic document intelligence (OCR + layout), visual question answering (VQA), and broader computer-vision use-cases.Evaluation & BenchmarkingDefine and automate task-specific metrics for OCR accuracy, retrieval, dense captioning, and VQA; maintain regression dashboards and ablation suites.Required QualificationsExperience: 2-3 years in ML engineering with emphasis on classical computer vision and modern vision-language models.Education: Bachelor's or Master's in Computer Science, AI/ML, or related fields.Technical SkillsStrong Python & PyTorch; comfortable with CUDA profiling and tensor debugging.Hands-on experience training CV models (CNNs, ViTs) and/or VLMs on ≥ 4-GPU nodes.Proven ability to build, deploy, and monitor pipelines for OCR, object detection, and segmentation.Solid grasp of computer-vision fundamentals (detection, segmentation, representation learning) and transformer mechanics.Software-Engineering Fundamentals:Proficiency with Git, unit tests, structured logging, Docker, and CI/CD.Ability to select and integrate appropriate databases (SQL, NoSQL, vector stores) for large-scale multimodal data.Experience designing scalable backend APIs/micro-services (gRPC/REST), including monitoring and observability best practices.Preferred QualificationsPublications or submissions in CVPR/ICCV/ECCV, EMNLP, ACL.Prior work on multilingual or low-resource vision-language tasks.Experience with data-centric AI (active learning, synthetic augmentation).Contributions to open-source vision/NLP libraries (Hugging Face, OpenCV, Detectron2).Familiarity with distributed schedulers (KubeFlow, Slurm).

Sarvam AI

Similar Jobs

  • Software Development Engineer

    Delta Air Lines

    • Bangalore, Karnataka
    About Delta Air Lines About the Company Delta Air Lines (NYSE: DAL) is the U.S. global airline leader in safety, innovation, reliability and customer experience. Powered by our…
    • 23 hours ago
    • Apply easily
  • Lead Software Engineer

    Ferguson

    • Bangalore, Karnataka
    About Ferguson Ferguson is the largest value-added distributor serving the specialized professional in the residential and non-residential North American construction market. We …
    • 23 hours ago
    • Apply easily
  • Software Engineer II (QA)

    Best Buy

    • Bangalore, Karnataka
    Job Description Key Responsibilities Contribute to the delivery of complex solutions, breaking down big problems into smaller pieces Actively participate in team planning acti…
    • 1 day ago
    • Apply easily