
Senior Expert – Vision-Language Models and Generative AI (GenAI)
- Bangalore, Karnataka
- Permanent
- Full-time
Conduct deep research in:Vision-Language and Multimodal AI for perception and semantic groundingCross-modal representation learning for real-world sensor fusion (camera, lidar, radar, text)Multimodal generative models for scene prediction, intent inference, or simulationEfficient model architectures for edge deployment in automotive and factory systemsEvaluation methods for explainability, alignment, and safety of VLMs in mission-critical applicationsSpin newer research directions and drive AI research programs for autonomous driving, ADAS, and Industry 4.0 applications.Create new collaborations within and outside of Bosch in relevant domains.Contribute to Bosch's internal knowledge base, open research assets, and patent portfolio.Lead internal research clusters or thematic initiatives across autonomous systems or industrial AI.Mentor and guide research associates, interns, and young scientists.QualificationsEducational qualification:Ph.D. in Computer Science / Machine Learning / AI / Computer Vision or equivalentExperience:8+ years (post PhD) in AI related to Vision and Language modalities, excellent exposure and hands on research in GenAI, VLMs, Multimodal AI, or Applied AI Research.Mandatory/requires Skills:Deep expertise in:Vision-Language Models (CLIP, Flamingo, Kosmos, BLIP, GIT) and multimodal transformersOpen- and closed-source LLMs (e.g., LLaMA, GPT, Claude, Gemini) with visual grounding extensionsContrastive learning, cross-modal fusion, and structured generative outputs (e.g., scene graphs)PyTorch, HuggingFace, OpenCLIP, and deep learning stack for computer visionEvaluation on ADAS/mobility benchmarks (e.g., nuScenes, BDD100k) and industrial datasetsStrong track record of publications in relevant AI/ML/vision venuesDemonstrated capability to lead independent research programsFamiliarity with multi-agent architectures, RLHF, and goal-conditioned VLMs for autonomous agentsPreferred Skills:Hands-on experience with:Perception stacks for ADAS, SLAM, or autonomous robotsVision pipeline tools (MMDetection, Detectron2, YOLOv8) and video understanding modelsSemantic segmentation, depth estimation, 3D vision, and temporal modelsIndustrial datasets and tasks: defect detection, visual inspection, operator assistanceLightweight or compressed VLMs for embedded hardware (e.g., in vehicle ECUs or factory edge)Knowledge of reinforcement learning or planning in embodied AI contextStrong academic or industry research collaborationsUnderstanding of Bosch domains and workflows in mobility and manufacturing