Applied Scientist
Accrete
- Mumbai, Maharashtra
- Permanent
- Full-time
- Research and build state-of-the-art computer vision systems with a focus on real-time video analytics, video summarization, object tracking, and activity recognition.
- Develop and apply Vision-Language Models (VLMs) and multimodal transformer architectures for deep semantic understanding of visual content.
- Apply self-supervised, zero-shot, and few-shot learning techniques to enhance model generalization across varied video domains.
- Explore and optimize LLM prompting strategies and cross-modal alignment methods for improved reasoning over vision data.
- Contribute to research publications, patents, and internal IP assets in the area of vision and multimodal AI.
- Masters in Computer Science, Computer Vision, Machine Learning, or a related discipline with 2+ years of experience leading applied research or product-focused CV/ML projects.
- Expertise in modern computer vision architectures (e.g., ViT, SAM, CLIP, BLIP, DETR, or similar).
- Experience with Vision-Language Models (VLMs) and multimodal AI systems.
- Strong background in real-time video analysis, including event detection, motion analysis, and temporal reasoning.
- Experience with transformer-based architectures, multimodal embeddings, and LLM-vision integrations.
- Proficiency in Python and deep learning libraries like PyTorch or TensorFlow, OpenCV
- Experience with cloud platforms (AWS, Azure) and deployment frameworks (ONNX, TensorRT) is a plus.
- Strong problem-solving skills, with a track record of end-to-end ownership of applied ML/CV projects.
- Excellent communication and collaboration skills, with the ability to work in cross-functional teams.