
Lead Assistant Manager
- Noida, Uttar Pradesh
- Permanent
- Full-time
Type: Full-time
Experience: 3+ yearsPosition SummaryWe seek an exceptional Senior ASR/TTS Specialist to lead speech AI initiatives and integrate advanced speech technologies with AI agent frameworks. This role focuses on fine-tuning ASR/TTS models, implementing MLOps best practices, and building production-ready speech AI systems powering next-generation conversational AI agents.Key ResponsibilitiesSpeech AI Model Development & Integration
- Model Fine-tuning: Customize state-of-the-art ASR/TTS models for domain-specific applications with
- Speech-to-Speech Systems: Build end-to-end S2S pipelines using Amazon Nova Sonic v1.0, Azure OpenAI Realtime (GPT-4o), and Gemini 2.5 Flash Native Audio
- Multi-modal Integration: Develop speech models integrating with vision and text modalities in AI agents
- Agent Framework Integration: Implement speech capabilities with LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex, and OpenAI Assistants API
- Model Lifecycle: Implement comprehensive MLOps pipelines using MLflow, Weights & Biases, and automated CI/CD
- Multi-cloud Deployment: Deploy speech models across AWS Bedrock, Google Cloud AI, and Azure Cognitive Services
- Real-time Processing: Build WebSocket-based streaming audio systems handling 1000+ concurrent connections
- Production Monitoring: Implement WER tracking, latency monitoring, and multi-provider failover mechanisms
- Cutting-edge Research: Stay current with latest speech AI breakthroughs and implement novel architectures
- Performance Optimization: Optimize models for real-time inference using TensorRT, ONNX, and edge deployment
- Data Pipeline Engineering: Build scalable audio ingestion, preprocessing, and augmentation systems
- Multi-lingual Processing: Cross-lingual transfer learning, zero-shot adaptation
- Domain Expertise: Healthcare, legal, technical domain speech AI
- Edge AI: TensorRT, Core ML, ONNX optimization for mobile/edge deployment
- Research Background: Publications in ICASSP, INTERSPEECH, ICML, NeurIPS
- Team Leadership: Experience leading speech AI teams and technical initiatives
- Education: MS/PhD in Computer Science, Electrical Engineering, or related field
- Open Source: Contributions to speech AI libraries and frameworks
- GPU Clusters: NVIDIA A100/H100 for model training
- Edge Deployment: NVIDIA Jetson, ARM-based targets
- Real-time Requirements:
- Enterprise Integration: Genesys AudioConnector, SIP protocol, telephony systems
- Multi-cloud Integration: Unified APIs across AWS, Google Cloud, Azure
- Conversational AI Agents: Low-latency speech-enabled customer service bots
- Telecom Integration: Enterprise telephony and AudioConnector systems
- Domain-specific Models: Medical, legal, technical vocabulary fine-tuning
- Performance:
- Latency:
- Reliability: 99.9% uptime for production services
- Scale: 1000+ concurrent speech streams