Lead Assistant Manager

Noida, Uttar Pradesh
Permanent
Full-time

5 days ago

Job Description:Senior ASR/TTS Specialist - AI Agent Integration ExpertCompany: EXL Service
Type: Full-time
Experience: 3+ yearsPosition SummaryWe seek an exceptional Senior ASR/TTS Specialist to lead speech AI initiatives and integrate advanced speech technologies with AI agent frameworks. This role focuses on fine-tuning ASR/TTS models, implementing MLOps best practices, and building production-ready speech AI systems powering next-generation conversational AI agents.Key ResponsibilitiesSpeech AI Model Development & Integration

Model Fine-tuning: Customize state-of-the-art ASR/TTS models for domain-specific applications with
Speech-to-Speech Systems: Build end-to-end S2S pipelines using Amazon Nova Sonic v1.0, Azure OpenAI Realtime (GPT-4o), and Gemini 2.5 Flash Native Audio
Multi-modal Integration: Develop speech models integrating with vision and text modalities in AI agents
Agent Framework Integration: Implement speech capabilities with LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex, and OpenAI Assistants API

MLOps & Production Engineering

Model Lifecycle: Implement comprehensive MLOps pipelines using MLflow, Weights & Biases, and automated CI/CD
Multi-cloud Deployment: Deploy speech models across AWS Bedrock, Google Cloud AI, and Azure Cognitive Services
Real-time Processing: Build WebSocket-based streaming audio systems handling 1000+ concurrent connections
Production Monitoring: Implement WER tracking, latency monitoring, and multi-provider failover mechanisms

Research & Development

Cutting-edge Research: Stay current with latest speech AI breakthroughs and implement novel architectures
Performance Optimization: Optimize models for real-time inference using TensorRT, ONNX, and edge deployment
Data Pipeline Engineering: Build scalable audio ingestion, preprocessing, and augmentation systems

Required QualificationsCore Technical Skills (Must-Have)Speech AI Models (3+ years experience): - ASR Systems: Amazon Nova Sonic v1.0, Google Speech-to-Text, Azure Speech Services, Whisper, Wav2Vec2, Riva - TTS Systems: Google TTS, Azure Cognitive Services TTS, ElevenLabs (REST/WebSocket), Tortoise, VITS, FastSpeech2 - Speech-to-Speech: Direct S2S without intermediate text, multimodal audio processing - Cloud Services: AWS Bedrock Runtime, Google Cloud AI (Gemini API), Azure OpenAI ServicesProgramming & Frameworks: - Languages: Expert Python, proficient C++/Rust for optimization - ML Frameworks: Advanced PyTorch, TensorFlow 2.x, JAX/Flax - Audio Processing: librosa, torchaudio, soundfile, WebRTC, µ-law/PCM conversion - Agent Frameworks: Hands-on experience with 3+ of: LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI AssistantsMLOps & Infrastructure (Essential)MLOps Tools (2+ years): - Experiment Management: MLflow, Weights & Biases - Model Serving: TorchServe, TensorFlow Serving, NVIDIA Triton - Workflow Orchestration: Apache Airflow, Kubeflow, Prefect - Containerization: Docker, Kubernetes for speech model deploymentCloud & Production: - Multi-cloud Experience: AWS (Bedrock, Nova Sonic), Google Cloud (Gemini, Speech APIs), Azure (OpenAI Services) - Real-time Systems: Sub-300ms latency, WebSocket architecture, telecom integration (Genesys AudioConnector) - Monitoring: Audio quality metrics, model drift detection, production reliability (99.9% uptime)Preferred QualificationsAdvanced Specializations

Multi-lingual Processing: Cross-lingual transfer learning, zero-shot adaptation
Domain Expertise: Healthcare, legal, technical domain speech AI
Edge AI: TensorRT, Core ML, ONNX optimization for mobile/edge deployment
Research Background: Publications in ICASSP, INTERSPEECH, ICML, NeurIPS

Leadership & Education

Team Leadership: Experience leading speech AI teams and technical initiatives
Education: MS/PhD in Computer Science, Electrical Engineering, or related field
Open Source: Contributions to speech AI libraries and frameworks

Technical EnvironmentProduction Technology StackCore Technologies: - Languages: Python, C++, Rust, TypeScript - Frameworks: PyTorch, TensorFlow, JAX, LangChain, CrewAI, AutoGen - Cloud Services: AWS Bedrock, Google Cloud AI, Azure OpenAI Services - Audio Tools: librosa, torchaudio, WebRTC, FFmpeg - MLOps: MLflow, Kubeflow, Docker, Kubernetes, NVIDIA Triton - Databases: Vector DBs (Pinecone, Weaviate), PostgreSQL, RedisProduction Models: - Amazon Nova Sonic v1.0 (Speech-to-Speech streaming) - Gemini 2.5 Flash Native Audio Dialog (Multimodal processing) - Azure OpenAI GPT-4o (Realtime voice conversations) - ElevenLabs (Voice cloning and synthesis)Infrastructure

GPU Clusters: NVIDIA A100/H100 for model training
Edge Deployment: NVIDIA Jetson, ARM-based targets
Real-time Requirements:
Enterprise Integration: Genesys AudioConnector, SIP protocol, telephony systems

Key Projects & Success MetricsPrimary Focus Areas * Next-gen S2S Systems: Amazon Nova Sonic, Azure OpenAI Realtime, Gemini Native Audio

Multi-cloud Integration: Unified APIs across AWS, Google Cloud, Azure
Conversational AI Agents: Low-latency speech-enabled customer service bots
Telecom Integration: Enterprise telephony and AudioConnector systems
Domain-specific Models: Medical, legal, technical vocabulary fine-tuning

Success Metrics

Performance:
Latency:
Reliability: 99.9% uptime for production services
Scale: 1000+ concurrent speech streams

EXL Service

Apply Now