Lead Assistant Manager

EXL Service

  • Noida, Uttar Pradesh
  • Permanent
  • Full-time
  • 5 days ago
Job Description:Senior ASR/TTS Specialist - AI Agent Integration ExpertCompany: EXL Service
Type: Full-time
Experience: 3+ yearsPosition SummaryWe seek an exceptional Senior ASR/TTS Specialist to lead speech AI initiatives and integrate advanced speech technologies with AI agent frameworks. This role focuses on fine-tuning ASR/TTS models, implementing MLOps best practices, and building production-ready speech AI systems powering next-generation conversational AI agents.Key ResponsibilitiesSpeech AI Model Development & Integration
  • Model Fine-tuning: Customize state-of-the-art ASR/TTS models for domain-specific applications with
  • Speech-to-Speech Systems: Build end-to-end S2S pipelines using Amazon Nova Sonic v1.0, Azure OpenAI Realtime (GPT-4o), and Gemini 2.5 Flash Native Audio
  • Multi-modal Integration: Develop speech models integrating with vision and text modalities in AI agents
  • Agent Framework Integration: Implement speech capabilities with LangChain/LangGraph, CrewAI, AutoGen, LlamaIndex, and OpenAI Assistants API
MLOps & Production Engineering
  • Model Lifecycle: Implement comprehensive MLOps pipelines using MLflow, Weights & Biases, and automated CI/CD
  • Multi-cloud Deployment: Deploy speech models across AWS Bedrock, Google Cloud AI, and Azure Cognitive Services
  • Real-time Processing: Build WebSocket-based streaming audio systems handling 1000+ concurrent connections
  • Production Monitoring: Implement WER tracking, latency monitoring, and multi-provider failover mechanisms
Research & Development
  • Cutting-edge Research: Stay current with latest speech AI breakthroughs and implement novel architectures
  • Performance Optimization: Optimize models for real-time inference using TensorRT, ONNX, and edge deployment
  • Data Pipeline Engineering: Build scalable audio ingestion, preprocessing, and augmentation systems
Required QualificationsCore Technical Skills (Must-Have)Speech AI Models (3+ years experience): - ASR Systems: Amazon Nova Sonic v1.0, Google Speech-to-Text, Azure Speech Services, Whisper, Wav2Vec2, Riva - TTS Systems: Google TTS, Azure Cognitive Services TTS, ElevenLabs (REST/WebSocket), Tortoise, VITS, FastSpeech2 - Speech-to-Speech: Direct S2S without intermediate text, multimodal audio processing - Cloud Services: AWS Bedrock Runtime, Google Cloud AI (Gemini API), Azure OpenAI ServicesProgramming & Frameworks: - Languages: Expert Python, proficient C++/Rust for optimization - ML Frameworks: Advanced PyTorch, TensorFlow 2.x, JAX/Flax - Audio Processing: librosa, torchaudio, soundfile, WebRTC, µ-law/PCM conversion - Agent Frameworks: Hands-on experience with 3+ of: LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI AssistantsMLOps & Infrastructure (Essential)MLOps Tools (2+ years): - Experiment Management: MLflow, Weights & Biases - Model Serving: TorchServe, TensorFlow Serving, NVIDIA Triton - Workflow Orchestration: Apache Airflow, Kubeflow, Prefect - Containerization: Docker, Kubernetes for speech model deploymentCloud & Production: - Multi-cloud Experience: AWS (Bedrock, Nova Sonic), Google Cloud (Gemini, Speech APIs), Azure (OpenAI Services) - Real-time Systems: Sub-300ms latency, WebSocket architecture, telecom integration (Genesys AudioConnector) - Monitoring: Audio quality metrics, model drift detection, production reliability (99.9% uptime)Preferred QualificationsAdvanced Specializations
  • Multi-lingual Processing: Cross-lingual transfer learning, zero-shot adaptation
  • Domain Expertise: Healthcare, legal, technical domain speech AI
  • Edge AI: TensorRT, Core ML, ONNX optimization for mobile/edge deployment
  • Research Background: Publications in ICASSP, INTERSPEECH, ICML, NeurIPS
Leadership & Education
  • Team Leadership: Experience leading speech AI teams and technical initiatives
  • Education: MS/PhD in Computer Science, Electrical Engineering, or related field
  • Open Source: Contributions to speech AI libraries and frameworks
Technical EnvironmentProduction Technology StackCore Technologies: - Languages: Python, C++, Rust, TypeScript - Frameworks: PyTorch, TensorFlow, JAX, LangChain, CrewAI, AutoGen - Cloud Services: AWS Bedrock, Google Cloud AI, Azure OpenAI Services - Audio Tools: librosa, torchaudio, WebRTC, FFmpeg - MLOps: MLflow, Kubeflow, Docker, Kubernetes, NVIDIA Triton - Databases: Vector DBs (Pinecone, Weaviate), PostgreSQL, RedisProduction Models: - Amazon Nova Sonic v1.0 (Speech-to-Speech streaming) - Gemini 2.5 Flash Native Audio Dialog (Multimodal processing) - Azure OpenAI GPT-4o (Realtime voice conversations) - ElevenLabs (Voice cloning and synthesis)Infrastructure
  • GPU Clusters: NVIDIA A100/H100 for model training
  • Edge Deployment: NVIDIA Jetson, ARM-based targets
  • Real-time Requirements:
  • Enterprise Integration: Genesys AudioConnector, SIP protocol, telephony systems
Key Projects & Success MetricsPrimary Focus Areas * Next-gen S2S Systems: Amazon Nova Sonic, Azure OpenAI Realtime, Gemini Native Audio
  • Multi-cloud Integration: Unified APIs across AWS, Google Cloud, Azure
  • Conversational AI Agents: Low-latency speech-enabled customer service bots
  • Telecom Integration: Enterprise telephony and AudioConnector systems
  • Domain-specific Models: Medical, legal, technical vocabulary fine-tuning
Success Metrics
  • Performance:
  • Latency:
  • Reliability: 99.9% uptime for production services
  • Scale: 1000+ concurrent speech streams

EXL Service