Job Category: DigitalJob Description:We are looking for a hands-on Full-Stack AI Engineer specializing in Voice to join our CX AI Engineering Team. You will own end-to-end delivery - from model research and GPU deployment to backend APIs and demo-ready frontends - with a builder's mindset: if it doesn't exist, build it; if it's slow, fix it; if it's not deployed, ship it. You will design and scale a low-latency voice agent platform targeting sub-200ms end-to-end latency over WebRTC and SIP, and conduct applied research on emerging speech AI - including speech-to-speech (S2S) models such as NVIDIA PersonaPlex, Gemini Live, and AWS Nova Sonic.Responsibilities:
End-to-end voice pipelines: mic/telephony → ASR → LLM/NLP → TTS → audio, optimized for sub-200ms latency at scale
● Applied S2S research: evaluate NVIDIA PersonaPlex, Gemini Live, AWS Nova Sonic for production voice agent readiness● Model integration: ASR (Whisper, NVIDIA Riva, Deepgram, Google Chirp) + TTS (Kokoro, Chatterbox, Cartesia, ElevenLabs, Murf)● Telephony & streaming: Asterisk, FreeSWITCH, LiveKit, Pipecat integrations; WebRTC audio streaming● Latency optimization: chunked ASR, first-chunk TTS, WebSocket streaming, TTFB profiling● Cloud deployment: GPU workloads on GCP (Cloud Run, Vertex AI, GKE) and AWS (EC2 GPU, SageMaker, ECS)● Full ownership: research → backend service → OpenAPI spec → frontend demo UI → production deployment● Benchmarking: accuracy, WER, latency, and cost-per-minute comparisons across providersQualifications:Bachelor's/Master's in Engineering 2-5 years