
Principal Software Engineer - AI/ML Platform
- Gurgaon, Haryana
- Permanent
- Full-time
GoDaddy's AI Platform team is looking for a Principal Software Engineer (Team Lead) to help guide and advance our company-wide AI transformation. This is an outstanding chance to build the platform that drives AI/ML and LLM experiences throughout GoDaddy!Develop crucial features such as retrieval, safety measures, and observability for scalable AI solutions. If you're passionate about modern AI platforms, enabling innovation responsibly, and leading a small but mighty team, we want to meet you.What you'll get to do...
- Architect and optimize AI/ML platform capabilities, including inference, pipelines, retrieval (RAG), and evaluation, with a focus on reliability, latency, and cost-efficiency.
- Build and manage core components such as model serving, feature/embedding pipelines, vector search, orchestration, safety/guardrails, and experimentation tools.
- Develop LLMOps standard methodologies for version control, evaluations, data sets, monitoring, and incident management for AI systems.
- Drive adoption and collaboration by integrating cloud/open-source AI tools with strong governance, partnering across functions, and evangelizing through documentation, templates, and talks.
- Lead and mentor a small engineering team, encouraging growth through reviews, pairing, and clear goal-setting.
- Engineering leadership: 10+ years in large-scale distributed systems, with 3+ years as a tech/people lead.
- Cloud expertise: 5+ years building on AWS/GCP/Azure, including IAM, networking, and managed services.
- 4+ years working with machine learning and artificial intelligence or high-throughput data systems, actively participating in applications supported by large language models.
- Strong technical depth: Proficient in Python, Go, Java, or Node.js, with proven systems development, algorithms, performance engineering, containers (Docker/Kubernetes/ECS), IaC (Terraform/CloudFormation), and CI/CD.
- Specialized skills: Hands-on with vector search (OpenSearch, Pinecone, pgvector), RAG patterns, and deep expertise in security, privacy, governance, and observability (OpenTelemetry, Prometheus, Grafana).
- Experience building internal developer platforms for AI (golden paths, standardized templates, platform APIs/SDKs, self-service tooling).
- Knowledge of LLM evaluation frameworks, offline/online testing, and safety/guardrail techniques (policy engines, red-team tooling, jailbreak defenses).
- Familiarity with retrieval & data tech: Lakehouse (e.g., Iceberg/Delta), feature stores (e.g., Feast), caching layers (Redis), and embeddings pipelines.
- Experience with responsible AI practices, model risk management, and compliance in production environments.
- Track record of developer advocacy: writing RFCs, running build reviews, and driving cross-org adoption.