
Associate Director, Reliability Platforms
- Bangalore, Karnataka
- Permanent
- Full-time
- Define and execute the charter for the Reliability Platforms Pod, consisting of 2-4 atomic teams, aligned with the broader Superpod mission and Strategic Technology Objectives (STOs).
- Take full ownership of the systems within the pod, ensuring technical decisions and outcomes drive measurable impact and business reliability.
- Identify and manage cross-superpod and cross-functional dependencies, driving cohesive delivery and reducing system fragility.
- Build and scale Platform-as-a-Product offerings with strong developer empathy, clear reliability KPIs, and intuitive user experience.
- Architect systems and services that integrate observability, resiliency, chaos testing, and platform insights into core infrastructure workflows.
- Lead performance engineering efforts including load testing, chaos simulations, and failure modeling to drive proactive reliability culture.
- Co-create and review technical artifacts such as design documents, architecture reviews, and postmortem analyses.
- Champion engineering best practices across observability, SLO definition, automation, and incident response.
- Support the translation of non-technical business requirements into robust engineering solutions-especially around risk mitigation and system uptime.
- Attract, mentor, and grow diverse engineering talent across ICs and managers.
- Foster a high-trust, inclusive, and collaborative team culture focused on continuous improvement.
- Coach teams through architectural tradeoffs, prioritization under ambiguity, and delivery of platform features with wide organizational impact.
- Represent the team in cross-functional and leadership forums to advocate for reliability priorities.
- Work closely with Product, SRE, Infrastructure, and Developer Experience stakeholders to align platform initiatives with company-wide goals.
- 16+ years of software engineering experience, with at least 10 years in technical leadership roles, and 5+ years leading managers.
- Proven ownership of Pod-level systems and delivery of complex platform initiatives aligned with STOs and cross-org reliability goals.
- Deep experience with cloud-native platform engineering including microservices, event-driven systems, and observability tools (Prometheus, Datadog, OpenTelemetry).
- Hands-on expertise in applying SRE principles (SLOs, error budgets, automated remediation) to platform and infrastructure layers.
- Proficiency with Infrastructure-as-Code and automation (e.g., Terraform, Kubernetes), driving repeatable, self-service reliability tooling.
- Demonstrated success in delivering developer-centric tooling-e.g., APIs, CLIs, dashboards-to improve adoption and system insights.
- Strong communication and stakeholder engagement skills; comfortable influencing with and without authority.
- A pragmatic, results-oriented mindset with the ability to balance technical excellence with business impact.