
Site Reliability Engineer III
- Hyderabad, Telangana
- Permanent
- Full-time
- Design, develop, and operate solutions for application reliability, monitoring, and automation.
- Execute incident response, troubleshooting, and root cause analysis to resolve production issues and improve system stability.
- Build and maintain CI/CD pipelines using Jenkins (including global libraries), and implement infrastructure as code with Terraform.
- Develop and support containerized applications using Docker and Kubernetes, ensuring robust deployments and scalability.
- Implement and maintain observability solutions using tools such as Grafana, Prometheus, Splunk, and OpenTelemetry.
- Collaborate with engineering and support teams to drive continuous improvement and operational excellence.
- Participate in on-call rotation, responding to production incidents and ensuring timely resolution.
- Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience
- Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
- Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, OpenTelemetry).
- Hands-on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
- Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
- Willingness to participate in on-call rotation and respond to production incidents.
- Ability to break down issues, document solutions, and communicate effectively with team members and customers.
- Familiar in banking, fintech, or regulated environments.
- Participation in game days or chaos engineering.
- Interest in sharing knowledge and best practices with peers.