
Senior Site Reliability Engineer I
- Mumbai, Maharashtra
- Permanent
- Full-time
- Lead efforts to automate manual and repetitive tasks, contributing to resilient and reliable systems.
- Develop and implement self-healing infrastructure solutions to drive operational efficiency and reduce incidents.
- Create and maintain automation and tools to promote system performance and uptime.
- Support post-release validation and operational readiness for new deployments.
- Provide occasional support outside of standard hours as needed for major releases or critical changes, with consideration for work-life balance.
- Design infrastructure following best practices for scalability, fault tolerance, and security.
- Define and manage Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to partner with teams in ensuring reliable services.
- Collaborate with engineering teams to enhance deployment pipelines and make recommendations for improved architecture, release speed, and productivity.
- Professional experience in a Site Reliability Engineering, DevOps, or related technical role (all relevant pathways and learning experiences welcomed).
- Cloud Platform Familiarity: Especially with AWS services such as EC2, Lambda, DynamoDB, Aurora RDS PostgreSQL, and AWS OpenSearch. Experience with similar platforms is also valued.
- Infrastructure as Code (IaC): Hands-on experience (preferably 2 or more years) with tools like Terraform, or similar, to automate and manage cloud resources.
- Experience with containerization, using Docker, with Kubernetes skills considered a plus.
- Familiarity with configuration management tools such as Puppet, Ansible, or comparable systems.
- Experience with monitoring, alerting, and observability tools (e.g., Elastic Search, Grafana, Open Telemetry, GitHub Actions, Azure DevOps, TeamCity, Jenkins).
- Relevant certifications in AWS, Kubernetes, or related areas are appreciated but not required.