
Associate Senior Site Reliability Engineer
- Pune, Maharashtra
- Permanent
- Full-time
- Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.
- Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.
- Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
- Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.
- Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.
- Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
- Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.
- Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).
- Troubleshooting systems and network issues, alongside our Technical Operations Team.
- Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.
- Developing runbooks and improving documentation.
- BS in Computer Science, Information Technology, Business / Management Information Systems or related field
- Typically minimum of 2 years relevant experience
- Nothing provided
- Skills / Knowledge - Developing professional expertise, applies company policies and procedures to resolve a variety of issues.
- Job Complexity - Works on problems of moderate scope where analysis of situations or data requires a review of a variety of factors. Exercises judgment within defined procedures and practices to determine appropriate action. Builds productive internal/external working relationships.
- Supervision - Normally receives general instructions on routine work, detailed instructions on new projects or assignments.
- Experience in Public and Private Clouds, Jenkins, Terraform, Ansible, OpenShift, Kubernetes or AWS EKS