
Site Reliability Engineer
- India
- Permanent
- Full-time
- You are experienced with infrastructure as code practices
- You consistently use your programming skills to automate tasks
- You are comfortable working in a CLI environment
- You think of software and infrastructure coming together to form a larger system
- You dig deep into incidents/problems and come up with unique solutions
- You are enthusiastic about learning new technologies and spreading your knowledge
- You battle ruthlessly to fix what's broken and protect the customer experience
- You are compelled to leave a situation better than you found it
- Increasing the observability of our various applications, services, and infrastructure using:
- Open Telemetry
- Grafana eco-system (Grafana, Loki, Mimir, Tempo)
- Fluentd
- Automating our applications and infrastructure using:
- Terraform
- Kubernetes
- Puppet
- Creating CI/CD pipelines for these services using:
- Gitlab
- ArgoCD
- Kustomize
- Working with our Product teams and helping them capture the user experience in SLOs
- Reducing the impact of service disruptions through our incident, problem, change management programs