
Senior Site Reliability Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- End-to-end responsibility, from development to production, in designing, deploying, operating, and continuously improving performance and fault-tolerance of large-scale multi-cloud solutions.
- Ensure system security, data integrity, and high availability of the platform.
- Establish and improve monitoring, logging, and alerting frameworks to detect and resolve issues promptly.
- Keep up with technology trends and identify promising new solutions that meet our requirements.
- Create technical support documentation and provide hands-on troubleshooting and consulting to our customers.
- Hands on expertise in container orchestration systems such as Kubernetes running in a hybrid cloud environment such as Azure.
- Experience in continuous integration/deployment, and system engineering experience in large-scale, distributed cloud solutions (like but not limited to Kafka, Elasticsearch, Otel, Observability).
- Experience programming in one or more of the following such as Go, Java, Python and in scripting languages (Shell or PowerShell).
- Hands-on expertise in open-source application and infrastructure monitoring tools, e.g., ELK and/or TICK stack, Prometheus (must have) and Grafana.
- Passion for sharing knowledge, through interactive sessions as well as documentation.
- Strong analytical and problem-solving skills, as well as the ability to focus on details without losing track of the bigger picture.
- Excellent oral and written English skills, additional language skills are a plus.
Reference Code: 135085