
Site Reliability Engineer (DCI)
- Bangalore, Karnataka
- Permanent
- Full-time
- Expertise in designing and implementing reliable, scalable, and fault-tolerant systems.
- Proficiency in setting up and managing monitoring, alerting, and logging systems for early detection and resolution of issues for container orchestrators like Kubernetes using Tools like Prometheus, Grafana, Open Telemetry Collector or similar tools.
- Hands-on experience in incident management, including incident response, troubleshooting, and post-mortem analysis.
- Proficiency in coding/scripting languages commonly used in infrastructure automation and monitoring (such as Terraform).
- Familiar with deployment process and strategies.
- Knowledge of best practices in disaster recovery planning and execution for cloud based Systems.
- Capability to advocate for SRE best practices and principles within the organization and drive cultural changes as needed.
- Willingness to stay updated with the latest trends, tools, and technologies in the field of site reliability engineering.
- Strong communication skills to effectively collaborate with cross-functional teams, including Software Developers, Product Owners, and Cloud Platform Engineers.
- Expertise in designing and implementing reliable, scalable, and fault-tolerant systems.
- Proficiency in setting up and managing monitoring, alerting, and logging systems for early detection and resolution of issues for container orchestrators like Kubernetes using Tools like Prometheus, Grafana, Open Telemetry Collector or similar tools.
- Hands-on experience in incident management, including incident response, troubleshooting, and post-mortem analysis.
- Proficiency in coding/scripting languages commonly used in infrastructure automation and monitoring (such as Terraform).
- Familiar with deployment process and strategies.
- Knowledge of best practices in disaster recovery planning and execution for cloud based Systems.
- Capability to advocate for SRE best practices and principles within the organization and drive cultural changes as needed.
- Willingness to stay updated with the latest trends, tools, and technologies in the field of site reliability engineering.
- Strong communication skills to effectively collaborate with cross-functional teams, including Software Developers, Product Owners, and Cloud Platform Engineers.