
Senior Site Reliability Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- Working on Internet technologies to improve the performance, availability, and scalability of large distributed content delivery systems
- Design,Implement and tune monitoring and observability systems to meet defined SLIs and SLOs.
- Acting as an escalation point for support, platform and product teams to ensure system issues are resolved
- Leading incident response, utilizing coding, data analysis, network diagnostics, and debugging tools for distributed systems.
- Collaborating with support, operations, and engineering teams, investigating issues, and implementing solutions to prevent recurrence.
- Collaborating with engineering and product teams to enhance reliability, scalability, performance, and usability of offerings.
- Identifying potential problems and creating scalable solutions to ensure continuous improvements to QOS
- Staying updated on advancements in cloud computing, DevOps, and SRE best practices.
- Require expertise in Computer Science, Engineering, or related field with 5+ experience in Site Reliability Engineering.
- Demonstrate expertise in coding and scripting languages such as C/C++, Python, Bash, JavaScript, etc.
- Demonstrate expert troubleshooting in UNIX/Linux environments, emphasizing scalability and reliability in distributed systems.
- Demonstrate expertise in internet protocols and networking: DNS, HTTP/HTTPS, UDP, TCP/IP, TLS/SSL.
- Gain expertise using observability tools like Prometheus, Grafana, ADBMS, Datadog for SLI/SLO management.
- Working knowledge in cloud platforms (Azure, Databricks).