
Site Reliability Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
Location: Bangalore (Hybrid)
Job Type: Full-TimeJob SummaryWe are seeking a proactive and skilled Site Reliability Engineer (SRE) with 3 to 5 years of experience to join our dynamic engineering team. The ideal candidate will be responsible for building and maintaining robust infrastructure, ensuring high availability and performance of systems, and enhancing operational efficiency through automation and DevOps practices. You will collaborate with cross-functional teams to identify reliability risks and drive stability across the platform.Key ResponsibilitiesSoftware Development & Automation
- Design, develop, test, and maintain high-quality software frameworks and automation tools to reduce manual intervention.
- Collaborate with development and QA teams to integrate reliability into application lifecycles.
- Incident & Problem Management
- Lead incident response and troubleshooting efforts to resolve production issues.
- Participate in on-call rotations, and create and maintain runbooks for effective incident response.
- Proactively identify and resolve performance and stability issues.
- Design, manage, and optimize cloud or on-premises infrastructure to ensure scalability and reliability.
- Hands-on experience with Microsoft Azure or Google Cloud Platform (GCP) is required.
- Build and maintain CI/CD pipelines using GitHub Actions.
- Drive automation initiatives and implement DevOps best practices across the engineering lifecycle.
- Set up and maintain observability solutions for applications and infrastructure using tools such as Splunk, Grafana, Prometheus, etc.
- Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Work with security teams to ensure compliance with internal and industry-wide standards.
- Conduct risk assessments and implement security controls.
- Identify areas for improvement in infrastructure and operations.
- Mentor junior engineers, participate in code reviews, and encourage a culture of knowledge sharing.
- Document systems, processes, and troubleshooting steps comprehensively.
- 3 to 5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Proficiency in at least one scripting/programming language, preferably Python.
- Solid understanding of Agile methodologies and SDLC practices.
- Strong experience with CI/CD using GitHub Actions or similar tools.
- Good hands-on experience with observability tools such as Splunk, Grafana, Prometheus, etc.
- Familiarity with version control systems, especially Git.
- Strong troubleshooting and problem-solving skills.
- Excellent communication and collaboration skills.