
Site Reliability Engineer - Systems (7 to 10 Years)
- Bangalore, Karnataka
- Permanent
- Full-time
- Lead incident response and resolution: Proactively troubleshoot, debug, and resolve complex system-level incidents and outages, encompassing Linux operating systems, applications, and database technologies.
- Conduct deep-dive root cause analysis: Perform thorough post-incident analysis to identify underlying issues in production environments, implementing sustainable solutions.
- Design and implement robust monitoring: Develop, maintain, and enhance comprehensive system and database monitoring, alerting, and observability solutions (e.g., Grafana, Prometheus, PMM).
- Drive automation and efficiency: Automate Linux system administration tasks, operational runbooks, and database maintenance to improve system reliability, consistency, and operational efficiency.
- Collaborate on resilient deployments: Partner with development and engineering teams to ensure seamless, reliable, and secure software deployments and infrastructure changes.
- Architect scalable infrastructure: Contribute to the architectural design and implementation of highly scalable, resilient, and performant infrastructure solutions.
- Enhance on-call effectiveness: Participate in and continuously improve on-call rotations, developing tools and processes to reduce alert fatigue and minimize human error.
- Foster technical growth: Mentor and guide junior Site Reliability Engineers (SREs), promoting knowledge sharing and skill development within the team.
- Extensive Linux Expertise: Proven experience in advanced Linux systems administration, including deep understanding of file systems, kernel tuning (Sysctl), and performance optimization.
- Advanced Troubleshooting & Debugging: Exceptional ability to debug and rapidly resolve complex, distributed system-level issues in high-pressure production environments.
- Configuration Management: Hands-on experience with industry-standard configuration management tools (e.g., SaltStack, Ansible, Puppet).
- Load Balancing & Proxying: Practical experience with load balancing technologies (e.g., Nginx, HAProxy, LVS) and their configuration for high availability.
- Containerization & Orchestration: Strong understanding and practical experience with containerization (e.g., Docker) and container orchestration platforms (e.g., Kubernetes, Mesosphere).
- Monitoring & Alerting Tooling: Proficiency in implementing, maintaining, and leveraging system and database monitoring platforms (e.g., Grafana, Prometheus, PMM) and custom scripting for alerts.
- Automation & Scripting Mastery: Highly proficient in developing automation solutions using scripting languages (e.g., Python, Shell scripting, Go) for operational tasks.
- Networking Fundamentals: Solid understanding of core networking concepts and protocols (e.g., TCP/IP, DNS, DHCP, BGP, IPTables, IP & Routing protocols).
- Database Administration Fundamentals: Strong grasp of relational database concepts and practical experience with database administration principles.
- Cloud Infrastructure Experience: Experience managing and troubleshooting private/on-premise cloud environments, with a focus on identifying and mitigating hardware-related issues and their impact.
- Relational Database Specialization: Deep practical experience with MariaDB, Percona Server, and/or MySQL, encompassing advanced database administration, performance tuning, and complex replication topologies.
- Backup & Recovery Expertise: Hands-on experience with robust backup and restore technologies, including ZFS.
- Message Queuing Systems: Familiarity with message queuing systems like RabbitMQ (RMQ).
- Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
- Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
- Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
- Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
- Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
- Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy