
Site Reliability Engineer I
- Noida, Uttar Pradesh
- Permanent
- Full-time
We at Innovaccer are looking for a Site Reliability Engineer-I to build the most amazing product experience. You’ll get to work with other engineers to build delightful feature experiences to understand and solve our customer’s pain pointsA Day in the Life
- Take ownership of SRE pillars: Deployment, Reliability, Scalability, Service Availability (SLA/SLO/SLI), Performance, and Cost.
- Lead production rollouts of new releases and emergency patches using CI/CD pipelines while continuously improving deployment processes.
- Establish robust production promotion and change management processes with quality gates across Dev/QA teams.
- Roll out a complete observability stack across systems to proactively detect and resolve outages or degradations.
- Analyze production system metrics, optimize system utilization, and drive cost efficiency.
- Manage autoscaling of the platform during peak usage scenarios.
- Perform triage and RCA by leveraging observability toolchains across the platform architecture.
- Reduce escalations to higher-level teams through proactive reliability improvements.
- Participate in the 24x7 OnCall Production Support team.
- Lead monthly operational reviews with executives covering KPIs such as uptime, RCA, CAP (Corrective Action Plan), PAP (Preventive Action Plan), and security/audit reports.
- Operate and manage production and staging cloud platforms, ensuring uptime and SLA adherence.
- Collaborate with Dev, QA, DevOps, and Customer Success teams to drive RCA and product improvements.
- Implement security guidelines (e.g., DDoS protection, vulnerability management, patch management, security agents).
- Manage least-privilege RBAC for production services and toolchains.
- Build and execute Disaster Recovery plans and actively participate in Incident Response.
- Work with a cool head under pressure and avoid shortcuts during production issues.
- Collaborate effectively across teams with excellent verbal and written communication skills.
- Build strong relationships and drive results without direct reporting lines.
- Take ownership, be highly organized, self-motivated, and accountable for high-quality delivery.
- Experience: 1–3 years in production engineering, site reliability, or related roles.
- Solid hands-on experience with at least one cloud provider (AWS, Azure, GCP) with automation focus (certifications preferred).
- Strong expertise in Kubernetes and Linux.
- Proficiency in scripting/programming (Python required).
- Strong understanding of observability toolchains (Logs, Metrics, Tracing).
- Knowledge of CI/CD pipelines and toolchains (Jenkins, ArgoCD, GitOps).
- Familiarity with persistence stores (Postgres, MongoDB), data warehousing (Snowflake, Databricks), and messaging (Kafka).
- Exposure to monitoring/observability tools such as ElasticSearch, Prometheus, Jaeger, NewRelic, etc.
- Proven experience in production reliability, scalability, and performance systems.
- Experience in 24x7 production environments with process focus.
- Familiarity with ticketing and incident management systems.
- Security-first mindset with knowledge of vulnerability management and compliance.
- Advantageous: hands-on experience with Kafka, Postgres, and Snowflake.
- Excellent judgment, analytical thinking, and problem-solving skills.
- Ability to quickly identify and drive optimal solutions within constraints.
- Generous Leaves: Enjoy generous leave benefits of up to 40 days.
- Parental Leave: Leverage one of industry's best parental leave policies to spend time with your new addition.
- Sabbatical: Want to focus on skill development, pursue an academic career, or just take a break? We've got you covered.
- Health Insurance: We offer comprehensive health insurance to support you and your family, covering medical expenses related to illness, disease, or injury. Extending support to the family members who matter most.
- Care Program: Whether it’s a celebration or a time of need, we’ve got you covered with care vouchers to mark major life events. Through our Care Vouchers program, employees receive thoughtful gestures for significant personal milestones and moments of need.
- Financial Assistance: Life happens, and when it does, we’re here to help. Our financial assistance policy offers support through salary advances and personal loans for genuine personal needs, ensuring help is there when you need it most.