Site Reliability Engineer

Bangalore, Karnataka
Permanent
Full-time

2 months ago

About Zamp:At Zamp, we're building AI agents that empower people to move at the speed of thought. Our vision is a world where AI handles the routine, so humans can focus on strategy and innovation. We are building a platform where all operational work runs autonomously. We partner with Fortune 500s, leading global banks and companies to streamline complex Finance and Operations processes.Founded in 2022 by Amit Jain-an IIT Delhi and Stanford graduate with over 20 years of industry leadership, including roles as Managing Director at Sequoia Capital and Head of Asia Pacific at Uber-Zamp is backed by a stellar $22M seed round. Our investors include Sequoia Capital, Dara Khosrowshahi (CEO, Uber), Tony Xu (CEO, DoorDash), and other global visionaries.About the team:At Zamp, our engineering team is the force behind our technological innovations. Transforming the most ambitious ideas into reality, we breathe life into dreams through code and hardware. With the right mix of expertise and creativity, this team is the unseen magicians who add a dash of tech wizardry to make Zamp's products shine.From coding late into the night to brainstorming over a cup of coffee, we are always on a mission to make our technology stand out. We are not just the engineering team but the tech superheroes that keep Zamp at the forefront of innovation. \nYou are likely to succeed in this role if you bring experiences in :

Self-Hosted Infrastructure Ownership: Design, deploy, and maintain self-hosted systems and services, ensuring reliability, scalability, and security.
Kubernetes & Container Orchestration: Architect, manage, and scale Kubernetes clusters in production and self-hosted environments.
Infrastructure as Code (IaC): Write and manage Terraform modules to provision and manage infrastructure across AWS/GCP and on-prem setups.
CI/CD Automation: Build and maintain reliable CI/CD pipelines using tools like Jenkins, GitLab CI, or ArgoCD to ensure fast and safe deployments.
Monitoring & Observability: Set up and fine-tune observability tools like Grafana, Prometheus, and Graylog to monitor infrastructure, detect anomalies, and ensure uptime SLAs.
Scripting & Engineering: Write clean, modular automation scripts in Python, Bash, or Go to support operational needs and improve team productivity.
System Reliability & Incident Response: Own on-call responsibilities, drive root cause analysis, and continuously improve incident handling and system resilience.
Security & Compliance: Implement security and access controls within infrastructure, focusing on hardened self-hosted environments.

What we are actively looking for :

4-6 years of experience
Proven experience managing self-hosted systems and internal tooling at scale
Deep hands-on knowledge of Kubernetes, including Helm, Ingress, scaling, and custom operators
Solid experience with Terraform for IaC; optionally Ansible for configuration
Expertise in monitoring/logging stacks: Grafana, Prometheus, Graylog, ELK
Hands-on with AWS, GCP, or Azure; strong understanding of cloud-native + on-prem hybrid setups
Strong scripting experience in Python, Bash, or Go

Proficiency in version control systems like Git for managing code repositories and facilitating collaboration among development teams

\nOur Culture and Benefits:At Zamp, we promote a culture of open communication, collaboration, and empowerment. We value transparency, meritocracy, and a strong work ethic. Join our early team and help us build something exceptional.Perks:- Competitive salaries and stock options with substantial potential upside.- Collaborate with top talent.- Diverse and inclusive workspace.- Comprehensive medical insurance for employees, spouses, and children.- A culture celebrating every victory.- Continuous learning and skill development opportunities.- Enjoy good food, games, and a comfortable office environment.

Zamp Finance

Apply Now