Senior Engineer SRE / Devops
Apna
- Bangalore, Karnataka
- Permanent
- Full-time
Requirement: 1
Team: Platform EngineeringRequirementsKey Responsibilities
- Manage platform modernization initiatives including containerization, service mesh adoption and migration to microservices and serverless infrastructure.
- Design and implement robust CI/CD pipelines and self-service DevOps platforms to streamline software delivery across environments.
- Develop and manage Infrastructure as a Code (IaaC) using tools like Terraform or CloudFormation for scalable and repeatable deployments.
- Automate infrastructure provisioning, configuration management and operations using tools like Ansible, Chef or Puppet.
- Leverage AI/ML-driven automation for predictive alerting, anomaly detection, auto-scaling and intelligent incident response.
- Embed security and compliance into DevOps workflows by adopting DevSecOps practices throughout the software development lifecycle.
- Evaluate emerging technologies and methodologies to improve system reliability, developer experience and platform scalability.
- Participate in SRE on-call rotations, production support and post-incident reviews to continuously improve system resilience.
- Build internal tools and automation solutions to enhance platform observability and operational efficiency.
- Identify and resolve performance bottlenecks and lead root cause analysis efforts for critical incidents.
- Collaborate cross-functionally with engineering, architecture and security teams to drive best practices and architectural alignment.
- Support disaster recovery planning, backup strategy implementation and compliance initiatives (e.g., SOC2, ISO).
- Mentor junior engineers, promote knowledge sharing and foster a culture of engineering excellence.
- 4–6 years of experience in DevOps, SRE or platform engineering roles with a software engineering mindset.
- Hands-on expertise in Kubernetes, Docker and service mesh architectures (Istio, Linkerd).
- Expertise in CI/CD tools such as Jenkins, ArgoCD, Spinnaker or similar tools for automating and managing deployment workflows.
- Experience with observability stacks (Prometheus, Grafana, ELK, Loki or Datadog) for monitoring, logging and alerting.
- Good understanding of AIOps and ML-driven automation, including anomaly detection, intelligent alerting and predictive incident response
- Strong problem-solving and debugging skills, particularly in complex, production-grade distributed systems
- Expertise in Infrastructure as a Code (IaaC) using tools like Terraform or Pulumi and proficiency in configuration management with Ansible, Puppet or similar tools.
- Familiarity with event-driven architectures using tools like Kafka or cloud-native pub/sub messaging systems.
- Good understanding of cloud cost optimization and efficiency practices through automation and resource management.
- Experience integrating security scanning and compliance checks into CI/CD pipelines using tools like Trivy, Snyk or arnica.
- Work on impactful infrastructure and DevOps challenges at scale.
- Build infrastructure that enables fast, reliable and responsible deployment of AI solutions.
- Be part of a culture that champions engineering excellence, ownership and continuous learning.
- Help shape the future of DevOps and AI integration in a fast-moving, innovation-focused environment.
- Collaborate with architects and DevOps leaders on strategic initiatives.
- Be part of a team building intelligent, resilient platforms using cutting-edge DevOps and AI technologies.