
Senior DevOps Engineer - Kubernetes & AWS
- Chennai, Tamil Nadu
- Permanent
- Full-time
- A passionate and experienced engineer with a proven track record of identifying and resolving reliability and scalability challenges in large-scale, containerized applications.
- A curious and collaborative team player who thrives in a fast-paced environment, eager to explore, learn, and improve processes—particularly around Kubernetes deployments and management.
- An efficiency enthusiast, skilled at automating solutions and continuously innovating container orchestration and management.
- A nimble learner, capable of grasping complex Kubernetes concepts and an excellent communicator who can advocate for best practices in Kubernetes operations.
- Design, deploy, and manage highly available and scalable Kubernetes clusters on AWS EKS using Terraform and/or Cross plane.
- Implement Infrastructure-as-Code (IaC) best practices for managing EKS clusters and related infrastructure.
- Configure and maintain Kubernetes deployments, services, ingresses, and other resources using YAML manifests or GitOps workflows.
- Implement GitOps practices with FluxCD for automated deployments and configuration management of containerized applications.
- Proactively ensure the reliability, security, and scalability of AWS production systems, with a particular focus on Kubernetes clusters and containerized applications.
- Resolve complex problems across multiple platforms and application domains, using advanced system troubleshooting techniques.
- Provide primary operational support and engineering expertise for all cloud and enterprise deployments, with a focus on Kubernetes.
- Monitor system performance, identify downtime incidents, and diagnose underlying causes, particularly related to Kubernetes cluster and container health.
- Design and develop cost-effective Kubernetes solutions within allocated budgets, ensuring efficient resource utilization.
- Work closely with developers, testers, and system administrators to ensure smooth deployments and operations of containerized applications.
- Champion the implementation of new processes, tools, and methodologies to enhance efficiency throughout the software development lifecycle (SDLC) and pipeline management.
- Integrate robust security measures into the development lifecycle, considering the specific security requirements of containerized applications.
- 5 to 9 years of experience building, scaling, and supporting highly available systems and services.
- Min 3+ years of experience managing and operating Kubernetes clusters in production.
- Proven experience in building and managing AWS platforms, with a strong focus on Amazon EKS (Elastic Kubernetes Service).
- Deep knowledge of Kubernetes architecture, core concepts, best practices, and security considerations.
- Expertise in Infrastructure-as-Code (IaC) tools like Terraform and Cross plane.
- Familiarity with GitOps principles and experience with FluxCD (a plus).
- Proficiency in at least one scripting/programming language (Python, Go, Ruby, Shell).
- Experience in Site Reliability Engineering (SRE) and DevOps principles, including CI/CD and version control (Bitbucket, GitHub, etc.).
- Familiarity with telemetry, observability, and modern monitoring tools (Prometheus, Alertmanager, Grafana, etc.), particularly for Kubernetes monitoring.
- Strong expertise in system visibility to facilitate rapid detection and resolution of issues within Kubernetes clusters.
- A strong ability to learn and adapt in a fast-paced environment, especially as Kubernetes and container orchestration technologies evolve.
- Excellent teamwork skills, collaborating effectively across cross-functional teams including developers, testers, and system administrators.
- Strong prioritization and problem-solving skills, adept at troubleshooting complex Kubernetes-related issues.
- Ability to manage multiple projects simultaneously, ensuring projects stay on track with clear progress updates.
- Ability to handle unexpected challenges while effectively context-switching between tasks.
- Willingness to participate in rotational on-call duties to ensure continuous monitoring and support of Kubernetes clusters.
- A strong work ethic and commitment to continuous learning and improvement in Kubernetes and container orchestration technologies