
Senior CloudOps Engineer
- Pune, Maharashtra
- Permanent
- Full-time
Responsible for optimizing performance, ensuring security, and driving innovation in our cloud environment while responding to infrastructure and security alerts in a 24x7x365 operation.
- Create automation, runbooks, and playbooks to help others support the infrastructure
- Troubleshoot infrastructure and application
- level issues and collaborate with support specialists and Cloud Operations / SRE
- Write and present weekly report highlighting the previous week’s alerts, with detailed analysis, resolution and any impact to SLA.
- Monitor performance and capacity of Onit systems.
- Monitor for hardware, software and environmental alerts or malfunctions.
- Monitor security alerts from multiple sources.
- Triage and troubleshoot problems as they arise, following runbooks and standard operating procedures.
- Track all issues from start to finish and document in detail all resolutions, across trouble ticketing system and engineering runbooks.
- Escalate issues to InfraOps/Devops engineers and Onit management.
- Ready to work in shifts.
- Bachelor’s degree in Computer Science or equivalent experience is required.
- 4+ years’ experience with Red Hat Enterprise or Amazon Linux 2023 is required.
- 3+ years hands-n experience with AWS (EC2, S3, RDS, VPC, Cloudwatch, CloudTrail, IAM, EKS, ECS, Security, etc.)
- A solid understanding of the components that make up production systems (Memory, CPU, Disk space, Disk i/o, Network i/o, etc.) is required.
- Strong experience with monitoring, alerting, and log aggregation tools: Datadog, AWS CloudWatch, PagerDuty, Statuspage.
- Experience with SIEM/event correlation systems like Elastic, Splunk, ELK, etc. required.
- Strong understanding of AWS security and monitoring and experience implementing best practices.
- Ability to read and interpret application server logs, outputs, CloudTrail and other critical logging output
- Experience working with Relational Database such as Postgres, AWS RDS is a plus
- Hands-on experience working in Kubernetes is a plus
- Experience with Enterprise Web applications in production
- Experience with a programming language such as Python a plus
- Excellent troubleshooting skills required.
- Ensure resource availability and allocation
- Excellent written and verbal communication skills required.
- Experience using Git (GitLab a plus), CI/CD pipelines (eg: Jenkins)