
Cloud Site Reliability Engineer
- Pune, Maharashtra
- Permanent
- Full-time
- Analyzing the current state, designing appropriate solutions and working with the team to implement them.
- Coordinate emergency responses, perform root cause analysis, identify and implement solutions to prevent re-occurrences
- Work with the team to identify ways to increase MTBF and lower MTTR for the environment
- Review each entire application stack and execute initiatives to reduce failures, defects and issues with the overall performance
- Identifying and working with the team to implement more efficient system procedures
- Maintaining environment monitoring systems to provide the best visibility into the state of the deployed products/solutions
- Perform root cause analysis on incoming infrastructure alerts and work with teams to resolve them
- Maintaining performance analysis tools, identifying any adverse changes to performance and working with the teams to resolve them
- Researching industry trends and technologies, and promote adoption of best-in-class tools and technologies
- Taking the initiative to advance the quality, performance, or scalability of our Cloud Solutions, by influencing the architecture or design of our products
- Design, develop and execute automated tests to validate solutions and environments
- Troubleshoot issues across the entire stack – infrastructure, software, application and network
- 1+ years’ experience working as a Site Reliability Engineer or an equivalent position
- 1+ years’ experience with AWS cloud technologies and at least one AWS certifications is required (Solution Architect / DevOps Engineer)
- 1+ years’ experience functioning as a senior member in an infrastructure/software team
- Hands-on experience with AWS services like CodeBuild, Config, Systems Manager, ServiceCatalog, Lambda, etc.
- Full-stack IT experience with *nix, Windows, network/firewall concepts, source control (BitBucket) and build/dependency management and continuous integration systems (TeamCity, Jenkins)
- Expertise in at least one scripting language, Python preferred
- Firm understanding of networking concepts and technologies
- Exposure to big data technologies (Spark, Hadoop, Scala, etc.) stack is preferred
- Good understanding of RDBMS and Cloud Database engines like PostgreSQL, MySQL etc.
- Basic understanding of Clusters, Load balancers and CDN
- Experience in fault-tolerant system design
- Familiarity with Splunk data analysis, Dynatrace APM, or similar tools is a plus
- A Bachelor’s degree (Master’s preferred) in a related technical field
- Excellent analytical, troubleshooting and communication skills
- Possess strong verbal, written and team presentation communication skills. ZS is a global firm; fluency in English is required
- This role requires healthy doses of initiative and the ability to remain flexible and responsive in a very dynamic environment
- Ability to quickly learn new platforms, languages, tools, and techniques as needed to meet project requirement
- Client-first mentality
- Intense work ethic
- Collaborative spirit and problem-solving approach