
Lead Site Reliability Engineering - Cloud, Observability, Docker, IaC(14+ Yrs)
- Bangalore, Karnataka
- Permanent
- Full-time
- Security and Safety: Ensure the security and safety of application services and platforms. Lead efforts to enhance operational practices focusing on efficiency, security, and excellence.
- Zero Downtime: Maintain zero downtime by swiftly addressing any issues to ensure the environment is always operational. Conduct rapid root cause analysis and implement remediation in production environments after thorough testing.
- Environment Management: Oversee all activities within the environment, including deploying new code.
- Team Leadership: Inspire and lead the team to deliver strategic and innovative approaches that drive Visa's growth. Provide mentorship and foster a culture of collaboration and continuous improvement.
- Stakeholder Partnerships: Build strong partnerships with key stakeholders, including product management, engineering, design, and operations.
- Strategic Impact: Impact strategic decisions at all levels by interacting with other leaders on complex issues and applying strong judgment and analysis.
- Effective Communication: Communicate effectively with both technical and business partners to create frameworks for discussing complex topics.
- Automation and AI: Regularly analyze the environment and promote the adoption of automation and Generative AI to stay competitive.
- Cloud Infrastructure: Lead cloud infrastructure adoption and migration, ensuring a seamless transition with minimal downtime.
- Problem Resolution: Run problem bridges by collaborating with different functional and technical teams, escalating issues as needed for timely resolution.
- Information Sharing: Proactively share important context and information with relevant stakeholders.
- Operational Excellence: Spearhead the enhancement of operational practices focusing on efficiency, security, and excellence.
- 14 or more years of work experience with a Bachelor's Degree or at least 12+ years of work experience with an Advanced Degree (e.g. Masters/ MBA/JD/MD) or at least 10+ years of work experience with a PhD
- 14+ years of work experience in Site Reliability Engineering.
- 10+ years of experience with JAVA, J2EE applications, and a deep understanding of Web Services technologies: REST & SOAP.
- 5+ years of experience managing applications on Containers (Docker) and Cloud (AWS, GCP, Azure).
- Strong understanding of relational databases and middleware stacks (IIS, .NET, Java, TcServer, JBoss, Containers).
- Knowledge of Generative AI capabilities and use cases.
- Advanced level programming and or scripting in 3 or more of the following: Python, Java, Go, PowerShell, JavaScript, Terraform, Ansible, Helm, Chef, Cloud Formation
- Proficiency in CI CD tooling such as Jenkins, Github, Bitbucket, ArgoCD, Artifactory, Bitbucket, Azure DevOps in a large-scale environment Experience in OO design and design patterns.
- Proficiency in observability tooling such as Grafana, Prometheus, Splunk, Datadog, New Relic, Dynatrace, Sentry, etc. in a large-scale environment
- Experience with Docker and Kubernetes.
- Experience with integrating third-party Web Services.
- 5+ years of leading and building Site Reliability teams.
- Strong work ethic, self-starter, ability to work in a fast-paced, team-oriented environment, and comfortable working with a global team.
- Exceptional analytical and problem-solving skills, along with strong oral and written communication abilities.
- Proven proficiency in troubleshooting, root-cause analysis, application design, and implementing major components for large projects.
- Experience in creating tools to automate production support activities.
- •Knowledge of monitoring tools and observability practices