Lead Site Reliability Engineering - Cloud, Observability, Docker, IaC(14+ Yrs)

Bangalore, Karnataka
Permanent
Full-time

17 days ago

Company DescriptionVisa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose - to uplift everyone, everywhere by being the best way to pay and be paid.Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa.Job DescriptionOverview:Join Visa's Technology Organization, a dynamic community of problem solvers and innovators dedicated to redefining the future of commerce. We manage one of the world's most advanced processing networks, handling over 65,000 secure transactions per second across 80 million merchants, 15,000 financial institutions, and billions of individuals. As a Lead Site Reliability Engineer (SRE), you will lead efforts to ensure stability, security, and efficiency of our applications and systems, driving continuous improvement and innovation.Key Responsibilities:

Security and Safety: Ensure the security and safety of application services and platforms. Lead efforts to enhance operational practices focusing on efficiency, security, and excellence.
Zero Downtime: Maintain zero downtime by swiftly addressing any issues to ensure the environment is always operational. Conduct rapid root cause analysis and implement remediation in production environments after thorough testing.
Environment Management: Oversee all activities within the environment, including deploying new code.
Team Leadership: Inspire and lead the team to deliver strategic and innovative approaches that drive Visa's growth. Provide mentorship and foster a culture of collaboration and continuous improvement.
Stakeholder Partnerships: Build strong partnerships with key stakeholders, including product management, engineering, design, and operations.
Strategic Impact: Impact strategic decisions at all levels by interacting with other leaders on complex issues and applying strong judgment and analysis.
Effective Communication: Communicate effectively with both technical and business partners to create frameworks for discussing complex topics.
Automation and AI: Regularly analyze the environment and promote the adoption of automation and Generative AI to stay competitive.
Cloud Infrastructure: Lead cloud infrastructure adoption and migration, ensuring a seamless transition with minimal downtime.
Problem Resolution: Run problem bridges by collaborating with different functional and technical teams, escalating issues as needed for timely resolution.
Information Sharing: Proactively share important context and information with relevant stakeholders.
Operational Excellence: Spearhead the enhancement of operational practices focusing on efficiency, security, and excellence.

This is a hybrid position. Expectations of days in office will be confirmed by your Hiring Manager.QualificationsBasic Qualifications:

14 or more years of work experience with a Bachelor's Degree or at least 12+ years of work experience with an Advanced Degree (e.g. Masters/ MBA/JD/MD) or at least 10+ years of work experience with a PhD

Education and Experience:

14+ years of work experience in Site Reliability Engineering.
10+ years of experience with JAVA, J2EE applications, and a deep understanding of Web Services technologies: REST & SOAP.
5+ years of experience managing applications on Containers (Docker) and Cloud (AWS, GCP, Azure).

Technical Skills:

Strong understanding of relational databases and middleware stacks (IIS, .NET, Java, TcServer, JBoss, Containers).
Knowledge of Generative AI capabilities and use cases.
Advanced level programming and or scripting in 3 or more of the following: Python, Java, Go, PowerShell, JavaScript, Terraform, Ansible, Helm, Chef, Cloud Formation
Proficiency in CI CD tooling such as Jenkins, Github, Bitbucket, ArgoCD, Artifactory, Bitbucket, Azure DevOps in a large-scale environment Experience in OO design and design patterns.
Proficiency in observability tooling such as Grafana, Prometheus, Splunk, Datadog, New Relic, Dynatrace, Sentry, etc. in a large-scale environment
Experience with Docker and Kubernetes.
Experience with integrating third-party Web Services.

Leadership and Communication:

5+ years of leading and building Site Reliability teams.
Strong work ethic, self-starter, ability to work in a fast-paced, team-oriented environment, and comfortable working with a global team.
Exceptional analytical and problem-solving skills, along with strong oral and written communication abilities.
Proven proficiency in troubleshooting, root-cause analysis, application design, and implementing major components for large projects.
Experience in creating tools to automate production support activities.
•Knowledge of monitoring tools and observability practices

Additional InformationVisa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Visa

Apply Now