Site Reliability Engineer

Pune, Maharashtra
Permanent
Full-time

16 days ago

Business DivisionsGroup FunctionsYour roleWe are seeking a highly experienced Site Reliability Engineer (SRE) to join our technology team in a mission-critical financial environment. This role is ideal for someone who has a proven track record of building and operating reliable, scalable systems in regulated industries such as banking or financial services.As a Senior SRE, you will be responsible for ensuring the availability, performance, and resilience of our platforms. You'll collaborate with engineering, infrastructure, and security teams to build systems that are secure, observable, and automated, while championing a culture of operational excellence.Key Responsibilities

Design, implement, and maintain highly available and fault-tolerant systems in a financial environment.
Define and monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure system reliability and customer satisfaction.
Passionately identify, measure, and reduce TOIL, with a proactive approach to eliminating repetitive manual tasks through automation.
Lead incident response, post-mortems, and root cause analysis for production issues.
Collaborate with development teams to embed reliability into the software development lifecycle.
Integrate with observability platforms (e.g., Prometheus, Grafana, ELK, Datadog) to ensure end-to-end visibility of systems and services.

What We're Looking ForWe're seeking a seasoned SRE with a strong foundation in reliability engineering and a passion for building robust, scalable systems. The ideal candidate will bring a mix of technical depth, operational maturity, and a proactive mindset.Function CategoryInformation Technology (IT)Join usAt UBS, we know that it's our people, with their diverse skills, experiences and backgrounds, who drive our ongoing success. We're dedicated to our craft and passionate about putting our people first, with new challenges, a supportive team, opportunities to grow and flexible working options when possible. Our inclusive culture brings out the best in our employees, wherever they are on their career journey. And we use artificial intelligence (AI) to work smarter and more efficiently. We also recognize that great work is never done alone. That's why collaboration is at the heart of everything we do. Because together, we're more than ourselves.We're committed to disability inclusion and if you need reasonable accommodation/adjustments throughout our recruitment process, you can alwaysYour teamYou'll be joining the Operating Systems and Middleware (OSM) crew, a globally distributed team that supports critical infrastructure across time zones using a follow-the-sun support model. Based in Pune / Hyderabad, this role is embedded in a collaborative Agile environment where engineers are empowered to take ownership, innovate, and continuously improve.
We value transparency, shared responsibility, and continuous learning. You'll work alongside talented engineers who are passionate about building reliable systems and solving complex problems.Your expertise

Proven expertise in Site Reliability Engineering, with a background in software engineering, infrastructure, or operations.
Hands-on experience with cloud platforms (e.g. Azure), operating systems (e.g. Linux RHEL7+ ), and networking fundamentals.
Solid understanding of networking and storage technologies (e.g. NFS, SAN, NAS).
Strong working knowledge of authentication and naming services (e.g. DNS, LDAP, Kerberos, Centrify).
Proficiency in scripting and automation (e.g., Python, Go, Bash).
Practical experience with infrastructure as code tools (e.g., Terraform, Ansible).
Demonstrated ability to define and manage SLIs, SLOs, SLAs, and to systematically reduce TOIL.
Ability to integrate with observability platforms to ensure system visibility.
A metrics- and automation-driven mindset, with a strong focus on measurable reliability.
Calm under pressure, especially during incidents and outages, with a structured approach to incident response and post-mortems.
Strong collaboration and communication skills, with the ability to work across engineering and business teams.
A proactive, ownership-driven attitude, always seeking opportunities to improve systems and processes.

✨ Desirable Additions

Experience with chaos engineering, resilience testing, or disaster recovery planning.
Familiarity with financial transaction systems, real-time data pipelines, or core banking platforms.
An understanding of CI/CD pipelines, containerization (AKS), and orchestration (Kubernetes).

About usUBS is the world's largest and the only truly global wealth manager. We operate through four business divisions: Global Wealth Management, Personal & Corporate Banking, Asset Management and the Investment Bank. Our global reach and the breadth of our expertise set us apart from our competitors.We have a presence in all major financial centers in more than 50 countries.

UBS

Apply Now