
Engineer, Reliability Engineering - CoE
- Chennai, Tamil Nadu
- Permanent
- Full-time
- Working with Platform, Production engineering and application SREs to manage and resolve complex production issues.
- Improving Platform performance, availability, and reliability.
- Implement observability solutions for proactive issue identification and optimization.
- Managing processes for incidents, changes, releases, and deployments.
- Developing automation tools (IaC, alerts as code, dashboard as code) to enhance efficiency.
- Conducting POCs to implement tools to improve performance, scaling, reliability and availability.
- Analysing trends in incidents, problems, and alerts to drive operational improvements.
- Documenting SOPs, critical systems information, and best practices for current and future use.
- Providing technical guidance to necessary stakeholders.
- Staying updated on advancements in Software Engineering with extended focus on Reliability Engineering.
- Programming Languages
- Linux, VM, Containers and Kubernetes
- AWS and Azure
- Database
- Observability
- Proficient in one or more of the following languages (Java, Python and Go) with full SDLC experience.
- Expertise in Reliability Engineering principles: Anomaly detection, root cause analysis, and predictive maintenance.
- Knowledge in defining SLIs, SLOs, and error budgets.
- Hands-on experience with Kubernetes, Containers, Cloud, and Database.
- Strong knowledge in Observability Tools and Open Telemetry.
- Familiarity with DevOps methodologies, tools, and automating (e.g. Azure Pipelines, Terraform, Helm etc.,)
- Experience with public/private cloud platforms including AWS and Azure.
- Experience in leading an operations team in application Production Environments.
- Experience in Messaging Platforms (e.g. MQ/Solace/Kafka), API Gateways and Service Mesh.
- Knowledge in Generative AI and Responsible AI
- Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
- Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
- Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term
- Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
- Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
- Flexible working options based around home and office locations, with flexible working patterns.
- Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
- A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
- Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.