
AML Software - Site Reliability Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- Experience: 3+ years in software site reliability engineering or software development roles.
- Programming: Proficient in at least one of Python, Golang, or Java.
- Data Structures & Algorithms: Strong foundation and application experience.
- Distributed Systems: Solid understanding and hands-on experience managing at least one distributed system (e.g., Kafka, Cassandra, Hadoop, Redis, or similar).
- Kubernetes: Expertise in Kubernetes ecosystem (deployment, configuration, monitoring, and operation).
- Cloud Platforms: Hands-on experience with at least one major cloud platform (AWS, Azure, or Google Cloud Platform).
- KEY RESPONSIBILITIES
- Design, develop, and automate: Build tools, frameworks and solutions to improve reliability, scalability, and efficiency across systems.
- Monitor and maintain: Implement advanced monitoring and alerting for cloud and containerized workloads.
- Troubleshoot and solve: Respond to and resolve complex production incidents, and perform root cause analysis.
- Collaborate: Work closely with development and operations teams to integrate reliability best practices throughout the software lifecycle.
- Optimize: Proactively recommend improvements in architecture, deployment, and operations for distributed systems.
- Problem Solving: Demonstrated ability to independently troubleshoot and resolve complex technical issues.
- Creative Thinking: A track record of proposing and implementing innovative solutions to technical challenges.