Site Reliability Engineer

Bangalore, Karnataka
Permanent
Full-time

27 days ago
Apply easily

We are a consulting company with a bunch of technology-interested and happy people!We love technology, we love design and we love quality. Our diversity makes us unique and creates an inclusive and welcoming workplace where each individual is highly valued.With us, each individual is her/himself and respects others for who they are and we believe that when a fantastic mix of people gather and share their knowledge, experiences and ideas, we can help our customers on a completely different level.We are looking for you who want to grow with us!With us, you have great opportunities to take real steps in your career and the opportunity to take great responsibility.We are seeking a skilled and forward-thinking Site Reliability Engineer to join our Emerging Tech team.Company : Aqilea India(Client : H&M India)Employment Type: Full TimeJob Title: Site Reliability EngineerExperience Range: 3 to 5 Years
Location: Bangalore (Hybrid)
Job Type: Full-TimeJob SummaryWe are seeking a proactive and skilled Site Reliability Engineer (SRE) with 3 to 5 years of experience to join our dynamic engineering team. The ideal candidate will be responsible for building and maintaining robust infrastructure, ensuring high availability and performance of systems, and enhancing operational efficiency through automation and DevOps practices. You will collaborate with cross-functional teams to identify reliability risks and drive stability across the platform.Key ResponsibilitiesSoftware Development & Automation

Design, develop, test, and maintain high-quality software frameworks and automation tools to reduce manual intervention.
Collaborate with development and QA teams to integrate reliability into application lifecycles.
Incident & Problem Management

Lead incident response and troubleshooting efforts to resolve production issues.
Participate in on-call rotations, and create and maintain runbooks for effective incident response.
Proactively identify and resolve performance and stability issues.

Infrastructure & Cloud Operations

Design, manage, and optimize cloud or on-premises infrastructure to ensure scalability and reliability.
Hands-on experience with Microsoft Azure or Google Cloud Platform (GCP) is required.

CI/CD & DevOps

Build and maintain CI/CD pipelines using GitHub Actions.
Drive automation initiatives and implement DevOps best practices across the engineering lifecycle.

Observability & Monitoring

Set up and maintain observability solutions for applications and infrastructure using tools such as Splunk, Grafana, Prometheus, etc.
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

Security & Compliance

Work with security teams to ensure compliance with internal and industry-wide standards.
Conduct risk assessments and implement security controls.

Continuous Improvement & Collaboration

Identify areas for improvement in infrastructure and operations.
Mentor junior engineers, participate in code reviews, and encourage a culture of knowledge sharing.
Document systems, processes, and troubleshooting steps comprehensively.

Required Skills

3 to 5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
Proficiency in at least one scripting/programming language, preferably Python.
Solid understanding of Agile methodologies and SDLC practices.
Strong experience with CI/CD using GitHub Actions or similar tools.
Good hands-on experience with observability tools such as Splunk, Grafana, Prometheus, etc.
Familiarity with version control systems, especially Git.
Strong troubleshooting and problem-solving skills.
Excellent communication and collaboration skills.

Notice Period: Immediate to 15 Days OnlyForm of employment: Full-time until further notice, we apply 6 months probationary employment.We interview candidates on an ongoing basis, do not wait to submit your application.

Aqilea