Senior Software Engineer - Cloud Infrastructure Reliability
Under Armour India
- Bangalore, Karnataka
- Permanent
- Full-time
- Engineer and improve reliable, scalable, andhigh performingsystems supporting critical business services.
- Build automation across deployments, monitoring, alerting, and operational workflows to reduce toil and improve resiliency.
- Partner with engineering and platform teams to apply SRE principles, including SLIs, SLOs, error budgets, and automated remediation.
- Enhance CI/CD pipelines and software delivery processes to improve reliability and efficiency.
- Develop observability solutions across metrics, logs, and distributed tracing to improve system visibility.
- Participate in incident response, root cause analysis, and corrective actions to prevent recurrence.
- Support capacity planning, performance tuning, and scaling strategies forcloud nativeand distributed systems.
- MaintainInfrastructure as Code, cloud configurations, and operational documentation, including runbooks and standards.
- Collaboratewithteams toidentifyreliability risks and drive continuous improvement.
- Bachelor's degree in computer science, Engineering, or a related field with typically 3-5 years of experience in Site Reliability Engineering, DevOps, Platform Engineering, ora relateddiscipline or Master's degree with typically 3 years of relevant experience or typically 9 years of relevant work experience without a degree.
- Proficiencyinone or more programmingor scripting languages such as Python, Go, JavaScript, or Bash.
- Solid working knowledge of Linux/Unix basedsystems.
- Experience building or supporting CI/CD pipelines using tools such as GitHub Actions, GitLab CI, or Jenkins.
- Familiarity withInfrastructure as Codepractices and tools (e.g., Terraform, CloudFormation).
- Experience with containerization and orchestration technologies, including Docker and Kubernetes.
- Understanding ofnetworking fundamentals, distributed systems, and system design principles.
- Handson experience with modern observability stacks such as Prometheus, Grafana, ELK/EFK, or Datadog. Experience contributing to SLI/SLO frameworks and applying error budgets to guide reliability decisions.
- Exposure toGitOpsworkflows and tooling such as Argo CD or Flux.
- Working knowledge of service mesh architectures (e.g., Istio,Linkerd).
- Familiarity with performance and load testing tools and techniques.
- Experience with asynchronous and distributed systems, including message queues,event drivenarchitectures, or distributed data platforms.
- Cloud or DevOps certifications (e.g., AWS Associate or Specialty, GCP Professional, Kubernetes CKA/CKS) are a plus.
- Experienceoperatingin largescale enterprise environments and collaborating with globally distributed teams. Experience usingAI assisteddevelopment tools (such as Copilot, Cursor, or similar) to improve code quality, accelerate development, and enhance documentation.
- Understanding offoundational AI/ML concepts, with exposure tocloud nativeAI services and/or the ability toleverageAI tools to automate cloud and operational tasks.
- Location: This individual must reside within commuting distance from ouroffice.
- Work Schedule:This role follows a hybrid work schedule, requiring 4 days in-office per week