
Senior Site Reliability Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- Design, build, advocate for and support the common tools and delivery platform used by Flexera developers.
- Improve developer experience and operational excellence.
- Foster collaboration and knowledge sharing across Flexera.
- Select and rollout supported defaults and standards for CI/CD tooling, Observability, Security and Runtime Environment.
- Work with teams across several continents, build relationships with our engineers by listening and understanding their needs and balancing this with the needs of our business.
- Research new tools and patterns and continuously measure and evolve our ways of doing things.
- Cloud Cost Optimization uses a combination of strategies, techniques, best practices and tools to help manage/reduce cloud costs.
- Developer/DevOps/SRE/Platform experience and a strong interest in software delivery and ongoing operation.
- Owned and led the architecting and rolling out of automation, tools, technologies, patterns and guardrails across an organization.
- Experience working in a globally distributed team.
- Deep & extensive public cloud (preferably Azure) knowledge & experience.
- Deep knowledge of containers (Docker) orchestration (Kubernetes).
- Knowledge of tools and patterns around CI/CD (familiar with GitHub Actions, Travis CI, Circle CI or similar).
- Observability knowledge; Logs, Tracing, Metrics and experience in a few of Elastic Stack, XRay, Jaeger, Zipkin, Prometheus, Honeycomb or LightStep. Enterprise observability tools such as SumoLogic, NewRelic, DataDog etc.
- Cloud cost optimization; Using automation to keep Cloud cost under control and within budget. Enabling individual Engineering teams with cloud cost optimization.
- Knowledge of operations, including incident management, immutable infrastructure as code (esp. Terraform or CloudFormation), and problem-solving.
- Produced robust well-tested code preferably in Golang; however, we will also consider Python, JavaScript, Ruby, Java or C# if you are happy to learn Go.
- Excellent communication skills, including experience in writing good documentation and running workshops.
- Vendor selection and management experience.
- Agile software delivery methodologies
- Experience managing cloud-based services e.g. AWS, Azure at scale
- Experience with DevOps
- Experience with docker Containers, Kubernetes, EKS, ECS
- Infrastructure as code e.g. Terraform, CloudFormation
- CI/CD pipelines using Jenkins, travisCI, teamcity, pipeline as code
- Automation / Configuration Management at scale e.g. Puppet, Chef, Ansible, Salt, Packer etc.
- Service mesh such as ishtio, Consul or similar
- Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++
- Experience with IaaS and Serverless services from a cloud provider
- A strong understanding in TCP/IP, DNS and experience designing networks
- Linux & Windows system administration experience
- Experience implementing fault detection, and automating fixes
- Experience designing scalable services
- Experience designing distributed, fault-tolerant systems
- A good understanding of SQL, No-SQL databases
- A solid understanding of data structures and algorithms
- A positive attitude and willingness to learn
- Strong conflict resolution competence
- Excellent written and verbal communication skills
- Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why
- Bachelor's or higher degree in Computer Science, Information Technology, or a related field.
- At least 4 years of hands-on job experience managing services in a public cloud
- At least ` years of experience working as a senior member of a centralized Cloud enablement / Platform or a similar team
- Python / Golang / Java / C# / C / C++ / Bash experience
- Big Data, Machine Learning, AI (DataBricks, Snowflake etc.) Platforms
- Experience with Monitoring systems such as New Relic, ELK, Prometheus, Datadog, X-ray etc.
- Security background
- SQL, NOSQL and Graph databases
- Relevant Certification e.g. AWS, GCP, Azure
- Experience of Disciplined Agile Delivery (DAD)