
DET-TT-Resilience - Reliability Consultant -GDSN04
- Bangalore, Karnataka
- Permanent
- Full-time
- Resilience & Reliability are fundamental to ensure modern architectures are available, performant and fault aware.
- A Resilience & Reliability Consultant will help in designing the roadmap to achieve Resilience for Enterprise IT
- An Reliability Consultant will be a technical advisor to strategize the transformation roadmap to modernize IT delivery with SRE principles, frameworks and levers - in a nutshell, setting up SRE into Enterprise IT
- They will also implement the Reliability / SRE Roadmap and govern SRE solutions across the enterprise / line-of-businesses
- They will be able to assess Resilience & Reliability Maturity of an IT Organization and provide strategy and roadmap to achieve higher maturity levels
- Defining SLA/SLO/SLI for a product / service
- Engineering in resilient design and implementation practices into solutions as they go through the product life cycle
- Designing & implementing Observability Solutions to track, report, and measure SLA adherence
- Engineering out manual effort (Toil) through the development of automated processes and services (e.g., Automated Management of Systems, CI/CD improvements)
- Optimize Cost of IT Infra & Operations - FinOps
- ·Review, Analysis and Improvement of deployed products with respect to product architecture and inter-service dependencies - Simplification
- 15+ years of experience in software product engineering principles, processes and systems
- Hands-on experience in Java / J2EE, one of web server (Apache Tomcat or IBM HTTP Server), one of the application servers (Tomcat/WebSphere), and any major RDBMS like Oracle
- Hands-on experience in at least one CI-CD (Azure DevOps, GitLab CI/CD, Jenkins) and IaC tools (Terraform, AWS CloudFormation, Ansible etc.)
- Experience in at least one cloud technology (AWS/Azure/GCP etc. and Docker, Pivotal, Kubernetes, OpenShift etc.) and its reliability tools (Azure AppInsight, CloudWatch, Azure Monitor etc.)
- Experience in Observability - APM tools (Dynatrace, AppDynamics etc.), metrics / log consolidation (Splunk) and ELK Stack
- Defining NFRs and SLA/SLO/SLI agreement for a product / platform / services
- Knowledge on queuing models used, thread pools, request servicing processes etc.
- Experience in Linux (RHEL) operating system performance monitoring parameters and their interpretation, commands used for monitoring
- Experience in Web Services, SOA, ESB (DataPower), RESTFul
- Knowledge of application design patterns, J2EE application architectures, Microservices, Spring boot & Cloud native architectures
- Proficiency in Java runtimes, Core Java, Garbage collection, JVM parameters tuning
- Experience in performance tuning on Application Servers (Tomcat/WAS)
- Experience in trouble shooting Performance / Scalability / Availability issues
- Thread dump, heap dump generation & analysis
- Knowledge on Query tuning and database architecture
- Knowledge at least one automation scripting language like Python
- Mastery of collaborative software development using Git, Jira, Confluence etc.
- AI/ML & Data Analytics knowledge and experience is a desirable