Intelligent Ops and Observability Engineer
McCormick View all jobs
- Gurgaon, Haryana
- Permanent
- Full-time
- Competitive compensation
- Career growth opportunities
- Flexibility and Support for Diverse Life Stages and Choices
- Wellbeing programs including Physical and Mental wellness
- Operational Oversight and Service Restoration
- Observability and Operational Tooling
- Continuous Improvement and Operational Maturity
- Incident Response Process Contribution
- Bachelor’s degree required in Information Technology, Computer Science, Engineering, Information Systems, or a related technical discipline.
- Advanced degree in a related field preferred.
- Relevant certifications in IT service management, cloud, reliability engineering, or operational disciplines are preferred, such as ITIL, SRE, or major platform certifications.
- 10+ yrs Experience in technology operations, production support, service availability, observability, monitoring, or service management roles in a complex, global enterprise environment.
- Experience supporting incident response, major incident coordination, service restoration, or operational command center activities.
- Experience implementing, administering, or supporting tools related to APM, infrastructure monitoring, event management, observability, or operational reporting.
- Experience working in environments with managed service providers or third-party support partners.
- Experience working within ITIL, service management, or SIAM aligned operating models is preferred.
- Strong understanding of incident management, event management, service monitoring, escalation practices, and operational governance.
- Strong technical knowledge of operational tooling, including monitoring, event correlation, dashboarding, alerting, and reporting.
- Ability to interpret operational signals, distinguish meaningful issues from noise, and improve monitoring effectiveness.
- Strong analytical and problem-solving skills, with the ability to identify trends, risks, and practical improvements.
- Strong communication skills, especially during incidents or time sensitive operational situations.
- Ability to work across organizational boundaries and influence adherence without direct ownership authority.
- Strong organizational skills, attention to detail, and follow-through in a fast-moving operational environment.
- Brings strong operational judgment and remains calm under pressure.
- Combines technical depth in operational tooling with practical understanding of incident response execution.
- Drives accountability and follow-through across teams and providers.
- Uses data and operational insight to improve resilience and service quality over time.
- Acts as a credible partner in shaping better operational practices while reinforcing consistent execution.