
Lead Cloud Operations Specialist
- Noida, Uttar Pradesh
- Permanent
- Full-time
Incident Management Team, being part of the IT Service Management (ITSM), works cross-functionally with Global Services, Engineering, Cloud Hosting and Management on the effective delivery of UKG's Cloud SaaS offerings.About The Role:
The Lead Cloud Operations Specialist provides day-day support for all the ongoing incidents and aligns with ITSM's strategic direction. Collaborating directly with the leadership team of ITSM, this position demands a high level of adaptability and quick thinking to achieve success.Responsibilities Include:
- Defining war room procedures, establishing communication channels, and ensuring all necessary resources (tools, data dashboards) are readily available for incident response
- Leading discussions during war room meetings, keeping the team focused, and ensuring everyone is aligned on priorities
- Capturing key decisions, actions taken, and lessons learned during the incident for future reference
- Take charge of the war room, leading the response team (engineers, support specialists) to diagnose, troubleshoot, and resolve issues impacting the SaaS product(s)
- Gathering and analyzing real-time data to understand the scope and impact of the incident
- Prioritizing actions, delegating tasks, and making critical decisions to resolve the incident efficiently
- Keeping stakeholders (internal and external) informed about the situation, progress, and estimated resolution time
- Enable the swift resolution of incidents, minimize downtime, and implement preventive measures to mitigate future issues
- Drive and facilitate resolution via Teams as an incident commander with excellent executive presence, communications, collaboration skills
- Collaborate and align with Leaders across Engineering, Sales, Corporate Comms, and Legal to accelerate incident resolution, remove blockers, and provide a high level of service to our customers
- Actively engage with cross functional teams to ensure Root Cause Analysis (RCAs) and Post Incident Review (PIRs) are complete, review remediation plans to identify areas for improvement, and socialize findings/insights
- Thrive under pressure with the ability to stay calm, handle conflict, and partner with other UKG teams to drive resolution
- Be able to coach other individual contributors in their professional development and serve as a role model
- Develop and monitor key metrics to understand incident trends, as well as operational resilience and readiness
- Develop and present business reviews on required cadences to executive leadership
- 5+ years of experience supporting a global 24x7x365 incident management team in an enterprise SaaS environment
- 5+ years of technical experience (Support, Services, IT, Engineering) at a tech company with exposure working with complex customer base
- 3+ years of working in a Cloud (AWS or GCP or Azure; GCP preferred) environment
- 3+ years of working in a scrum/agile/SRE environment (hands-on experience will be a PLUS)
- 3+ years of working in on-call support rotation model and PagerDuty experience
- 3+ years of working experience with Teams (integrations with PagerDuty and Service Now), Slack, Confluence and Share Point
- Subject matter expertise in incident management frameworks; awareness of industry standards and best practices
- Excellent problem-solving and decision-making skills to identify root causes and implement corrective actions
- Clear and concise communication skills at all levels (written and verbal)
- Demonstrated ability to collaborate, build credibility, and establish good working relationships with leaders across UKG to ensure solid partnership and alignment
- Willingness/Ability to work in shift-based rotation model in a larger enterprise incident management team
- Hands-on experience working with the following tools: JIRA, ServiceNow, Salesforce, and Aha and their integrations (e.g. JIRA to PD integration/JIRA to Slack Integration)
- Experience working in an Agile technical environment
- Experience working in a Cloud environment