IT Reliability & Incident Lead
Costco View all jobs
- Hyderabad, Telangana
- Permanent
- Full-time
- Defines, captures, and validates IT requirements and other artifacts, ensuring appropriate stakeholders are involved.
- Develops key team deliverables and dashboards.
- Documents and manages risks, issues, assumptions, and constraints impacting operational support efforts.
- Develops and coordinates legal/compliance, operational controls, and associated metrics to measure success.
- Develops and implements standards, processes, and procedures for new technology solutions; ensures newer solutions will not negatively impact current service commitments.
- Manages the incident and problem management process and team members involved in resolving the incident and problem.
- Responds to a reported incident and initiates the incident management process.
- Remediates the deviation of the current incident management process.
- Acts as the point of contact for all major incidents.
- Analyzes internal IT customer needs and priorities while initiating operational support and delivery efforts.
- Participates in periodic audits for solutions, planning, and delivery functions.
- Ensures incidents that are not immediately resolved, are appropriately escalated according to defined service level agreements (SLAs).
- Drives to key performance indicators (KPIs); improving metrics and services to our members and stakeholders.
- Identifies and reports incident and problem trends and progress.
- Ensures timely, clear communication regarding high priority issues with the appropriate stakeholders.
- Works closely with the incident owner to ensure incident escalation processes are in-line with the overall incident management processes.
- Manages and tracks supplier performance; leverages approved contractual terms for accountability.
- Develops and conducts presentations as needed.
- Represents the first stage of escalation for incidents.
- Monitors and analyzes the incidents reported to ensure that SLAs are respected, RCAs are prepared, and preventive actions are in place.
- Identifies, initiates, schedules, and conducts incident reviews.
- Ensures users and leadership are informed about the incidents status at regular intervals.
- Ensures the closure of all resolved and end users confirmed the incident records.
- Establishes continuous process performance, activities, roles and responsibilities, and procedures are reviewed and enhanced wherever applicable.
- Facilitates collaboration with problem management to ensure successful transition of incidents into problem investigations.
- Ensures RCA is prepared and schedules RCA reviews with the teams worked on the incident.
- Records all details and timeline of key elements during incident management bridge calls.
- Undertakes continual service improvement activities.
- Creates, maintains, and reports SLA and KPIs.
- Identifies and reports incident trends and progress.
- Ensures the team and other stakeholders in the call understand the business impact.
- Collaborates with appropriate business and IT stakeholders to determine root cause and problem identification, and as appropriate, enhancement identification for future development work.
- Supports eCommerce releases for both pre- and post-release activities.
- Regular and reliable workplace attendance at your assigned location.
- Excellent verbal and written communication skills. Ability to create accurate, concise correspondence. Ability to develop and conduct presentations.
- Strong proven interpersonal skills and able to work well with people at all levels.
- Ability to conduct monthly meetings with stakeholders to drive increased availability in identified trends.
- Detail-oriented and strong problem-solving skills, with the ability to analyze a situation for potential future problems.
- Organized and thorough, with a dedication for follow through.
- Intellectually inquisitive nature with the ability to be open minded to varying opinions.
- Responsible, conscientious, and possess a passion for excellence - positive can do attitude.
- Innovative, creative, and extremely responsive in respect to service quality and ways in which it can be improved.
- Highly responsive and available to support business needs, flexing as needed.
- Good understanding of corporate IT policies, procedures, and standards.
- Incident, Problem, Change, and Knowledge Management practices.
- IT strategies, customers, and services provided.
- Costcos core business environment related to eCommerce, Merchandising, Warehouse Operations, and company philosophies.
- Service analysis and other tools: CARTS+, Google Apps, Smartsheets.
- Available for on-call coverage 24X7, to support off-hours work as required, including weekends and holidays, and fluctuates with staffing.
- Develops and presents a business case document and/or presentation to management. Must be familiar with a broad set of technologies and solutions currently in use at Costco.
- Demonstrates ability to work independently and with limited supervision.
- Strong abstraction skills - ability to derive general rules and concepts from the usage and classification of specific examples, literal signifiers, and first principles.
- Strong communication skills able to speak to large audiences and to leaders at all levels of the organization. Able to adapt vocabulary and style for each situation; able to represent complex ideas with effective documents and visuals and to adapt presentations to the expectations and background of the audience.
- Extremely responsive, with a strong sense of urgency.
- Familiarity with ServiceNow.
- Experience with statistical analysis and reporting.
- Familiarity with multiple Costco business areas from an IT perspective.
- Knowledge of the Service Desk or Call Center business processes.
- IT Infrastructure Library (ITIL) V3 Foundation certification.
- Prior experience with the IT Service Management software.
- Proficient in Google Workspace applications, including Sheets, Docs, Slides, and Gmail.
- Successful internal candidates will have spent one year or more on their current team.