Senior Systems Engineer II - SRE
Marriott Tech Accelerator
- Hyderabad, Telangana
- Permanent
- Full-time
- Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management.
- Oversee the management of high-severity incidents, driving quick resolution and post-incident analysis to identify root causes and prevent recurrence.
- Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage.
- Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders
- Assess application architectures to identify key monitoring points
- Identify Key Performance Indicators, apply monitoring, and report out on compliance.
- Gather information to develop reporting metrics and KPIs
- Ensure that all applications adhere to appropriate monitoring standards based on their technology/business process
- Determine forums and cadence to provide regular monitoring updates
- Collaborates with Enterprise Application and Architecture and Infrastructure teams to continuously improve processes and procedures.
- Liaises with vendors and Service Providers to select services and tools that best meet company goals
- Functions as a strategic senior technical expert within the department.
- Develops specific goals and plans to prioritize, organize, and accomplish work.
- Champions leaders vision for product and service delivery.
- Makes and executes the necessary decisions to keep moving forward toward achievement of goals.
- Determines priorities, schedules, plans and necessary resources to promote completion of any projects on schedule.
- Generates and provides accurate and timely results in the form of reports, presentations, etc.
- Plans, develops, implements, and evaluates the quality of operations.
- Understands and meets the needs of key stakeholders.
- Communicates concepts in a clear and persuasive manner that is easy to understand.
- Demonstrates an understanding of business priorities.
- Supports achievement of performance goals, budget goals, team goals, etc.
- Provides technical expertise and technical leadership within own and other teams.
- Provides recommendations to improve the effectiveness of processes and programs.
- Demonstrates advanced knowledge of job-relevant issues, products, systems, and processes.
- Demonstrates advanced knowledge of function-specific procedures.
- Applies knowledge/judgment to achieve business goals.
- Foresees, identifies and resolves problems.
- Keeps up-to-date technically and applies new knowledge to job.
- Performs other reasonable duties as required for this position.
- 6-8 years experience in information technology process and / or technical project management including:
- 4+ years of experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems, with 2+ years of experience on public cloud, preferably AWS.
- 4+ years of project lead or management experience, preferably in SRE areas
- Proven automation and programming experience in one or more of the following languages: Java, Python, Go, Perl, Bash
- Deep understanding of SRE practices such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning
- Strong working knowledge of modern, continuous development techniques and pipelines (Agile, Kanban, Jira, CI/CD, Jenkins, Git, Artifactory)
- Production level expertise with containerization orchestration engines such as Kubernetes
- Experience with deploying, monitoring, and troubleshooting large-scale, distributed applications in cloud environments such as AWS
- Familiarity with security frameworks such as ISO27001, SOCII, PCI-DSS, and/or HIPAA
- Experience working with SaaS, IaaS, and PaaS offerings
- Ability to work with global teams located in US and India
- 6+ years experience in a technical discipline role with experience in planning, implementing and evaluating processes, systems and/or initiatives
- Broad technical acumen across multiple disciplines applications with a solid understanding of current technologies
- Experience applying measurement processes/methods for assessing program outputs and outcomes or progress toward goals and objectives.
- Extremely high level of analytical ability with complex problems
- Ability to work across organizational boundaries, to help lead and influence change
- Ability to command the process across all levels to ensure customer focus; including being assertive and self-starting
- Demonstrated leadership experience in influence and garnering alignment from external organizations
- Ability to align change management strategies with project
- Skilled in conceptualizing creative solutions, documenting them, and presenting/selling them to senior management
- Very high level of interpersonal skills to work effectively with others, motivate employees, and elicit work output in a team environment
- Undergraduate degree in Computer Science or related technical field or equivalent experience/certification