Senior Systems Engineer II, SRE
Marriott Tech Accelerator
- Hyderabad, Telangana
- Permanent
- Full-time
- Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management.
- Oversee the management of high-severity incidents, driving quick resolution and post incident analysis to identify root causes and prevent recurrence.
- Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage.
- Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders
- Assess application architectures to identify key monitoring points
- Identify Key Performance Indicators, apply monitoring, and report out on compliance.
- Gather information to develop reporting metrics and KPIs
- Ensure that all applications adhere to appropriate monitoring standards based on their technology / business process
- Determine forums and cadence to provide regular monitoring updates
- Collaborates with Enterprise Application and Architecture and Infrastructure teams to continuously improve processes and procedures.
- Liaises with vendors and Service Providers to select services and tools that best meet company goals
- Functions as a strategic senior technical expert within the department.
- Develops specific goals and plans to prioritize, organize, and accomplish work.
- Champions leaders vision for product and service delivery.
- Makes and executes the necessary decisions to keep moving forward toward achievement of goals.
- Determines priorities, schedules, plans and necessary resources to promote completion of any projects on schedule
- Generates and provides accurate and timely results in the form of reports, presentations, etc.
- Plans, develops, implements, and evaluates the quality of operations.
- Understands and meets the needs of key stakeholders.
- Communicates concepts in a clear and persuasive manner that is easy to understand.
- Demonstrates an understanding of business priorities.
- Supports achievement of performance goals, budget goals, team goals, etc.
- Provides technical expertise and technical leadership within own and other teams.
- Provides recommendations to improve the effectiveness of processes and programs.
- Demonstrates advanced knowledge of job-relevant issues, products, systems, and processes.
- Demonstrates advanced knowledge of function-specific procedures.
- Applies knowledge/judgment to achieve business goals.
- Foresees, identifies and resolves problems.
- Keeps up-to-date technically and applies new knowledge to job.
- Performs other reasonable duties as required for this position
- 6+ years experience in information technology process and / or technical project management including:
- 2+ years of experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems, with 3+ years of experience on public cloud, preferably AWS, EKS, Linux, Ansible, Harness, Dynatrace
- Expertise in Web Servers, Apache, Tomcat, IIS, Kafka
- Strong scripting skills (Python, Shell, PowerShell, ansible).
- Familiarity with Infrastructure as Code (IaC) tools like Terraform, CloudFormation
- Monitoring and observability experience using Dynatrace
- Proven automation and programming experience in one or more of the following languages: Java, Python, Go, Perl, Bash
- Deep understanding of SRE practices such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning
- Experience with deploying, monitoring, and troubleshooting large-scale, distributed applications in cloud environments such as AWS
- Familiarity with security frameworks such as ISO27001, SOCII, PCI-DSS, and / or HIPAA
- Experience working with SaaS, IaaS, and PaaS offerings
- Ability to work with global teams located in US and India
- 6+ years of experience in a technical discipline role with experience in planning, implementing and evaluating processes, systems and / or initiatives
- Broad technical acumen across multiple disciplines applications with a solid understanding of current technologies
- Experience applying measurement processes / methods for assessing program outputs and outcomes or progress toward goals and objectives.
- Extremely high level of analytical ability with complex problems
- Ability to work across organizational boundaries, to help lead and influence change
- Ability to command the process across all levels to ensure customer focus; including being assertive and self-starting
- Demonstrated leadership experience in influence and garnering alignment from external organizations
- Ability to align change management strategies with projects
- Skilled in conceptualizing creative solutions, documenting them, and presenting / selling them to senior management
- Very high level of interpersonal skills to work effectively with others, motivate employees, and elicit work output in a team environment
- Undergraduate/Bachelors degree in Computer Science, Information Systems, or related field.