TechOps-DE-AMS-Platform Eng-Manager
- Kochi, Kerala
- Permanent
- Full-time
- Drive the identification of opportunities, development and management of tools and platform capabilities to serve operations engineers, reducing toil and increasing productivity.
- Create self-service platforms that enable operations engineer autonomy and golden paths.
- Design, implement, and optimize dashboards, logs, and alerting systems using tools like Splunk, AppDynamics, Datadog etc.,
- Manage observability for cloud-native workloads (AWS/Azure) and hybrid environments.
- Drive automation using cloud, monitoring, and SDLC tools to integrate observability into CI/CD pipelines and ITSM tools (e.g., ServiceNow).
- Bring together Cloud, AI/ML platforms, automation technologies, DevSecOps, ITSM, and observability into a coherent, supportable managed services ecosystem. Architect and integrate end-to-end tool chains spanning across different tools across layers such as:
- Cloud & DevSecOps: Azure (preferred), AWS, GCP; Kubernetes; Terraform; GitHub/Azure DevOps; GitOps
- Data & AI: Azure OpenAI, Databricks, Snowflake, vector databases, MLOps pipelines, prompt/version management
- Automation: UiPath, Power Automate, Azure Automation, event-driven workflows
- ITSM & AIOps: ServiceNow, Jira Service Management, Dynatrace, Azure Monitor, Prometheus/Grafana
- Security: Entra ID, Key Vault, Sentinel/Splunk, DLP and data governance tools
- Monitor production performance (SLAs/KPIs) and drive incident resolution, including on-call management.
- Maintains scalable, secure, and resilient engineering / cloud application architecture while managing day-to-day operations.
- Define SLAs, SLOs, OLAs, service catalogs, and automation backlogs.
- Guide service onboarding, knowledge transfer (KCS), runbook creation, and CMDB population.
- Drive increased automation coverage and reduced MTTR through self-healing and AIOps patterns.
- Mentor and develop junior engineers on site reliability engineering (SRE) principles.
- Collaborate with cross-functional teams - data engineers, Engineering teams, developers, AI / Automation team, architects, and business stakeholders to identify, and drive transformation.
- Ensure compliance with EY's risk management and security standards.
- Strong experience in public cloud services (AWS, Azure, GCP), DevOps, and automation tools
- Strong skills in programming/scripting (Python, Bash, or PowerShell) for custom automation.
- Deep understanding of SLIs, SLOs, distributed tracing, and log management.
- Experience building and implementing tools, automation and AI solutions in digital engineering.
- Strong track record in managing delivery quality, project execution, and cross-functional collaboration.
- Strong understanding and management of ITSM, ITIL methodologies, product engineering, and agile methodologies.
- Excellent communication, leadership, and stakeholder management skills.
- Bachelor's or Master's degree in computer science, Engineering, or related field.
- 10-12+ years of experience in engineering delivery, or managed services, with at least 2-5 years in developing, implementing and managing engineering platforms, and tools.
- Expert knowledge of monitoring and observability tools (E.g., Datadog / Splunk / Dynatrace / App dynamics / New Relic etc.,)
- Enthusiastic learners with a passion for cloud services (AWS, Azure, GCP), DevOps, and automation tools and Strong skills in programming/scripting (Python, Bash, or PowerShell).
- Problem solvers with a proactive approach to troubleshooting and optimization.
- Team players who can collaborate effectively in a remote or hybrid work environment.
- Detail-oriented professionals with strong documentation skills.