
Monitoring & Observability Tech Lead
- Chennai, Tamil Nadu
- Permanent
- Full-time
- Maintain, support, and optimize the existing SolarWinds Orion platform.
- Take operational ownership of the new Grafana + Prometheus stack once delivered via project.
- Ensure observability platforms are scalable, secure, and well-integrated with cloud-native services (Azure PaaS, OCI PaaS).
- Define and evolve the monitoring strategy, standards, and roadmaps: APM, AI Ops, Self-Healing…
- Lead technically the projects aiming at deploying the features in the roadmap.
- Create and maintain dashboards, alerts, custom exporters, synthetic checks, and health probes.
- Drive collaboration between cloud, DevOps, infrastructure, and application teams to onboard services and ensure end-to-end visibility.
- Lead troubleshooting and root cause analysis on the observability platform.
- Mentor team members and promote knowledge sharing across operations and engineering teams.
- Must have:
- Strong technical experience with SolarWinds Orion for infrastructure, database, and application monitoring.
- Strong technical experience with Grafana and Prometheus.
- Hands-on knowledge of monitoring best practices in hybrid and multi-cloud environments.
- Proven experience leading technical initiatives or playing a lead role in monitoring/observability projects.
- Strong scripting or automation experience (PowerShell, Python, Bash, Terraform).
- Deep understanding of metrics, logs, telemetry pipelines, synthetic monitoring, and alerting strategies.
- Nice to Have:
- Familiarity with Azure Monitor, Log Analytics, Application Insights, or OCI Monitoring and Logging.
- Experience with Prometheus exporters and integrating custom metrics.
- Knowledge of ITSM practices and integration with platforms like ServiceNow.
- Exposure to AI Ops, anomaly detection, or predictive alerting tools.
- Toolbox:
- Monitoring: SolarWinds Orion, Grafana, Prometheus
- Cloud: Azure
- Automation: PowerShell, Bash, Python, Terraform
- Telemetry: Azure Monitor, App Insights, OCI Logging
- ITSM: ServiceNow
- CI/CD: Azure DevOps (nice-to-have for future integrations)