
Principal Dev Ops Engineer
- Pune, Maharashtra
- Permanent
- Full-time
- Designing and developing the DevOps infrastructure for business-critical systems.
- Maintaining and improving container-based Kubernetes environments.
- Develop and integrate observability solutions across the stack (infrastructure, application, network, and user experience) to monitor and provide actionable insights.
- Work with developers and engineers to ensure that all relevant services, applications, and infrastructure components are instrumented using the latest observability best practices (e.g., logging, tracing, and metrics collection).
- Set up automated alerting systems for real-time detection of performance bottlenecks, failures, or anomalies, and integrate with incident management workflows.
- Build pipelines for data collection, storage, and visualisation to help the teams gain insights from monitoring data.
- Use observability data to improve system reliability, availability, and performance by driving root cause analysis and continuous improvement initiatives.
- Implement automated solutions for monitoring and alerting that scale with platform growth and reduce manual intervention.
- Develop and maintain comprehensive documentation on monitoring, alerting, and incident response processes. Provide training and support to engineering teams to use observability tools effectively.