
Senior Site Resiliency Engineer (Application Monitoring & Troubleshooting + Azure + Angular + C#.Net + APM Tools (Dynatrace, Datadog, App Insights))
- Hyderabad, Telangana
- Permanent
- Full-time
- Build and maintain application monitors, to ensure our applications are performing at a high reliability rate
- Monitor applications, to ensure we identify and remediate issues impacting our customers quickly
- Maintain knowledge of overall distributed system environments, utilities and procedures
- Participate in on-call rotations
- Provide timely, concise communication of incident status to appropriate personnel
- Document incident occurrence, root cause analysis and resolution(s) applied using designated repositories
- Evaluate conditions and suggest possible strategies to minimize risk(s) of incident recurrence
- Consult with and direct other staff personnel as required for effective incident resolution
- Resolve development and support issues of high complexity or risk
- Bachelor’s Degree in Computer Science, Engineering, Information Technology or equivalent experience
- Technology Certification (Optional) – DR (e.g., CDRE), Business Continuity (e.g., MBCP), Network (e.g., CCNA), Microsoft (e.g., MCP) or equivalent experience
- Minimum 8-10 years of experience in the field of Information Technology, Systems & Application Development/Support, Infrastructure support, Disaster Recovery
- Minimum 3 years of experience in leading teams or project management
- Minimum 3 years of experience in advanced technology analysis and diagramming
- On-call duty in the potential event of business continuity incidents and disaster response (24/7/365)
- Potential for long hours in the event of actual business continuity incidents and disaster response
- No travel required
- Technology Management Experience
- Use of other business continuity software tools
- Reporting aptitude and capabilities (ie. Excel, PowerBI, Tableau)
- Good communication skills (English)
- 5 years’ experience web application development, to troubleshoot issues by reading logs, and resolving them)
- 5 years’ experience analyzing technical problems and delivering solutions of high risk
- 4 years’ experience in Azure, .Net, Angular, SQL, REST
- 4 years’ experience in APM Tools (Datadog, App Insights, Dynatrace)
- 4 years’ experience in Performance analysis
- Advanced Knowledge of Datadog & Dynatrace Dashboard creation (APM Reporting, Infrastructure Health Reporting)
- Familiarity with business operations, critical business processes, and interdependencies on systems and applications
- Familiarity with legal, regulatory and industry security requirements and frameworks. Including, but not limited to the following:
- International Organization for Standards (ISO/IEC 27001)
- Payment Card Industry – Data Security Standards (PCI – DSS)
- Sarbanes Oxley (SOX)
- Information Technology Infrastructure Library (ITIL)