
System Administrator, NOC
- Pune, Maharashtra
- Permanent
- Full-time
- Monitor cloud infrastructure and perform level 1 and level 2 troubleshooting
- Configure and manage various monitoring tools.
- Perform Windows patching and vulnerability remediation.
- Troubleshoot infrastructure issues and Active Directory access.
- Develop and implement automation solutions.
- Troubleshoot infrastructure issues and Active Directory access.
- Provide analysis and insights to various teams.
- Create, operationalize, and manage NOC Standard Operating Procedures (SOPs), including quarterly reviews and adjustments.
- Perform system health checks and maintain related reports and communications.
- Participate in stand-up meetings, conference calls, bridge/incident calls, documentation, and Root Cause Analysis (RCA) preparation.
- Implement and maintain security measures.
- Collaborate with other IT teams to ensure seamless operations.
- Lead troubleshooting efforts for complex issues.
- Perform advanced system maintenance and upgrades.
- Mentor and train junior administrators.
- Investigate issues outside of defined SOPs.
- Work closely with technical teams (e.g., developers, sys admins, DBAs, infra teams).
- Execute corrective actions identified during post-incident reviews and RCA.
- Identify and resolve performance bottlenecks as part of NOC.
- Develop and implement automation solutions to streamline NOC operations using Terraform, GitHub, and Ansible.
- Collaborate with cross-functional teams to identify automation opportunities and improve operational efficiency.
- Maintain and update automation scripts and configurations to ensure optimal performance and reliability.
- Monitor and troubleshoot automation processes to quickly resolve any issues.
- Document automation workflows and provide training to team members as needed.
- Apply SRE principles to enhance system reliability and performance.
- Utilize PowerShell scripting to automate routine tasks and improve system management.
- Bachelor’s degree in computer science or a related field.
- 5+ years of experience in monitoring tool configuration and system administration.
- 2+ years of experience in Windows/Linux troubleshooting and patching.
- Proven experience with Terraform, GitHub, and Ansible.
- Knowledge of Site Reliability Engineering (SRE) principles.
- Proficiency in PowerShell scripting.
- Strong problem-solving skills and attention to detail.
- Excellent communication and teamwork abilities.
- Ability to work in a fast-paced environment and manage multiple tasks simultaneously.
- Excellent communication skills, both verbal and written.
- Experience with diagnostic and monitoring tools such as SolarWinds, Dotcom Monitor, Datadog, Nagios, Prometheus etc.
- Flexibility to provide on-call support during weekends if required and adapt to a constantly changing environment.
- Experience with Windows server patching tools such as Ivanti, Endpoint Central, etc.
- Experience leveraging agile methodologies (e.g., Scrum Ban) to manage project work.
- Familiarity with the healthcare/health insurance industry.
- Familiarity with service desk platforms (e.g., JIRA, Remedy, ServiceNow).
- Experience working closely with technical teams (e.g., developers, sys admins, DBAs, infra teams).
- Ability to review vulnerability reports and prioritize vulnerabilities for remediation based on severity/impact.
- Highly collaborative attitude.
- Low ego and a team player.
- Relevant certifications (e.g., AZ-900, ITIL, CCNA, Datadog, SolarWinds).