
Cloud Ops Engineer
- Bangalore, Karnataka
- Permanent
- Full-time
- Be a champion for department initiatives and values by ensuring all actions promote the department’s mission statement
- Participate in release cycles of product by closely working with Engineering Managers, Architects and Developers.
- Work towards automating the product deployment to various environments by integrating with continuous integration (CI) and continuous delivery (CD) tools, monitoring, and change management practices.
- Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in the environment.
- Implement monitoring, alerting, notification and metrics collection for
- Infrastructure and application performance
- System uptime
- Error rate
- Monitor and continually improve the capacity and reliability of our production environments infrastructure.
- Investigate and fix performance and scalability bottlenecks, proactively identify issues and create work items to improve stability and performance.
- Respond to alerts from production systems, identify and resolve root causes in a timely fashion
- Identify single points of failure and other high-risk architecture issues and propose resilient resolutions to mitigate the risk thereby improving the system reliability.
- See opportunities of automation and reduce the operational workload, build scripts, introduce new tools and practices as needed
- Work with other Cloud Infrastructure Engineer and developers to ensure maximum performance, reliability and automation of our deployments and infrastructure.
- Work with, consult and influence developers on new features and software architecture to ensure scalability.
- Communicate to stakeholders and handle the deployment/maintenance/support efficiently
- Ticket Handling and Support
- Tickets that are handled should have clear communication and correct stakeholders involved
- Tickets should be completed within the SLA and should be clearly informed, documented if there is any delay or improper tickets.
- Tickets should have proper comments to close the ticket including steps for resolutions, screen shots.
- Tickets that are repetitive should be discussed in standup call for brainstorming and eventually should lead into resolution through automation if necessary.
- 4+ years of experience with any public cloud provider such as Amazon Web Services (AWS), Microsoft Azure and On-Prem Servers
- Solid understanding of standard TCP/IP networking, Windows IIS, Load Balancing and common protocols like DNS, HTTPS
- Good knowledge on CI/CD tools like Octopus CD, Azure ADO, GitHub Actions, Jenkins etc
- Monitoring and Logging: Experience with any Application monitoring and logging tools (e.g. Datadog, New Relic, AppDynamics, Application Insight, ELK, Prometheus).
- Good understanding of Web Servers & Database
- [Optional] Good understanding in Docker and Kubernetes.
- Good scripting knowledge & Software life cycles model.
- Good understanding of DevOps practices.
- Should have worked on high traffic & highly scalable systems in past
- Knowledge on fundamental aspects for release automation (packaging, dependencies, promotion, deployment, compliance)
- A passion for collecting, evaluating, and improving performance metrics.
- Excellent time management, resource organization and priority establishment skills, and ability to multi-task in a fast-paced environment
- Ability to work quickly and efficiently with minimal supervision
- Excellent communication skills with both written and verbal
- Should be able to handle On-calls 12-hours following a week rotation pattern for symplr products.
- Able to work during the US Day hours shift and coordinate with team members in US/India for completing the day-to-day tasks.
- Have HEART. To work here, you must be:
- Humble – self-aware and respectful
- Effective – measurably move the needle & immeasurably add value
- Adaptable – innately curious and constantly changing
- Remarkable – stand out in some way
- Transparent – openly and honestly sharing knowledge
- 3+ years of Systems Engineering experience in the following areas
- Cloud platforms (Azure, AWS) and On-Prem Servers
- Windows and Linux Servers
- Application Monitoring Tools (Datadog, New Relic, AppDynamics, Application Insights)
- Log Aggregation Tools (Datadog, ELK, etc)
- PowerShell, Bash, or Python scripting
- CI/CD tools (Azure Pipelines, GithHub Actions, Jenkins, Octopus, etc.)
- Infrastructure management tools (Terraform, Ansible, etc.)
- Application Hosting (IIS, Apache, Tomcat)
- Alerting (PagerDuty)
- Ticketing (ADO Boards and Ivanti)
- Documentation (Confluence)
- Bachelor’s degree or equivalent experience