
Staff Production Infrastructure Engineer- System Engineering
- Hyderabad, Telangana
- Permanent
- Full-time
- Design, build, and manage large-scale hybrid and multi-cloud environments (AWS, Azure, GCP, or on-prem).
- Develop and maintain Infrastructure as Code (IaC) using Terraform and configuration management tools such as Puppet, Ansible, or Chef.
- Implement and maintain CI/CD pipelines to ensure reliable, automated cloud deployments.
- Write and maintain automation scripts and tools in Python and Go to improve efficiency, scalability, and resilience of infrastructure.
- Manage and troubleshoot core networking services (DNS, DHCP, routing, load balancing, firewalls) across on-prem and cloud environments.
- Build and optimize orchestration workflows leveraging the ServiceNow platform.
- Write test plans and automation for declarative infrastructure across multiple cloud environments.
- Research, evaluate, and adopt new cloud-native and open-source technologies to enhance infrastructure reliability and performance.
- Leverage AI-driven tools and frameworks to enhance cloud operations, automate repetitive tasks, and improve productivity.
- Explore integration of AI frameworks (e.g., LangChain, OpenAI API, or similar) into Cloud and DevOps workflows.
- Stay ahead of advancements in AI agents and Retrieval-Augmented Generation (RAG) pipelines to build smarter, context-aware automation in cloud environments.
- 9+ years of experience managing and operating cloud and hybrid infrastructure at scale.
- Strong expertise in AWS, Azure, or GCP, including compute, storage, IAM, networking, monitoring, and security services.
- Hands-on experience with Terraform for Infrastructure as Code.
- Strong knowledge of CI/CD pipelines (Jenkins, GitLab CI/CD, GitHub Actions, or similar).
- Proficiency in Python and Go for automation and infrastructure tooling.
- Solid understanding of networking concepts and protocols, including DNS, DHCP, TCP/IP, routing, switching, firewalls, and load balancing.
- Strong understanding of Linux servers and their components (networking, storage, software packaging).
- Familiarity with Kubernetes and container orchestration (bonus).
- Experience with open-source tools and best practices for cloud operations and DevOps.
- AI-related skills (basic to intermediate):
- Understanding of prompt engineering and how to interact effectively with AI systems.
- Awareness of AI frameworks (LangChain, OpenAI, or similar) and their application in automation and cloud workflows.
- Basic understanding of RAG pipelines and how AI agents can enhance observability, troubleshooting, and knowledge retrieval.
- Ability to identify practical use cases of AI in day-to-day cloud operations (e.g., log analysis, monitoring, documentation automation).
- Strong communication skills with the ability to document and present both technical and AI-enhanced solutions to diverse stakeholders.