MLOps Engineer

Agivant

  • Pune, Maharashtra
  • Permanent
  • Full-time
  • 9 days ago
Job Description :About the Role:We are seeking a highly skilled DevOps / ML Ops Engineer to design, implement, and manage the infrastructure and deployment pipelines for cutting-edge ML Systems. The ideal candidate will have strong expertise in CI/CD, container orchestration, cloud platforms, and observability tools to ensure performance, scalability, and reliability of AI-driven workflows.Key Responsibilities:
  • CI/CD Pipeline Setup: Design, implement, and maintain CI/CD pipelines for deploying ML System systems using tools like Jenkins, GitHub Actions, or GitLab CI.
  • Performance & Reliability Monitoring: Monitor and optimize the performance, scalability, and reliability of ML Systems
  • Infrastructure Scaling: Scale infrastructure to support AI workflows efficiently across multiple environments.
  • Observability & Monitoring: Implement and manage observability tools (Prometheus, Grafana, ELK Stack) for real-time monitoring and alerting.
  • Vector Database Infrastructure: Set up and manage infrastructure for vector databases to support AI driven applications.
  • Focuses on the machine learning infrastructure, model deployment, and MLOps practices for AI systems.
  • Responsible for designing and implementing end-to-end machine learning pipelines, automating model deployment, monitoring, and versioning across development, staging, and production environments using tools like Kubernetes, Docker, MLflow, and Azure cloud platforms, N8n etc.
RequirementsRequired Skills & Qualifications:
  • CI/CD Expertise: Strong experience with Jenkins, GitHub Actions, GitLab CI.
  • Scripting: Proficiency in Python and Bash for automation of deployment, scaling, and maintenance tasks.
  • Containerization & Orchestration: Hands-on experience with Docker, Kubernetes, and Helm charts.
  • Infrastructure as Code (IaC): Experience with Terraform, Ansible, or CloudFormation for automated infrastructure provisioning and version control. Monitoring & Observability: Familiarity with Prometheus, Grafana, ELK Stack for system health and performance tracking.
  • Cloud Platforms: Proficient in AWS, GCP, or Azure for provisioning and scaling compute, storage, and networking resources. Preferred Qualifications Experience with ML System systems or AI/ML infrastructure.
  • Knowledge of vector databases (e.g., Pinecone, Weaviate, Milvus).
  • Strong problem-solving and troubleshooting skills in distributed systems.
  • Programming: Python, Shell scripting, YAML
  • Containerization: Docker, Kubernetes, Helm
  • Cloud Platforms: Azure ML
  • CI/CD: GitHub Actions, Azure DevOps,
  • ML Tools: MLflow, Kubeflow,
  • Monitoring:, Grafana, Azure Monitor
  • Orchestration: Apache Airflow, N8n, Azure Data Factory
  • Experience: 3+ years in DevOps/Infrastructure, 2+ years in ML systems

Agivant