Supervisor, Data Operations & Management
McDonald's View all jobs
- Hyderabad, Telangana
- Permanent
- Full-time
- AIOps, MLOps, Python, Docker, Kubernetes, Machine Learning, Artificial Intelligence, GCP, AWS
- 4 - 8 years
- Deploy, monitor, and manage ML/AI models on Vertex AI, including training pipelines, model endpoints, and batch prediction workflows.
- Provide MLOps support, including troubleshooting model performance, debugging ML pipelines, and resolving production incidents.
- Implement AIOps practices, including monitoring, alerting, and automated remediation for AI/ML systems using GCP tools.
- Collaborate with Data Scientists to operationalize ML models, optimize performance, and ensure reliability.
- Manage CI/CD pipelines for ML workflows using tools like Cloud Build, GitHub Actions, or Jenkins.
- Monitor and optimize GCP resources such as Big Query, Cloud Storage, Dataflow, Dataproc, Cloud Functions, Cloud Run, and Pub/Sub for AI/ML workloads.
- Ensure compliance with security, governance, and data privacy best practices in ML workflows.
- Build dashboards and observability solutions using Google Cloud Monitoring/Logging, Data plex, and third-party tools.
- Troubleshoot and optimize ML pipelines, including data preprocessing, feature engineering, and model retraining.
- 24 years of experience in MLOps, AI operations, or ML engineering on cloud platforms, preferably GCP.
- Hands-on experience with Vertex AI for model deployment, training pipelines, and endpoint management.
- 2+ years of experience of code management and reliability using GitHub, pytest, and SonarQube.
- 1+ years of experience with Tableau.
- Strong knowledge of MLOps practices, including CI/CD, versioning, automated testing, and model monitoring.
- Proficiency in Python, SQL, and ML frameworks like TensorFlow or PyTorch.
- Experience with GCP services: Big Query, Cloud Storage, Dataflow, Dataproc, Cloud Functions, Cloud Run, Pub/Sub, Dataplex.
- Familiarity with AI monitoring, alerting, and observability tools (Cloud Monitoring, Logging, Data plex).
- Strong problem-solving skills and ability to troubleshoot ML pipelines and AI system issues.
- Understanding of AIOps concepts, including automated anomaly detection, model drift detection, and incident response for AI systems.
- GCP certifications such as Professional Machine Learning Engineer or Professional Data Engineer.
- Experience with feature stores, model registries, or MLflow.
- Familiarity with Kubeflow, Airflow, or orchestration tools for ML pipelines.
- Experience in data privacy, governance, and compliance for ML systems.