Lead Consultant – Enterprise AI Platforms
AstraZeneca
- Chennai, Tamil Nadu
- Permanent
- Full-time
- Collaborate closely with data science teams to design, deploy and manage the Kubernetes platform for Machine Learning.
- Provide the necessary infrastructure and platform to support the deployment and monitoring of ML solutions in production. Optimizing solutions for performance and scalability.
- Deployment of systems, applications, and tooling for data science on AWS cloud environments.
- Liaise with BTG data scientists to understand their challenges and work with them to help productionise ML pipelines, models and algorithms for innovative science.
- Take responsibility for all aspects of software engineering, from design to implementation, QA and maintenance with the support from ML experts.
- Liaise with other teams to enhance our technological stack, to enable the adoption of the latest advances in Data Processing and AI
- 12+ years’ or equivalent experience architecting and managing large Kubernetes clusters
- Experience of managing service mesh, such as Istio
- Experience of Kubernetes ML platforms and toolkits (Kubeflow)
- Knowledge on Linux/Shell scripting
- Certified Kubernetes Administrator/Developer
- Experience of scheduling strategies on clusters with different node types
- Modern DevOps mindset, using best of breed DevOps toolchains, such as Docker, Git, Jenkins
- Experience with infrastructure as code technology such as Ansible, Terraform and Cloud Formation
- Experience managing and automating real-world platforms/applications on AWS
- Strong software coding skills, with proficiency in Python, however exceptional ability in any language will be recognized.
- Experience with system monitoring tools such as Grafana, Prometheus, Thanos, etc
- Experience with Continuous Integration and the building of continuous delivery pipelines, such as: Helm, ArgoCD
- Experience with open-source and cloud-native Machine Learning Platforms and Toolkits
- Demonstrable knowledge of building MLOps environments to a production standard
- Understanding of Kubernetes internal networking and its effect on the performance of multi-node GPU ML training
- Experience in declarative management of Kubernetes objects using tools such as: kustomize
- Multi-cloud experience (AWS/Azure/GCP)
- Data storage experience with RDBMS and NoSQL technologies
- Experience in mentoring, coaching and supporting less experienced colleagues and clients.
- Experience with SAFe agile principles and practices
- Certified Kubernetes Administrator