AI Models MAD - Model Automation and Dashboarding Engineer
Advanced Micro Devices View all jobs
- Hyderabad, Telangana
- Permanent
- Full-time
- Model Testing & Validation: Build and maintain automated functional and performance testing pipelines for AI models across ROCm-supported hardware using scalable tools.
- Software Engineering Excellence: Proficiency in Python and good understanding on C++ with deep experience in automation, integration testing, debugging, and robust test design, ensuring reliable, maintainable, high-performance codebases.
- Benchmarking Infrastructure: Develop tools and automation for continuous benchmarking and regression tracking across hardware generations and ROCm releases.
- Dashboard & Metrics Development: Build and maintain real-time dashboards that report relevant performance, accuracy, and reliability metrics for both internal and public users.
- Ecosystem Integration: Collaborate with teams to support a wide range of models, including public and private/NDA workloads.
- Scalable Tooling: Contribute to the design of portable, easy-to-use Python interfaces that support multi-node profiling, distributed workloads, and containerized deployments.
- Open-Source Contributions: Support public-facing MAD GitHub repositories and Docker releases, enabling the community to run and validate models on ROCm.
- Programming & Tooling: Strong Python development skills, with experience in test automation, CI/CD, and Linux scripting
- Knowledge of Docker, frameworks for provisioning and managing infrastructure (such as Kubernetes or Ansible) , especially for testing and deploying AI Models and services at scale
- Machine Learning Workflow Understanding: Familiarity with AI frameworks (e.g., PyTorch, TensorFlow), model benchmarking, and ML model lifecycles.
- Performance Analysis: Strong experience with profiling tools, system monitoring, or regression tracking systems for deep learning models.
- DevOps & Dashboards: Solid experience in performance dashboards, visualization tools (e.g., Grafana, Plotly), and metrics collection pipelines.
- Software Engineering Practices: Proficiency with version control (GitHub), testing strategies, code reviews, and collaborative software development.
- Communication & Ownership: Strong written and verbal communication skills with a proactive approach to defining and driving development efforts.
- Undergraduate and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.