MTS Systems Design Eng. (Machine Learning)
Advanced Micro Devices
- Hyderabad, Telangana
- Permanent
- Full-time
- Kernel Development:
- Design and implement highly optimized C++ kernel library for Generative AI models.
- Collaborate with the research and software teams to integrate these kernels into our existing software stack.
- Vector Processor Optimization:
- Work closely with hardware engineers to understand the architecture of VLIW vector core units such as MAC, GeMM, and non-linear functions.
- Develop vectorized code that leverages SIMD (Single Instruction, Multiple Data) and ILP (instruction level parallelism) for maximum performance.
- Performance Profiling and Tuning:
- Profile and analyze the performance of existing kernels.
- Identify bottlenecks and optimize critical sections for better throughput.
- Graph Acceleration:
- Accelerate machine learning graphs using operator fusion and linear approximations.
- Explore quantization approaches (INT4, FP4, FP16) to improve performance while maintaining numerical stability and accuracy.
- Testing and Validation:
- Develop CPU models for the ML operators in C++/ Python to validate accuracy.
- Write unit tests and integration tests to ensure correctness and reliability.
- Validate kernel performance across different hardware platforms.
- Documentation and Collaboration:
- Document design specs for new kernels, and document performance improvements.
- Follow coding guidelines, use tools like git to maintain code and create pull-requests, and documentation.
- Collaborate with cross-functional teams, including machine learning researchers and software engineers.
- Excellent C/C++ and Python coding skills
- Good understanding of SIMD/Tensor/VLIW processor architecture to exploit parallelism.
- Experience with vectorized programming (SIMD) and parallel computing.
- Experience with CNN, LSTM, LLM, and diffusion models is necessary.
- Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch) is a plus.
- Knowledge of low-level hardware details (cache hierarchy, memory access patterns) is desirable.
- Excellent problem-solving skills and a passion for performance optimization.