Vacancy Description
Responsibilities
- System Development & Maintenance
Contribute to the development, optimization, and maintenance of core components of the machine learning platform, including feature stores, experiment tracking systems, model registries, workflow orchestration, and serving frameworks - Training Efficiency Optimization
Assist in optimizing the performance of distributed training frameworks (e.g., PyTorch DDP, DeepSpeed, FSDP) on large-scale clusters, addressing challenges such as resource scheduling and communication bottlenecks - Inference Performance Optimization
Participate in model deployment and serving, including performance profiling and acceleration through model compilation (e.g., TVM, TensorRT), operator optimization, computation graph optimization, and batching strategies - Infrastructure Support
Leverage technologies such as containerization (Docker), orchestration (Kubernete...
Ready to Apply?
अभी आवेदन करें
Submit your application for Machine Learning Systems Engineer (MLSys) at HPC AI TECHNOLOGY PTE. LTD.
Apply for this Position