Q

ML Research Platform Engineer (Distributed Training & HPC)

QNT Partners

singapore, singapore, Singapore Full-time June 18, 2026
Apply Now

Vacancy Description

About the role

We are looking for a platform engineer to build the infrastructure that powers our next-generation machine learning research. Think: large-scale experimentation, distributed training, and reproducibility.

This is not an applied ML role. You will not be fine-tuning LLMs or building agents. Instead, you will build the systems that enable researchers to train models at scale

What you will own
  • Distributed training pipelines for GPU-accelerated workloads (PyTorch, JAX)
  • Experiment management and model versioning
  • Resource scheduling on on-premise HPC clusters and cloud (Slurm, Kubernetes)
  • Observability and debugging for complex training jobs
  • Data lineage and artifact tracking
Must haves (non-negotiable)
  • 2+ years building large-scale distributed systems for research or data-intensive...

Ready to Apply?

अभी आवेदन करें

Submit your application for ML Research Platform Engineer (Distributed Training & HPC) at QNT Partners

Apply for this Position