Vacancy Description
GPU Infrastructure / Performance Engineer
London (Onsite) | Visa Sponsorship + Relocation
Join a frontier AI company backed by NVIDIA, building large-scale open-weight foundation models alongside researchers and engineers from DeepMind, OpenAI, Meta, Anthropic, and Google Brain.
⚡ What You’ll Do
Optimise GPU performance and training efficiency across 1,000+ GPU clusters
Improve utilisation, throughput, and reliability across distributed training infrastructure
Build tooling for orchestration, monitoring, scheduling, and observability
Work closely with research teams to accelerate large-scale model training
What They’re Looking For
Deep GPU infrastructure / distributed systems experience
Strong knowledge of CUDA, NCCL, PyTorch, DeepSpeed, JAX, Megatron-LM, vLLM, etc.
Experience operating large-scale GPU clusters (1,000+ GPUs)
Kubernetes, Slurm, or similar orchestration expertise
BONUS: Experience working on NVIDIA Blackwell chips (B200,...
London (Onsite) | Visa Sponsorship + Relocation
Join a frontier AI company backed by NVIDIA, building large-scale open-weight foundation models alongside researchers and engineers from DeepMind, OpenAI, Meta, Anthropic, and Google Brain.
⚡ What You’ll Do
Optimise GPU performance and training efficiency across 1,000+ GPU clusters
Improve utilisation, throughput, and reliability across distributed training infrastructure
Build tooling for orchestration, monitoring, scheduling, and observability
Work closely with research teams to accelerate large-scale model training
What They’re Looking For
Deep GPU infrastructure / distributed systems experience
Strong knowledge of CUDA, NCCL, PyTorch, DeepSpeed, JAX, Megatron-LM, vLLM, etc.
Experience operating large-scale GPU clusters (1,000+ GPUs)
Kubernetes, Slurm, or similar orchestration expertise
BONUS: Experience working on NVIDIA Blackwell chips (B200,...
Ready to Apply?
अभी आवेदन करें
Submit your application for Vacancy: Software Engineer at Confidential
Apply for this Position