C

GPU Infrastructure Support Engineer

CloudEngine Digital

kuala lumpur, kuala lumpur, Malaysia Full-time May 31, 2026
Apply Now

Vacancy Description

We are seeking a Infra Support Engineer to join the Global Infrastructure team. This role focuses on GPU system delivery, incident detection, triage, basic remediation, runbook execution, monitoring and clear escalation to the SRE (Site Reliability Engineering) team while helping improve operational runbooks and observability.

Responsibilities

  1. Provide first/second-line technical support to customers for the AI Infrastructure (GPU/CPU nodes, networking, storage, orchestration, platform services) via ticketing systems, emails, Slack, or other messaging systems.
  2. Monitor system health and service-level indicators (alerts, dashboards); respond to alerts 24x7 as scheduled.
  3. Triage incidents, gather context, verify scope and impact, follow standard operating procedures and runbooks to perform immediate mitigations.
  4. Escalate to the global SRE engineers with clear, concise incident notes and relevant logs/traces.
  5. Maint...

Ready to Apply?

अभी आवेदन करें

Submit your application for GPU Infrastructure Support Engineer at CloudEngine Digital

Apply for this Position