Vacancy Description
Job Responsibilities
Own production reliability (SLOs, capacity, incident response, postmortems) and turn every incident into a durable fix in code or automation.Build the platform and tooling that make services easy to deploy, observe, and operate: CI/CD, infrastructure-as-code, observability stacks, runbooks-as-code.Apply AI agentically across operations (triage, root-cause analysis, remediation, change review) and contribute to our internal agentic ecosystem.Design and integrate the systems underneath our services: messaging (e.g. Kafka), orchestration (e.g. Kubernetes), and performance-sensitive infrastructure.Partner with product engineers on release readiness, rollout strategy, and production hardening before things ship.Continuously reduce toil: measure it, attack it with code, and raise the floor on what easy to maintain looks like.Job Requirements
<...
Ready to Apply?
अभी आवेदन करें
Submit your application for Site Reliability Engineer (SRE) at SGX
Apply for this Position