Vacancy Description
Role Overview
We are looking for a Lead Site Reliability Engineer with 6-7 years of experience to drive reliability, observability, and incident management practices. The ideal candidate will have strong expertise in Grafana stack , production monitoring, and handling critical incidents in high-availability systems.
Key Responsibilities
- Act as the Incident Commander during production outages, ensuring timely resolution and stakeholder communication
- Lead incident response, triage, RCA (Root Cause Analysis), and postmortems
- Build and enhance observability systems using Grafana (Prometheus, Loki, Tempo)
- Define and manage SLIs, SLOs, and SLAs for critical services.
- Develop and maintain monitoring, alerting, and dashboards for proactive issue detection.
- Collaborate with Dev, Infra, an...
Ready to Apply?
अभी आवेदन करें
Submit your application for Lead - Site Reliability Engineer at FundsIndia
Apply for this Position