Vacancy Description
Owns the eval harness and quality gate from the beginning. This role replaces the old late‑stage “Evals Specialist” model with a standing owner for measurable agent quality.
Key Responsibilities
- Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
- Wire evals into CI so quality regressions fail builds and releases.
- Define and maintain release‑gate thresholds with Product and the Tech Lead.
- Lay the path for later adversarial and drift‑testing expansion without overbuilding MVP scope.
Requirements
- Experience evaluating ML, LLM, or non‑deterministic systems.
- Strong test and benchmark design capability.
- Comfort working with noisy metrics, thresholds, and probabilistic behavior.
- Good scripting and automation skills.
- Uses AI to generate candidate eval cases and failure hypotheses, but never confuses gene...
Ready to Apply?
अभी आवेदन करें
Submit your application for Agent Quality / Evals Engineer 1754 at Softgic S.
Apply for this Position