Logo for Softgic

Agent Quality / Evals Engineer 1754

Key Facts

Remote From: 
Full time
English

Other Skills

  • Quality Assurance
  • Problem Solving
  • Teamwork

Roles & Responsibilities

  • Experience evaluating ML, LLM, or non-deterministic systems.
  • Strong test and benchmark design capability.
  • Comfort working with noisy metrics, thresholds, and probabilistic behavior.
  • Good scripting and automation skills.

Requirements:

  • Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
  • Wire evals into CI so quality regressions fail builds and releases.
  • Define and maintain release-gate thresholds with Product and the Tech Lead.
  • Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.

Job description

This is a remote position.

Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

• Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
• Wire evals into CI so quality regressions fail builds and releases.
• Define and maintain release-gate thresholds with Product and the Tech Lead.
• Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.


Requirements

Must-Have Qualifications

• Experience evaluating ML, LLM, or non-deterministic systems.
• Strong test and benchmark design capability.
• Comfort working with noisy metrics, thresholds, and probabilistic behavior.
• Good scripting and automation skills.

AI-First Expectations

• Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.
• Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

• The first reference agent has a published scorecard and gated eval path. • Golden and exception tests run automatically. • The team can explain what “good enough to ship” means in measurable terms.

Quality Improvement Engineer Related jobs

Other jobs at Softgic

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.