Role overview

Qualifications

Experience evaluating ML, LLM, or non-deterministic systems.
Strong test and benchmark design capability.
Comfort working with noisy metrics, thresholds, and probabilistic behavior.
Good scripting and automation skills.

Responsibilities

Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
Wire evals into CI so quality regressions fail builds and releases.
Define and maintain release-gate thresholds with Product and the Tech Lead.
Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.

Key facts

Remote from: Anywhere
Full time
Quality Improvement Engineer
English

Hard skills

Benchmarking Scripting Test Design Business Metrics AI Testing

Other skills

Quality Assurance
Problem Solving
Teamwork

About the company

Softgic

We are a young and growing company, with operations in Medellin and Bogota, focused on the generation of technological solutions in synergy with our customers and our team so that these solutions add value within their organizations and their business processes.

Company details

Company size51 - 200

Links

Website LinkedIn See all jobs

Your match analysis

See how your profile stacks up against this role.

We compared the job requirements to your profile to show where you're strong and where you fall short.

Job description

This is a remote position.

Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

• Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.

• Wire evals into CI so quality regressions fail builds and releases.

• Define and maintain release-gate thresholds with Product and the Tech Lead.

• Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.

Requirements

Must-Have Qualifications

• Experience evaluating ML, LLM, or non-deterministic systems.

• Strong test and benchmark design capability.

• Comfort working with noisy metrics, thresholds, and probabilistic behavior.

• Good scripting and automation skills.

AI-First Expectations

• Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.

• Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

• The first reference agent has a published scorecard and gated eval path. • Golden and exception tests run automatically. • The team can explain what “good enough to ship” means in measurable terms.

Apply once. Then go straight to the hiring manager.

After you apply, unlock the direct contact details of the people who actually make the call. A quick follow-up makes you 5x more likely to land an interview.

Marcus Rivera

Chief Revenue Officer

m.rivera@company.com

linkedin.com/in/marcusrivera

Unlocked after you apply

Quality Improvement Engineer Related jobs

Worldwide Quality Improvement Engineer

Quality Engineer

Today

Vigil

Full time

Test AutomationTest-Driven Development (TDD)Behavior-Driven DevelopmentCI/CDSoftware Testing

ServiceNow Quality Engineer

Today

TELUS Digital

Full time

ServiceNowServiceNowTest AutomationTest PlanningTest Design

Performance Quality Engineer

Today

AffiniPay

Full time

Amazon Web ServicesMicroservicesKubernetesPython (Programming Language)Observability

Performance Quality Engineer

Today

8:00 AM

Full time

Amazon Web ServicesKubernetesMicroservicesObservabilityScripting

Sr Quality Engineer

3 days ago

Ecolab

Full time

Root Cause AnalysisData MiningSAP CO

Other jobs at Softgic

1747 Cloud Security Specialist (AWS)

13 days ago

Softgic

Full time

Amazon Web ServicesAmazon Web ServicesIncident ResponseIdentity And Access ManagementAmazon Elastic Compute Cloud

Full-Stack Developer (Product-Oriented) 1654

30+ days ago

Softgic

Full time
Mid-level (2-5 years)

JavaScript LibrariesTypeScriptJavaScript LibrariesJavaScript LibrariesDomain Driven Design

1748 Product Lead / Product Owner

4 days ago

Softgic

Full time
Senior (5-10 years)

Product ManagementBacklogsUser StoryLarge Language Modeling

Agent Quality / Evals Engineer 1754

Role overview

Qualifications

Responsibilities

Key facts

Hard skills

Other skills

About the company

Company details

Links

Your match analysis

Job description

Requirements

Apply once. Then go straight to the hiring manager.

Quality Improvement Engineer Related jobs

Quality Engineer

ServiceNow Quality Engineer

Performance Quality Engineer

Performance Quality Engineer

Sr Quality Engineer

Other jobs at Softgic

1747 Cloud Security Specialist (AWS)

Full-Stack Developer (Product-Oriented) 1654

1748 Product Lead / Product Owner

Reach out to the hiring manager directly.