This is a remote position.
Review and evaluate AI/LLM model outputs for accuracy, completeness, relevance, and logical consistency.
Design structured test scenarios and evaluation frameworks for AI responses.
Write, optimize, and refine prompts to improve model performance.
Identify hallucinations, inconsistencies, bias, and reasoning gaps in AI outputs.
Perform functional validation of AI-generated test cases, summaries, or responses.
Create and maintain golden datasets to support model benchmarking and training.
Ensure edge case and negative scenario coverage during testing.
Document evaluation findings with clear reasoning and supporting evidence.
Collaborate with AI/ML and product teams to improve overall model quality.
Strong knowledge of SDLC, STLC, and the defect lifecycle.
Experience in Manual Testing and/or API Testing.
Strong analytical and logical thinking skills.
Practical understanding of AI tools, LLM behavior, and their limitations.
Ability to write clear, structured, and outcome-driven prompts.
Strong documentation and stakeholder communication skills.
Experience with quality metrics and evaluation frameworks is preferred.

Keep Simple

Clever Real Estate

Darwoft

blau direkt Poland

Ci&T

Indium Software

Indium Software

Indium Software