Role Overview
The AI Agent Evaluation Engineer is responsible for ensuring the quality, accuracy, explainability, and reliability of AI agent systems across Proof-of-Concept, Pilot, and Production. The role focuses on establishing enterprise-grade evaluation frameworks for agentic AI, LLMs, and AI-driven workflows to ensure outputs are trustworthy, measurable, and continuously improving.
Key Responsibilities
• Design and implement evaluation frameworks for AI agents, LLMs, and RAG-based systems.
• Measure accuracy, relevance, consistency, hallucinations, and task success across AI outputs.
• Establish baseline and comparative evaluations across models, prompts, and agent strategies.
• Validate agent decision logic, reasoning paths, and tool usage for explainability and traceability.
• Support human-in-the-loop (HITL) evaluation for high-impact or high-risk use cases.
• Partner with engineering teams to improve prompts, retrieval strategies, and agent orchestration.
• Validate AI observability, monitoring, drift detection, and regression controls.
• Support vendor PoCs, pilots, and RFP evaluations with fact-based assessments.
Required Qualifications
• Experience evaluating Generative AI, LLMs, and agentic AI systems.
• Strong understanding of AI/ML evaluation metrics and error analysis.
• Hands-on experience with Python and AI evaluation workflows.
• Familiarity with RAG architectures, prompt evaluation, and agent orchestration.
• Experience with cloud AI platforms (Azure or GCP preferred).
Preferred Qualifications
• Experience in Education, Healthcare, or other regulated domains.
• Exposure to synthetic data generation and test scenario design.
• Familiarity with AI governance, risk, and compliance practices.
Success Measures
• Measurable improvement in AI accuracy, reliability, and trustworthiness.
• Clear visibility into why AI agents made specific decisions.
• Standardized evaluation frameworks adopted across AI initiatives.
• Increased leadership confidence in AI-driven outcomes.

Ad Hoc LLC

Docker

SHI International Corp.

Nagarro

Genesys

Ci&T

Ci&T

Ci&T