Key Facts

Remote From:

Kenya , Turkey , Egypt , Ukraine , India , Vietnam , Thailand , Indonesia , Taiwan , China , Japan , Russia , Germany , France , Brazil ...

Fixed term

English

Hard Skills

AI Testing Sales Management Educational Evaluation Red Teaming Cross-Functional Collaboration Hallucinations Escalation Management Evaluation Strategy Qualitative Analysis Regression Testing +13 more

Other Skills

•
Empathy
•
Decision Making
•
Professionalism
•
Teamwork
•
Decisiveness
•
Detail Oriented
•
Reliability
•
Quality Control

Roles & Responsibilities

Deep domain expertise in customer service, support operations, or CX
Experience handling customer inquiries, support workflows, or service escalations
Ability to apply criteria consistently and work with structured evaluation workflows
Native or professional fluency in one or more supported languages; English fluency for guidelines, feedback, and collaboration

Requirements:

Evaluate AI outputs for customer service and support interactions using clearly defined rubrics
Conduct structured scoring, classification, comparison, and judgment tasks; assess accuracy, tone, empathy, clarity, and alignment with support standards
Identify hallucinations, policy violations, unsafe guidance, or escalation issues; apply domain-specific guidelines consistently
For senior-level evaluators, validate and refine evaluation rubrics, adjudicate disagreements, conduct error analysis, and collaborate with research/product/customer teams on evaluation design and model readiness

Lilt

About Lilt

LILT is the leading AI solution for enterprise translation. Our stack of Contextual AI, Connector APIs, and Human Adaptive feedback enable global organizations to adopt a true AI translation strategy, focusing on business outcomes instead of outputs. We bring human-powered, technology-assisted translations to global enterprises. We give organizations everything they need to scale their translation programs, go-to-market faster than ever, and improve the global customer experience. Speaking of – with Lilt, innovative, category-defining organizations like Intel, ASICS, WalkMe, and Canva are using AI-powered technology to deliver multilingual, digital customer experiences at scale. Lilt is based in San Francisco with global offices in Berlin, Dublin, Indianapolis, Washington, D.C., London, and Argentina, and are backed by Intel Capital, Sequoia, Redpoint, Zetta, and XSeed.

Company type: Scaleup

Founded: 2018

Company size: 51 - 200

Website LinkedIn See all jobs →

Job description

Overview

LILT is building a global network of domain experts to support high-quality AI evaluation across training, benchmarking, red-teaming, and ongoing model monitoring. We are seeking customer service and support professionals to contribute expert judgment to human-in-the-loop AI evaluation workflows used by leading enterprises and hyperscalers.

This role is designed for professionals who understand how customer support interactions work in real operational environments and who can apply that expertise to evaluate, assess, and improve multilingual AI systems used in customer-facing contexts.

Your contribution of expertise will directly influence multilingual AI model quality, safety, and deployment readiness.

This role includes two distinct expert tracks, based on experience level and scope of responsibility.

Track A: Customer Service & Support AI Rater

Raters execute structured evaluation tasks using clearly defined rubrics and instructions.

Responsibilities

Evaluate AI outputs related to customer service and support interactions
Perform structured scoring, comparison, classification, and judgment tasks
Assess accuracy, clarity, tone, helpfulness, and alignment with support best practices
Identify hallucinations, misleading responses, policy violations, or unsafe guidance
Apply domain-specific customer support guidelines consistently across tasks

Ideal Background

Customer support professionals, service operations specialists, or CX practitioners
Experience handling customer inquiries, support workflows, or service escalation
Strong attention to detail and comfort working with structured evaluation criteria

Track B: Customer Service & Support AI Evaluator (Senior Track)

Evaluators provide higher-level domain oversight and help shape how evaluation is performed.

Responsibilities

Validate and refine evaluation rubrics and edge-case handling
Perform adjudication where raters disagree
Conduct error analysis and qualitative reviews of model behavior
Partner with LILT research, product, and customer teams on evaluation design
Support red-teaming, policy alignment, and model readiness assessments

Ideal Background

Senior support leaders, CX managers, or service quality specialists
Experience defining support standards, reviewing complex edge cases, or managing escalations
Ability to clearly explain nuanced service decisions and tradeoffs

Evaluation Focus & Requirements

Types of AI Evaluation Work

Depending on project demands, work may include:

Customer service and support content evaluation
Tone, empathy, and response quality assessment
Policy adherence and escalation handling evaluation
Red-teaming for harmful, unsafe, or misleading responses
Ongoing model monitoring and regression testing

What We Look For

Deep domain expertise in customer service, support operations, or CX
Strong judgment and ability to apply criteria consistently
Comfort working with structured evaluation workflows
Ability to explain reasoning clearly, especially in sensitive customer scenarios
Reliability, professionalism, and respect for quality standards

Engagement Model

Contract-based, flexible participation
Project-based work with clear expectations and timelines
Opportunities for recurring work based on performance and demand
Compensation communicated upfront per project or task type

Why This Work Matters

Your expertise helps ensure that AI systems:

Deliver accurate, helpful, and empathetic customer support
Align with enterprise service standards and policies
Are safe, reliable, and trustworthy across languages

Language Requirements

Native or professional fluency in one or more supported languages is required
Supported languages span 30+ global languages
Language-specific nuance is assessed through screening and task-based evaluation, not separate job descriptions
English fluency is required for guidelines, feedback, and collaboration

AI is changing how the world communicates — and LILT is leading that transformation.

LILT's mission is to make the world's information available to everyone, no matter the language they speak. Join our global community who thrive on innovation and excellence. Our collective knowledge, uniqueness, and skills deliver multilingual AI and human-verified services to Enterprises, Governments, and AI Developers around the world.

Earn money. Have fun. Advance human knowledge. Work on diverse projects from anywhere, any time you want. Get paid quickly and fairly, and build your professional network in a supportive community—all through a streamlined application process tailored to your expertise.

Information collected and processed as part of your application process, including any job applications you choose to submit, is subject to LILT's Privacy Policy at https://lilt.com/legal/privacy.

At LILT, we are committed to a fair, inclusive, and transparent hiring process. As part of our recruitment efforts, we may use artificial intelligence (AI) and automated tools to assist in the evaluation of applications, including résumé screening, assessment scoring, and interview analysis. These tools are designed to support human decision-making and help us identify qualified candidates efficiently and objectively. All final hiring decisions are made by people. If you have any concerns, require accommodations, or would like to opt-out of the use of AI in our hiring process, please let us know at recruiting@lilt.com.

LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to an individual’s race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, physical or mental disability, medical condition, genetic characteristics, veteran or marital status, pregnancy, or any other classification protected by applicable local, state or federal laws. We are committed to the principles of fair employment and the elimination of all discriminatory practices.

Ready to apply?

APPLY

Share ·