Key Facts

Remote From:

Full time

Senior (5-10 years)

Hard Skills

Python (Programming Language) Azure OpenAI Apache Airflow Apache Kafka Hugging Face (NLP Framework) AWS SageMaker Continuous Delivery Explainable AI (XAI) Elasticsearch CI/CD Azure AI Studio PyTorch (Machine Learning Library) Argo CD Terraform Kubernetes Stakeholder Communications Continuous Training Internal Documentation Technical Leadership

Other Skills

•
Communication
•
Teamwork
•
Problem Solving

Roles & Responsibilities

Bachelor's or Master's degree in Computer Science, Artificial Intelligence, Data Science, or a related field
3–5+ years of hands-on ML/AI engineering experience, including at least 2 years working directly on LLM evaluation, QA, or safety
Strong familiarity with evaluation techniques for generative AI (human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming) and bias detection/fairness
Proficiency with Python and modern AI/ML/LLM libraries (e.g., LangChain, HuggingFace, PyTorch, LlamaIndex) and experience with observability/guardrails tools (Langfuse, Langsmith) and DevOps/MLOps pipelines (Kubernetes, Terraform, ArgoCD, GitHub Actions)

Requirements:

Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety use cases; design guardrails and alignment strategies to minimize bias, toxicity, and hallucinations in production workflows
Define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability) and implement continuous evaluation pipelines integrated into CI/CD and production monitoring systems
Stress test models against edge cases, adversarial prompts, and sensitive data scenarios; research and integrate third-party evaluation frameworks adapted to a regulated, high-stakes environment
Ensure explainability, transparency, and auditability of AI outputs; provide technical leadership in responsible AI practices and contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of evaluation and guardrail systems

Job description

At LeoTech, we are passionate about building software that solves real-world problems in the Public Safety sector. Our software has been used to help the fight against continuing criminal enterprises, drug trafficking organizations, identifying financial fraud, disrupting sex and human trafficking rings and focusing on mental health matters to name a few.

Role

This is a remote, WFH role.

As an AI/LLM Evaluation & Alignment Engineer on our Data Science team, you will play a critical role in ensuring that our Large Language Model (LLM) and Agentic AI solutions are accurate, safe, and aligned with the unique requirements of public safety and law enforcement workflows. You will design and implement evaluation frameworks, guardrails, and bias-mitigation strategies that give our customers confidence in the reliability and ethical use of our AI systems. This is an individual contributor (IC) role that combines hands-on technical engineering with a focus on responsible AI deployment. You will work closely with AI engineers, product managers, and DevOps teams to establish standards for evaluation, design test harnesses for generative models, and operationalize quality assurance processes across our AI stack.

Core Responsibilities

Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases.

Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows.

Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability).

Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems.

Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios.

Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment.

Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs.

Provide technical leadership in responsible AI practices, influencing standards across the organization.

Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus).

Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation.

What We Value

Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field.

3–5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety.

Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming.

Experience with bias detection, fairness approaches, and responsible AI design.

Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith

Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex).

Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions.

Understanding of cloud AI platforms (AWS, Azure) and deployment best practices.

Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios.

Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders.

Technologies We Use

Cloud & Infrastructure: AWS (Bedrock, SageMaker, Lambda), Azure AI, Kubernetes (EKS), Terraform, ArgoCD.

LLMs & Evaluation: HuggingFace, OpenAI API, Anthropic, LangChain, LlamaIndex, Ragas, DeepEval, OpenAI Evals.

Observability & Guardrails: Langfuse, GuardrailsAI.

Backend & Data: Python (primary), ElasticSearch, Kafka, Airflow.

DevOps & Automation: GitHub Actions, CodePipeline.

What You Can Expect

Work from home opportunity

Enjoy great team camaraderie.

Thrive on the fast pace and challenging problems to solve.

Modern technologies and tools.

Continuous learning environment.

Opportunity to communicate and work with people of all technical levels in a team environment.

Grow as you are given feedback and incorporate it into your work.

Be part of a self-managing team that enjoys support and direction when required.

3 weeks of paid vacation – out the gate!!

Competitive Salary.

Generous medical, dental, and vision plans.

Sick, and paid holidays are offered.

LeoTech is an equal opportunity employer and does not discriminate on the basis of any legally protected status.

Ready to apply?

APPLY

Share ·

Software Engineer Related jobs

Texas (USA)Software Engineer

Principal Software Engineer, Full Stack

30+ days ago

Cambridge Mobile Telematics

Full time

AWS Cloud ServicesJavaScript LibrariesPython (Programming Language)Full Stack Development

Software Engineer, CPU and SoC Profiling Tools

30+ days ago

NVIDIA

Full time

Data StructuresPerformance ProfilingAlgorithm AnalysisC++ (Programming Language)

Staff Software Engineer

30+ days ago

Cority

Full time

MicroservicesASP.NETEntity Framework

Principal Software Engineer (London Only)

30+ days ago

8th Light

Full time

Software DeploymentSoftware ArchitectureTechnical Leadership

Software Engineer - AMS

30+ days ago

Parexel

Full time

DatabricksAzure Data FactoryContinuous DeliveryAzure DevOps

Other jobs at Ring

AI/NLP Engineer

30+ days ago

Ring

Full time
Senior (5-10 years)

Large Language ModelingSemantic AnalysisMulti-Agent Systems

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.

AI/LLM Evaluation & Alignment Software Engineer

Key Facts

Hard Skills

Other Skills

Roles & Responsibilities

Requirements:

Job description

Software Engineer Related jobs

Principal Software Engineer, Full Stack

Software Engineer, CPU and SoC Profiling Tools

Staff Software Engineer

Principal Software Engineer (London Only)

Software Engineer - AMS

Other jobs at Ring

AI/NLP Engineer

We help you get seen. Not ignored.

Auto-Apply

AI Match Feedback