Match score not available

AI Evaluation Engineer

unlimited holidays - fully flexible
Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field, 2+ years of experience evaluating AI and/or ML systems, Proficiency in Python and experience with machine learning frameworks like scikit-learn, TensorFlow, PyTorch, Knowledge of retrieval-augmented generation (RAG) and agent-based workflows..

Key responsabilities:

  • Design and implement evaluation frameworks and performance metrics for AI systems
  • Develop tools and dashboards for observability in the AI development lifecycle
  • Collaborate cross-functionally to integrate monitoring and testing methodologies into workflows
  • Identify bottlenecks and propose solutions for high accuracy and reliability in AI components.

Trunk Tools logo
Trunk Tools Startup https://trunktools.com/
11 - 50 Employees
See all jobs

Job description

At Trunk Tools, we are tackling the massive $13 trillion+ construction industry. We’re an exceptional team of serial entrepreneurs, brought together by our shared mission: automating construction. Our founding team (SpaceX, Stanford, MIT, Carta, etc.) has successfully built and deployed software in construction for 140k+ users, millions of users beyond the construction space, and worked on +$2 billion of built-environment projects. We aren’t another out-of-touch tech startup, most of our team comes from construction. 

We spent the last few years building the brain behind construction. Now we are deploying workflows/ agents, starting with Q&A document chatbot, to be ingrained in construction teams’ workflows, ultimately to automate construction. Given our immense traction with several Fortune 500 construction companies,  we are doubling our team (currently 45 FTE) in order to deploy several more agents this year. You will have an opportunity to drive the transformation of a multi-trillion-dollar industry full of waste, risks and inefficiencies.

What you will do and achieve:

  • Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)

  • Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle

  • Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows

  • Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components

  • Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities

Who you are:

  • MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field

  • 2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation

  • Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines

  • Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch

  • Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance

  • Experience with synthetic data generation or test automation to validate model robustness

  • Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment

Preferred but not required: 

  • Bonus: Experience with reinforcement learning, reward function design and policy optimization

  • Bonus: Construction industry knowledge or an interest in automating complex, large-scale processes

What we offer 😎

  • 🎖️ A close-knit and collaborative early-stage startup environment where every voice is heard and every opinion matters.

  • 💰 Competitive salary and stock option equity packages.

  • 🏥 3 Medical Plans to choose from including 100% covered option. Plus Dental and Vision Insurance!

  • 💰 401K

  • 🤓 Learning & Growth stipend.

  • 🥨 Free lunch provided in NYC and Austin office - you’ll never go hungry with us!

  • 🛫 Unlimited PTO; We truly believe in work-life balance and that hard work should be balanced with time for rest and rejuvenation.

  • 🏝 IRL / In-Person retreats throughout the year.

We realize applying for jobs can feel daunting at times. We don’t expect you to check all the qualification boxes and encourage you to apply if you have experience in some of the areas.

At Trunk Tools, we’re working hard to build a more productive and safer environment within the construction industry, and we strive to live by these same values here at Trunk Tools HQ. As an equal-opportunity employer, we are committed to building an inclusive environment where you can be you. We work hard to evaluate all employees and job applicants consistently, without regard to race, color, religion, gender, national origin, age, disability, pregnancy, gender expression or identity, sexual orientation, or any other legally protected class. 

 

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Problem Solving

AI Operations (AI Ops) Engineer Related jobs