At Trunk Tools, we are tackling the massive $13 trillion+ construction industry. We’re an exceptional team of serial entrepreneurs, brought together by our shared mission: automating construction. Our founding team (SpaceX, Stanford, MIT, Carta, etc.) has successfully built and deployed software in construction for 140k+ users, millions of users beyond the construction space, and worked on +$2 billion of built-environment projects. We aren’t another out-of-touch tech startup, most of our team comes from construction.
We spent the last few years building the brain behind construction. Now we are deploying workflows/ agents, starting with Q&A document chatbot, to be ingrained in construction teams’ workflows, ultimately to automate construction. Given our immense traction with several Fortune 500 construction companies, we are doubling our team (currently 45 FTE) in order to deploy several more agents this year. You will have an opportunity to drive the transformation of a multi-trillion-dollar industry full of waste, risks and inefficiencies.
What you will do and achieve:
Design and implement rigorous evaluation frameworks and performance metrics for AI systems (including RAG and agent-based architectures)
Develop tools, dashboards, and processes that bring observability to every step of the AI development lifecycle
Collaborate cross-functionally to embed best-in-class monitoring and testing methodologies into production workflows
Identify bottlenecks and propose solutions to ensure high accuracy and reliability across all AI components
Stay at the forefront of industry trends in LLMs, measurement techniques, and agent architectures to enhance system evaluation capabilities
Who you are:
MS/PhD in Computer Science, Machine Learning, Artificial Intelligence or a related field
2+ years of experience evaluating AI and/or ML systems, with a focus on performance metrics and validation
Hands-on experience with observability, analytics platforms, or data engineering to create robust monitoring pipelines
Proficiency in Python and strong experience with machine learning frameworks such as scikit-learn, TensorFlow, PyTorch
Knowledge of retrieval-augmented generation (RAG) and agent-based workflows, including best practices for measuring their performance
Experience with synthetic data generation or test automation to validate model robustness
Strong problem-solving skills and a collaborative mindset, eager to work in a fast-paced environment
Preferred but not required:
Bonus: Experience with reinforcement learning, reward function design and policy optimization
Bonus: Construction industry knowledge or an interest in automating complex, large-scale processes
What we offer 😎
🎖️ A close-knit and collaborative early-stage startup environment where every voice is heard and every opinion matters.
💰 Competitive salary and stock option equity packages.
🏥 3 Medical Plans to choose from including 100% covered option. Plus Dental and Vision Insurance!
💰 401K
🤓 Learning & Growth stipend.
🥨 Free lunch provided in NYC and Austin office - you’ll never go hungry with us!
🛫 Unlimited PTO; We truly believe in work-life balance and that hard work should be balanced with time for rest and rejuvenation.
🏝 IRL / In-Person retreats throughout the year.
We realize applying for jobs can feel daunting at times. We don’t expect you to check all the qualification boxes and encourage you to apply if you have experience in some of the areas.
At Trunk Tools, we’re working hard to build a more productive and safer environment within the construction industry, and we strive to live by these same values here at Trunk Tools HQ. As an equal-opportunity employer, we are committed to building an inclusive environment where you can be you. We work hard to evaluate all employees and job applicants consistently, without regard to race, color, religion, gender, national origin, age, disability, pregnancy, gender expression or identity, sexual orientation, or any other legally protected class.
Emmes
Zurich Insurance
agilon health
Zurich Insurance
J.D. Power