Research Engineer Evaluations

extra parental leave - fully flexible
Work set-up: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Strong understanding of generative AI models like Diffusion Models, GANs, and Transformers., Practical experience in designing data-driven evaluation methodologies., Experience managing large-scale distributed model training across multiple GPUs., Proficiency with machine learning frameworks such as PyTorch and cloud environments like AWS..

Key responsibilities:

  • Design and optimize infrastructure for automated evaluation systems using multimodal large language models.
  • Implement inference-time alignment techniques to improve model outputs.
  • Develop benchmarking frameworks to assess model quality and human alignment.
  • Collaborate with research teams to integrate evaluation systems into model development.

Canva logo
Canva Computer Software / SaaS Large http://www.canva.com
1001 - 5000 Employees
See all jobs

Job description

Company Description

Join the team redefining how the world experiences design.

Servus, hey, gday, mabuhay, kia ora, 你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and youre probably keen to find out whats on offer, so well get straight to the point.

Where and how you can work

Our flagship campus is in Sydney, Australia but Austria is home to part of our European operations. And you have choice in where and how you work, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.

Fun fact, a big part of our Austrian operations is developing the AI product within Canva to help reimagine how artificial intelligence can be used in design. Pretty cool ha!

Job Description

At Canva, our mission is to empower the world to design. To ensure our generative AI models are truly helpful, we are seeking a talented ResearchMachine Learning Engineer to build our nextgeneration evaluation system by leveraging automatic evaluations.

Job Description

At Canva, our mission is to empower the world to design. To ensure our generative AI models are truly helpful, we are seeking a talented ResearchMachine Learning Engineer to build our nextgeneration evaluation system by leveraging automatic evaluations.

About the role:

You will engineer sophisticated AI agents that can automatically assess the quality and human alignment of our generative design models. This highimpact role focuses on building the practical systems that make cuttingedge research effective, to provide a rapid feedback loop that guides the future of design generation at Canva, ultimately empowering millions of users to create.

At the moment, this role is focused on:

  • Agentic Evaluation Systems: Engineering autonomous AI agents that use Multimodal Large Language Models (MLLMs) to evaluate the quality, relevance, and human alignment of generated designs.

  • InferenceTime Alignment: Mastering techniques that improve model outputs without full retraining, but by inferencebased methods including prompt engineering, incontext learning and RetrievalAugmented Generation (RAG).

  • Model Benchmarking & Analysis: Building a rigorous framework to systematically benchmark internal and external quality understanding models, delivering clear, datadriven insights on human alignment.

    • Primary Responsibilities:

      • Design, build, and optimize the infrastructure for an MLLMasaJudge evaluation system for scalable, automated feedback.

      • Implement and experiment with inferencetime alignment techniques (Prompt Engineering, RAG, ICL) to directly improve model output quality.

      • Establish and manage a comprehensive benchmarking process to compare various foundation models on designcentric tasks.

      • Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team.

      • Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle.

      • Translate the latest research in LLM evaluation and agentic AI into practical, productionready engineering solutions.

        • You’re probably a match if you:

          • You have a strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures, with practical experience that informs robust evaluation strategies

          • Excel at creating datadriven evaluation methodologies, turning user analytics into clear, actionable insights.

          • You’ve successfully managed or optimized largescale distributed model training across hundreds of GPUs

          • You have a solid understanding of machine learning, have worked with PyTorch and know how to optimize such codes for speed

          • You have disciplined coding practices, and are experienced with code reviews and pull requests.

          • You have experience working in cloud environments, ideally AWS

Required profile

Experience

Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Thinking
  • Collaboration
  • Problem Solving

Related jobs