Key Facts

Remote From:

Canada

Full time

Senior (5-10 years)

English

Hard Skills

Natural Language Processing (NLP) MLOps (Machine Learning Operations) LibSVM Compiler Construction System Optimization Distributed Computing Cloud Computing Inference Engine Kubernetes

Other Skills

•
Communication
•
Leadership

Roles & Responsibilities

Advanced degree in Computer Science or Electrical Engineering (BS, MS, or PhD) with extensive experience in compiler development.
Deep expertise in MLIR, LLVM, Torch-MLIR, or similar frameworks for compiler design.
Strong understanding of model optimization techniques for inference, including quantization and operator fusion.
Experience deploying machine learning models on heterogeneous cloud hardware environments.

Requirements:

Design and implement a scalable MLIR-based compiler framework for cloud AI inference.
Lead development of compiler passes for model partitioning, operator fusion, and memory optimization.
Collaborate with cross-functional teams to ensure seamless integration of the compiler with system and cloud infrastructure.
Mentor and guide a team of compiler engineers to deliver high-performance inference software.

Job description

At dMatrix, we are focused on unleashing the potential of generative AI to power the transformation of technology. We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. Our culture is one of respect and collaboration.

We value humility and believe in direct communication. Our team is inclusive, and our differing perspectives allow for better solutions. We are seeking individuals passionate about tackling challenges and are driven by execution. Ready to come find your playground? Together, we can help shape the endless possibilities of AI.

Location:

Hybrid, working onsite at our Toronto, Ontario, Canada headquarters 35 days per week.

Role: Software Compiler Architect – MLIRLLVM for Cloud Inference

What You Will Do:

As a handson FrontEnd Software Compiler Architect focused on cloudbased AI inference, you will drive the design and implementation of a scalable MLIRbased compiler framework optimized for deploying largescale NLP and transformer models in cloud environments. You will architect the endtoend software pipeline that translates highlevel AI models into efficient, lowlatency executables on a distributed, multichiplet hardware platform featuring heterogeneous compute elements such as inmemory tensor processors, vector engines, and hierarchical memory.

Your compiler designs will enable dynamic partitioning, scheduling, and deployment of inference workloads across a cloudscale infrastructure, supporting both statically compiled and runtimeoptimized execution paths. You will focus on compiler strategies that minimize inference latency, maximize throughput, and efficiently utilize compute and memory resources in data center environments, in addition to your work on developing the compiler.

You will collaborate crossfunctionally with systems architects, ML framework teams, runtime developers, performance engineers, and cloud orchestration groups to ensure seamless integration and optimized inference delivery at scale.

Key Responsibilities:

• Architect the MLIRbased compiler for cloud inference workloads, focusing on efficient mapping of largescale AI models (e.g., LLMs, Transformers, TorchMLIR) onto distributed compute and memory hierarchies.

• Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latencyaware scheduling.

• Design support for hybrid offlineonline compilation and deployment flows with runtimeaware mapping, allowing for adaptive resource utilization and load balancing in cloud scenarios.

• Define compiler abstractions that interoperate efficiently with runtime systems, orchestration layers, and cloud deployment frameworks.

• Drive scalability, reproducibility, and performance through welldesigned IR transformations and distributed execution strategies.

• Mentor and guide a team of compiler engineers to deliver highperformance inferenceoptimized software stacks.

What You Will Bring:

• BS 15+ Yrs MS 12+ Yrs PhD 10+ Yrs Computer Science or Electrical Engineering, with 12+ years of experience in Front End Compiler and systems software development, with a focus on ML inference.

• Deep experience in designing or leading compiler efforts using MLIR, LLVM, TorchMLIR, or similar frameworks.

• Strong understanding of model optimization for inference: quantization, fusion, tensor layout transformation, memory hierarchy utilization, and scheduling.

• Expertise in deploying ML models to heterogeneous compute environments, with specific attention to latency, throughput, and resource scaling in cloud systems.

• Proven track record working with AI frameworks (e.g., PyTorch, TensorFlow), ONNX, and hardware backends.

• Experience with cloud infrastructure, including resource provisioning, distributed execution, and profiling tools.

Preferred Qualifications:

• Experience targeting inference accelerators (AI ASICs, FPGAs, GPUs) in cloudscale deployments.

• Knowledge of cloud deployment orchestration (e.g., Kubernetes, containerized AI workloads).

• Strong leadership skills with experience mentoring teams and collaborating with largescale software and hardware organizations.

• Excellent written and verbal communication; capable of presenting complex compiler architectures and tradeoffs to both technical and executive stakeholders.

This role is a cornerstone of our cloud AI software strategy. Youll shape the way inference workloads are deployed, optimized, and scaled across data center infrastructure.

Equal Opportunity Employment Policy

dMatrix is proud to be an equal opportunity workplace and affirmative action employer. We’re committed to fostering an inclusive environment where everyone feels welcomed and empowered to do their best work. We hire the best talent for our teams, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status. Our focus is on hiring teammates with humble expertise, kindness, dedication and a willingness to embrace challenges and learn together every day.

dMatrix does not accept resumes or candidate submissions from external agencies. We appreciate the interest and effort of recruitment firms, but we kindly request that individual interested in opportunities with dMatrix apply directly through our official channels. This approach allows us to streamline our hiring processes and maintain a consistent and fair evaluation of al applicants. Thank you for your understanding and cooperation.