Offer summary

Qualifications:

Proven experience in model compression techniques., Expertise in deep learning frameworks like TensorFlow or PyTorch., Strong understanding of AI/ML research developments., Preferred: PhD or advanced degree in related fields..

Key responsabilities:

Design and implement advanced model compression techniques.

Optimize compressed models for high-throughput inference.

Collaborate with researchers and engineers on model integrations.

Stay updated on advancements in AI model compression.

Job description

Company Overview
Axelera is a European, high-growth Series B startup revolutionizing the AI landscape with our in-memory computing platform. We specialize in creating AI hardware and software optimized for high-performance inference, catering to cutting-edge use cases across high-end edge computing, embodied AI, and server-side AI deployments. We are looking for passionate, innovative research engineers to join our team and help drive the future of AI.

Role Overview
We are looking for an AI Research Engineer with a strong focus on model compression to join our dynamic team. This role will be responsible for developing cutting-edge compression techniques that make Generative AI models more efficient for real-time inference across a variety of environments, from high-end edge systems to large-scale server-side deployments. You will be key in ensuring that our models are optimized for memory usage, computational efficiency, and performance, while maintaining or improving model accuracy.

This is an exciting opportunity to work at the intersection of advanced machine learning, in-memory computing, and high-performance AI inference on cutting-edge hardware architectures.

Responsibilities:

Model Compression: Design and implement advanced model compression techniques such as pruning, quantization, weight sharing, and knowledge distillation to make models more memory-efficient and computationally optimized.
Performance Tuning: Optimize compressed models to achieve high-throughput and low-latency inference, specifically tailored to our in-memory computing platform.
Collaboration: Work closely with AI researchers, software engineers, and hardware engineers to integrate your model optimizations into our AI platform, ensuring that models work effectively across edge and server-side deployments.
Innovation: Stay on top of the latest developments in the AI and model compression research space, pushing the envelope on novel techniques for reducing model size without sacrificing performance.
Deployment & Testing: Implement best practices for model testing, deployment, and continuous improvement to ensure models scale effectively in production environments.

Requirements:

Experience: Proven experience (for all levels) working on model compression, including techniques like pruning, quantization, low-rank factorization, and knowledge distillation.
Technical Skills:
- Expertise in deep learning frameworks such as TensorFlow, PyTorch, or JAX.
- Experience optimizing models for resource-constrained environments, such as edge devices or embedded systems.
- Familiarity with distributed systems, in-memory computing, or high-performance computing environments.
- A strong understanding of deep learning algorithms, neural networks, and the trade-offs involved in model compression.
Knowledge: A strong understanding of the latest advancements in AI/ML research, particularly in compression and distillation of generative models (e.g. transformers and diffusion models).
Collaboration & Communication: Ability to work in a highly collaborative, fast-paced startup environment and communicate complex technical concepts clearly.

Preferred Qualifications:

PhD or advanced degree in Computer Science, Machine Learning, AI, or related fields.
Research experience in model compression, efficient inference, or deploying AI models to resource-constrained devices.
Familiarity with model deployment frameworks like TensorRT, ONNX, or similar.
A passion for solving real-world challenges with AI in dynamic, high-performance environments.

Location

This position is based in Italy & we support relocation to Bologna, Florence or Milan for talent based abroad and interested in this role.

Why Join Us?

Impact: Work on groundbreaking technology that will power the next wave of AI applications, from edge computing to embodied AI systems.
Culture: Join a diverse, driven team that values innovation, collaboration, and continuous learning.
Growth: As a Series B startup, you’ll have significant growth opportunities, including the chance to shape the direction of the product and AI strategy.
Compensation: Competitive salary, equity options, and benefits package.

How to Apply?
Please submit your resume and a brief cover letter explaining why you're excited about this opportunity, and how your experience aligns with our model compression goals.

At Axelera AI, we wholeheartedly embrace equal opportunity and hold diversity in the highest regard. Our steadfast commitment is to cultivate a warm and inclusive environment that empowers and celebrates every member of our team. We welcome applicants from all backgrounds to join us in shaping the future of AI.

Required profile