Offer summary

Qualifications:

Bachelor’s degree in relevant field with 2-4 years of experience, Master’s degree in relevant field with 1-2 years of experience, PhD with internship experience, In-depth knowledge of CPU/GPU Computer Architecture and Microarchitecture, Excellent coding skills in C/C++, Strong understanding of Machine Learning workloads and benchmarks, Knowledge of performance modeling concepts and improvement strategies.

Key responsibilities:

Develop functional and timing simulators in C++, Analyze and optimize architectural and microarchitectural design space, Influence design choices based on experiments, Develop tests to evaluate model and RTL design, Identify and fix performance bottlenecks in tests/workloads/simulator

Job description

Join a well-funded, cutting-edge hardware startup in Silicon Valley as an Accelerator Microarchitecture Performance Modeling Engineer.

Responsibilities and opportunities in this role include - functional and cycle-accurate simulator development, architectural and microarchitectural design-space exploration for programmable accelerators, as well as analysis and optimization of modern, highly-parallel applications.

Our mission is to reimagine silicon and create accelerated computing platforms that will transform the industry. You will have the opportunity to work with some of the most talented and passionate engineers in the world to create designs that push the envelope on performance, energy-efficiency, programmability and scalability.

You will also have the opportunity to explore many adjacent areas of research and engineering, cross-cutting many levels of abstraction that must be scaled when building computing machinery - ISA design, application software, compiler optimization, RTL design, RTL correlation, design verification, test writing, and power/area analysis.

We offer a fun, creative, collaborative and flexible work environment, where you can contribute to our vision of building server-class compute machines that fulfill the promise and potential of hardware-software co-design, while also learning every day.

Requirements

In-depth knowledge of CPU/GPU Computer Architecture and Microarchitecture.

Excellent coding skills in C/C++ languages

Strong understanding of workloads and benchmarks in the Machine Learning space

Solid appreciation for the basics of SIMT processing, cache and memory hierarchies

Knowledge of performance modeling concepts - analytical, functional and cycle-accurate modelingKnowledge of performance improvement concepts - bottleneck analysis, latency hiding, speculative execution, shared resource arbitration, scheduling, buffer sizing, replacement policies

Ability to work well in a team, take ownership of tasks, embrace aggressive schedules, be self motivated to learn, seek help, think clearly and communicate effectively

Responsibilities

Performance modeling - develop functional and timing simulators in C++ modeling the programmable processing cores in a Data Parallel Accelerator.

Performance analysis - configure and use the simulator to explore the architectural and microarchitectural design space.

Design Space Exploration - influence the design choices based on experiments and studies

Performance testing - develop tests to evaluate quality of model and RTL design

Performance debug - identify and fix performance bottlenecks in tests/workloads/simulator

Performance correlation - identify correct performance targets for tests/workloads and ensure that the RTL design meets that target

Workload analysis - develop a deep understanding of the characteristics of workloads in the target market - machine learning, data analytics, graph analytics

Education and Experience

Bachelor’s degree with 2-4 years of experience in a relevant field

Master’s degree with 1-2 years of experience in a relevant field

PhD with internship experience in a relevant field

Required profile