Match score not available

HPC Technical Support Engineer

extra holidays
Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field., Strong expertise in Linux system administration, particularly RedHat., Experience with large-scale storage systems like GPFS, Lustre, or NFS., Proficiency in scripting languages such as Bash, Python, or Perl for automation..

Key responsabilities:

  • Administer and maintain HPC clusters, including installation and configuration of hardware and software.
  • Manage and optimize parallel file systems and storage solutions, ensuring data redundancy and security.
  • Investigate and resolve hardware faults, disk failures, and network issues while analyzing system logs.
  • Produce and maintain documentation detailing system setups and inventory for HPC environments.

Eviden logo
Eviden XLarge https://eviden.com/
10001 Employees
See all jobs

Job description

HPC Support Engineer:


A High-Performance Computing (HPC) support engineer plays a vital role in maintaining and optimizing computing environments, which are used by research institutions, industries, and organizations for tasks that require significant computational power, such as scientific simulations, large-scale data analysis, machine learning, and engineering computations.


Role Expectations:


  • HPC systems are often clusters of interconnected servers. The engineer is responsible for the administration of these clusters, which includes installation, configuration, and maintenance of hardware and software.
  • Linux is the dominant OS in HPC environments. The engineer ensures that the OS is updated, secure, and optimized for high-performance workloads.
  • Deploy, configure, and maintain parallel file systems (e.g., Lustre, GPFS (Spectrum Scale))
  • Manage NAS, SAN, and object storage solutions (e.g., Ceph, ZFS, NetApp, Dell EMC Isilon)
  • Handle RAID configurations, LVM (Logical Volume Manager)
  • Investigate and resolve disk failures, network congestion, and hardware faults.
  • Analyze logs from storage controllers, RAID arrays, and filesystems.
  • Set up snapshots, replication, and erasure coding for data redundancy.
  • Interactions with SMC (Smart Management Center) which is the foundation for hosting infrastructure and application micro-services dedicated in managing a HPC supercomputer.
  • Support and maintain technology standards, processes and policies related to on prem/cloud Infrastructure in scope.
  • Produce and maintain appropriate documentation and diagrams describing system setups and overall inventory.


Capabilities and Expertise:


  • System Administration RedHat expertise.
  • Storage Management, familiarity with large-scale storage systems such as GPFS, Lustre, or NFS, and the ability to troubleshoot file system issues.
  • Proficiency in hardware diagnostics.
  • Familiarity with high-availability (HA) storage solutions.
  • Experience with backup and restore solutions for petabyte-scale data
  • Familiarity firewall rules for securing storage nodes.
  • Scripting Proficiency, use scripting languages such as Bash, Python, or Perl for automating routine tasks like storage monitoring, job submissions, etc.


Nice to have:

  • Supercomputers knowledge, and understanding of advanced supercomputing platforms (e.g., Cray, IBM Blue Gene).
  • IBM Certified Administrator - Spectrum Scale (GPFS) – GPFS expertise.
  • Understanding of how HPC storage is evolving in exascale computing.


What we offer:

  • Training and Certifications: Access to continuous learning and career development opportunities.
  • Flexible working environment
  • Competitive salary and benefits package.
  • Reimbursement: Get a yearly fixed amount for reimbursement.
  • Performance Bonus: Earn an annual performance bonus based on your achievements.
  • Career Advancement: Explore numerous opportunities for professional growth and career advancement.
  • Extra Vacation Days: Take advantage of additional vacation days to relax and recharge.

Let’s grow together.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving

Technical Support Engineer Related jobs