Logo for KMC Solutions

XTN-EF5F239 | SENIOR DEVOPS ENGINEER -CLOUD & HPC INFRASTRUCTURE

Roles & Responsibilities

  • 7+ years of experience in DevOps, Infrastructure Engineering, or Systems Engineering
  • 5+ years of hands-on AWS architecture experience
  • Deep expertise in Linux systems administration
  • Strong experience with containerization and Kubernetes

Requirements:

  • Design, deploy, and manage scalable, secure AWS infrastructure and cloud networking with IaC (Terraform/CloudFormation)
  • Architect and automate server provisioning across cloud and hybrid environments; build hardened Linux images; implement configuration management and patching
  • Manage containerization and virtualization (Docker, Kubernetes/EKS, VMware/KVM) and implement CI/CD pipelines for containerized apps
  • Build and maintain HPC infrastructure, including clusters and job schedulers (Slurm/PBS), integrate with AWS ParallelCluster, and optimize high-speed networking and GPU workloads

Job description

Position Overview

We are seeking a highly experienced Senior DevOps Engineer to lead the design, deployment, automation, and operational excellence of our AWS-based cloud infrastructure and high-performance computing (HPC) environments. This role requires deep expertise in AWS architecture, Linux systems administration, server deployment, containerization, virtualization, license server management, and cloud networking.

The ideal candidate is hands-on, automation-driven, security-focused, and comfortable operating in complex hybrid environments supporting research, engineering, and compute-intensive workloads.

  • Health Insurance/HMO 
  • Enjoy unlimited MadMax Coffee
  • Diverse learning & growth opportunities
  • Accessible Cloud HR platform (Sprout)
  • Above standard leaves

Key Responsibilities

  • Cloud Infrastructure & AWS Architecture
  • Design, deploy, and manage scalable, secure AWS infrastructure.
  • Architect and maintain VPCs, subnets, route tables, NAT gateways, transit gateways, and peering.
  • Manage AWS networking components including Route53, Load Balancers (ALB/NLB), CloudFront, and PrivateLink.
  • Implement infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar.
  • Optimize cloud cost, performance, and resource utilization.
  • Implement AWS best practices for security, resilience, and high availability.

Server Deployment & Systems Engineering

  • Architect and automate server provisioning across cloud and hybrid environments.
  • Deploy and manage EC2, Auto Scaling Groups, Launch Templates, and AMIs.
  • Build hardened Linux server images (CIS benchmarks preferred).
  • Implement configuration management using tools such as Ansible, Puppet, or Chef.
  • Manage patching, lifecycle management, and OS hardening strategies.

Expert Linux Administration

  • Advanced administration of RHEL, Rocky, Ubuntu, or similar distributions.
  • Kernel tuning and performance optimization for compute-intensive workloads.
  • Troubleshooting system-level performance (CPU, memory, I/O, networking).
  • Manage system services, storage, RAID, LVM, NFS, and distributed filesystems.
  • Shell scripting and automation (Bash, Python).

Containerization & Virtualization

  • Design and manage containerized workloads using Docker.
  • Deploy and maintain Kubernetes (EKS preferred).
  • Implement CI/CD pipelines for container-based applications.
  • Manage virtualization platforms (VMware, KVM, or similar).
  • Optimize container orchestration for HPC and compute workloads.

HPC Infrastructure Management

  • Deploy and maintain High Performance Computing clusters.
  • Manage job schedulers (Slurm, PBS, or similar).
  • Optimize cluster performance, storage throughput, and node scaling.
  • Integrate HPC workloads with AWS services (e.g., ParallelCluster).
  • Manage high-speed networking (InfiniBand or equivalent if applicable).
  • Support GPU-based workloads where applicable.

License Server Administration

  • Deploy and manage FlexLM or similar license servers.
  • Ensure high availability and redundancy for engineering license services.
  • Monitor license usage and optimize allocation.
  • Troubleshoot license connectivity and performance issues.

Cloud Networking & Security

  • Deep understanding of TCP/IP, DNS, routing protocols, and firewall design.
  • Implement secure connectivity (VPN, Direct Connect, site-to-site).
  • Manage security groups, NACLs, IAM roles, and zero-trust principles.
  • Implement logging, monitoring, and alerting (CloudWatch, Prometheus, Grafana).
  • Support compliance frameworks and infrastructure security controls.

Automation & CI/CD

  • Build and maintain CI/CD pipelines (GitHub Actions, GitLab, Jenkins, etc.).
  • Automate infrastructure deployments and configuration management.
  • Implement DevSecOps best practices.
  • Develop reusable infrastructure modules and standards.

Monitoring & Observability

  • Implement centralized logging solutions.
  • Configure performance monitoring and alerting systems.
  • Perform root cause analysis and incident response.
  • Develop dashboards and operational metrics.

Required Qualifications

  • 7+ years of experience in DevOps, Infrastructure Engineering, or Systems Engineering.
  • 5+ years of hands-on AWS architecture experience.
  • Deep expertise in Linux systems administration.
  • Strong experience with containerization and Kubernetes.
  • Proven experience managing HPC environments.
  • Experience managing enterprise license servers.
  • Strong scripting skills (Bash, Python).
  • Experience with Infrastructure as Code (Terraform preferred).
  • Strong understanding of networking fundamentals and cloud networking.

Preferred Qualifications

  • AWS Solutions Architect Professional or DevOps Professional certification.
  • Experience with AWS ParallelCluster.
  • Experience with GPU workloads and AI/ML infrastructure.
  • Experience with enterprise storage solutions (NetApp, Isilon, etc.).
  • Experience supporting research or engineering compute environments.
  • Soft Skills
  • Strong troubleshooting and analytical skills.
  • Ability to work independently in high-complexity environments.
  • Clear documentation and communication skills.
  • Experience collaborating across engineering, security, and research teams.
  • Strategic mindset with hands-on execution capability.

What Success Looks Like

  • Highly available, secure, and automated AWS & HPC infrastructure.
  • Optimized cloud costs and compute performance.
  • Reliable license server infrastructure with minimal downtime.
  • Fully automated server deployments.
  • Secure, scalable cloud networking architecture.
  • Improved deployment velocity through CI/CD automation.

Cloud DevOps Engineer Related jobs

Other jobs at KMC Solutions

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.