Job description

Required U.S. Citizenship / No clearance needed / 100% remote within the US / EST Time Zone

Staff Site Reliability Engineer / Cloud SME

Location: 100% remote in the continental US

Type: Long-term contract (3+ years)

Role Summary

As the Staff SRE/Cloud SME, you will be a critical technical leader driving the rearchitecting of our existing monolithic system into a resilient, cloud-native architecture. This role requires deep expertise across multiple cloud platforms (Azure and AWS) and container orchestration (Kubernetes) to ensure the next-generation platform meets the highest standards of scalability, reliability, and security.

Key Responsibilities

Architecture & Transformation Leadership

Lead the technical rearchitecting efforts, transforming a large-scale monolithic system into a modern microservices-based, cloud-native application.
Collaborate with cross-functional teams (Engineering, Architecture, Product) to define and implement the new system architecture using domain-driven design (DDD) principles.
Conduct technology evaluations and provide recommendations for new tools, frameworks, and cloud services to enhance our infrastructure.

Reliability Engineering & Cloud Operations

Utilize Kubernetes (K8S) for container orchestration and management, ensuring extreme scalability, reliability, and high availability of the system.
Implement robust, highly resilient, and highly available components for the system.
Develop and implement comprehensive monitoring, logging, and alerting mechanisms to ensure optimal system performance and availability.
Drive the adoption of DevOps principles and practices throughout the software development lifecycle, ensuring seamless integration and continuous deployment processes.

Technical Expertise & Mentorship

Stay up-to-date with emerging technologies, frameworks, and industry trends related to systems and cloud computing.
Mentor and provide technical guidance to junior team members, fostering a culture of continuous learning and professional growth.

Required Qualifications

Cloud Platforms: 7+ years of experience with cloud computing platforms. Strong multi-cloud expertise required with AWS and Azure.
Cloud-Native Transformation: 7+ years of experience in rearchitecting large-scale monolithic applications to cloud-native architectures.
Container Orchestration: Strong expertise in Kubernetes (K8S) is required, including hands-on experience with both AKS (Azure Kubernetes Service) and EKS (Elastic Kubernetes Service).
Networking: Strong experience with Cloud Networking, with the ability to design and resolve complex cloud networking architecture problems.
IaC: Expert knowledge of Terraform for infrastructure-as-code deployment and management.
Security: Must possess strong knowledge of security best practices for containers and Kubernetes clusters.
Education: Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
Bonus Knowledge: Knowledge of load balancing algorithms.

Thanks for applying!

Staff Site Reliability Engineer

Role overview

Qualifications

Responsibilities

Key facts

Hard skills

Other skills

About the company

Company details

Links

Your match analysis

Job description

Apply once. Then go straight to the hiring manager.

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineer

Reliability Engineer

Senior Site Reliability Engineer II

Principal Site Reliability Engineer, Machine Learning

Senior Operations Reliability Engineer – IAM

Other jobs at ASCENDING

RHEL Systems Engineer

SME – Observability, ELK Stack & Monitoring Engineer

Golang & Java Developer (AWS)

Reach out to the hiring manager directly.