Logo for ASCENDING

Staff Site Reliability Engineer

Roles & Responsibilities

  • 7+ years of experience with cloud platforms (AWS and Azure) with strong multi-cloud expertise
  • 7+ years of experience in rearchitecting large-scale monolithic applications into cloud-native architectures
  • Strong Kubernetes (K8S) expertise, including hands-on experience with AKS and EKS
  • Terraform (IaC) experience and knowledge of security best practices for containers and Kubernetes

Requirements:

  • Lead the technical rearchitecting efforts to transform a large-scale monolithic system into a microservices-based, cloud-native application, collaborating with Engineering, Architecture, and Product to define and implement the new architecture using domain-driven design (DDD) principles
  • Conduct technology evaluations and provide recommendations for new tools, frameworks, and cloud services to enhance our infrastructure
  • Utilize Kubernetes for container orchestration to ensure extreme scalability, reliability, and high availability; design and implement robust, highly resilient components with comprehensive monitoring, logging, and alerting
  • Drive the adoption of DevOps practices throughout the software development lifecycle and mentor junior team members to foster continuous learning and growth

Job description

Required U.S. Citizenship / No clearance needed / 100% remote within the US / EST Time Zone

Staff Site Reliability Engineer / Cloud SME

Location: 100% remote in the continental US 

Type: Long-term contract (3+ years)

Role Summary

As the Staff SRE/Cloud SME, you will be a critical technical leader driving the rearchitecting of our existing monolithic system into a resilient, cloud-native architecture. This role requires deep expertise across multiple cloud platforms (Azure and AWS) and container orchestration (Kubernetes) to ensure the next-generation platform meets the highest standards of scalability, reliability, and security.

Key Responsibilities

Architecture & Transformation Leadership

  • Lead the technical rearchitecting efforts, transforming a large-scale monolithic system into a modern microservices-based, cloud-native application.
  • Collaborate with cross-functional teams (Engineering, Architecture, Product) to define and implement the new system architecture using domain-driven design (DDD) principles.
  • Conduct technology evaluations and provide recommendations for new tools, frameworks, and cloud services to enhance our infrastructure.

Reliability Engineering & Cloud Operations

  • Utilize Kubernetes (K8S) for container orchestration and management, ensuring extreme scalability, reliability, and high availability of the system.
  • Implement robust, highly resilient, and highly available components for the system.
  • Develop and implement comprehensive monitoring, logging, and alerting mechanisms to ensure optimal system performance and availability.
  • Drive the adoption of DevOps principles and practices throughout the software development lifecycle, ensuring seamless integration and continuous deployment processes.

Technical Expertise & Mentorship

  • Stay up-to-date with emerging technologies, frameworks, and industry trends related to systems and cloud computing.
  • Mentor and provide technical guidance to junior team members, fostering a culture of continuous learning and professional growth.

Required Qualifications

  • Cloud Platforms: 7+ years of experience with cloud computing platforms. Strong multi-cloud expertise required with AWS and Azure.
  • Cloud-Native Transformation: 7+ years of experience in rearchitecting large-scale monolithic applications to cloud-native architectures.
  • Container Orchestration: Strong expertise in Kubernetes (K8S) is required, including hands-on experience with both AKS (Azure Kubernetes Service) and EKS (Elastic Kubernetes Service).
  • Networking: Strong experience with Cloud Networking, with the ability to design and resolve complex cloud networking architecture problems.
  • IaC: Expert knowledge of Terraform for infrastructure-as-code deployment and management.
  • Security: Must possess strong knowledge of security best practices for containers and Kubernetes clusters.
  • Education: Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • Bonus Knowledge: Knowledge of load balancing algorithms.

Thanks for applying!

Site Reliability Engineer (SRE) Related jobs

Other jobs at ASCENDING

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.