Match score not available

Staff Site Reliability Engineer - Kubernetes/AWS/EKS

79% Flex
EXTRA HOLIDAYS - FULLY FLEXIBLE
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

7+ years experience in Site Reliability Engineering, Experience with Kubernetes and cloud-based systems, Proficiency in scripting languages like Python, Golang, Java, Ruby, AWS expertise and familiarity with other cloud platforms.

Key responsabilities:

  • Ensure stability, reliability, and scalability of distributed systems
  • Design and implement monitoring, alerting, and system analysis
  • Automate tasks & optimize SRE support for critical systems
SentinelOne logo
SentinelOne Cybersecurity Large https://www.sentinelone.com/
1001 - 5000 Employees
See more SentinelOne offers

Job description

Logo Jobgether

Your missions

About Us:

SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real-time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With SentinelOne, organizations gain full transparency into everything happening across the network at machine speed – to defeat every attack, at every stage of the threat lifecycle. 

We are a values-driven team where names are known, results are rewarded, and friendships are formed. Trust, accountability, relentlessness, ingenuity, and OneSentinel define the pillars of our collaborative and unified global culture. We're looking for people that will drive team success and collaboration across SentinelOne. If you’re enthusiastic about innovative approaches to problem-solving, we would love to speak with you about joining our team!

Hiring for this role is limited to US Citizens only under various Federal laws and regulations.

 

What Are We Looking For?

We are looking for an experienced SRE, well-versed in large-scale SaaS or cloud engineering environments. As a Site Reliability Engineer, your primary responsibility will be the stability, reliability, and scalability of SentinelOne’s products and services. In this job, you will have an opportunity to help design, implement, and maintain robust infrastructure, complex distributed systems and related areas such as monitoring and automation. Someone who has driven continuous deployment, has provided engineering leadership and expertise for complex incidents and corresponding post-incident reviews, has provided feedback to development teams on architecture decisions, and has automated repetitive operational tasks would be a great fit.

What Will You Do?

  • Support the stability, reliability, and scalability of SentinelOne’s distributed systems through various tasks performed by the Site Reliability Engineering organization including managing Kubernetes, creating IaC, and leading troubleshooting during incident response
  • Identify areas, such as performance issues and availability concerns, as well as perform other technical and architectural reviews to partner with fellow engineering teams to improve overall reliability of SentinelOne systems
  • Design and implement comprehensive monitoring and alerting, as well as concepts such as SLIs/SLOs and critical user journeys to provide deeper insight into the performance and availability of SentinelOne’s systems
  • Analyze systems, identify toil, and develop and implement strategies such as automation to streamline and optimize SRE’s support of critical systems

What Skills and Experience Will You Need?

  • 7+ years of experience in Site Reliability Engineering, preferably with a large scale SaaS product or large cloud-based distributed system
  • 5+ years of production experience with orchestration systems like Kubernetes, Nomad or Mesos
  • Experience with a scripting language, such as Python, Golang, Java, or Ruby
  • Familiarity with running Java and JavaScript applications, including build and deploy
  • AWS experience, and familiarity with other platforms like GCP
  • Experience using Infrastructure as Code (IaC) to setup cloud-native services
  • Familiarity with CI and practical delivery using Jenkins, GHA, ArgoCD, etc. or similar; familiarity with deployment strategies like blue-green, rolling deploys, canary deploys, and best practices around deployment automation
  • Curiosity, fast-learning, and great communication skills
  • Preferred: 2+ years of experience in a FedRAMP environment
  • Ability to work in a diverse and distributed team
  • Self-starter attitude, with passion for new technologies and empathy for legacy systems
  • Ability to learn quickly, and navigate through unfamiliar programming languages, systems, and processes

Why Us?

  • You will be joining a cutting-edge company, where you will tackle extraordinary challenges and work with the very best in the industry
  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Industry-leading gender-neutral parental leave
  • Paid company holidays
  • Paid sick time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events including regular happy hours and team-building events
This U.S. role has a base pay range that will vary based on the location of the candidate.  For some

locations, a different pay range may apply.  If so, this range will be provided to you during the recruiting

process.  You can also reach out to the recruiter with any questions.

Base Salary Range
$148,000$204,000 USD

SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

SentinelOne participates in the E-Verify Program for all U.S. based roles. 

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Cybersecurity
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Soft Skills

  • Excellent Communication
  • Fast Learner
  • Proactive Mindset

Go Premium: Access the World's Largest Selection of Remote Jobs!

  • Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
  • Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
  • Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
  • Try out our Premium Benefits with a 7-Day FREE TRIAL.
    No obligations. Cancel anytime.
Upgrade to Premium

Find more Site Reliability Engineer jobs