Match score not available

Site Reliability Engineer (Remote)

Remote: 
Full Remote
Salary: 
120 - 175K yearly
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor’s Degree in Computer Science or related field, 5+ years in site reliability and software engineering, Proficiency with Kubernetes and JavaScript (JS), Strong foundation in computer science concepts.

Key responsabilities:

  • Support the entire software lifecycle
  • Build processes for system reliability and performance

Job description

Epsilon3 is a multi-product operations management platform revolutionizing the way teams build, launch, and operate spacecraft and other advanced hardware systems.

Launched in 2021, our company is led by engineers from SpaceX, Google, and NASA, who have experience supporting over 100 space missions. Innovative teams at Blue Origin, Rocket Lab, Axiom Space, Firefly Aerospace, and many others depend on our web-based (SaaS) solutions to plan and track high-stakes procedures. We raised a $15M Series A funding round led by Lux Capital, Y Combinator (YC S21), and other world-class investors.

This role is remote and can be based anywhere in the United States.

We are looking for a Site Reliability Engineer (SRE) who is interested in space exploration and passionate about building scalable, reliable, and secure software. You will be responsible for building and supporting complex infrastructure and deployment scenarios. We are currently using technologies such as React.JS, Node, Postgres, AWS GovCloud, Docker, and K8s, and our stack will evolve over time as we scale our solutions and approach.

The ideal candidate has years of experience using Kubernetes (K8s) and is proficient in JavaScript.

Some of the technical challenges we’re undertaking:
  • Real-time synchronization of data and user interfaces across earth and space
  • Visualization of many complex data fields
  • Integration of multiple high-bandwidth data streams for real-time processing and display
  • Multiple deployment environments including cloud and on-premises
  • Mission-critical security and reliability requirements
  • Supporting complex workflows and detailed tracking while also maintaining simplicity and delightfulness of user experience

  • Responsibilities:
  • Support and contribute to the entire lifecycle of our software, from inception and design, through to deployment, operation and refinement
  • Support our services in production and before they go live through system design, security considerations, capacity planning, and launch preparedness
  • Build processes and systems to continuously improve system reliability and performance
  • Build processes and systems to continuously improve the productivity of the rest of the development team
  • Scale systems sustainably through automation and continuous improvement in reliability and velocity
  • Practice sustainable incident response and postmortems
  • Contribute to the design, build, test, and release of our web-based operational dashboards, electronic procedure tools, and suite of specialized software solutions to support various missions
  • Join and actively participate in customer discovery calls and technical demonstrations
  • Analyze and enhance the security, efficiency, stability, and scalability of our software systems
  • Support software QA and user testing
  • Support and facilitate security reviews and audits of our systems by customers and third parties
  • Facilitate compliance with cybersecurity certifications and contribute to improvements in our security policies and processes
  • Assess third-party and open source software and develop integrations
  • Contribute to the growth and refinement of our engineering culture, processes, and tools

  • Qualifications:
  • Bachelor’s Degree in Computer Science or related field
  • 5+ years of combined experience in site reliability and production software engineering
  • Proficiency with Kubernetes (K8s) and JavaScript (JS) is required for this role
  • Strong foundation in computer science concepts (algorithms, data structures, object-oriented programming, design, testing, etc.)
  • Self-starter and able to navigate ambiguity and assess rapidly evolving priorities
  • Strong team player with great communication skills and collaborative work ethic
  • Love of learning (technical and otherwise)
  • Experience in fast-growing tech startups is a plus
  • Experience with Lean Startup methodologies (agile software development) is a plus
  • US Citizenship (future security clearance may be required)
  • Must be located in the United States
  • Salary range: $120,000 - $175,000

    This full-time role includes stock options, generous PTO, health insurance, and a 4% 401k match.

    We meet in-person four times per year for hackathons and fun team bonding activities.

    Epsilon3 is an equal opportunity employer committed to diversity and inclusion in the workplace. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws. This policy applies to all employment practices within our organization, including hiring, recruiting, promotion, termination, layoff, recall, leave of absence, compensation, benefits, training, and apprenticeship. Epsilon3 makes hiring decisions based solely on qualifications, merit, and business needs at the time.

    Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Teamwork
    • Communication
    • Problem Solving

    Site Reliability Engineer (SRE) Related jobs