Match score not available

Staff Site Reliability Engineer

75% Flex
EXTRA PARENTAL LEAVE
Remote: 
Full Remote
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

5+ years managing AWS, Kubernetes infrastructure, Security best practices knowledge in cloud environments, Proficient in Kubernetes, Docker, Terraform, Strong troubleshooting and networking skills, Experience with CI/CD pipelines.

Key responsabilities:

  • Design, build and maintain AWS, Kubernetes infrastructure
  • Monitor system performance, troubleshoot issues
  • Implement preventive measures for future incidents
  • Ensure security best practices throughout infrastructure stack
  • Participate in post-incident reviews
Everbridge  logo
Everbridge Large https://www.everbridge.com/
1001 - 5000 Employees
See more Everbridge offers

Job description

Logo Jobgether

Your missions

About the company
 
Everbridge (NASDAQ: EVBG) empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. In today’s unpredictable world, resilient organizations minimize impact to people and operations, absorb stress, and return to productivity faster when deploying critical event management (CEM) technology. Everbridge digitizes organizational resilience by combining intelligent automation with the industry’s most comprehensive risk data to Keep People Safe and Organizations Running™. For more information, visit www.everbridge.com, read the company blog, and follow on Twitter. Everbridge… Empowering Resilience

What you'll do
  • Design, build, and maintain scalable, reliable and secure AWS and  Kubernetes infrastructure to support our applications and services. 
  • Manage infrastructure configuration using modern IaC tool such as  Terraform. 
  • Monitor system performance and reliability metrics, troubleshoot  issues, and implement solutions to minimize downtime and  performance degradation. 
  • Collaborate with cross-functional teams to design and develop  reliable, fault-tolerant systems. 
  • Participate in on-call rotation and respond to production incidents in  a timely manner. 
  • Participate in post-incident reviews and implement preventive measures  to mitigate future incidents. 
  • Implement and maintain security best practices throughout the  infrastructure stack, ensuring compliance with industry standards  and regulations. 
  • Monitor and identify security vulnerabilities in infrastructure and  container runtime and mitigate them 

  • What you'll bring:
  • 5+ years of experience in building and managing production grade infrastructure in AWS, Kubernetes and/or EKS. 
  • In-depth knowledge of AWS services, including but not limited to  EC2, S3, VPC, IAM, ECR, Route53, and API Gateway. 
  • Familiarity with security best practices in cloud environments,  including identity and access management (IAM), encryption, and  compliance standards (e.g., GDPR). 
  • Deep understanding of Kubernetes architecture, components, and  ecosystem, including Docker, etcd, kube-proxy, and kube-controller- manager. 
  • Proficiency in container orchestration concepts and IaC with hands on experience in tools such as Helm, Terraform. 
  • Mentor and guide engineering team to adapt to infrastructure releated changes within their services. 
  • Familiarity with monitoring and logging tools such as DataDog, Sumologic, Prometheus, Grafana, and experience setting up monitoring/alerting systems for large-scale production environments. 
  • Solid understanding of Linux/Unix system administration, including  shell scripting and system troubleshooting. 
  • Strong understanding of networking concepts, including TCP/IP,  DNS, DHCP, VPN, and CDN tools like Cloudflare and/or AWS CloudFront. 
  • Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex technical issues in production environments. 
  • Experience with CI/CD pipelines and tools such as Jenkins, GitLab CI, Argo CD or Spinnaker. 
  • #LI-BK1 

    About Everbridge

    Everbridge (NASDAQ: EVBG) empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. In today’s unpredictable world, resilient organizations minimize impact to people and operations, absorb stress, and return to productivity faster when deploying critical event management (CEM) technology. Everbridge digitizes organizational resilience by combining intelligent automation with the industry’s most comprehensive risk data to Keep People Safe and Organizations Running™. For more information, visit www.everbridge.com, read the company blog, and follow on Twitter. Everbridge… Empowering Resilience
     
    Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

    Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Soft Skills

    • Networking
    • Team Collaboration
    • Problem Solving
    • Leadership
    • Strong Communication

    Go Premium: Access the World's Largest Selection of Remote Jobs!

    • Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
    • Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
    • Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
    • Try out our Premium Benefits with a 7-Day FREE TRIAL.
      No obligations. Cancel anytime.
    Upgrade to Premium

    Find more Site Reliability Engineer jobs