Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree in computer science or equivalent work experience., 3-5 years of hands-on experience in the cybersecurity field., Strong understanding of networking principles in cloud and containerized environments., Expertise with Infrastructure as Code (IaC) and deployment automation tools..

Key responsabilities:

  • Lead the migration of EC2 workloads to ECS and develop DevOps tooling for containerized applications.
  • Implement a service mesh architecture to advance zero trust security initiatives.
  • Design and implement proactive monitoring and alerting solutions to optimize uptime.
  • Uphold SLAs and SLOs by applying SRE best practices and creating operational playbooks.

DefenseStorm logo
DefenseStorm Computer Hardware & Networking Scaleup https://defensestorm.com/
51 - 200 Employees
See all jobs

Job description

Site Reliability Engineer

As a Site Reliability Engineer at DefenseStorm you will be playing a crucial role in ensuring the reliability, scalability, and performance of our cloud-based services. GRID is a high-throughput, data intensive application that currently handles 250k events/sec. You will drive best practices and contribute to both the design and implementation of robust cloud infrastructures that can scale rapidly to support the growing customer base of DefenseStorm.   

Location
Atlanta, GA
Remote

Job Duties and Responsibilities 

  • Lead the migration of EC2 workloads to ECS and develop DevOps tooling to empower development teams to build and manage containerized applications. 
  • Advance zero trust security initiatives by implementing a service mesh architecture with technologies such as Istio. 
  • Enhance the security, scalability, and reliability of AWS cloud-native infrastructure through continuous improvement and innovation. 
  • Design and implement proactive monitoring and alerting solutions using tools like Prometheus, Grafana, and OpsGenie, leveraging data-driven insights to optimize uptime and mitigate operational risks. 
  • Uphold SLAs and SLOs by applying SRE best practices, including incident response, post-mortem analysis, and the creation of operational playbooks. 
  • Build, manage, and scale cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform. 
  • Support SOC 2 and ISO compliance efforts by championing security best practices, streamlining evidence collection, and introducing automation to improve audit processes. 
  • Other duties as assigned by management 

Required Education and Experience 

  • Hands-on experience building and maintaining CI/CD pipelines using tools such as GitHub Actions. 
  • ​​​​​​​Strong understanding of networking principles and their application in cloud and containerized environments. 
  • Proven experience designing, building, and managing cloud infrastructure in AWS. 
  • Expertise with Infrastructure as Code (IaC) and deployment automation tools to streamline environment provisioning and management. 
  • Experience running and supporting containerized workloads in production environments. 
  • Familiarity with observability, monitoring, logging, and tracing tools to ensure system performance, reliability, and visibility. 
  • Experience using AWS, ECS, Elasticsearch, PostgreSQL, Prometheus, Grafana, GitHub Actions, Terraform   

Preferred Education and Experience 

  • Bachelor's degree in computer science or equivalent work experience 
  • ​​​​​​​3-5 years of hands-on experience in the cybersecurity field 

DefenseStorm provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Required profile

Experience

Industry :
Computer Hardware & Networking
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs