Match score not available

AWS Cloud Site Reliability Engineer (SRE)

Remote: 
Full Remote
Contract: 
Salary: 
80 - 128K yearly
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree and 5 years of experience, Proven experience as a Site Reliability Engineer or similar role, In-depth knowledge of AWS services and cloud infrastructure, Familiarity with CI/CD tools and IaC technologies, Must be a US Citizen with eligibility for agency clearance.

Key responsabilities:

  • Design and manage infrastructure as code solutions
  • Implement monitoring systems for performance optimization
  • Conduct incident response and post-incident reviews
  • Collaborate on software releases and automation processes
  • Promote continuous improvement in release management
Peraton logo
Peraton Management Consulting Large https://www.peraton.com/
10001 Employees
See more Peraton offers

Job description

Logo Jobgether

Your missions

About Peraton

Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world's leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can't be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we're keeping people around the world safe and secure.

Responsibilities

We are seeking an experienced and motivated AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure on Amazon Web Services (AWS). The ideal candidate will have a strong background in AWS services, a deep understanding of infrastructure as code, and a passion for implementing best practices in site reliability engineering. The AWS Site Reliability Engineer (SRE) will collaborate closely with cross-functional teams, including development, quality assurance, and operations, to ensure seamless software releases and continuous improvement of our release processes.

What you will do:

  • Infrastructure Automation: Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes. Collaborate with development teams to integrate continuous deployment practices and ensure the reliability of applications.
  • Monitoring and Alerting: Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance. Analyze system metrics, logs, and alerts to troubleshoot and resolve issues promptly.
  • Performance Optimization: Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency. Identify and implement improvements to enhance system reliability and resilience.
  • Incident Response: Participate in on-call rotations to respond to and resolve incidents promptly. Conduct post-incident reviews to identify root causes and implement preventive measures.
  • Security and Compliance: Work closely with security teams to implement and enforce best practices for securing AWS environments. Ensure compliance with industry standards and regulations related to cloud infrastructure.
  • Communication: Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders. Coordinate communication of release schedules and changes to all relevant parties.
  • Release Planning and Coordination: Collaborate with development, QA, and operations teams to plan and coordinate software releases. Define release scope, schedule, and dependencies to ensure timely and smooth deployments. Create and submit change records as required for process and audit compliance. Participation in Technical Change Advisory and Review boards as required. Release Automation: Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar. Automate and streamline release processes to improve efficiency and reduce manual errors. Continuous Improvement: Proactively identify areas for process improvement within the release management lifecycle. Implement feedback loops to capture lessons learned from each release and apply improvements iteratively. Stay up to date with industry best practices, emerging technologies, and trends related to release management and reliability engineering. Quality Assurance: Collaborate with QA teams to establish and execute release validation procedures. Ensure releases are thoroughly tested and meet quality standards before deployment. Drive continuous improvement by analyzing release management trends, identifying recurring issues, and working with teams to implement solutions.

Qualifications

Required Qualifications:

  • Bachelor's degree and 5 years of experience. Additional 4 years of experience maybe accepted in lieu of the degree.
  • Proven experience as a Site Reliability Engineer or similar role.
  • In-depth knowledge of AWS services and expertise in managing cloud infrastructure.
  • Proficiency in scripting languages (e.g., Python, Bash) for automation tasks.
  • Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines.
  • Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others.
  • Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies.
  • Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies.
  • Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes).
  • Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis.
  • Attention to detail, with a focus on maintaining high-quality software releases.
  • Solid understanding of Agile methodologies and their application in release management.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.
  • Must be a US Citizen
  • Must be able to obtain and maintain the required agency clearance (6C Public Trust)

Preferred Qualifications:

  • Relevant certifications in DevOps or related fields are a plus.
  • High Risk Public Trust or Secret Clearance preferred.

Benefits:

At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We're fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

Target Salary Range

$80,000 - $128,000. This represents the typical salary range for this position based on experience and other factors.

SCA / Union / Intern Rate or Range

EEO

An Equal Opportunity Employer including Disability/Veteran.

Our Values

Benefits

At Peraton, our benefits are designed to help keep you at your best beyond the work you do with us daily. We're fully committed to the growth of our employees. From fully comprehensive medical plans to tuition reimbursement, tuition assistance, and fertility treatment, we are there to support you all the way.

  • Paid Time-Off and Holidays
  • Retirement
  • Life & Disability Insurance
  • Career Development
  • Tuition Assistance and Student Loan Financing
  • Paid Parental Leave
  • Additional Benefits
  • Medical, Dental, & Vision Care

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Management Consulting
Spoken language(s):
Check out the description to know which languages are mandatory.

Soft Skills

  • Detail Oriented
  • Problem Solving
  • Collaboration
  • Communication
  • Troubleshooting (Problem Solving)

Site Reliability Engineer Related jobs