Logo for Pacifica Continental

Site Reliability Engineer

Roles & Responsibilities

  • 5+ years of hands-on engineering experience with Windows and Linux servers, VMware, AD, and cloud/on-premises infrastructure (prefer Azure) in multi-tenant environments
  • Experience with configuration management and automation tools (Salt, Ansible, Terraform) and Infrastructure as Code
  • Proficiency in one or more programming languages (C#, JavaScript, Python, Go) and scripting languages (PowerShell, Bash) with strong CLI skills
  • Experience with monitoring, incident response, security practices and CI/CD tooling (APM like New Relic, log aggregation like Splunk/SumoLogic, security testing tools, and CI/CD pipelines)

Requirements:

  • Improve internal processes to reduce lead time and increase deployment frequency
  • Identify and drive improvements to the quality, security, and performance of the infrastructure and advocate for best practices (IaC, monitoring, high availability, disaster recovery, security, and DevOps)
  • Create and maintain SLIs, SLOs, and SLAs; contribute to capacity planning and readiness for load/stress testing
  • Mentor peers, collaborate across teams on best practices, document lessons learned from production incidents, and maintain architectural and policy documentation

Job description

About the team:

Our engineering team has built the largest private Medicare marketplace in the country. We passionately focus on the continuous improvement of the systems we build.

We have spent many years growing and fostering a DevOps culture by bridging the divide between our Software and Infrastructure Engineering departments. We want the cross-functional teams that we are building to include Site Reliability Engineers. We operate in a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both the Windows and Linux OS. We strive for security, reliability, and automation in line with DevOps and Site Reliability Engineering principles. If you are passionate about learning and improvement through metrics and automation, and passionate about engendering that mindset in others, we want to hear from you.

 

About the role:

Maintains shared cloud resources in use by numerous software engineering teams within our business unit. We aim to enable software engineering teams to build cloud native applications that adhere to security and regulatory requirements with limited handholding by our cloud engineers. We do still have a fair number of applications hosted in on-premise data centers, which we aim to support migrating to the cloud.

 

Requirements:

Hands-on Engineering

5+ years of hands-on experience with a majority of the following technologies, along with a willingness to become proficient in the remaining areas:

  • Windows and Linux Servers
  • VMware
  • Cloud platforms, preferably with Azure
  • Active Directory
  • Secrets management with Consul and Vault or similar systems
  • Configuration management tools like Salt, Ansible and Terraform
  • Firewalls and load balancers such as F5
  • Web servers, including IIS and NGINX
  • Database Server Infrastructure like Microsoft SQL Server and PostgreSQL
  • Application Performance Monitoring with tools like New Relic
  • Infrastructure monitoring with tools like Sensu, SolarWinds, Nagios, or Azure App Insights
  • CI/CD tools like TeamCity, Octopus Deploy, Concourse, Azure DevOps, or GitHub Actions
  • Log Aggregation tools like SumoLogic or Splunk
  • Network theory and protocols such as DNS, DHCP, proxy servers, and firewalls
  • Security operations with tools for SAST, DAST, RAST, and WAF
  • Infrastructure as Code or automation experience.

Proficiency, high-comfort, and familiarity with:

  • One or more programming languages, such as C#, JavaScript, Python or Go
  • One or more scripting languages, such as PowerShell and BASH
  • Command line tools such as (git, netcat, npm, terraform, etc.)

Responsibilities

  • Make improvements to internal processes to reduce lead time and increase deployment frequency
  • Identify improvements to the quality, security, and performance of our infrastructure
  • Increase the velocity with which teams deliver, leveraging expertise from various functional disciplines
  • Identify how to remediate production incidents more quickly and safely while reducing the frequency of outages
  • Actively engage with other teams and departments to collaborate on best practices and implementation strategy
  • Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
  • Create SLIs, SLOs, and SLAs
  • Contribute to capacity planning, advise and consult with teams who will be load/stress testing
  • Keep up with industry innovations, recommending new tools or practices when appropriate
  • Actively mentor peers, developing their expertise and inspiring others to innovate
  • Provide timely assistance and remediation solutions during critical situations and production incident
  • Document and share “lessons learned” from production, including root cause analysis
  • Explore new ways of improving communication between other Site Reliability Engineers and with other teams
  • Write and maintain architectural, stakeholder, and policy documentation

Site Reliability Engineer (SRE) Related jobs

Other jobs at Pacifica Continental

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.