Key Facts

Remote From:

Full time

Senior (5-10 years)

English

Hard Skills

Site Reliability Engineering Microsoft Azure Terraform Linux Servers Nginx Netcat Active Directory SolarWinds Teamcity VMware Virtualization High Availability Design Root Cause Analysis IIS Manager Concourse CI Data Architecture Capacity Planning Octopus Deploy Application Performance Management Splunk Data Vaults JavaScript (Programming Language) Microsoft SQL Server Data Engine (MSDE) Log Analysis Windows PowerShell Ansible Application Performance Management SASS Firewall Security Operations (SecOps) Load Balancing Hybrid Cloud Computing Python (Programming Language) Cloud Computing npm (Node Package Manager) Bash (Scripting Language) CI/CD Domain Name System F5 Irules Disaster Recovery Automated Information Systems PostgreSQL Sensu Proxy Servers Salts C# (Programming Language) Git (Version Control System) Incident Management Nagios Load Testing Azure DevOps Infrastructure as Code (IaC) Continuous Monitoring New Relic (SaaS) Key Management Go (Programming Language) Microsoft Networking Incident Response Cross-Functional Collaboration Engineering Documentation Stakeholder Communications Learning Strategies User Advocacy Continuous Improvement Process

Other Skills

•
Collaboration
•
Communication
•
Mentorship
•
Problem Solving

Pacifica Continental

About Pacifica Continental

Pacifica Continental is a global recruitment firm specialized in strategic positions such as board members, c-suites, senior and middle management. Our customized solutions and consultative approach ensure the selection of the best talent in the market.Our operations have reached more than 50 countries, including more than 250 cities worldwide. We pride ourselves on working with precision and knowledge in a wide range of industries. Always committed to the continuous acquisition of industry knowledge, we partner with a large number of multinational and local companies. By developing a strong interface between our operations, we offer global solutions with a local focus.We adapt our strategy to meet the needs of each client, driving business growth, supporting them in their internal strengthening and overcoming adversities. Working with the utmost precision and expertise, we offer specialized recruitment services, market mapping, assessment projects and temporary recruitment.

Company type: SME

Founded: 2018

Company size: 51 - 200

Website LinkedIn See all jobs →

Job description

About the team:

Our engineering team has built the largest private Medicare marketplace in the country. We passionately focus on the continuous improvement of the systems we build.

We have spent many years growing and fostering a DevOps culture by bridging the divide between our Software and Infrastructure Engineering departments. We want the cross-functional teams that we are building to include Site Reliability Engineers. We operate in a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both the Windows and Linux OS. We strive for security, reliability, and automation in line with DevOps and Site Reliability Engineering principles. If you are passionate about learning and improvement through metrics and automation, and passionate about engendering that mindset in others, we want to hear from you.

About the role:

Maintains shared cloud resources in use by numerous software engineering teams within our business unit. We aim to enable software engineering teams to build cloud native applications that adhere to security and regulatory requirements with limited handholding by our cloud engineers. We do still have a fair number of applications hosted in on-premise data centers, which we aim to support migrating to the cloud.

Requirements:

Hands-on Engineering

5+ years of hands-on experience with a majority of the following technologies, along with a willingness to become proficient in the remaining areas:

Windows and Linux Servers
VMware
Cloud platforms, preferably with Azure
Active Directory
Secrets management with Consul and Vault or similar systems
Configuration management tools like Salt, Ansible and Terraform
Firewalls and load balancers such as F5
Web servers, including IIS and NGINX
Database Server Infrastructure like Microsoft SQL Server and PostgreSQL
Application Performance Monitoring with tools like New Relic
Infrastructure monitoring with tools like Sensu, SolarWinds, Nagios, or Azure App Insights
CI/CD tools like TeamCity, Octopus Deploy, Concourse, Azure DevOps, or GitHub Actions
Log Aggregation tools like SumoLogic or Splunk
Network theory and protocols such as DNS, DHCP, proxy servers, and firewalls
Security operations with tools for SAST, DAST, RAST, and WAF
Infrastructure as Code or automation experience.

Proficiency, high-comfort, and familiarity with:

One or more programming languages, such as C#, JavaScript, Python or Go
One or more scripting languages, such as PowerShell and BASH
Command line tools such as (git, netcat, npm, terraform, etc.)

Responsibilities

Make improvements to internal processes to reduce lead time and increase deployment frequency
Identify improvements to the quality, security, and performance of our infrastructure
Increase the velocity with which teams deliver, leveraging expertise from various functional disciplines
Identify how to remediate production incidents more quickly and safely while reducing the frequency of outages
Actively engage with other teams and departments to collaborate on best practices and implementation strategy
Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
Create SLIs, SLOs, and SLAs
Contribute to capacity planning, advise and consult with teams who will be load/stress testing
Keep up with industry innovations, recommending new tools or practices when appropriate
Actively mentor peers, developing their expertise and inspiring others to innovate
Provide timely assistance and remediation solutions during critical situations and production incident
Document and share “lessons learned” from production, including root cause analysis
Explore new ways of improving communication between other Site Reliability Engineers and with other teams
Write and maintain architectural, stakeholder, and policy documentation