Match score not available

Site Reliability Engineer III - Remote

78% Flex
EXTRA HOLIDAYS - EXTRA PARENTAL LEAVE
Remote: 
Full Remote
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor’s degree in computer science or equivalent 6+ years' progressive experience, 6+ years of direct technical experience in IT Operations, DevOps, scripting, and automating activities, Experience with best-in-class APM tooling, Ansible, Terraform, Python, Agile, and ITIL ITSM.

Key responsabilities:

  • Lead emergency response efforts and design solutions for NH Platforms stability
  • Champion automation, capacity management, disaster recovery testing, and building monitoring solutions
  • Conduct Postmortems after incidents and collaborate with development teams for performance optimization
Net Health logo
Net Health SME https://www.nethealth.com/
501 - 1000 Employees
See more Net Health offers

Job description

Logo Jobgether

Your missions

About Net Health

Belong. Thrive. Make a Difference.

Are you looking for a meaningful and satisfying career where you have endless opportunities to grow and be financially rewarded? Net Health may be the perfect place for you.
A high-growth and profitable company, we help caregivers harness data for human health. We also honor and respect the needs of our Net Health family and staff, which is why we offer a work-from-anywhere environment and unlimited PTO. Our welcoming and collaborative culture paired with progressive benefits makes Net Health the ultimate career home!
As a leading-edge SaaS company in healthcare, we deliver solutions that help patients get better, faster, and live more fulfilling lives. Our software and predictive analytics cover the continuum of care, from hospital-to-home, across various medical specialties. Come join us and start the next chapter of your exciting career while helping others to live better lives.

World-Class Benefits That Reflect Our World-Class Culture.

Click Here to Learn More!:

#WorkFromAnywhere #UnlimitedPTO #ComprehensiveBenefitsPackage #EmployeeResourceGroups #CasualDressCode #PrioritizedEmployeeWellness #DiversityAndInclusion #AVoice #NewHireSupport #CareerDevelopment #EducationalAssistance #EmployeeReferralBonus #ProgressiveParentalLeave

JOB OVERVIEW

As a Site Reliability Engineer III, you will collaboratively manage the performance, stability, and redundancy of all Platform systems and infrastructure. You will be part of a team responsible for remediating system instability and slowness through monitoring, fault tolerance, tooling, capacity management, and automation. Proactive and relentless pursuit of the identification and implementation of infrastructure solutions to ensure high degrees of observability, availability, and reliability will be at the core of this role. Partnership with development teams in ensuring NH Platforms are performant, scalable, fault tolerant, and HIPAA compliant is critical.

RESPONSIBILITIES AND DUTIES

  • Leading emergency response efforts in conjunction with Engineering, Infrastructure, and Database teams to establish root cause
  • Leading the efforts to build robust monitoring solutions while expanding our current monitoring and alerting footprint
  • Participate in the design of solutions increasing the holistic stability of NH Platforms and identifying potential risks
  • Conduct Blameless Postmortems and Anomaly Investigations after incidents to further analyze root cause and create permanent solutions to improve serviceability and prevent future outages
  • Establish a Don’t Repeat Incidents (DRI) culture by learning from past issues and always looking to improve monitoring and dashboarding capabilities
  • Ensuring applications are performing efficiently, collaborating with development teams and architecture to resolve application performance issues
  • Consults with management in the analysis of short- and long-range business requirements and recommends innovations
  • Championing automation efforts to reduce or eliminate repetitive, manual processes
  • Partner with project management to define Service Level Objectives (SLO) and identify and implement Service Level Indicators (SLI) to track compliance
  • Championing capacity management and disaster recovery testing efforts

QUALIFICATIONS

  • Bachelor’s degree in computer science OR equivalent 6+ years’ progressive experience in IT Operations and/or systems management
  • 6+ years direct experience in a technical role dealing with complex enterprise software landscapes (DevOps focused development)
  • 6+ years’ experience with scripting and automating technical activities
  • Experience with best-in-class application monitoring (APM) tooling (New Relic, Dynatrace, AppDynamics)
  • Direct, hands-on experience with automated software and system management.
  • Strong knowledge of change control best practices and methodologies
  • Experience with Ansible, Terraform, Python, or Docker (or similar) is a plus
  • Experience with Agile development methodology and/or ITIL ITSM is a plus
  • REQUIRED HARDWARE EXPERIENCE
  • Servers, Workstations, Load Balancers, Switches, Routers, Firewalls, SAN, NAS and other storage hardware

REQUIRED SOFTWARE EXPERIENCE

  • PowerShell scripting, and coding standards
  • Best-in-class application monitoring (APM) tooling (New Relic, Dynatrace, AppDynamics)
  • Azure and/or AWS PaaS/IaaS
  • Linux OS and Apache (e.g. SALT, etc.)
  • Direct, hands-on experience with automated software delivery and system management.
  • Agile development methodology
  • Working understanding of Platform Engineering work model in a software development environment
  • Proven project management skills and/or substantial exposure to project-based work structures, project lifecycle models, etc
  • Proven experience in architecting and overseeing the direction, development, and implementation of technology solutions
  • O/S - Windows and Linux, VMWare, Powershell, Azure Administration, PRTG and other systems monitoring software, DNS Management, IIS, TomCat, Docker, APM Monitoring, ITSM tools, SSL/TLS certificates, JavaScript, Json, Python, Ansible, Terraform, Vsphere, Kubernetes, Service Fabric, Azure Management, Elastic, Citrix, JIRA, New Relic, Project Management Tools, ADO, DUO, Secret Server, Qualys, Pager Duty Application, Couchbase, Redis, API gateways, DNS, Security, IP Routing, SSH, FTP, LDAP, HTTP/HTTPS, Email Routing, Jenkins, GitHub, AWS , Cloud development pipelines using CI/CD tooling, Bash scripting

 

Note: This job description is not intended to be all-inclusive. Employee may perform other related duties as requested to meet the ongoing needs of the organization.

Colorado Pay Law: If you are a Colorado resident and this role is available in Colorado or remote, you may be eligible to receive additional information about the compensation and benefits for this role, which we will provide upon request. Please send an email to Recruiting@NetHealth.com

If you are a CA, CT, CO, IL, MD, NV, RI, WA or NY City resident and this role is available in one of those locales or remote, you may be eligible to receive additional information about the compensation and benefits for this role, which we will provide upon request. Please send an email to Recruiting@NetHealth.com

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Go Premium: Access the World's Largest Selection of Remote Jobs!

  • Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
  • Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
  • Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
  • Try out our Premium Benefits with a 7-Day FREE TRIAL.
    No obligations. Cancel anytime.
Upgrade to Premium

Find more Site Reliability Engineer jobs