Match score not available

Senior Site Reliability Engineer at HHAeXchange

Remote: 
Full Remote
Contract: 
Salary: 
125 - 135K yearly
Experience: 
Mid-level (2-5 years)
Work from: 
District of Columbia (USA), New York (USA), United States

Offer summary

Qualifications:

Bachelor's degree in Computer Science or related field, 3-5 years of experience in SRE role, Deep understanding of SQL Server and AWS databases, Expertise in observability platforms like DataDog, Technical certifications (AWS Solution Architect preferred).

Key responsabilities:

  • Maintain and implement observability systems
  • Act as point of contact for production issues
  • Conduct root cause analysis and post incident reviews
  • Coordinate with onshore and offshore teams
  • Analyze system performance trends proactively
HHAeXchange logo
HHAeXchange
501 - 1000 Employees
See more HHAeXchange offers

Job description

HHAeXchange is the leading technology platform for home and community-based care. Founded in 2008, HHAeXchange was born out of an idea to create a fully comprehensive end-to-end homecare solution to help people who are aging or have disabilities thrive in their homes and communities. Our employees are passionate about transforming the healthcare space by building the only homecare ecosystem that fully connects patients, personal care providers, managed care organizations, and states.  

The Sr SRE Engineer role will be working with SRE team on the shared full stack ownership of a collection of services and technologies in the cloud and our 2 data centers. The individual in this role needs should have the ability to work independently, understand end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.

To perform this job successfully, an individual must be able to perform each essential job duty satisfactorily with or without reasonable accommodation.  Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

Essential Job Duties
  • Maintain PostgreSQL, MySQL, SQL Server, MongoDB.
  • Maintain and implement observability systems specially DataDog, CloudWatch, Solarwinds.
  • Act as point of contact and first responder for production issues during business hours.
  • Engage in incident calls and help resolve issues as soon as possible.
  • Engage in capacity planning of resources to make sure there is enough infrastructure available.
  • Conduct root cause analysis and post incident review after each incident.
  • Review SRE jira tickets daily and ensure they are on track according to the goals defined.
  • Maintain and update System Operation Procedures (SOP) for production systems.
  • Act as liaison between onshore and offshore SRE teams.
  • Conduct daily morning production inspections before start of business hour.
  • Actively coordinate with director of cloud and onshore team during US business hours.
  • Participate in DR drills to make sure there is proper disaster recovery plan in place.
  • Identify and implement changes to optimize system performance.
  • Develop and maintain SRE documentation in internal wiki.
  • Validate and maintain certificates and licences for different applications.
  • Design and develop cloud operation solutions (AWS VPN, Workspace, AWS Backup)
  • Analyze trends, dive into system dashboards and review key performance metrics to identify anomalies, and proactively address any potential issues.

  • Other Job Duties
  • Other duties as assigned by supervisor or HHAeXchange leader.

  • Travel Requirements
  • Travel 10-25%, including overnight travel

  • Required Education, Experience, Certifications and Skills
  • Bachelor’s degree in Computer Science or a related field
  • 3-5 years of experience in an SRE Engineer role.
  • Deep understanding of SQL Server is highly desired. You will be maintaining massive on-prem and cloud SQL databases.
  • Expert in AWS purpose-built databases: RDS, DynamoDB, DocumentDB, ElastiCache, etc.
  • Deep understanding of observability platforms and ways to improve them.
  • Deep understanding of DataDog, AWS cloud watch and Azure AppInsight.
  • Expertise in AWS, Linux, Windows, Ansible, IaC using Terraform, CloudFormation/CDK, .Net, IIS, Redis, Kafka, Kubernetes, Nginx, shell scripting & git.
  • Familiarity with one programming language - preferably Python, Go or .Net.
  • High level understanding of networking concepts is desired.
  • Solid understanding of hybrid cloud environments.
  • Comfortable collaborating with teams working from around the globe
  • Proficient in SDLC, Waterfall, and Agile methodologies
  • Eager to learn and implement new solutions on AWS.
  • Technical certifications (AWS Solution Architect Associate or Professional)
  • Ability to effectively convey technical information to both technical and non-technical stakeholders, including clear and concise documentation.
  • Work well within a team environment, sharing knowledge, and collaborating with cross-functional teams such as developers, operations, and engineering.
  • Strong analytical skills and the ability to troubleshoot complex issues, identify root causes, and implement effective solutions.
  • Ability to quickly adapt to changing priorities, technologies, and environments in dynamic and fast-paced work settings.
  • Think creatively to design innovative solutions and approaches to improve system reliability, efficiency, and automation.
  • Skilled in prioritizing tasks effectively, managing workloads efficiently, and meeting deadlines in a high-pressure environment.
  • Pay close attention to detail when designing, implementing, and troubleshooting systems to ensure accuracy and reliability.
  • Ability to handle setbacks and failures gracefully, learn from them, and persist in finding solutions to complex problems.
  • The base salary range for this US-based, full-time, and exempt position is $125,000-135,000, not including variable compensation. An employee’s exact starting salary will be based on various factors including but not limited to experience, education, training, merit, location, and the ability to exemplify the HHAeXchange core values.
     
    This is a benefits-eligible position. HHAeXchange offers competitive health plans, paid time-off, company paid holidays, 401K retirement program with a Company elected match, including other company sponsored programs.

    HHAeXchange is an equal-opportunity employer. The Company offers employment opportunities to all applicants and employees without regard to race, color, religion, national origin, sex, sexual orientation, gender identity or expression, age, disability, medical condition, marital status, veteran status, citizenship, genetic information, hairstyles, or any other status protected by local or federal law.

    Required profile

    Experience

    Level of experience: Mid-level (2-5 years)
    Spoken language(s):
    Check out the description to know which languages are mandatory.

    Other Skills

    • Microsoft Windows
    • Analytical Skills
    • Verbal Communication Skills
    • Prioritization
    • Resilience
    • Adaptability
    • Detail Oriented
    • Troubleshooting (Problem Solving)
    • Creative Thinking
    • Collaboration

    Site Reliability Engineer Related jobs