Logo for Hewlett Packard Enterprise

Site Reliability Engineer

Key Facts

Remote From: 
Full time
Expert & Leadership (>10 years)
English

Other Skills

  • Accountability
  • Active Learning
  • Active Listening
  • Dealing With Ambiguity
  • Critical Thinking
  • Growth Mindedness
  • Verbal Communication Skills
  • Creativity
  • Empathy
  • Coaching
  • Problem Solving

Roles & Responsibilities

  • 10+ years of engineering or systems experience
  • Hands-on experience building and running reliable, fault-tolerant production cloud systems at scale on AWS, with Terraform, Terragrunt, Packer, CI/CD, and Ansible
  • Strong programming skills in Shell, Python, Golang and/or Ruby
  • Experience with monitoring/observability platforms and incident response; ability to debug and optimize code

Requirements:

  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement
  • Provide technical leadership on managing availability and performance, building automation to prevent problem recurrence, and automated responses for non-exceptional service conditions
  • Maintain services by measuring and monitoring availability, latency, and overall system health
  • Be on an on-call rotation to respond to incidents that impact platform availability

Job description

Site Reliability Engineer

  

This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

   

We are looking for a highly motivated, self-driven, and dedicated Site Reliability
Engineer possessing hands-on experience with:

• Experience building and running reliable and fault-tolerant production cloud systems at scale on AWS.
• Coding infrastructure automation with Terraform, Terragrunt, Packer, CI/CD, and knowing how to use configuration management systems like Ansible.
• Hands-on experience with Linux/Unix operating systems internals, file systems, system tuning, administration, and networking.
• Deep experience in microservice technologies, container orchestration, and continuous deployment (Kubernetes, Docker, Helm, GitOps with Flux).
• Experience in designing, building, maintaining production services, and troubleshooting large-scale distributed systems.
• Experience with technologies like Apache Kafka, Apache Storm, Apache Flink, Apache Airflow and Spark, Postgres, Redis, Elasticsearch, Arango, Cassandra.
• Experience with observability tools and methodology (monitoring, logging, tracing, SLOs/SLIs) for detecting and diagnosing issues in advance before causing service impact or performance degradation.
• Possess strong programming skills in Shell, Python, Golang and/or Ruby.
• Deliver efficiently and effectively.
• Strong problem-solving and debugging skills with a high sense of ownership.

Responsibilities:

• Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement.
• Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
• Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
• Maintain services once they are living by measuring and monitoring availability, latency, and overall system health.
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
• Capacity planning the growth of cloud infrastructure.
• Improve operational processes such as deployments and upgrades.
• Manage execution of project priorities, deadlines, and deliverables.
• Be on an on-call rotation to respond to incidents that impact platform availability.
• Use your on-call shift to prevent incidents from happening.
• Experience in incident response, including conducting post-mortems and implementing lessons learned, enhances system reliability.

Preferred Qualifications:

• 10+ years of engineering or systems experience.
• Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns.
• Strong understanding of network design and architecture.
• Scaling and managing distributed systems.
• Significant experience with monitoring and observability platforms.
• Demonstrated ability to debug, fix, and optimize code.
• Troubleshooting skills across network, application, and distributed services layers.
• The ability to learn quickly and adapt to new technologies is essential.
• Excellent communications skills, both verbal and written.

Additional Skills:

Accountability, Accountability, Action Planning, Active Learning, Active Listening, Agile Methodology, Bias, Business, Coaching, Creativity, Critical Thinking, Cybersecurity, Data Analysis Management, Data Collection Management (Inactive), Data Controls, Design Thinking, Development Methodologies, Empathy, Follow-Through, Growth Mindset, Implementation Methodologies, Infrastructure Design, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 4 more}

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Personal & Professional Development

We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.

Unconditional Inclusion

We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Let's Stay Connected:

Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.

#poland, #romania

#networking

Job:

Information Technology

Job Level:

TCP_04

    

"The expected salary/wage range for this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level.
– Poland: Annual Salary PLN 154500,0 - 305500,0
The listed salary range reflects base salary. Variable incentives may also be offered."

    

HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.

Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.

   

HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

   

No Fees Notice & Recruitment Fraud Disclaimer

 

It has come to HPE’s attention that there has been an increase in recruitment fraud whereby scammer impersonate HPE or HPE-authorized recruiting agencies and offer fake employment opportunities to candidates.  These scammers often seek to obtain personal information or money from candidates.

 

Please note that Hewlett Packard Enterprise (HPE), its direct and indirect subsidiaries and affiliated companies, and its authorized recruitment agencies/vendors will never charge any candidate a registration fee, hiring fee, or any other fee in connection with its recruitment and hiring process.  The credentials of any hiring agency that claims to be working with HPE for recruitment of talent should be verified by candidates and candidates shall be solely responsible to conduct such verification. Any candidate/individual who relies on the erroneous representations made by fraudulent employment agencies does so at their own risk, and HPE disclaims liability for any damages or claims that may result from any such communication.

Site Reliability Engineer (SRE) Related jobs

Other jobs at Hewlett Packard Enterprise

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.