Match score not available

Senior Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Extensive experience in Kubernetes management., Proficiency in Ansible, Helm, Kustomize., Experience with AWS services: EC2, S3, IAM., Hands-on experience with Terraform for IaC., Strong background in MySQL database management..

Key responsabilities:

  • Analyze metrics to improve platform performance.
  • Collaborate with engineering teams for efficient services.
  • Develop sustainable systems through automation.
  • Manage the system landscape for reliability and availability.
  • Mentor engineers and improve company processes.
Oomnitza logo
Oomnitza Information Technology & Services SME https://www.oomnitza.com/
51 - 200 Employees
See more Oomnitza offers

Job description

Oomnitza offers the industry’s most versatile Enterprise Technology Management platform that orchestrates and automates key business processes for IT. Our SaaS solution, with agentless integrations, best practices and low-code workflows, enables enterprises to leverage their existing infrastructure systems and automate processes such as offboarding, onboarding, audit readiness, refresh forecasting and more, thereby reducing reliance on error-prone manual tasks and tickets. We help some of the most well-known and innovative companies to improve efficiency, expedite audits, mitigate cyber risk and eliminate redundant IT spend. 

At Oomnitza, we’re passionate about building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions - using DevSecOps methodologies. Our dynamic and innovative team is growing and we are looking to add a highly motivated and experienced Site Reliability Engineer to the team. As an experienced DevSecOps practitioner we will  look to you to operate and deliver working systems based on insights gathered from  massive scale data in real time, ensuring Oomnitza’s internal and external services are reliable while keeping an ever-watchful eye on our systems, capacity, and performance. You’ll have the opportunity to experience the complex challenges of building and running large-scale, fault tolerant, and secure distributed microservice based systems worldwide. Specifically, we are searching for someone who:
- Brings fresh ideas to the table, and demonstrates a unique and informed viewpoint
- Enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction


Responsibilities:
  • Gather and analyze metrics from our platform and applications to continually improve our performance tuning and fault finding
  • Partner with our world-class engineering teams to improve services through rigorous testing and release procedures
  • Create sustainable systems and services through automation and uplifts while working  closely with engineering professionals within the company to enable  projects to be completed efficiently
  • Develop, monitor, and manage the entire system landscape by balancing feature development speed and reliability with well-defined service level objectives, ensuring minimal downtime and maximum availability.
  • Participate in the development and implementation of practices, procedures, and technology to ensure our system landscapes are operating within our Security, Compliance, and Availability commitments.
  • Plan, prepare, and execute system upgrades.
  • Mentor and train other engineers throughout the company and seek to continually improve processes company-wide
  • This position will be part of an on-call rotation

  • Qualifications:
  • Kubernetes: Extensive experience with container orchestration and managing production clusters, focusing on deployment, scaling, and troubleshooting within Kubernetes environments. Proven ability to set up and manage Kubernetes clusters effectively for enterprise applications. Experience with Amazon EKS is a plus. 
  • Configuration Management: Proficiency in tools like Ansible, Helm, and Kustomize for automating infrastructure provisioning, configuration, and deployment. Skilled in managing Kubernetes manifests and application releases to streamline processes and ensure consistency across various deployment environments.
  • Monitoring: Experience with  Prometheus, Grafana, or similar to proactively track system health, detect anomalies, and optimize performance across the platform.
  • AWS Cloud Services: Deep knowledge of the AWS ecosystem, including EC2, S3, IAM, VPC, and other essential services for building and managing scalable infrastructure.
  • Infrastructure as Code (IaC): Hands-on experience with Terraform to provision and manage cloud resources, ensuring version control, repeatability, and efficiency in infrastructure deployment.
  • Queuing Systems: Familiarity with message queuing systems like RabbitMQ and Kafka, as well as managed queuing services such as AmazonMQ. Skilled in setting up, managing, and optimizing message brokers for high-throughput, reliable communication between distributed systems.
  • Database Management: Strong background in managing MySQL databases and leveraging Amazon RDS for high availability, performance tuning, and secure database management in cloud environments.
  • Networking and Security Best Practices: Understanding of network design and security protocols to protect systems, enforce compliance, and meet industry-standard audit requirements.
  • High-Uptime / Low-Downtime Environments: Experience ensuring high uptime agreements for critical systems, implementing strategies for fault tolerance, disaster recovery, and proactive monitoring to maintain service availability and minimize downtime.
  • Cross-functional Collaboration: Proven ability to work effectively with cross-functional teams from multiple departments to achieve project goals and execute project plans in an orderly and efficient manner.
  • Programming Skills: Ability to develop and maintain code in one or more high-level programming languages such as Python, Go, or JavaScript. Familiarity with modern development tools and CI/CD pipelines to automate testing, deployment, and monitoring.
  • Problem Solving and Performance Optimization: A proactive mindset towards identifying system issues, areas for process improvement, and resolving performance bottlenecks.

  • What We Can Offer You:
  • Healthcare for dependents and spouse 
  • A progressive, healthy work culture with excellent opportunities for professional and personal development.  
  • Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales.
  • A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad

  • Our Benefits Package:
  • Dental & Vision Insurance 
  • Employee equity plan
  • Health Insurance for your spouse and dependents 
  • Pension, Life insurance and Income protection
  • Remote working & flexible work schedules Working from home equipment allowance
  • Choice of preferred equipment, Mac or PC.
  • Regular, fun social events and  workshops.
  • ** Please note, this role requires you to be located in Ireland.

    Oomnitza recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, sex, disability, age, veteran status, and other protected status as required by applicable law.

    Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Industry :
    Information Technology & Services
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Problem Solving
    • Analytical Thinking
    • Verbal Communication Skills
    • Collaboration

    Site Reliability Engineer Related jobs