SRE & Observability Engineering

fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

At least 5 years of experience in cloud infrastructure or application development., Minimum of 2 years in Site Reliability Engineering roles., Strong knowledge of distributed systems, cloud-native architectures, and operational support., Proficiency with observability tools like Prometheus, Grafana, ELK stack, and OpenTelemetry..

Key responsibilities:

  • Design, implement, and maintain observability systems including metrics, logs, and traces.
  • Develop automation and tooling to improve monitoring and incident management.
  • Collaborate with teams to define and track service level objectives and improve system reliability.
  • Participate in on-call rotations, troubleshoot distributed systems, and support CI/CD pipelines.

Amach Software logo
Amach Software Information Technology & Services SME https://amach.software.com/
201 - 500 Employees
See all jobs

Job description

About us:

Amach is an industry-leading technology driven company with headquarters located in Dublin and remote teams in UK and Europe.

Our blended teams of local and nearshore talent are optimised to deliver high quality and collaborative solutions.

Established in 2013, we specialise in cloud migration and development, digital transformation including agile software development, DevOps, automation, data and machine learning…

This role is focused on the development and maintenance of the Observability Platform, with secondary responsibilities in Site Reliability Engineering (SRE). It blends software engineering and systems administration to drive performance, scalability and reliability across a growing cloud estate. The successful candidate will support the design, deployment and optimisation of observability tooling, enable operational excellence through automation, and contribute to the broader reliability strategy.

You will report to the SRE Lead, who will support your success through appropriate task allocation and opportunities for technical growth. The role sits within a team responsible for both observability and reliability, providing operational support, driving continuous improvement, and ensuring system integrity across the production environment.

Required skills:

  • 5+ years’ experience in cloud infrastructure or application development, with at least 2 years in Site Reliability Engineering roles
  • Strong background in distributed systems, cloud-native architectures, and production-grade operational support
  • Hands-on expertise with observability and monitoring tools such as Prometheus, Grafana, Loki, Tempo, ELK stack, OpenTelemetry, Datadog, or Splunk
  • Proficient in Infrastructure-as-Code and automation using Terraform, Ansible, and Helm
  • Skilled in containerisation and orchestration technologies, especially Docker and Kubernetes
  • Solid experience with CI/CD pipelines (e.g., Jenkins, GitLab, GitHub Actions) and scripting/programming in Python, Go, or Bash
  • Working knowledge of AWS services (EKS, EC2, S3, ASG, Load Balancers, IAM, etc.) and core networking principles
  • Familiarity with incident management and alerting tools such as Alertmanager, OpsGenie, and CloudWatch
  • Comfortable with modern data storage systems (MySQL, PostgreSQL, MongoDB) and performance testing tools (Chaos Monkey, k6)
  • Strong problem-solving mindset with the ability to automate workflows, reduce manual effort, and proactively troubleshoot issues
  • Demonstrated ability to collaborate across teams, communicate technical concepts clearly, and balance technical priorities with business needs
  • High degree of ownership, accountability, and a customer-centric approach to system reliability and usability

Key responsibilities & duties include:

  • Design, implement and maintain observability systems (metrics, logs, traces, alerting)
  • Develop tooling and automation to enhance monitoring and incident management
  • Partner with developers and infrastructure teams to define and track SLOs/SLAs/SLIs
  • Improve system reliability and performance through data-driven insights
  • Build self-service tooling and dashboards for engineering teams
  • Participate in on-call rotation and incident response activities
  • Optimise on-call workflows through automation and tooling
  • Provide operational support and troubleshoot distributed software systems
  • Collaborate on reliable and secure code deployment processes
  • Analyse infrastructure data to fine-tune performance and availability
  • Support CI/CD initiatives and delivery pipelines
  • Champion continuous improvement and DevOps best practices
  • Ensure capacity, compliance, continuity and service governance standards are met
  • Document processes, systems and observability best practices
  • Use trend analysis to identify potential system or process bottlenecks
  • Provide regular reports on production system health and performance

Desirable skills & competencies:

  • Fast learner with the ability to upskill independently
  • Knowledge of SaaS, IaaS, and PaaS models
  • In-depth experience with enterprise-grade monitoring tools
  • Familiarity with major CI/CD strategies at scale
  • Proven track record in implementing automation technologies
  • Exposure to large-scale incident management strategies and tooling

What’s in it for you: 

  • An opportunity to join a fast-growing company  
  • Options for career advancement 
  • Learning and development opportunities 
  • Flexible working environment
  • Competitive salaries based on experience 

Equal Opportunity Employer:

Amach is an equal opportunity employer and makes employment decisions on the basis of merit. We celebrate diversity and are committed to creating an inclusive environment for all employees. This job description is intended to convey essential responsibilities and qualifications for this role, but it is not an exhaustive list of tasks that an employee may be required to perform.

If you are passionate about driving customer success, advising on strategic solutions, and contributing to product innovation, we would love to hear from you!

Not for you?

Check out all of our open positions in our careers page and follow us on LinkedIn for future opportunities.

P.S. Share this with friends and co-workers! Don't be afraid they'll steal it from you, if you're amazing and smart we'll find a role for you. We are growing fast and we are always looking for talented people.

At Amach, we strive to be an inclusive community of open-minded individuals with different backgrounds and we are committed to fostering, cultivating and preserving a culture of diversity, equity and inclusion. We strongly believe that a diversity of experience and background is essential to create a fulfilling environment and better solutions for our people and our customers. All Amach employees and contractors are expected to honour this policy and act to ensure that every individual is respected in the workplace. 

Your personal data

Amach will process your personal information in accordance with the EU's General Data Protection Regulation (GDPR). We will comply with data protection law and principles, which means that your data will be:

  • Used lawfully, fairly and in a transparent way
  • Collected only for valid purposes and not used in any way that is incompatible with those purposes
  • Relevant to the purposes we have told you about and limited only to those purposes
  • Accurate and kept up to date
  • Kept only as long as necessary for the purposes we have told you about
  • Kept securely

If you would like to contact us about your data, please use the following address: info@amach.com

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Accountability
  • Collaboration
  • Communication
  • Problem Solving
  • Quick Learning
  • Teamwork
  • Communication
  • Problem Solving

Related jobs