At least 4 years' experience in development, engineering, QA or related roles focusing on programming and automating solutions to manage cloud infrastructure reliability.
Hands-on experience with Amazon Web Services (IAM, EC2, ECR, EKS, Route53), Docker (Linux/Windows) and Kubernetes for container orchestration.
Experience with configuration management tools (e.g., Ansible) and Infrastructure as Code (Terraform, CloudFormation).
Familiarity with monitoring, logging and dashboard tools (e.g., Sumologic, DataDog, New Relic, Grafana) to support site reliability.
Requirements:
Ensure services run smoothly and have capacity for continued growth and improvement.
Support and deploy cloud solutions to maintain AWS-hosted services.
Contribute to the development of infrastructure health monitoring and reporting.
Troubleshoot production infrastructure, load issues, and implement solutions.
Job description
This is a remote position.
We are working with a fast-growing, multinational software company who help some of the world’s largest companies decide what infrastructure to invest to maximise resources and money, through the use of their data analytics and optimisations tools.
As the European development team continues to expand, they’re looking for a proactive CloudOps Engineer. CloudOps is modelled on a Site Reliability Engineering Team. They provide reliability and uptime and work on our services to automate and reduce toil.
As a CloudOps / DevOps Engineer you will be involved in the following:
Ensuring Services run smoothly and have the capacity for continued growth and improvement.
Support and deploy cloud solutions to maintain AWS hosted services.
Contributing to the development of infrastructure health monitoring and reporting
Troubleshooting production infrastructure, load issues, and implement solutions
Learn and understand best practice procedures for deployment of infrastructure
Identifying areas for development and improvement of Windows and Linux operating system
Documentation and identification of failure modes related to hosted solutions.
Open to learning new strategies to manage site reliability
Working together with the team, you will participate in on-call
Your background:
We are looking for applicants with knowledge of cloud technologies and solutions focused on maintaining and improving cloud hosted solutions. You'll have at least 4 years' experience working in development, engineering, QA, team(s) with a focus on programming and automating of solutions to manage cloud infrastructure reliability. Building and operating high availability, complex customer facing systems at scale in a 24x7 environment.
Requirements
Configuration Management tools (e.g. Ansible)
Managing Docker for Linux and for Windows
Managing containers with a container orchestration platform (e.g. Kubernetes)
Amazon Web Services (e.g. IAM, EC2, ECR, EKS, Route53)