Lead DevOps Engineer

fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Minimum of 5 years in operations management within a platform or core infrastructure team., Proficiency in managing hybrid cloud environments such as AWS, Azure, or Google Cloud., Experience with infrastructure provisioning, systems administration, and monitoring tools like Terraform, Ansible, k8s, and Datadog., Strong understanding of security best practices, compliance frameworks like ISO 27001 and SOC 2, and incident response processes..

Key responsibilities:

  • Oversee daily operations and management of the core infrastructure team.
  • Design and implement improvements to existing infrastructure and new services.
  • Manage capacity planning, cost optimization, and ensure compliance with security standards.
  • Contribute technically to provisioning, monitoring, and troubleshooting core services.

Ripjar logo
Ripjar SME http://www.ripjar.com/
51 - 200 Employees
See all jobs

Job description

Ripjar specialises in the development of software and data products that help governments and organisations combat serious financial crime. Our technology is used to identify criminal activity such as money laundering and terrorist financing and enables organisations to enforce sanctions at scale to help combat rogue entities and state actors.

Team Mission:

The core infrastructure team at Ripjar is responsible for commissioning and maintaining the underlying IT infrastructure that supports the companys data analytics and intelligence solutions. These systems are provisioned in a hybrid publicprivate cloud environment and include the underlying clusters used for large scale analytics as well as internal tooling and customer facing SaaS service.

Position Overview:

The Lead DevOps Engineer will oversee the daytoday management of the coreinfrastructure team (currently 5 headcount), ensuring the efficient provisioning, monitoring, maintenance, and troubleshooting of our mixed public and private cloud environment. This role requires a strategic mindset to design and implement infrastructure improvements while managing performance, capacity, and cost. The role holder will collaborate closely with Product, Delivery, Engineering, and Security to align infrastructure capabilities with business needs alongside regulatory requirements.

Key Responsibilities:

Team Leadership

  • Coordination: Oversee the daytoday activities of the operations team, ensuring that processes run smoothly and efficiently. This includes assigning tasks, monitoring progress, and addressing any issues that arise.
  • Technical Oversight: Design and implement improvements to existing infrastructure as well as new services. Evaluate the benefits of thirdparty managed solutions vs internal provision.
  • Performance Management: Assess and improve the performance of coreinfrastructure team members, fostering a culture of continuous development.
    • Operations Management

      • Process Management: Establish and optimise processes that enable the team to independently handle routine tasks.
      • Jira Service Desk: Operate an internal facing service desk ensuring triage and timely ticket management as well as evolving ticket types to streamline support requests.
      • OutofHours Support: Coordinate outofhours support activities, ensuring a collective knowledge base for nontrivial SaaS support issues.
      • Incident Response: Manage and contribute to incident response efforts for infrastructurerelated issues, ensuring timely resolutions.
        • Capacity & Cost Management

          • Capacity Planning: Conduct infrastructure capacity planning, utilising metrics to inform decisions and ensure readiness for business scaling.
          • Cost Tracking & Optimization: Monitor and optimise costs associated with infrastructure and services, ensuring alignment with budgetary goals.
            • Compliance & Audits

              • Compliance: Manage and contribute to recurring annual compliance activities, including ISO27001 and SOC2 audits, in collaboration with the respective audit teams and thirdparty advisors.
              • Security: Ensure security best practice including identifying potential threats and vulnerabilities, designing secure software systems, and implementing robust security measures.
              • Disaster Recovery Testing: Participate in disaster recovery testing, ensuring robust recovery processes are in place.
                • In addition to the above the role holder should remain technically proficient such that they can contribute to the daily activities of the team including provisioning, monitoring, maintenance, and troubleshooting of our core services.

                  Requirements:

                  • Minimum of 5 years in operations management, particularly within a platform core infrastructure team (or equivalent).
                  • Proven ability to lead, mentor, and develop team members, fostering a culture of continuous improvement.
                  • Proficiency in managing hybrid cloud environments (both public and private) and familiarity with relevant technologies and platforms (e.g., AWS, Azure, Google Cloud). Our production workloads are currently hosted in AWS.
                  • Proficiency in infrastructure provisioning, systems administration and monitoring tools. We use Terraform, Ansible, k8s and Datadog to manage a range of RHELRocky 9 hosts. Our analytics clusters make use of Spark, HBASE and HDFS.
                  • Experience in designing and implementing scalable infrastructure solutions, ideally with some exposure to parallel processing environments used for largescale analytics.
                  • An appreciation of security best practice in areas such as network security, threat modelling, vulnerability assessment, IAM, SIEM and incident response.
                  • Skills in system monitoring, performance tuning, and troubleshooting infrastructure and microservicebased architectures.
                  • Understanding of compliance frameworks like ISO 27001 and SOC 2, and experience in managing audits and compliance activities.
                  • Familiarity with incident response processes and tools, ensuring timely resolution of issues.
                    • Benefits:

                      • Competitive salary DOE
                      • 25 days annual leave + your birthday off, rising to 30 days after 5 years of service.
                      • Fully remote working with occasional office travel required
                      • 35 hour working week.
                      • Private Family Healthcare.
                      • Employee Assistance Programme.
                      • Company contributions to your pension (Salary exchange scheme)
                      • Enhanced maternitypaternity pay.
                      • The latest tech including a top of the range MacBook Pro.
                      • Free food and drink

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)
  • Team Leadership
  • Collaboration
  • Communication

DevOps Engineer Related jobs