Match score not available

Site Reliability Engineer/ DevOps Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Ministry of Programming logo
Ministry of Programming Scaleup https://www.ministryofprogramming.com
51 - 200 Employees
See more Ministry of Programming offers

Job description

Who we are :

Ministry of Programming is a startup studio, and a change maker focused on supporting worldwide startups on their way to success. Through working with more than 95 startups in the last 7 years and creating a team of 200 professionals, the company is leveraging international networks to create partnerships with top-notch startups from all over the world.

Ministry of Programming has a strong focus on software design and development consulting services for early-stage startups and new products. The company also invests in startups and has done more than a dozen investments so far. The company is recognized by the Financial Times and listed in the FT1000 list of fastest-growing European companies. In addition, the company found its place in Deloitte's annual list of 50 fastest-growing companies in central Europe, taking the 21st place in the ranking, along with receiving the Deloitte Impact Star Award.

Where you come in:


We are seeking a qualified Site Reliability Engineer to join our team, providing technical leadership in the management and scaling of our services. In this role, you will collaborate with product teams to build, manage, and deploy infrastructure as code within a virtual computing and storage environment for digital media delivery and supply chain management. Your responsibilities will include empowering and aligning with Software Engineering Teams, coordinating efforts to architect systems, establishing shared standards, and documenting designs and prototypes. Additionally, you will contribute to the development and maintenance of techniques required for observability, instrumentation, metrics, and monitoring, as well as education on the use of these systems.

Key Responsibilities:

  • Ensure that our Kubernetes clusters are reliable, scalable, performant, and can be extended to support new requirements
  • Prescribe and enforce service-level objectives (SLOs) and error budgets for production systems
  • Automate the provisioning and management of infrastructure hosted in AWS and GCP
  • Create automated systems for repetitive tasks, including self-healing/auto-scaling capabilities.
  • Network design
  • Enforce access controls
  • Automate and tune static and runtime analysis to improve service security
  • Software system architecture
  • Participate in an on-call rotation
  • Implement change controls
  • Craft plans and procedures for disaster recovery

Skills:

  • Familiarity with Linux and the UNIX methodology
  • Proficiency in a scripting language such as Python or Bash
  • Proficiency in observability tools such as Prometheus, Grafana and Sentry
  • Experience in a DevOps or Software Engineering role
  • Familiarity with software, including the application of data structures and algorithms
  • Experience operating Kubernetes or orchestrated containers (OCI) in a production environment
  • Familiarity with building and maintaining continuous delivery systems
  • Experience working with at least one of the major cloud providers (AWS, GCP preferred)
  • A background in building and managing highly available distributed systems
  • Ability to write infrastructure as code (some examples would be Terraform, Ansible, Puppet, and Chef)
  • Comfortable with networking concepts such as TCP/IP, DNS and HTTP
  • A basic understanding of relational and non-relational database technologies and how to administer these systems in a production environment (e.g. MariaDB, MySQL, Elasticsearch)

Job type: Full-time

Location
: Sarajevo or remote (Bosnia and Herzegovina)

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Site Reliability Engineer (SRE) Related jobs