Job description

Who we are :

Ministry of Programming is a startup studio, and a change maker focused on supporting worldwide startups on their way to success. Through working with more than 95 startups in the last 7 years and creating a team of 200 professionals, the company is leveraging international networks to create partnerships with top-notch startups from all over the world.

Ministry of Programming has a strong focus on software design and development consulting services for early-stage startups and new products. The company also invests in startups and has done more than a dozen investments so far. The company is recognized by the Financial Times and listed in the FT1000 list of fastest-growing European companies. In addition, the company found its place in Deloitte's annual list of 50 fastest-growing companies in central Europe, taking the 21st place in the ranking, along with receiving the Deloitte Impact Star Award.

Where you come in:

We are seeking a qualified Site Reliability Engineer to join our team, providing technical leadership in the management and scaling of our services. In this role, you will collaborate with product teams to build, manage, and deploy infrastructure as code within a virtual computing and storage environment for digital media delivery and supply chain management. Your responsibilities will include empowering and aligning with Software Engineering Teams, coordinating efforts to architect systems, establishing shared standards, and documenting designs and prototypes. Additionally, you will contribute to the development and maintenance of techniques required for observability, instrumentation, metrics, and monitoring, as well as education on the use of these systems.

Key Responsibilities:

Ensure that our Kubernetes clusters are reliable, scalable, performant, and can be extended to support new requirements
Prescribe and enforce service-level objectives (SLOs) and error budgets for production systems
Automate the provisioning and management of infrastructure hosted in AWS and GCP
Create automated systems for repetitive tasks, including self-healing/auto-scaling capabilities.
Network design
Enforce access controls
Automate and tune static and runtime analysis to improve service security
Software system architecture
Participate in an on-call rotation
Implement change controls
Craft plans and procedures for disaster recovery

Skills:

Familiarity with Linux and the UNIX methodology
Proficiency in a scripting language such as Python or Bash
Proficiency in observability tools such as Prometheus, Grafana and Sentry
Experience in a DevOps or Software Engineering role
Familiarity with software, including the application of data structures and algorithms
Experience operating Kubernetes or orchestrated containers (OCI) in a production environment
Familiarity with building and maintaining continuous delivery systems
Experience working with at least one of the major cloud providers (AWS, GCP preferred)
A background in building and managing highly available distributed systems
Ability to write infrastructure as code (some examples would be Terraform, Ansible, Puppet, and Chef)
Comfortable with networking concepts such as TCP/IP, DNS and HTTP
A basic understanding of relational and non-relational database technologies and how to administer these systems in a production environment (e.g. MariaDB, MySQL, Elasticsearch)

Job type: Full-time

Location: Sarajevo or remote (Bosnia and Herzegovina)

Required profile