Match score not available

Site Reliability Analyst

fully flexible
Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

2+ years in site reliability eng/devops, Proficiency in Python, GCP Cloud & Terraform.

Key responsabilities:

  • Lead deployment processes & feature flag management
  • Optimize platform performance and scalability
  • Provide on-call support and incident resolution
  • Develop and maintain CI/CD pipelines
  • Create detailed system documentation

Job description

Almanak Blockchain Labs is a data science & research-oriented company, dedicated towards understanding and designing the next generation of decentralized networks. The company is backed by top VCs.

We use AI and Simulation to optimize and improve top tier decentralized finance & gaming protocols - our ultimate objective is to use cutting edge data modeling to maximize their profitability while simultaneously ensuring economic security.

Our culture is centered around disciplined pursuit of knowledge, meritocracy, impact on our partner’s businesses and data-backed performance. We are a collective of execs and technologists from companies such as Google, McKinsey, Uber, EY and DBS Bank.

We are seeking an experienced Site Reliability Analyst to join our team. As a Site Reliability Analyst at Almanak, you will be responsible for ensuring the reliability, scalability, and performance of our systems, as well as actively monitoring and troubleshooting any issues that may arise. You will work closely with our engineering and operations teams to optimize our infrastructure, automate processes, and implement best practices for system availability and performance.

Responsibilities
  • Lead release management efforts, ensuring smooth and reliable software releases through well-managed deployment processes.
  • Implement and manage feature flags using tools like LaunchDarkly, facilitating safe and controlled feature releases and A/B testing.
  • Oversee the scaling of the platform to handle increased load, optimizing for performance and reliability.
  • Conduct deployment management, including the coordination of canary deployments and blue-green deployment strategies, to minimize disruption and ensure high availability.
  • Contribute to the setup and maintenance of CI/CD pipelines, leveraging automation to improve development workflows and deployment efficiency.
  • Provide on-call support as L1-L2, quickly addressing and resolving incidents to maintain service quality and platform stability.
  • Perform small bug fixes and respond to feature requests, contributing to the continuous improvement of the platform.
  • Develop and maintain comprehensive documentation covering deployment processes, incident management, and system configurations.

Requirements

  • 2+ years of experience in site reliability engineering, devops, or a similar role with a focus on release management and platform scalability.
  • Proficiency with Python, GCP Cloud, infrastructure as code (IaC) tools, specifically Terraform, for managing and provisioning infrastructure through code.
  • Solid understanding of CI/CD principles and experience with CI/CD tools to support efficient development and deployment processes.
  • Experience with feature flag management tools (e.g., LaunchDarkly) and strategies for safe, incremental feature rollouts.
  • Knowledge of deployment strategies such as canary releases and blue-green deployments, with the ability to implement these processes effectively.
  • Ability to provide on-call support, troubleshoot and resolve issues promptly, ensuring platform reliability and service continuity.
  • Strong documentation skills, with the ability to create clear and detailed guides for system configurations, deployment procedures, and incident response.
  • Excellent problem-solving abilities, with a proactive approach to identifying and mitigating potential issues before they affect users.
Desirable Qualifications:
  • Experience with trunk-based development and its implementation in a CI/CD pipeline to support rapid and safe code integrations.
  • Familiarity with scaling strategies for high-traffic applications, including load balancing and resource optimization techniques.
  • A commitment to continuous learning and improvement, staying abreast of the latest industry trends and best practices in site reliability and deployment management.

Benefits

  • Compensation: You’ll receive competitive compensation, consisting of either fiat/crypto remuneration
  • Flexible schedule & remote work: You’ll be able to work remotely and manage your own time. We want you to work from a place that makes you the happiest, and contributes to your overall well-being.
  • Co-working space, gear & education budgets: The company shall invest in your comfort of work, as well as in your personal growth.
  • Impact: You’ll work with some of the smartest people in the space and play a pivotal role in influencing the way some of the most popular crypto applications are built.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
Research
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Skills

Site Reliability Engineer (SRE) Related jobs