Key Facts

Remote From:

Category: Site Reliability Engineer (SRE)

Full time

English

Hard Skills

AWS Cloud Services Terraform Kubernetes Site Reliability Engineering Performance Profiling Datadog PostgreSQL Observability Incident Response Infrastructure Automation +14 more

Other Skills

•
Collaboration
•
Communication
•
Teamwork
•
Troubleshooting (Problem Solving)
•
Problem Solving

Roles & Responsibilities

Experience in a Site Reliability Engineering or similar role
Experience operating production Kubernetes clusters
Proficiency building infrastructure with Terraform
Strong experience with AWS (especially EKS, RDS, and VPC) and Vercel

Requirements:

Maintain and improve platform uptime and availability
Configure and maintain monitoring, alerting, and observability systems across the stack
Assist with incident response, including investigation, mitigation, and postmortems; develop and maintain incident runbooks
Support and scale infrastructure for AI/ML systems, including model-serving workloads, data pipelines, and batch/async processing

Job description

Hiive is redefining how private companies and their shareholders access liquidity. Through its institutional-grade platform, Hiive brings together buyers, sellers, and issuers to facilitate secondary transactions in venture-backed, pre-IPO companies, introducing efficiency, transparency, and standardization to an otherwise opaque asset class.

Recognized as one of Canada’s fastest-growing companies and backed by leading U.S. investors, Hiive is profitable, well-capitalized, and building a high-performance team to meet growing demand and pursue new market opportunities.

Interested in learning more about life at Hiive? Check out our careers page to see how you can grow with us!

As a Site Reliability Engineer at Hiive, you will be responsible for ensuring the reliability, availability, and performance of our platform. You’ll join our small but growing infrastructure team, working closely with the DevOps team and engineering leadership. As a hands-on contributor, you will build scalable and resilient infrastructure, automate processes, and respond to incidents efficiently and effectively.

You will help implement security and compliance measures, act as a trusted resource to your colleagues, and collaborate across teams to continuously improve our platform’s performance and reliability. You’ll also contribute to fostering an excellent, supportive engineering culture.

As Hiive continues to expand its use of AI across the platform, you will play a key role in building and operating the infrastructure that powers these systems. This includes supporting AI/ML workloads, improving observability into model performance and system behavior, and ensuring these services are reliable, scalable, and cost-efficient in production.

In this role, your responsibilities would include:

Maintain and improve our platform's uptime and availability
Optimize and maintain our infrastructure to improve reliability, performance, and security
Proactively identify and resolve scaling and reliability issues before they impact users or business metrics
Partner with product engineers to troubleshoot performance issues and implement effective solutions
Configure and maintain monitoring, alerting, and observability systems across our stack
Assist with incident response, including investigation, mitigation, and postmortems; develop and maintain incident runbooks
Participate in an on-call rotation shared across the engineering organization
Support and scale infrastructure for AI/ML systems, including model-serving workloads, data pipelines, and batch/async processing
Improve observability for AI systems (latency, cost, drift, failures) and help define reliability standards for these workloads

Required Skills:

Experience in a Site Reliability Engineering or similar role
Experience working with (writing or deploying) Elixir, or a strong desire to learn
Experience operating production Kubernetes clusters
Proficiency building infrastructure with Terraform
Strong experience with AWS (especially EKS, RDS, and VPC) and Vercel
Experience working with and optimizing PostgreSQL
Experience with Datadog or similar observability tools

Preferred Skills:

Experience working in regulated or high-compliance environments
Experience with CI/CD systems such as GitHub Actions
Experience supporting SOC 2 or similar certifications
Experience working with Cloudflare
Hands-on development experience in one or more programming languages
Experience supporting AI/ML systems in production (e.g., model serving, vector databases, or data pipelines)

Compensation, benefits & perks:

Opportunity to participate in ownership of a rapidly growing early-stage startup through our employee stock option plan.
Comprehensive 100% employer-paid health and dental premiums, and a health spending account.
A dedicated desk in our Vancouver, BC HQ, in the heart of downtown, with a fridge stocked with healthy snacks and drinks, an onsite gym and a gorgeous rooftop amenity.
Preference to those willing to work in our Vancouver, BC HQ, with a first-class view of the mountains. Open to Canadian or US-based remote candidates.
Enjoy a $20 per day commuter benefit for every day you work in our Vancouver HQ.
An engaging social calendar, including bi-weekly catered lunches, bi-weekly “Friday bar”, team workouts, annual summer party and holiday party, two “onsite” all-team retreats each year, semi-annual team-building events, and Hiive Womens’ Network events.
Significant opportunities for growth into team leadership and management roles.
Entrepreneurial culture, and a small and dynamic team.
Sponsorship, immigration and relocation for exceptional candidates.

Hiive is committed to fostering an inclusive workplace where all individuals have an opportunity to succeed.

Ready to apply?

APPLY

Site Reliability Engineer (SRE) Related jobs

Canada Site Reliability Engineer (SRE)

Site Reliability Engineer

Today

Moniepoint Group

Full time

Site Reliability EngineeringKubernetesJava (Programming Language)Python (Programming Language)Distributed Computing

Senior Site Reliability Engineer

Today

Moniepoint Group

Full time

Distributed ComputingJava (Programming Language)KubernetesPython (Programming Language)Rust (Programming Language)

Senior Site Reliability Engineer

1 day ago

Malvern Panalytical

Full time

Site Reliability EngineeringMicrosoft AzureDevOpsContinuous MonitoringIncident Management

Site Reliability Engineer

1 day ago

Moniepoint Group

Full time

Distributed ComputingJava (Programming Language)KubernetesPython (Programming Language)Microservices

Site Observability Engineer

1 day ago

Bright Vision Technologies

Full time

Prometheus (Software)GrafanaTelemetryObservabilityDatadog

Other jobs at Hiive

Senior Product Designer

30+ days ago

Hiive

Full time

User ResearchSystems DesignIndustrial DesignFigma (Design Software)Design Strategies

Engineering Lead

30+ days ago

Hiive

Full time
Senior (5-10 years)

GraphQLElixir (Programming Language)TypeScriptJavaScript LibrariesApache Phoenix

Principal Product Engineer

30+ days ago

Hiive

Full time

Elixir (Programming Language)TypeScriptSoftware ArchitectureJavaScript LibrariesApache Phoenix

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.