Site Reliability Engineer Observability

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

5–8 years of experience in SRE, Observability, or Reliability roles., Proficiency with observability tools like Grafana, Prometheus, OpenTelemetry, ELK., Hands-on experience with tracing, profiling tools, and distributed systems., Strong programming skills, preferably in C# or Go..

Key responsibilities:

  • Own the end-to-end observability and reliability strategy across all product lines.
  • Define and maintain an observability framework and ensure coverage for all services.
  • Build and manage alerting rules, incident management, and RCA processes.
  • Collaborate with teams to automate anomaly detection and promote observability best practices.

Flinks logo
Flinks Financial Services Scaleup https://flinks.com
51 - 200 Employees
See all jobs

Job description

About Flinks 🚀

Flinks is where financial data moves—with purpose, trust, and impact.

We’re on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial products and experiences. Since 2016, we’ve been bridging the gap between fintechs, financial institutions, and consumers by enabling seamless, secure data connectivity.

From instant account funding to smarter lending, our solutions help power some of the most innovative financial products in North America. We partner with lenders, banks, and fintechs to streamline onboarding, prevent fraud, and fuel realtime decisionmaking with enriched, reliable data.

As pioneers in Canada’s open banking movement, were not waiting for the future—were building it. If youre bold, curious, and ready to help shape the future of finance, we’d love to meet you.

What Youll Be Doing 🔥

As the Observability SRE, you will own the endtoend observability, monitoring, and reliability strategy across all Flinks product lines. Your mission is to ensure every product—Data Connectivity, Payments, Enrichment, and Document Services—has the right telemetry, actionable alerts, and reliability insights.

  • Companywide Observability & Monitoring: Define and maintain an observability framework across products; ensure coverage for APIs, scraping systems, payments, enrichment, and document services; establish SLIsSLOs aligned to client expectations.
  • Alerting & Incident Management: Build consistent, lownoise alerting rules; integrate observability into Incident.io workflows; lead crossproduct RCA; maintain a “single source of truth” for reliability metrics.
  • Reliability Analysis & Insights: Deliver monthlyquarterly scorecards linking reliability to client outcomes (e.g., churn risk, adoption blockers); analyze trends and recurring failures; translate data into executive insights.
  • Automation & AIEnabled Observability: Automate anomaly detection, escalation, and selfhealing; partner with the AI team; optimize logging and monitoring spend.
  • Collaboration & Enablement: Champion observability practices across teams; train PMs, QA, and Engineers; ensure insights influence roadmaps; collaborate with Tech Leadership to build observability in from the start.
    • Who You Are đź’Ş
      • Experience: 5–8 years in SRE, Observability, or Reliability roles, ideally across multiple product environments (fintech, SaaS, or data platforms).
      • Technical Skills: Strong in observability tooling (Grafana, Prometheus, OpenTelemetry, ELK); Hands on experience with tracing and profiling tools (APM, OTEL, Pyroscope); experience with distributed systems, APIs, and data pipelines; strong automation skills (Kubernetes).
      • Strong programming skills with working knowledge of at least one programming language; C# and Go are preferred, but experience in other languages will also be considered valuable.
      • Mindset:
        • Systems thinker who sees the big picture.
        • Businessaware, connecting reliability to retention and profitability.
        • Proactive, anticipating failures before they occur.
        • Collaborative, working across product, QA, engineering, and reliability.
            • Great to haves

              • Experience in fintech or highavailability SaaS environments.
              • Familiarity with payments infrastructure and fraud detection systems.
              • Contributions to opensource observability tools or frameworks.
                • Why This Role Matters at Flinks đź’ˇ
                  • Ensures all products have consistent reliability and observability standards.
                  • Provides a single source of truth for performance and reliability across the org.
                  • Directly improves client trust, profitability, and operational efficiency.
                  • Enables proactive stability management across Flinks’ core product lines.
                  • Supports our shift to a cohesive, reliable, platformfirst mindset at scale.
                    • The Interview Process 🏗

                      • Head of People
                      • Director of IT Ops
                      • Technical Challenge
                      • Panel Interview

    Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Industry :
    Financial Services
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Reliability
    • Systems Thinking
    • Collaboration
    • Proactivity

    Site Reliability Engineer (SRE) Related jobs