Match score not available

Site Reliability Engineer

Remote:

Full Remote

Contract:

Full time

Experience:

Mid-level (2-5 years)

Work from:

Malta

Swish Analytics Information Technology & Services Startup https://swishanalytics.com/

11 - 50 Employees

See more Swish Analytics offers

Job description

Swish Analytics is a sports analytics, betting and fantasy startup building the next generation of predictive sports analytics data products. We believe that oddsmaking is a challenge rooted in engineering, mathematics, and sports betting expertise; not intuition. We're looking for team-oriented individuals with an authentic passion for accurate and predictive real-time data who can execute in a fast-paced, creative, and continually-evolving environment without sacrificing technical excellence. Our challenges are unique, so we hope you are comfortable in uncharted territory and passionate about building systems to support products across a variety of industries and enterprise clients.

About the team

The Swish Analytics DevSecOps and Infrastructure team is looking for an experienced Site Reliability Engineer based in Europe who will support our enterprise infrastructure during non-US hours. In addition to supporting you will assist in optimizing incident response, observability, and working with technical teams to improve overall workload resiliency.

Responsibilities

Support production systems and help triage issues during live sporting events
Monitor the system and respond to incidents to maintain system SLO/SLA, review and follow up production incidents
Write and review code, develop documentation, and debug problems, live, on complex distributed systems
Optimize and facilitate incident response, conduct root cause analysis and blameless retrospectives
Work closely with technical teams to implement, optimize, maintain, scale and debug workloads on Kubernetes using CI/CD, automation tools and scripting languages to deliver tools/software to improve the reliability and scalability of services

Qualifications

3+ years of experience working in an SRE leaning DevOps or full SRE roles
3+ years building CICD pipelines with Github Actions, Gitlab CICD, or similar
Extensive experience with Kubernetes
Experience in managing customer-facing systems in a 24/7 environment including escalations
Experience triaging and escalation policies/protocols
Strong communication and documentation skills
Comfortable with scripting languages like Bash, Python, or similar

Preferred