Role overview

Qualifications

Strong experience with Linux and Kubernetes (kubectl: logs, exec, describe)
Ability to read and interpret Python or Go stack traces to diagnose issues across distributed services
Solid proficiency in PostgreSQL / SQL (psql)
Experience with Google Cloud Platform (GCP) and hands-on infrastructure provisioning; Terraform (or equivalent IaC)

Responsibilities

Monitor and troubleshoot the running platform across multiple services and components
Analyze Cloud Run logs, Temporal workflow UI, GKE pod status, and Pub/Sub queues to identify and resolve issues
Perform end-to-end triage to determine whether issues originate from the agent layer (Python), workflow layer (Temporal), API layer (Go), or frontend (Vue)
Support new customer onboarding, including provisioning and validating customer environments to maintain reliability

Key facts

Remote from: Philippines
Full time
Mid-level (2-5 years)
Site Reliability Engineer (SRE)
English

Other skills

Collaboration
Communication
Proactivity
Customer Service
Problem Solving

About the company

Manila Recruitment

Human Resources, Staffing & Recruiting

Manila Recruitment is a full service recruitment consultancy providing executive, expert and technical recruitment support for the Filipino market. We are the leader in innovation for recruitment solutions in the Philippines since 2010. We were born from entrepreneurial roots, and carefully crafted into a full-service consultancy that delivers a suite of innovative headhunting and talent sourcing solutions. Our expertise is defined by an unparalleled understanding of the “big picture” business needs of our clients, and how recruitment solutions can only be tailored for optimum results when a holistic view is taken. The Manila Recruitment difference is rooted in our passion to scour the globe for cutting-edge developments in recruitment science. We get genuinely excited by developments in social sourcing strategies, web 3.0 tools for headhunters and areas of innovation that can help us deliver the best client and candidate experiences. We identify and deliver the latest international recruitment strategies specifically adapted for headhunting talent within the Filipino market. Producing unrivalled access to perfectly matched, previously undiscoverable talent for our clients. The recipe is simple, innovation, international best practice, combined with local market knowledge, a candidate database of over 40,000 and growing, and of course our greatest asset – simply the best team of passive talent sourcing, end-to-end recruitment consultants in the Philippines!

Company details

Company typeSME

IndustryHuman Resources, Staffing & Recruiting

Company size11 - 50

Links

Website LinkedIn See all jobs

Your match analysis

See how your profile stacks up against this role.

We compared the job requirements to your profile to show where you're strong and where you fall short.

Job description

Company Profile:

Our client is a U.S.-based group of affiliated companies operating at the intersection of legal technology and mass tort litigation. The organization includes a legal technology platform that automates medical record retrieval and case qualification for law firms, a Washington, D.C.–based mass tort litigation firm, and related holding entities. It is a lean, high-growth environment where each team member plays a significant and impactful role.

Overall purpose and responsibilities of the role:
As a Site Reliability Engineer, you will help build and support a technology platform while working closely with support staff and developers. You will be responsible for monitoring and troubleshooting the live platform to ensure optimal performance and stability. The role will also involve participating in new customer onboarding, provisioning customer environments, and resolving production issues to maintain system reliability and performance.

Duties and Responsibilities:

● Monitor and troubleshoot the running platform across multiple services and components

● Analyze Cloud Run logs, Temporal workflow UI, GKE pod status, and Pub/Sub queues to identify and resolve issues

● Perform end-to-end triage to determine whether issues originate from the agent layer (Python), workflow layer (Temporal), API layer (Go), or frontend (Vue)

● Support resolution of paralegal-facing operational issues such as stuck cases, failed faxes, and pending qualifications

● Execute and write SQL queries against AlloyDB for investigation, validation, and troubleshooting

● Participate in platform development and improvement initiatives, including identifying recurring issues and contributing to fixes

● Support new customer onboarding, including provisioning and validating customer environments

● Contribute to the build and enhancement of internal tools, services, and platform components

● Act as a Level 2 support engineer, going beyond surface-level platform monitoring to identify and resolve deeper system and integration errors

● Develop and maintain runbooks, escalation procedures, and operational documentation to improve incident response and system reliability

Requirements

Must-have Skills / Qualification:

● Strong experience with Linux and Kubernetes (kubectl: logs, exec, describe)

● Ability to read and interpret Python or Go stack traces to diagnose issues across distributed services

● Solid proficiency in PostgreSQL / SQL (psql)

● Experience with GCP, AWS, or Azure (GCP preferred), including hands-on infrastructure provisioning and management

● Practical experience with Kustomize or Helm

● Exposure to workflow orchestration tools (preferably Temporal; also Airflow, Argo, Dagster, or AWS Step Functions)

● Experience with CI/CD pipelines (e.g., GitHub Actions or equivalent)

● Hands-on Terraform (or equivalent IaC) experience for provisioning cloud resources

● Experience with observability tooling: Cloud Logging, Grafana / Prometheus, OpenTelemetry, or equivalent

● Comfort working with HIPAA-adjacent / PHI data; understands secure-logging hygiene (no raw PHI in logs or traces)

● Must have own equipment

Advantageous or Nice-to-Have Skills/Experience:

● Experience with Google Cloud Platform (GCP) services such as Cloud Run, GKE, Pub/Sub, Cloud SQL / AlloyDB, IAM, and Secret Manager

● Terraform at scale (multi-environment modules, remote state)

● Legal ops or litigation support background is a bonus

Location:Work-from-home

Working hours / Job Type:

Monday to Friday, 6:00 AM – 3:00 PM Pacific Time (9:00 PM – 6:00 AM Philippine Time), with a 2-hour overlap for collaboration between teams. This schedule includes 8 core working hours, exclusive of a 1-hour break

**You will be a full-time contractor of our client’s US based company**