Match score not available

Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

4-year degree or equivalent experience, 3+ years in cloud computing and IaC, Experience with cloud-native tooling preferred, Proficient in Linux, containerization, CI/CD, Strong troubleshooting and problem-solving skills.

Key responsabilities:

  • Create, deploy, and maintain AWS infrastructure
  • Implement automated application releases
  • Maintain application performance monitoring
  • Ensure application security and oversight of incidents
  • Manage application and infrastructure availability monitoring
DOXA Talent logo
DOXA Talent
501 - 1000 Employees
See more DOXA Talent offers

Job description

Role Summary

Our client is looking for a Site Reliability Engineer to join the client’s rapidly growing company in support of multiple SaaS applications. You will be responsible for cloud infrastructure, availability, reliability, performance, and security of production applications and systems.

SCHEDULE: 9:00 AM – 6:00 PM Pacific Daylight Time (12:00 AM – 9:00 AM Philippine Standard Time), follows Philippine holidays

POSITION TYPE: Full Time

WORK ARRANGEMENT: Remote

Essential Functions

  • Create, deploy, and maintain production infrastructure within the AWS accounts, using IAC/Terraform
  • Utilize various AWS services, including EC2, EKS, RDS, RedShift, S3, and IAM
  • Create, implement, and maintain automated application releases using Bitbucket Pipelines
  • Create, implement, and maintain application and infrastructure performance monitoring using Datadog or Prometheus/Loki/Grafana
  • Create, implement, and maintain application and infrastructure availability monitoring using Datadog or Prometheus/Loki/Grafana
  • Apply security practices and policies to identify and remediate security vulnerabilities
  • Oversee incident response procedures, including analysis and documentation of incidents to prevent future occurrences

Qualifications

  • A 4-year college degree (technical or quantitative science) is preferred or equivalent work experience with evidence of proficiency and achievement in virtual infrastructure management
  • 3+ years experience in cloud computing and Infrastructure as Code (IaC) (e.g., Terraform, etc.) or related field
  • Experience with cloud-native tooling (Helm Charts, ArgoCD, HashiCorp Vault, Harbor, Reloader, Grafana, Prometheus, and Loki) is a plus
  • Experience with cloud native analytics tools (ElasticSearch, MongoDB, RedShift/SnowFlake, and Looker)
  • Any AWS certification is a big plus
  • Proficient in Linux system administration and security
  • Proficient with containerization technologies, especially Kubernetes
  • Proficient with code versioning tools (e.g., Git, Bitbucket, etc.)
  • Proficient with CI/CD tools (e.g., Bitbucket Pipelines, etc.)
  • Proficient in scripting languages such as Bash and Python
  • Exposure to Open Telemetry and Distributed Tracing
  • Awareness of recent industry trends related to observability and monitoring
  • Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues
  • Excellent oral and written communication skills

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)
  • Problem Solving
  • Verbal Communication Skills

Site Reliability Engineer (SRE) Related jobs