Logo for TWO95 International, Inc

DevOps / Site Reliability Engineer

Roles & Responsibilities

  • 8+ years in SRE, DevOps, or Platform Engineering roles; 2+ years in a senior or lead capacity.
  • Deep knowledge of Kubernetes, containers, and cloud-native infrastructure.
  • Proficiency in automation and scripting using Bash, Python, or Go; Hands-on experience with CI/CD pipelines and release engineering.
  • Expert-level familiarity with IaC tools (Terraform preferred) and GitOps workflows (ArgoCD or similar).

Requirements:

  • Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform.
  • Lead incident response, root-cause analysis, and drive actionable postmortems.
  • Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team.
  • Execute and continuously improve disaster recovery and business continuity plans.

Job description

Job Title: Lead SRE (Site Reliability Engineer )

Location: Remote Work

Type: 6+ Month Contract to hire

Rate: $Open /hr.

Pl forward updated resume to deivy.malli@two95intl.com  and include your rate requirement along with your contact details with a suitable time when we can reach you.

 

Responsibilities

·         Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform.

·         Lead incident response, root-cause analysis, and drive actionable postmortems.

·         Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team.

·         Maintain and improve monitoring, alerting, and observability (Grafana, Prometheus, New Relic, etc).

·         Manage, operate and recommend improvement of mo

·         Execute and continuously improve disaster recovery and business continuity plans.

·         Partner with platform engineering, QA, and development teams to ensure operational readiness.

·         Establish and maintain runbooks, operational standards, and reliability best practices.

·         Provide leadership, mentorship, and clear communication during both normal operations and incidents.

·         Optimize cloud and Kubernetes environments for reliability, performance, and scalability.

 

Requirements

Qualifications

·         8+ years in SRE, DevOps, or Platform Engineering roles; 2+ years in a senior or lead capacity.

·         Strong experience supporting production environments with strict SLAs and high uptime requirements.

·         Deep knowledge of Kubernetes, containers, and cloud-native infrastructure.

·         Proficiency in automation and scripting using Bash, Python, or Go.

·         Hands-on experience with CI/CD pipelines and release engineering in modern environments.

·         Expert-level familiarity with IaC tools (Terraform preferred).

·         Strong understanding of monitoring, alerting, logging, and observability tooling.

·         Experience implementing and managing GitOps workflows (ArgoCD or similar).

·         Demonstrated ability to lead incidents and communicate effectively with technical and non-technical stakeholders.

·         Solid understanding of disaster recovery planning, resilience practices, and system hardening.

 

Site Reliability Engineer (SRE) Related jobs

Other jobs at TWO95 International, Inc

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.