Job description

Work Location - SYDNEY

Equifax is seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. At Equifax, SRE is a vital discipline that merges software and systems engineering to build and maintain large-scale, distributed, and fault-tolerant systems. We're dedicated to ensuring our internal and external services consistently meet or exceed reliability and performance expectations, all while upholding Equifax's core engineering principles.

We believe in an engineering-driven approach to operations, constantly developing solutions to operational challenges. Our SREs are integral to overall system operations, leveraging a diverse set of tools and methodologies to tackle a wide range of problems. This includes practices like optimizing time spent on operational work, conducting blameless postmortems, and proactively identifying and preventing potential outages.

What you'll do

As a Site Reliability Engineer, you will:

Manage system uptime across cloud-native (AWS, GCP) and hybrid architectures.
Develop Infrastructure as Code (IaC) patterns that adhere to security and engineering standards, utilizing technologies like Terraform, cloud CLI scripting, and cloud SDK programming.
Build CI/CD pipelines for application and cloud architecture patterns, leveraging platforms such as Jenkins and cloud-native toolchains.
Create automated tooling to deploy service requests for production changes.
Develop comprehensive and detailed runbooks to facilitate service detection, remediation, and restoration.
Troubleshoot and triage complex distributed architecture service maps.
Participate in on-call rotations for high-severity application incidents and contribute to improving runbooks to reduce Mean Time To Recovery (MTTR).
Lead availability blameless postmortems and own the action items for preventing future recurrences.

What experience you need

BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent job experience.
7-10 years of experience in software engineering, systems administration, database administration, and networking.
5+ years of experience developing and/or administering software in a public cloud environment.
Experience monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
Proficiency in languages such as Python, Bash, Java, Go, JavaScript, and/or Node.js.
Demonstrable cross-functional knowledge across systems, storage, networking, security, and databases.
System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansible, and/or containers (Docker, Kubernetes, etc.).
Proficiency with continuous integration and continuous delivery tooling and practices.
Cloud Certification Strongly Preferred.
MUST HAVE VALID WORKING RIGHTS IN AUSTRALIA

What could set you apart

Knowledge in coding using Python and Ansible.
Having previous DevSecOps experience is a big plus.
Experience in GCP.
Experience with large scale enterprise or critical infrastructure transformation

Primary Location:

AUS-Sydney-Blue-Street

Function:

Function - Tech Engineering and Service Ops

Schedule:

Full time

Required profile