Match score not available

Senior Site Reliability Engineer (SRE) - Guidewire Cloud Platform (Application)

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Proven experience as a Senior SRE, Experience with automation and monitoring tools, Strong problem-solving skills, Proficient in Linux and scripting languages, Experience with AWS and Kubernetes ecosystems.

Key responsabilities:

  • Collaborate to troubleshoot and solve issues
  • Develop automated runbooks for proactive issue handling
  • Monitor and enhance application reliability and performance
  • Document incidents and improve processes continuously
  • Participate in on-call rotations for service availability
Guidewire Software logo
Guidewire Software Large https://www.guidewire.com
1001 - 5000 Employees
See more Guidewire Software offers

Job description

About Guidewire
At Guidewire, we deliver the software that Property and Casualty (P&C) insurance companies rely on to protect their customers during crises, natural disasters, accidents, and cyber risks. Our core applications enable insurers to sell and underwrite policies, settle claims, and bill their customers. We also offer a suite of innovative products for data management, digital portals, and predictive analytics.
Hundreds of insurers worldwide use Guidewire's products, running on our cutting-edge Guidewire Cloud Platform, to handle billions of dollars in business. We are dedicated to providing the tools and technology that help insurers protect and support their customers when they need it most.

The Opportunity
We are searching for a Senior Site Reliability Engineer hungry for a rare chance to transform insurance with the industry's leading cloud platform. As a member of the SRE-Application team, you'll be responsible for building and evolving our SRE practice for the application running on our Guidewire Cloud Platform. This is an opportunity to apply your expertise in automation, software engineering, and operational discipline to ensure our cloud-based solutions' reliability, performance, and scalability.

What you'll do
  • Collaborate with development teams to troubleshoot and solve problems, reducing customer impact.
  • Develop automated runbooks and implement measures to handle issues proactively.
  • Apply sound engineering principles and mature automation to our operating environments.
  • Monitor, maintain, and enhance the reliability and performance of applications on our Guidewire Cloud Platform.
  • Leverage your automation and software engineering expertise to optimize systems and eliminate toil.
  • Document and examine incidents to improve processes and prevent future occurrences continuously
  • Stay up-to-date with the latest industry trends, tools, and best practices in site reliability engineering.
  • Contribute to a culture of innovation, learning, and continuous improvement.
  • Participate in on-call rotations to ensure the availability and reliability of our services.

  • What you'll bring:
  • Proven experience as a Senior SRE or similar role, with a track record of improving system reliability
  • Strong problem-solving skills and the ability to analyze complex systems and devise effective solutions
  • Excellent collaboration and communication abilities to work cross-functionally and clearly document processes
  • Experience with automation, monitoring, and performance optimization tools and techniques
  • Dedication to maximizing uptime, scalability, and delivering an exceptional end-user experience
  • A passion for technology and a strong desire to continuously learn and grow your skills
  • Alignment with Guidewire's mission to leverage technology to help protect and support others

  • Required skills:
  • Proven experience designing and deploying SLI’s, SLO’s, and Error Budgets
  • Proven experience leveraging application performance monitoring (APM) and telemetry tools to ensure we maintain expected service levels for our applications.
  • Proven experience triaging and debugging distributed systems on cloud infrastructure. 
  • Proven experience in designing and engineering CICD pipelines within K8S and legacy ecosystems
  • Proven experience in designing and engineering monitors, dashboards, and synthetic transactions in Datadog
  • Proven experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud-native approaches
  • Proven experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible
  • Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale
  • Proficiency with Linux system administration and the ability to program/script using Python, Go, Java, shell, or equivalent.

  • Preferred Skills:
  • SRE Certified in multiple categories
  • AWS Certified in multiple categories
  • Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design
  • Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions
  • Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
  • Why Guidewire
    This is an opportunity to join a mission-driven company and make a real impact in the lives of people facing challenges. You'll work with cutting-edge technology, collaborate with talented peers, and grow your skills in a culture that values innovation, teamwork, and work-life balance. We offer competitive compensation, comprehensive benefits, and opportunities for career development.
    If you're a Senior SRE who combines deep technical expertise with a passion for problem-solving and a commitment to reliability, we'd love to hear from you. Join us in building the software that helps insurers care for their customers when they need it most.
    This position requires participation in mandatory on-call rotations to ensure the availability and reliability of our services. This includes responding to incidents and alerts outside of regular business hours, on weekends, and during holidays, as per the established on-call schedule. Candidates must be willing and able to fulfill this critical responsibility.

    Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Collaboration
    • Problem Solving
    • Lifelong Learning
    • Reliability
    • Communication

    Site Reliability Engineer Related jobs