Match score not available

Site Reliability Engineer - Manager

72% Flex
Remote: 
Full Remote
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor's Degree, 7+ years SRE experience with GCP, AWS, and/or Azure, 2+ years IaC automation support, coding/scripting experience.

Key responsabilities:

  • Manage GCP’s SRE team, service levels, Stack Overflow channel
  • Support platform RBAC, Firewall, SRE strategies
  • Develop monitoring/alerting, conduct incident retrospective
  • Coordinate on-call rotations, 24x7 support, conduct training sessions
  • Collaborate with other teams, troubleshoot issues, root cause analysis
Huntington National Bank logo
Huntington National Bank Financial Services Large https://www.huntington.com/
10001 Employees
HQ: Columbus
See more Huntington National Bank offers

Job description

Logo Jobgether

Your missions

Description

Summary:

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The position reports into the Chief Development Office (CDO) and will manage a team of SRE’s that support GCP. 

Job Description

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The SRE manager will lead a team of SRE’s to develop Infrastructure as Code (IaC) to provide platform, infrastructure, observability, and security capabilities via Terraform and Pipeline automation.  The qualified candidate will collaborate with the CDO, Application, Incident, Security, and Change Management teams to manage the ITIL process, reduce toil, enhance reliability, and drive innovation for the GCP. Candidate will join a team of developers whose goal is to enable via automation and a culture of support, continuous improvement, and learning.

Responsibilities:

  • Manage GCP’s SRE team, discipline, maintain service levels, manage cost, and enhance operations
  • Manage Stack Overflow channel, GCP releases and Disaster Recovery exercises
  • Manage Platform RBAC, Firewall and User Access certifications
  • Support GCP’s Service Now platform and application configurations
  • Develop SRE strategies, best practices, and knowledge base
  • Build monitoring/alerting/availability/uptime into product and reduce toil
  • Participate in the DevSecOps model to build, test, and implement SRE cloud solutions via IaC
  • Collaborate with Incident/CSOC/SRE teams to troubleshoot issues and perform root cause analysis
  • Provide 24x7 support for the GCP and coordinate on-call rotations
  • Conduct periodic blameless incident retrospective and focus on continuous improvement
  • Conduct training sessions and simulated game days
  • Experience with scripting and programming languages and concepts
  • Demonstrate knowledge of GCP, CLI, services and integrations
  • Demonstrate knowledge of DevSecOps tool chains and processes
  • Demonstrate knowledge of IaC software: Terraform, CLI, CDM, CFT, ARM, etc.
  • Demonstrate knowledge of Security as Code principles, policy, best practices, and tools
  • Demonstrate knowledge of Credential, Certificate and Encryption best practices, rotation, and policies
  • Experience using monitoring tools like Cloud Logging, Splunk, Dynatrace to evaluate system health, research issues, identify root causes and provide solution options
  • Additional duties as required

Basic Qualifications:

  • Bachelor's Degree
  • 7+ years of SRE experience with GCP, AWS, and/or Azure

Preferred Qualifications:

  • Minimum of 2 years of supporting IaC automation, preferably Terraform
  • Minimum of 2 years of coding/scripting experience
  • Self-motivated problem solver
  • Experience troubleshooting cloud-based technologies
  • Cloud (GCP, AWS, Azure) and/or IaC (Terraform) certifications and/or work experience
  • Experience in Agile delivery, Azure DevOps Services, CI/CD Pipelines, Git, Snyk, Cyberark, Splunk, etc.
  • Experience with cloud security, IAM, Security Scans and custom polices
  • Full stack engineering knowledge – application, network, infrastructure, and security
  • Understanding of containers and serverless computing concepts
  • Background in application, database, and infrastructure monitoring tools
  • Willingness to guild others and outstanding communication skills
  • Familiarity with financial industry


Exempt Status: (Yes = not eligible for overtime pay) (No = eligible for overtime pay)

Yes

Workplace Type:

Huntington is an equal opportunity and affirmative action employer and is committed to providing equal employment opportunities for all regardless of race, color, religion, sex, national origin, age, disability, sexual orientation, veteran status, gender identity and expression, genetic information, or any other basis protected by local, state, or federal law.

Tobacco-Free Hiring Practice: Visit Huntington's Career Web Site for more details.

Agency Statement: Huntington does not accept solicitation from Third Party Recruiters for any position

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Financial Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Soft Skills

  • Remote Team Management
  • Problem Solving
  • Interpersonal Skills
  • Leadership
  • Team Collaboration

Go Premium: Access the World's Largest Selection of Remote Jobs!

  • Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
  • Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
  • Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
  • Try out our Premium Benefits with a 7-Day FREE TRIAL.
    No obligations. Cancel anytime.
Upgrade to Premium

Find more Site Reliability Engineer jobs