Match score not available

Site Reliability Engineer - Manager

72% Flex

Remote:

Full Remote

Experience:

Senior (5-10 years)

Work from:

United States

Offer summary

Qualifications:

Bachelor's Degree, 7+ years SRE experience with GCP, AWS, and/or Azure, 2+ years IaC automation support, coding/scripting experience.

Key responsabilities:

Manage GCP’s SRE team, service levels, Stack Overflow channel
Support platform RBAC, Firewall, SRE strategies
Develop monitoring/alerting, conduct incident retrospective
Coordinate on-call rotations, 24x7 support, conduct training sessions
Collaborate with other teams, troubleshoot issues, root cause analysis

Huntington National Bank Financial Services Large https://www.huntington.com/

10001 Employees

HQ: Columbus

See more Huntington National Bank offers

Job description

Your missions

Description

Summary:

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The position reports into the Chief Development Office (CDO) and will manage a team of SRE’s that support GCP.

Job Description

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The SRE manager will lead a team of SRE’s to develop Infrastructure as Code (IaC) to provide platform, infrastructure, observability, and security capabilities via Terraform and Pipeline automation. The qualified candidate will collaborate with the CDO, Application, Incident, Security, and Change Management teams to manage the ITIL process, reduce toil, enhance reliability, and drive innovation for the GCP. Candidate will join a team of developers whose goal is to enable via automation and a culture of support, continuous improvement, and learning.

Responsibilities:

Manage GCP’s SRE team, discipline, maintain service levels, manage cost, and enhance operations
Manage Stack Overflow channel, GCP releases and Disaster Recovery exercises
Manage Platform RBAC, Firewall and User Access certifications
Support GCP’s Service Now platform and application configurations
Develop SRE strategies, best practices, and knowledge base
Build monitoring/alerting/availability/uptime into product and reduce toil
Participate in the DevSecOps model to build, test, and implement SRE cloud solutions via IaC
Collaborate with Incident/CSOC/SRE teams to troubleshoot issues and perform root cause analysis
Provide 24x7 support for the GCP and coordinate on-call rotations
Conduct periodic blameless incident retrospective and focus on continuous improvement
Conduct training sessions and simulated game days
Experience with scripting and programming languages and concepts
Demonstrate knowledge of GCP, CLI, services and integrations
Demonstrate knowledge of DevSecOps tool chains and processes
Demonstrate knowledge of IaC software: Terraform, CLI, CDM, CFT, ARM, etc.
Demonstrate knowledge of Security as Code principles, policy, best practices, and tools
Demonstrate knowledge of Credential, Certificate and Encryption best practices, rotation, and policies
Experience using monitoring tools like Cloud Logging, Splunk, Dynatrace to evaluate system health, research issues, identify root causes and provide solution options
Additional duties as required

Basic Qualifications:

Bachelor's Degree
7+ years of SRE experience with GCP, AWS, and/or Azure

Preferred Qualifications:

Minimum of 2 years of supporting IaC automation, preferably Terraform
Minimum of 2 years of coding/scripting experience
Self-motivated problem solver
Experience troubleshooting cloud-based technologies
Cloud (GCP, AWS, Azure) and/or IaC (Terraform) certifications and/or work experience
Experience in Agile delivery, Azure DevOps Services, CI/CD Pipelines, Git, Snyk, Cyberark, Splunk, etc.
Experience with cloud security, IAM, Security Scans and custom polices
Full stack engineering knowledge – application, network, infrastructure, and security
Understanding of containers and serverless computing concepts
Background in application, database, and infrastructure monitoring tools
Willingness to guild others and outstanding communication skills
Familiarity with financial industry

Exempt Status: (Yes = not eligible for overtime pay) (No = eligible for overtime pay)

Yes

Workplace Type:

Huntington is an equal opportunity and affirmative action employer and is committed to providing equal employment opportunities for all regardless of race, color, religion, sex, national origin, age, disability, sexual orientation, veteran status, gender identity and expression, genetic information, or any other basis protected by local, state, or federal law.

Tobacco-Free Hiring Practice: Visit Huntington's Career Web Site for more details.

Agency Statement: Huntington does not accept solicitation from Third Party Recruiters for any position

Required profile

Experience

Level of experience: Senior (5-10 years)

Industry :

Financial Services

Spoken language(s):

English

Check out the description to know which languages are mandatory.

Hard Skills

Dynacad Infrastructure as Code (IaC)Terraform Pipeline Management ITIL DevSecOps Scripting High-Level Programming Languages Cloud Computing Monitoring Clari CDI Splunk

Soft Skills

Remote Team Management
Problem Solving
Interpersonal Skills
Leadership
Team Collaboration

Are you interested?

Go Premium: Access the World's Largest Selection of Remote Jobs!

Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
Try out our Premium Benefits with a 7-Day FREE TRIAL.
No obligations. Cancel anytime.

Upgrade to Premium

Find more Site Reliability Engineer jobs

SEE MORE JOBS