Bachelor’s degree with 5+ years of experience OR Associate degree with 7+ years for senior roles, Minimum 4 years of AWS Cloud Platform experience in Production with various technologies.
Key responsabilities:
Maintain high availability, reliability, and performance of application services through monitoring and automation
Provide technical guidance and suggestions for improvement in deployment and operational processes
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
“Treat your clients, consultants and internal employees well, and never stop innovating.” - Jeremy Langevin, Cofounder & CEO, Talent
Since Horizontal's formation in 2003, these words have been the guiding perspective that has delivered continual growth and landed us in the top 2% of all staffing companies worldwide.
At Horizontal, we pride ourselves on matching exceptional talent with outstanding companies.
Whether it’s hiring great candidates or landing the perfect IT, marketing & creative or business strategy job, Horizontal will help enable your business.
If you’re unsure what type of role you want to hire or apply for, don’t worry. Our talent, team and project solutions will ensure that your needs are met.
Horizontal Talent: We help the world work better.
Ensures that application services are highly available, reliable, and performant through monitoring and alerting.
Serves as the primary subject matter expert for the application services towards preventing (pro-active) as well as troubleshooting and mitigating (re-active) service availability/performance issues.
Develops tools or automation to improve our ability to effectively monitor application services in a large-scale and complex environment. Evaluates and implements improvement of existing tools and monitoring thresholds.
Provides technical assistance and operational guidelines for business operations and application development to ensure applications are running optimally in production, test, and development environments.
Designs, implements, and maintains SRE dashboard, bots and other automation based on the current operational needs and current release changes. Evaluate and suggest improvement of the dashboard, bots, and other automation.
Identifies repetitive, manual, and scalable tasks and automates them using scripting/programming languages or tools.
Identifies key operational metrics, follows through by defining and designing methods to programmatically capture the data necessary to create them.
Functions as the subject matter expert for coordinating and managing the deployment process and support of the full lifecycle of applications in Amazon Web Services.
Understands and evaluates current application release changes to identify any potential addition or modification needs to current SRE program.
Serves as a technical resource to internal and external IT groups. May provide subject matter expertise for third party products and utilities used to support enterprise-wide applications.
Consults with developers on issues related to the impact of development on the infrastructure, works with system engineers and developers to define server configuration settings, leads the migration of code through staging environments to production, and provides assistance to software quality assurance technicians during system acceptance testing.
Influences new application and infrastructure designs and architectures, as well as create standards and guidelines for large-scale distributed systems with a focus on operability.
Create and maintain cloud operations processes and technical documentation.
Provide technical mentorship and training to team members.
Perform other duties as assigned or requested.
Adhere to the Bank's attendance policies through regular and prompt attendance.
Problem Solving Skills
Logical analysis: Requires thinking through and solving problems step by step, completing root cause analysis, often looking beyond the obvious solution to problems and digging deeper for the best solutions.
Requires following vaguely defined procedures. Decisions are consistently made within reason and affect the work group or department.
Working in a group environment: Requires working as part of a group to solve issues and problems.
Qualifications
Bachelor’s degree and 5+ years of experience OR Associate degree and/or Technical Bootcamp Certificate with 7+ years of experience for Sr. or Lead Site Reliability Engineer, DevOps Engineer, or Platform Engineer
4+ years of working experience with AWS Cloud Platform technologies, infrastructure, and practices in Production environment including CloudWatch, ECS, Lambda, Canaries, DynamoDB, RDS, PostgreSQL, S3, API Gateway, Elastic Load Balancer, Athena, AWS X-Ray, SQS
2+ years of working experiences with creating automation or solution development
2+ years of working experiences with GitLab, CDK, Terraform and CI/CD pipeline
2+ years of working experiences with cloud technology including Grafana, OpenSearch, and Docker
2+ years of working experiences with Infrastructure as Code, Configuration as Code, Alerts and Monitoring as Code
Ability to read, comprehend, and create complex technical documentation.
Ability to comprehend business operational requirements.
Demonstrated ability to analyze complex and communicate complex technical analysis to technical and non-technical audiences.
Strong communication skills; verbal & written. Ability to articulate clear and concise instructions and resolutions.
Excellent problem solving, organizational and analytical skills
Knowledge Areas Preferred
Traditional and Cloud infrastructure components and techniques in Production and Lower environments, including virtualization, elasticity, networking, and load balancing
Development, QA, and Production Deployment patterns and version control (e.g., zero downtime, blue/green deployments, canary releases, etc.)
Cloud Operating Console commands, administration, and configuration
Experience in coding languages, such as Python, Typescript, NodeJS, .Net, Java,
Understanding of Agile and DevOps practices
Familiar with ITIL framework
Familiar with Chaos Engineering
Required profile
Experience
Level of experience:Senior (5-10 years)
Industry :
Human Resources, Staffing & Recruiting
Spoken language(s):
English
Check out the description to know which languages are mandatory.