Match score not available

Senior Site Reliability (SRE) Engineer

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Brazil

Offer summary

Qualifications:

Bachelor's degree in computer science or related field, 5+ years of experience in DevOps engineering, Strong understanding of SRE and DevOps principles, Experience with cloud computing platforms like AWS or Azure, Proficient in scripting languages, e.g., Python or Bash.

Key responsabilities:

Troubleshoot and resolve infrastructure issues
Design and maintain scalable infrastructure solutions
Monitor system performance and prevent outages
Collaborate with cross-functional teams for integration
Develop automation scripts to improve efficiency

Sky Systems, Inc. (SkySys) Information Technology & Services Startup https://myskysys.com/

11 - 50 Employees

See more Sky Systems, Inc. (SkySys) offers

Job description

Role: Senior Site Reliability Engineer
Position Type: Full-Time Contract (40hrs/week)
Contract Duration: 6-8 Months+
Work Hours: Eastern Standard Time (EST)
Work Schedule: 8 hours/day (Mon-Fri)
Location: 100% Remote in Brazil

Overview:

The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, scalability, and performance of Client's digital platforms and infrastructure. As part of a global team of highly skilled engineers, the SRE will work on challenging and impactful projects that directly contribute to the company's core business activities. Client is committed to fostering a culture of innovation, collaboration, and continuous learning, providing the SRE with an opportunity to grow and develop their skills while making a positive impact on the world.

Main Accountabilities:

Troubleshoot and resolve infrastructure issues and incidents in a timely manner.
Design, implement, and maintain reliable and scalable infrastructure solutions to support Client's digital platforms and applications.
Monitor and analyze system performance, identify potential issues, and take proactive measures to prevent outages and disruptions.
Collaborate with cross-functional teams, including software engineers, product managers, and operations personnel, to ensure seamless integration of infrastructure and application components.
Develop and implement automation scripts and tools to streamline infrastructure management tasks and improve operational efficiency.
Stay up to date with industry best practices and emerging technologies in the field of site reliability engineering.
Close cooperation with DevOps and Cloud engineers.

Impact/Dimensions:

Contributes to the reliability and uptime of Client's digital platforms, which are critical for the company's global operations and customer satisfaction.
Works on projects that have a direct impact on Client's revenue and profitability.
The individual in this role will have a significant impact on the efficiency and effectiveness of Client's technology operations and will be responsible for driving continuous improvement initiatives that save the company time and money.

Key Performance Indicators (KPIs):

Mean Time to Repair (MTTR) for critical systems
System uptime and availability
Number of incidents and outages prevented
Customer satisfaction with infrastructure performance

Major Opportunities and Decisions:

Identifying and mitigating potential risks to infrastructure stability and performance.
Making decisions on infrastructure investments and resource allocation to optimize cost-effectiveness and scalability.
Balancing the need for innovation with the requirement for stability and reliability in infrastructure operations.

Management/Leadership:

Leads and mentors a team of junior SREs and infrastructure engineers.
Provides technical guidance to cross-functional teams on infrastructure-related matters.
Actively participates in shaping the company's infrastructure strategy and roadmap.

Key Relationships, Stakeholders & Interfaces (External & Internal):

Works closely with software engineering teams to ensure seamless integration of infrastructure and application components.
Development teams
Infrastructure teams
Business stakeholders
Vendors and partners

Knowledge and Technical Competencies:

Strong understanding of SRE & DevOps principles and practices.
Experience with CI/CD Azure DevOps platform.
Knowledge of infrastructure management tools such as Ansible, Puppet, or Chef.
Solid experience with containerization such as Docker and orchestration tools such as Kubernetes.
Solid knowledge about security aspects in cloud and on-premises.
Proficient in scripting languages such as Python or Bash.
Experience with cloud computing platforms such as AWS and Azure where GCP is preferred.
Experience with monitoring software such as Datadog, Zabbix, Kibana etc.
Hand-on coding, deploying, and supporting large scale, serverless architectures.
Infrastructure provisioning with Terraform or CloudFormation (IaaC).
Experience with Linux and Windows operating systems.
Strong problem-solving and analytical skills.
Excellent communication and interpersonal skills.