Match score not available

Site Reliability Engineer

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Brazil

TRACTIAN 𝗕𝗥 Scaleup https://tractian.com/

51 - 200 Employees

See more TRACTIAN 𝗕𝗥 offers

Job description

Why join us

TRACTIAN is reimagining industrial systems so that every frontline maintenance worker can realize their full potential. We're building software and hardware in one place—disrupting long-standing institutions with products and experiences that better serve the ambitions of our clients.

Working at TRACTIAN allows you to push your limits, challenge the status quo and collaborate with some of the brightest minds in the industry. Our team members have the autonomy needed to accomplish challenging goals. We are a growth-stage startup and you will work directly with the founders, helping to define the vision, product and user experience. **

Engineering at TRACTIAN**

The Engineering team develops infrastructure, statistical models, and products using IoT data. Our Scientists and Engineers work together to make data—and insights derived from data—a core asset across the company. Our work is ingrained in Tractian’s decision-making process, in the efficiency of our operations and insights, and in the industry-leading experience we provide our consumers. **

What You'll Do**

As a Site Reliability Engineer (SRE), you will play a crucial role in bridging the gaps between complex business problems and solutions in the cloud. Additionally, you will be responsible for monitoring and alerts using tools such as Datadog, Sentry, and Opsgenie. You will design, build, and maintain efficient, reusable, and reliable systems that support high availability and disaster recovery. Your expertise will drive the development of automation and orchestration solutions, ensuring smooth CI/CD pipelines and scalable, secure infrastructure. Moreover, you will configure and manage proactive alerts, ensuring early detection of potential issues and appropriate corrective actions. **

Responsibilities**

Designing and implementing robust monitoring solutions using tools like Datadog, Sentry, and Opsgenie to ensure the availability, performance, and reliability of our systems.

Developing and maintaining monitoring dashboards and alerting mechanisms to proactively identify and address issues before they impact users.

Collaborating with cross-functional teams to understand system requirements and implement monitoring solutions that align with business objectives.

Analyzing system performance data to identify trends, optimize resources, and improve overall system efficiency.

Configuring and managing alerting policies and escalation procedures to ensure timely response to incidents.

Conducting root cause analysis of critical incidents and implementing preventive measures to mitigate future occurrences. **

Requirements**

Bachelor’s degree in Computer Science, Engineering, or related field.

5+ years of experience in monitoring, operations, or related field.

Proficiency in monitoring tools such as Datadog and Sentry.

Strong scripting skills in Bash, Python, Go, or similar languages for automating monitoring tasks.

Experience with logging and metrics collection systems.

Knowledge of cloud platform AWS and containerization technologies.

Ability to work collaboratively in a cross-functional team environment and communicate effectively with stakeholders. **

Bonus Points**

Experience with software development.

Fluent in English. **

Compensation**

Competitive salary and stock options

R$800/mo for you to use with food in supermarkets, restaurants and delivery

GymPass so you don't sit/work all day

Optional fully funded English / Spanish courses

30 days of paid annual leave

Education and courses stipend

Earn a trip anywhere in the world every 4 years

Day off during the week of your birthday

R$200 a month for remote work allowance

Mental health support: we cover 40% of the cost of your therapy

Health plan with national coverage and without coparticipation

Dental Insurance: we help you with dental treatment for a better quality of life.

Sports Incentive: R$300/mo extra if you practice activities

Up to R$5.000 bonus for referring new Blue Caps

I want to apply

Required profile

Experience

Level of experience: Senior (5-10 years)

Spoken language(s):

English

Check out the description to know which languages are mandatory.

Hard Skills

Datadog Scripting Continuous Monitoring Cloud Platform System Containerization Amazon Web Services

Other Skills

Problem Solving
Operations
Teamwork
Collaboration
Communication

Are you interested?

Site Reliability Engineer (SRE) Related jobs

Junior Site Reliability Engineer - OP01507

15 day ago

Dev.Pro

Full time
Remote: Poland

Senior Manager, Site Reliability Engineering

3 day ago

Zscaler

Full time
Remote: India

Site Reliability Engineer (SRE) (Blockchain)

18 day ago

CryptoRecruit

Full time
Remote: United Kingdom

SENIOR SITE RELIABILITY ENGINEER - FOCUS ON DATA PLATAFORM - (26964)

24 day ago

Bosch

Full time
9 - 19K
Remote: Brazil

[Job-18346] SRE Systems Architect, Brazil

14 day ago

CI&T

Full time
Remote: Brazil

See more Site Reliability Engineer (SRE) jobs