Why join us
TRACTIAN is reimagining industrial systems so that every frontline maintenance worker can realize their full potential. We're building software and hardware in one place—disrupting long-standing institutions with products and experiences that better serve the ambitions of our clients.
Working at TRACTIAN allows you to push your limits, challenge the status quo and collaborate with some of the brightest minds in the industry. Our team members have the autonomy needed to accomplish challenging goals. We are a growth-stage startup and you will work directly with the founders, helping to define the vision, product and user experience. **
Engineering at TRACTIAN**
The Engineering team develops infrastructure, statistical models, and products using IoT data. Our Scientists and Engineers work together to make data—and insights derived from data—a core asset across the company. Our work is ingrained in Tractian’s decision-making process, in the efficiency of our operations and insights, and in the industry-leading experience we provide our consumers. **
What You'll Do**
As a Site Reliability Engineer (SRE), you will play a crucial role in bridging the gaps between complex business problems and solutions in the cloud. Additionally, you will be responsible for monitoring and alerts using tools such as Datadog, Sentry, and Opsgenie. You will design, build, and maintain efficient, reusable, and reliable systems that support high availability and disaster recovery. Your expertise will drive the development of automation and orchestration solutions, ensuring smooth CI/CD pipelines and scalable, secure infrastructure. Moreover, you will configure and manage proactive alerts, ensuring early detection of potential issues and appropriate corrective actions. **
Responsibilities**
Designing and implementing robust monitoring solutions using tools like Datadog, Sentry, and Opsgenie to ensure the availability, performance, and reliability of our systems.
Developing and maintaining monitoring dashboards and alerting mechanisms to proactively identify and address issues before they impact users.
Collaborating with cross-functional teams to understand system requirements and implement monitoring solutions that align with business objectives.
Analyzing system performance data to identify trends, optimize resources, and improve overall system efficiency.
Configuring and managing alerting policies and escalation procedures to ensure timely response to incidents.
Conducting root cause analysis of critical incidents and implementing preventive measures to mitigate future occurrences. **
Requirements**
Bachelor’s degree in Computer Science, Engineering, or related field.
5+ years of experience in monitoring, operations, or related field.
Proficiency in monitoring tools such as Datadog and Sentry.
Strong scripting skills in Bash, Python, Go, or similar languages for automating monitoring tasks.
Experience with logging and metrics collection systems.
Knowledge of cloud platform AWS and containerization technologies.
Ability to work collaboratively in a cross-functional team environment and communicate effectively with stakeholders. **
Bonus Points**
Experience with software development.
Fluent in English. **
Compensation**
Competitive salary and stock options
R$800/mo for you to use with food in supermarkets, restaurants and delivery
GymPass so you don't sit/work all day
Optional fully funded English / Spanish courses
30 days of paid annual leave
Education and courses stipend
Earn a trip anywhere in the world every 4 years
Day off during the week of your birthday
R$200 a month for remote work allowance
Mental health support: we cover 40% of the cost of your therapy
Health plan with national coverage and without coparticipation
Dental Insurance: we help you with dental treatment for a better quality of life.
Sports Incentive: R$300/mo extra if you practice activities
Up to R$5.000 bonus for referring new Blue Caps
I want to apply
Dev.Pro
Zscaler
CryptoRecruit
Bosch
CI&T