Offer summary

Qualifications:

Bachelor's degree in Computer Science, Computer Engineering, or related field, with 5+ years of experience; or a Master's degree with relevant experience., Strong troubleshooting skills for systems and infrastructure., Experience with microservice architecture and monitoring tools like Prometheus or Nagios., Knowledge of Linux, Windows containers, orchestration tools, and Infrastructure as Code software..

Key responsibilities:

Collaborate with development teams to improve system reliability and resilience.

Analyze development decisions to assess their impact on reliability metrics.

Assist teams in debugging, troubleshooting, and resolving high-impact issues.

Maintain documentation related to systems, incident management, and security.

Job description

To apply for this position, you must be based in the Americas, preferably Latin America (the United States of America is not applicable). Applications from other locations will be disqualified from this selection process.

We are....

a cutting-edge e-commerce company developing products for our own technological platform.Our creative, smart and dedicated teams pool their knowledge and experience to find the best solutions to meet project needs, while maintaining sustainable and long-lasting results. How? By making sure that our teams thrive and develop professionally. Strong advocates of hiring top talent and letting them do what they do best, we strive to create a workplace that allows for an open, collaborative and respectful culture.

What you will be doing

Improving reliability through the construction of systems and software, your primary role will be that of a software engineer. You won´t be writing loads of code but you will be able to see the bigger picture and you´ll really understand how development decisions impact wider systems. As an integral part of the company, you will collaborate closely with our various development teams to ensure that they are developing for reliability and resilience. Analyzing development decisions to understand how they will impact key reliability metrics, measured by Service Level Objectives and error budgets. These metrics will be the foundation for all coding and architecture configurations.

Some of your responsibilities will include:

Interacting with other engineering teams to help them improve the availability, reliability, and resilience of our infrastructure and systems.
Using your analytical skills to help engineering teams debug and fix issues.
Helping teams identify, troubleshoot, and resolve high-impact issues.
Practicing sustainable incident response, facilitating incident resolution, and performing blameless postmortems.
Creating and keeping up-to-date required documentation related to all systems/solutions in their area of responsibility.
Building knowledge in incident & problem management, change management, and security.
On-calls availability.

Knowledge and skills you need to have

BS. in Computer Science, Computer Engineering, or a related field with 5 years of relevant experience; or M.S. in Computer Science, Computer Engineering or a related field (if you don´t meet this requirement, an equivalent combination of experience and/or education will be taken into consideration)
5+ years troubleshooting systems and infrastructure
Software development background with the ability to analyze and understand existing code
Familiar with microservice-based architecture
Proven experience with any Monitoring systems (Prometheus, Nagios, Zabbix, New Relic, or any other).
Understanding the fundamental principles of continuous integration, testing, and deployment.
Experience with Linux and Windows-based containers and containers orchestration such as Docker, Kubernetes, Docker Swarm, etc.
Knowledge of Infrastructure as Code software (Ansible, Terraform).
Experience with Log Management tools like Graylog, ELK, or similar technologies.
Basic understanding of TCP/IP (routing, subnets, ports, etc.).
Working knowledge of HTTP layer infrastructure including load balancers and Web servers.
Business Analysis experience.
Flexibility to work with departments in different time-zones.
English & Spanish fluency is a must.

Why choose us?

We provide the opportunity to be the best version of yourself, develop professionally, and create strong working relationships, whether working remotely or on-site. While offering a competitive salary, we also invest in our people's professional development and want to see you grow and love what you do. We are dedicated to listening to our team's needs and are constantly working on creating an environment in which you can feel at home.

We offer a range of benefits to support your personal and professional development: