Match score not available

Senior Site Reliability Engineer

fully flexible
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

8+ years IT experience, 3+ SRE/DevOps, Strong AWS, Kubernetes, logging skills.

Key responsabilities:

  • Maintain scalable AWS infrastructure with Kubernetes.
  • Design automated deployment pipelines and perform capacity planning.
Plytix logo
Plytix Scaleup https://www.plytix.com/
51 - 200 Employees
See more Plytix offers

Job description

About the role

We’re on the search for a highly experienced Senior Site Reliability Engineer (SRE) ready to join our SRE team in monitoring, automating, and alerting our systems. Our Development team is based in Malaga, but if you code your best from the comfort of your own home or even from a different city—the location doesn’t matter, you do!

What is Plytix?
This is where you’d typically see a long paragraph of boring text that no one reads, peppered with corporate buzzwords that don’t mean anything. But Plytix is no typical company, so we’ll spare you from that. Instead, watch this video, and if you like what you see, keep scrolling.


What´s the opportunity?

We are looking for a skilled and experienced Site Reliability Engineer to join our team, and to help us scale and keep our applications reliable under certain parameters of quality (low latencies, low error rate, and so on).


Our current technical stack is simple, so the ideal candidate should have a strong background in:

- AWS: Where our infrastructure runs.

- K8s: We run our applications in Kubernetes.

- Kong: The gateway.

- Python: All our backend is developed in Python.

- MongoDB / Postgres: We save our data in these databases.

- Redis: Cache and temporary result backend.

- RabbitMQ: Our message broker.

- Elastic: Our logging backend.

- Grafana/Prometheus: Monitoring.

What will you be doing?

- Maintain reliable and scalable infrastructure on AWS using Kubernetes and other tools.

- Monitor and troubleshoot production systems and respond to incidents in a timely manner.

- Design and implement automated deployment and testing pipelines to ensure quality and reliability.

- Perform capacity planning and scaling of infrastructure to support growing demand.

- Collaborate with development teams to ensure that applications are designed for reliability and scalability.

- Develop and maintain monitoring and alerting systems to detect issues before they become critical.

- Continuously improve the reliability and performance of our infrastructure and applications.

- Be part of releases. You’ll have the last word when a team wants to deploy a new release.

After 1 month:
You’ll have a good grasp of our favorite tools and processes, and your code will start flowing. You’ll start learning our current architecture and services, performing small tasks so that you can start to show off your engineering skills. You’ll take part in any engineering discussions and production procedures.
After 3 months:

You should know almost everything about our infrastructure and have an idea on what things should be improved. At this point you’ll be ready to propose your ideas to improve our system:

  • Observability: Improve our logging system and make it easier for other teams to see the status of the platform.

  • Automation: Propose some automations for some processes.

  • High-availability: Prepare and test our system to make it more efficient and reliable.

  • Alerting: Improve our alerting system.

  • Disaster recovery: Have procedures for disaster recovery.

6 months in:

You’re fully integrated into the team at this point. You know all our strengths and weaknesses, and should be able to prepare a roadmap to improve all our systems. You’re also collaborating with other departments to find the best possible solutions for our customers to have the best quality system. You’re delivering high-quality, fast-running solutions that’ll be tested to iron out any potential errors. By now, you’re confident in your role, performing tasks in development as well as production.

Who will you be working with?

You’ll be working with your fellow SRE engineer making sure that everything is running smoothly, but they aren’t the only shining face you’ll be working with on a daily basis. You’ll also be working with the QA, Development, and DevOps teams to keep customers smiling about how amazing their Plytix PIM is.


If you’re curious about what it's like working at our office, take a peek into what Plytix has to offer (you know you want to ????).
We expect you to:

- 8+ years of experience in IT and 3+ years of experience in a Site Reliability Engineering or DevOps role.

- Strong experience with AWS, Kubernetes, and logging/monitoring systems.

- Experience with automation and configuration management tools (e.g. Ansible, Terraform).

- Strong understanding of networking and security principles.

- Excellent troubleshooting and problem-solving skills.

- Ability to work independently and as part of a team in a fast-paced environment.

- Strong communication and collaboration skills.

Nice to have:

- Experience with other cloud providers (e.g. GCP, Azure).

- Experience with other container orchestration systems (e.g. Docker Swarm, Mesos).

- Experience with other messaging systems (e.g. Kafka).

- Experience with other programming languages (e.g. Go, Java).

Why work at Plytix?
  • Voted Malaga’s #1 employer by Great Place to Work®
  • Be a part of a welcoming culture with bright, friendly people from around the world
  • Have a purpose and plenty of opportunities to grow from day one
  • Work from home or from our offices in the center of Malaga
  • Flexible working hours and unlimited remote days
  • Competitive full-time salary (*the amount will be determined based on experience)
  • Get an extra day off on your birthday, Christmas Eve, and New Years Eve
  • Earn bonuses for being a good colleague or in our Growth Program
  • Enjoy free catered lunches and unlimited ice cream when working from the office
  • Private health insurance covered by Plytix
  • Customizable perks package* including entertainment, a gym membership, language lessons and nursery and restaurant vouchers.
About our culture

At Plytix, we don't have boring mission statements and corporate values that no one reads. 

We operate from a simple guiding principle that's easy to remember:


"Don't be a jerk, don't hold back, and don't forget to have fun."


At Plytix, we offer equal opportunities and welcome applications from all sectors of society. We do not discriminate on the basis of race, religion or belief, ethnic or national origin, disability, age, citizenship, marital status, sexual orientation, or gender identity.

About us
We’re a next-generation SaaS scale-up that builds PIM software for the retail industry. Our name, Plytix, is short for Product Analytics—we started as an ecommerce analytics tool in 2015.

Since then, we’ve grown to be one of the leading Product Information Management (PIM) tools. It’s the only PIM system specially made (and priced!) for small to medium-sized ecommerce. It’s a single source of truth that helps teams manage and syndicate product information at scale, allowing you to get your products to market faster and smarter—regardless of the channel.

As far as the brains behind the software go, we’re a tight-knit team of passionate, data-driven individuals based here, there, and everywhere in the world! We’re also recognized for our outstanding customer care and employee culture, making us a Great Place To Work in 2021 & 2022.


Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Verbal Communication Skills
  • Team Effectiveness
  • Organizational Skills

Site Reliability Engineer (SRE) Related jobs