Senior Site Reliability Engineer

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

At least 5 years of experience in Site Reliability Engineering, DevOps, or related fields., Strong expertise with APIs, messaging systems like Kafka and RabbitMQ, and integration frameworks., Proficiency in cloud platforms such as AWS, Azure, or GCP, and container orchestration with Kubernetes., Hands-on experience with observability tools like Datadog, New Relic, Prometheus, or Grafana..

Key responsibilities:

  • Ensure the reliability, availability, and performance of APIs, data pipelines, and services.
  • Collaborate with engineering and product teams to establish best practices and automate processes.
  • Design and implement monitoring, alerting, and incident response procedures.
  • Participate in capacity planning, performance testing, and on-call rotations.

Nexaminds logo
Nexaminds

Job description

Unlock Your Future with Nexaminds!

At Nexaminds, we're on a mission to redefine industries with AI. We're passionate about the limitless potential of artificial intelligence to transform businesses, streamline processes, and drive growth.

Join us on our visionary journey. We're leading the way in AI solutions, and we're committed to innovation, collaboration, and ethical practices. Become a part of our team and shape the future powered by intelligent machines. If you're driven by ambition, success, fun, and learning, Nexaminds is where you belong.

Nexaminds is actively seeking a Senior Site Reliability Engineer. As a Site Reliability Engineer focused on our integration layer, you’ll be responsible for the reliability, availability, and performance of the APIs, data pipelines, and services that power communication between key systems. You’ll work closely with engineering, architecture, and product teams to establish best practices, automate processes, and proactively manage stability at scale.

Qualifications we are looking for:

  • 5+ years of experience in Site Reliability Engineering, DevOps, or related fields
  • Strong experience with APIs, messaging systems (e.g., Kafka, RabbitMQ), and integration frameworks
  • Proficiency in cloud platforms (AWS, Azure, or GCP), container orchestration (Kubernetes), and infrastructure-as-code (Terraform, CloudFormation)
  • Hands-on experience with observability tools (e.g., Datadog, New Relic, Prometheus, Grafana)
  • Strong scripting or programming skills (Python, Go, or similar)
  • Understanding of CI/CD pipelines and automated deployment best practices
  • Excellent problem-solving skills and a proactive approach to system stability
  • Strong collaboration and communication skills; ability to partner across multiple teams

Nice to have:

  • Experience managing large-scale data pipelines and streaming platforms
  • Familiarity with API gateways and security best practices for integrations
  • Prior experience in fintech, logistics, or other industries with complex integration ecosystems

Job Duties:

  • Own reliability and performance for our integration platform, including APIs, event streams, and third-party connections
  • Design and implement robust monitoring, alerting, and incident response processes for the integration layer
  • Collaborate with developers and architects to influence design decisions for reliability, scalability, and observability
  • Build automation for deployment, scaling, failover, and remediation
  • Lead root cause analysis and post-mortem processes for integration-related incidents
  • Identify and mitigate risks across connectivity points with third-party and internal systems
  • Establish SLOs and SLAs for platform integration services and drive continuous improvement
  • Assist in capacity planning and performance testing for integration components
  • Participate in an on-call rotation, contributing to a healthy reliability culture
  • Excellent written and verbal communication skills in English

What you can expect from us

Here at Nexaminds, we're not your typical workplace. We're all about creating a friendly and trusting environment where you can thrive. Why does this matter? Well, trust and openness lead to better quality, innovation, commitment to getting the job done, efficiency, and cost-effectiveness.

  • Stock options 📈
  • Remote work options 🏠
  • Flexible working hours 🕜
  • Benefits above the law
  • But it's not just about the work; it's about the people too. You'll be collaborating with some seriously awesome IT pros.
  • You'll have access to mentorship and tons of opportunities to learn and level up.

Ready to embark on this journey with us? 🚀🎉 If you're feeling the excitement, go ahead and apply!

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs