Match score not available

Site Reliability Engineer (SRE)

extra holidays - extra parental leave
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Experience in cloud environments and automation, Strong programming skills in .NET, Python, Node.js, Hands-on experience with Kubernetes and AWS, Familiarity with monitoring tools like Datadog.

Key responsabilities:

  • Monitor and manage infrastructure for high availability
  • Automate deployment processes and environment setup
  • Collaborate on CI/CD pipeline improvement
  • Manage incidents and troubleshoot issues
  • Optimize cloud infrastructure within AWS
  • Plan for capacity and scalability
Laivly logo
Laivly Scaleup https://laivly.com/
51 - 200 Employees
See more Laivly offers

Job description

About Laivly


Seeking curious and creative types! We are an ambitious company of innovators building and shaping the future of customer service technology. Our solutions help the world’s biggest brands leverage artificial intelligence, machine learning, and digital automation in their contact centers to deliver better customer experiences. Led by a team of established contact center experts, Laivly addresses the unique needs and challenges of customer service programs, with an emphasis on ethics in AI and the customer service agent experience.


About the role

We are looking for a Site Reliability Engineer (SRE) who will ensure the reliability, scalability, and performance of our applications and infrastructure. In this role, you will work closely with our development, operations, and product teams to deploy, monitor, and maintain our systems, driving a seamless user experience. This position requires a proactive and solutions-oriented engineer with experience in cloud environments, automation, and a deep understanding of application development and deployment.


As SRE, you will:


  • Monitor and Manage Infrastructure: Ensure high availability and resilience of infrastructure and applications, performing proactive health monitoring and incident resolution using tools like Datadog
  • Automate Processes: Develop and maintain automation scripts for environment setup, deployment, monitoring, and scaling using Python, Node.js, or similar scripting languages
  • Collaborate on DevOps Practices: Partner with DevOps and development teams to improve CI/CD pipelines, identify bottlenecks, and enhance the efficiency of deployments
  • Manage Incidents and Troubleshooting: Act as a first responder for incidents, troubleshoot issues across the application stack, and perform root cause analysis
  • Manage Cloud Infrastructure: Design, manage, and optimize infrastructure within AWS, leveraging services such as EC2, RDS, and S3 to ensure scalability and cost-efficiency
  • Plan Capacity and Scalability: Monitor application capacity and performance, forecast usage patterns, and develop strategies for scaling to meet demand


As SRE, you have:


  • Experience with Kubernetes: Hands-on experience deploying and managing applications on Kubernetes, including troubleshooting and optimizing resource allocation
  • Programming Proficiency: Strong skills in at least two of the following: .NET, Python, Node.js, with an ability to write and maintain high-quality code
  • Experience with AWS: Knowledge of AWS services, infrastructure management, and deployment best practices
  • Experience Monitoring and Observability: Experience with Datadog or similar monitoring/observability tools to measure, monitor, and alert on infrastructure and application performance
  • Automation and Scripting: Strong scripting skills (Python, Bash, Node.js) for automation, deployment, and system configuration.
  • CI/CD Pipeline Management: Knowledge of CI/CD principles and experience with tools like Jenkins, GitLab CI, or similar
  • Ability to Problem Solve and Troubleshoot: Proven experience in troubleshooting complex system issues, with a solutions-oriented and proactive approach
  • Demonstrated Ability in Database Management: Familiarity with databases (e.g., PostgreSQL, MySQL, MongoDB) for maintenance and query optimization
  • Strong Networking and Security: Understanding of network protocols, firewall configurations, and security best practices
  • Knowledge of IaC: Experience with Infrastructure-as-Code tools such as Terraform or CloudFormation


This role is remote and open to applicants within Canada. 


Life at Laivly:

Laivly gives you the opportunity to collaborate and grow your career with a creative, diverse, and passionate team. We work hard and play often, with a flexible environment that works with you. A career at Laivly means being part of a fun-loving, dedicated team of creatives, risk takers and game changers. It’s about sharing your talent and imagination to develop innovative tech that’s revolutionizing the way top brands interact with the world. 


We’ve got a shared mission—and a Laivly future. Join us today!


Laivly provides Equal Employment Opportunities in accordance with all provincial and federal laws. Laivly is committed to ensuring equality of opportunity in all aspects of employment and does not discriminate based on protected characteristics.

Laivly is committed to accommodating persons with disabilities. If you need accommodation at any stage of the application process or want more information on our accommodation policies, please let us know.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Troubleshooting (Problem Solving)

Site Reliability Engineer (SRE) Related jobs