Match score not available

Site Reliability Engineer (SRE)

extra holidays - extra parental leave

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Canada

Offer summary

Qualifications:

Experience in cloud environments and automation, Strong programming skills in .NET, Python, Node.js, Hands-on experience with Kubernetes and AWS, Familiarity with monitoring tools like Datadog.

Key responsabilities:

Monitor and manage infrastructure for high availability
Automate deployment processes and environment setup
Collaborate on CI/CD pipeline improvement
Manage incidents and troubleshoot issues
Optimize cloud infrastructure within AWS
Plan for capacity and scalability

Laivly Scaleup https://laivly.com/

51 - 200 Employees

See all jobs

Job description

About Laivly

Seeking curious and creative types! We are an ambitious company of innovators building and shaping the future of customer service technology. Our solutions help the world’s biggest brands leverage artificial intelligence, machine learning, and digital automation in their contact centers to deliver better customer experiences. Led by a team of established contact center experts, Laivly addresses the unique needs and challenges of customer service programs, with an emphasis on ethics in AI and the customer service agent experience.

About the role

We are looking for a Site Reliability Engineer (SRE) who will ensure the reliability, scalability, and performance of our applications and infrastructure. In this role, you will work closely with our development, operations, and product teams to deploy, monitor, and maintain our systems, driving a seamless user experience. This position requires a proactive and solutions-oriented engineer with experience in cloud environments, automation, and a deep understanding of application development and deployment.

As SRE, you will:

Monitor and Manage Infrastructure: Ensure high availability and resilience of infrastructure and applications, performing proactive health monitoring and incident resolution using tools like Datadog
Automate Processes: Develop and maintain automation scripts for environment setup, deployment, monitoring, and scaling using Python, Node.js, or similar scripting languages
Collaborate on DevOps Practices: Partner with DevOps and development teams to improve CI/CD pipelines, identify bottlenecks, and enhance the efficiency of deployments
Manage Incidents and Troubleshooting: Act as a first responder for incidents, troubleshoot issues across the application stack, and perform root cause analysis
Manage Cloud Infrastructure: Design, manage, and optimize infrastructure within AWS, leveraging services such as EC2, RDS, and S3 to ensure scalability and cost-efficiency
Plan Capacity and Scalability: Monitor application capacity and performance, forecast usage patterns, and develop strategies for scaling to meet demand

As SRE, you have:

Experience with Kubernetes: Hands-on experience deploying and managing applications on Kubernetes, including troubleshooting and optimizing resource allocation
Programming Proficiency: Strong skills in at least two of the following: .NET, Python, Node.js, with an ability to write and maintain high-quality code
Experience with AWS: Knowledge of AWS services, infrastructure management, and deployment best practices
Experience Monitoring and Observability: Experience with Datadog or similar monitoring/observability tools to measure, monitor, and alert on infrastructure and application performance
Automation and Scripting: Strong scripting skills (Python, Bash, Node.js) for automation, deployment, and system configuration.
CI/CD Pipeline Management: Knowledge of CI/CD principles and experience with tools like Jenkins, GitLab CI, or similar
Ability to Problem Solve and Troubleshoot: Proven experience in troubleshooting complex system issues, with a solutions-oriented and proactive approach

Demonstrated Ability in Database Management: Familiarity with databases (e.g., PostgreSQL, MySQL, MongoDB) for maintenance and query optimization
Strong Networking and Security: Understanding of network protocols, firewall configurations, and security best practices
Knowledge of IaC: Experience with Infrastructure-as-Code tools such as Terraform or CloudFormation

This role is remote and open to applicants within Canada.

Life at Laivly:

Laivly gives you the opportunity to collaborate and grow your career with a creative, diverse, and passionate team. We work hard and play often, with a flexible environment that works with you. A career at Laivly means being part of a fun-loving, dedicated team of creatives, risk takers and game changers. It’s about sharing your talent and imagination to develop innovative tech that’s revolutionizing the way top brands interact with the world.

We’ve got a shared mission—and a Laivly future. Join us today!

Laivly provides Equal Employment Opportunities in accordance with all provincial and federal laws. Laivly is committed to ensuring equality of opportunity in all aspects of employment and does not discriminate based on protected characteristics.

Laivly is committed to accommodating persons with disabilities. If you need accommodation at any stage of the application process or want more information on our accommodation policies, please let us know.