Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Computer Science or a related field preferred., At least 3 years of hands-on experience in Site Reliability Engineering and cloud technologies, particularly on Azure., Extensive knowledge in Linux and experience with container technologies like Docker and Kubernetes., Proficiency in programming languages such as Bash, Go, Python, Ruby, TypeScript, and Rust..

Key responsibilities:

  • Ensure services can handle complex, high-throughput demands and maintain service availability.
  • Create and sustain comprehensive monitoring systems for infrastructure and implement alerting strategies.
  • Troubleshoot production issues and keep stakeholders informed of resolutions.
  • Collaborate with the delivery team to define Error Budgets, Service Level Objectives (SLOs), and participate in the development lifecycle.

AudienceView logo
AudienceView
201 - 500 Employees
See all jobs

Job description

The Company:

AudienceView is an organization of people who are passionate about the business of Live Events. We create industry-leading software solutions that fuel attendee engagement, ticket sales and advertising solutions for thousands of sports, music and theatre venues in 16 countries around the world.

AudienceView employees share a vision to help entertainment organizations deliver exceptional experiences for people who love live events. We achieve this through innovative technology, popular media brands, effective distribution strategies and a dedicated team of experts that help create customer success every single day.


AudienceView is a legal employer in Chile and hires employees directly in Chile. AudienceView is compliant with the Chilean payroll, benefit and vacation laws and policies. This position is open to candidates who are currently residing in Chile with a legal work status. 


Join Our Team as a Site Reliability Engineer

Imagine being at the helm of a production environment, where your mission is to ensure that everything runs smoothly, efficiently, and reliably. As a Site Reliability Engineer, your role is crucial in maintaining the health of our entire system. You will monitor service availability with a holistic perspective, ensuring that our infrastructure is always performing at its best.

In this role, you will devise innovative procedures and automated actions to manage both applications and underlying infrastructure. Your goal is to enhance the availability and accelerate the time to market of our cutting-edge software solutions. By measuring and optimizing system performance, you will help us meet and exceed our Service Level Agreements (SLAs) while continuously promoting improvement.


What you'll do:

  • Maintain and Scale Services: You'll be responsible for ensuring our services can handle complex, high-throughput demands.
  • Build and Monitor Infrastructure: You'll create and sustain comprehensive monitoring systems for our infrastructure, implementing alerting and mitigation strategies using cloud-native solutions.
  • Implement CI/CD Pipelines: Your expertise will be key in establishing and maintaining continuous integration and continuous deployment pipelines.
  • Troubleshoot Production Issues: As the go-to person for production issues, you'll troubleshoot, escalate when necessary, and monitor resolutions, keeping the team and stakeholders informed.
  • Collaborate on Error Budgets and SLOs: Work closely with the delivery team to define Error Budgets, Service Level Objectives (SLOs), and Service Level Indicators (SLIs).
  • Participate in Development Lifecycle: From ideation to operation, you'll be involved in all aspects of the development lifecycle.
  • Stay Updated on Technology Trends: Keep abreast of emerging trends in cloud technology, DevOps, service reliability, and security.
  • Enhance Scalability and Performance: Improve our systems' scalability, reliability, capacity, and performance.
  • Automate Infrastructure Operations: Write code to automate provisioning and operating infrastructure at scale.
  • Support Engineering Best Practices: Build internal tools to promote software engineering best practices.
  • Reduce Toil: Develop tools and processes to minimize manual, repetitive work.
  • Engineer for Operations: While you're not just an operator, your engineering skills will focus on ensuring operational excellence.
  • Ensure Infrastructure Compatibility: Work with development teams to ensure applications are scalable, reliable, and secure from the start.
  • Participate in On-Call Rotation: Share responsibility for uptime and support, ready to troubleshoot and resolve incidents, and find root causes.


What you’ll need: 

  • Education: A Bachelor's degree in Computer Science or a related field is preferred.
  • Experience: At least 3 years of hands-on experience in Site Reliability Engineering and cloud technologies, particularly on Azure.
  • Linux Proficiency: Extensive knowledge and skills in Linux.
  • Cloud Native Infrastructure: Experience in building and operating cloud-native infrastructure, applications, and services.
  • Container Technologies: 2+ years of experience with Docker, Kubernetes, and other container technologies.
  • Observability Tools: Proficiency in creating full observability stacks using tools like ELK, Grafana, Prometheus, SumoLogic, Jaeger, and Zipkin.
  • Web Architecture: Understanding of modern web architecture and technology stacks.
  • Programming Languages: 3+ years of experience with languages such as Bash, Go, Python, Ruby, TypeScript, and Rust.
  • Deployment Automation: Experience with tools like Ansible, Packer, and Terraform for deployment automation.
  • Version Control Systems: Proficient with GitHub, Bitbucket, and similar systems.
  • Artifact Repositories: Experience with repositories like Nexus and Artifactory.
  • Azure DevOps: Familiarity with working in the Azure Cloud and Azure DevOps.
  • Database Support: Experience supporting various databases, including MySQL, SQL, Redis, and various NoSQL engines.
  • System Design: Strong experience in system design.
  • Problem Solving: A keen eye for detail and a problem-solving mindset.
  • Communication Skills: Excellent written and verbal communication skills in English.


It'd be nice if you have: 

  • Advanced Experience: 5+ years of experience in Site Reliability Engineering and 2+ years in DevOps.
  • Full Stack Development: 1+ year of experience as a Full Stack Web Developer.
  • Payment Systems: Experience supporting payment systems.



Join us and play a pivotal role in shaping the reliability and performance of our software solutions. Your expertise will be at the core of our mission to deliver exceptional technology with unparalleled reliability.

 

Why work at AudienceView: 

  • We’re a global leader in live events technology. As the essential partner to get live events discovered, attended, and remembered, we help our clients sell more tickets every single day.
  • We’re passionate about live entertainment. AudienceView believes in the power of live events and its purpose is to ignite that passion in people around the world.
  • We have amazing clients. Our exciting roster of clients includes sports, live music, and performing arts organizations
  • Our employees love us. We offer excellent benefits, competitive salaries, flexible hours, remote work opportunities, and more!
  • We're a remote-first company. Our remote culture allows our employees to have the flexibility to work anywhere in the country they are residing (Canada, the USA, UK, and Chile).
  • Diversity and inclusion are paramount to building our culture. The data is abundantly clear that diverse teams are more successful because they offer different perspectives, increased innovation, faster problem-solving, and higher employee engagement among other benefits.
  • Flexible work schedule. AudienceView empowers permanent employees to take off alternating Fridays by condensing their two-week schedule into 9 days. Flexible, uncapped vacation and sick policy. Employees need time away from work to recharge.

 

How we hire: here's a brief overview of our recruitment process:

  • Screening & Resume Review: Our Talent Acquisition team will evaluate all resumes and select candidates for a video screening call for 30 minutes.
  • Hiring Manager Interview (60 minutes): Meet with the hiring manager to speak in detail about your relevant experience and learn more about the role.
  • Technical Interview (60 minutes): Show off your technical and problem-solving skills by solving production challenges that the team faces with 1-2 Engineering team members.

 

Diversity and inclusion have always been at the core of our values at AudienceView. A diverse workforce with wide perspectives and creative ideas benefits our clients, the communities where we operate, and all of us as colleagues. We welcome applications from qualified individuals from all backgrounds. We also welcome and encourage applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the recruitment process.

 

Disclaimer: 

AudienceView does not offer employment to prospects without first ensuring that qualified candidates speak directly with the hiring manager and a member of our HR team. All qualifications will be done face-to-face over Microsoft Teams. AudienceView does not send out offers of employment without meeting candidates and does not offer employment via text. If you are requested for any personal information via text and/or without having met a member of our hiring team in person, please disregard.





Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs