Offer summary

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience)., 5+ years of experience in a Site Reliability Engineering role., Proficiency in scripting and automation using languages such as Python, Bash, or Go., Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform..

Key responsabilities:

Collaborate with software development teams to design and implement scalable, reliable, and efficient systems.

Drive automation initiatives to streamline deployment, configuration, and monitoring processes.

Develop and implement incident management processes to minimize downtime and ensure rapid resolution of issues.

Provide technical guidance and mentorship to peers and less experienced team members.

Job description

About Us

We're leaders in technology, managing over 120K production databases and delivering 5+ SaaS products. Handling an average of 10K req/min, we're committed to reliability and scalability. Join us in driving technological advancement and making an impact worldwide.

Job Summary

As a Senior Site Reliability Engineer, you will oversee the design, development, and operation of our accounting SaaS solution, ensuring its scalability, reliability, and performance. You will foster a DevOps culture within the organization through strong technical contributions, mentorship, and collaboration with key stakeholders.

Responsibilities

System Design and Architecture: Collaborate with software development teams to design and implement scalable, reliable, and efficient systems.
Infrastructure Automation: Drive automation initiatives to streamline deployment, configuration, and monitoring processes.
Performance Optimization: Identify and resolve performance bottlenecks to ensure optimal system performance and reliability.
Incident Management: Develop and implement incident management processes to minimize downtime and ensure rapid resolution of issues.
Monitoring and Alerting: Define and implement monitoring and alerting strategies to proactively detect and respond to system issues.
Capacity Planning: Perform capacity planning to ensure that systems can handle current and future load requirements.
Security: Collaborate with security teams to ensure that systems are designed and maintained with security best practices.
Documentation: Create and maintain documentation for systems, processes, and procedures.
Collaboration: Work closely with cross-functional teams, including software development, operations, and QA, to ensure the successful delivery of projects.
Mentorship: Provide technical guidance and mentorship to peers and less experienced team members to drive continuous improvement and operational excellence.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
5+ years of experience in a Site Reliability Engineering role.
Proven experience in mentoring and guiding technical initiatives.
Strong understanding of software engineering principles and practices.
Proficiency in scripting and automation using languages such as Python, Bash, or Go.
Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Deep understanding of system monitoring, logging, and alerting tools.
Strong troubleshooting and problem-solving skills.
Excellent communication and collaboration skills.
Experience with containerization technologies such as Docker and orchestration tools like Kubernetes.