Match score not available

Senior SRE (Site Reliability Engineer)

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Vietnam

Offer summary

Qualifications:

5 years experience in software engineering, Bachelor's or Master's in related field, Expertise in AWS and Kubernetes, Familiarity with CI/CD tools and GitOps, Proficiency in performance monitoring.

Key responsabilities:

Optimize API reliability and performance
Automate deployment and management with Kubernetes and AWS
Implement comprehensive monitoring solutions
Collaborate across teams for design and operational integration
Integrate security best practices in backend architectures

Findicia Startup https://www.careers-page.com/

11 - 50 Employees

See more Findicia offers

Job description

Your missions

Overview

Our Client is a trailblazing tech startup, specializing in providing advanced, privacy-centric tools designed to protect and empower kids and teens online. We have secured pivotal partnerships with key industry leaders in the gaming industry and are venture-backed by leading global VCs. We are poised for unprecedented growth and expansion in the coming year.

We are seeking a Senior SRE, you will be instrumental in enhancing the reliability and scalability of our Global Compliance Engine. This role combines software engineering with systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

Key Responsibilities:

API Reliability and Performance Optimization: Be a key contributor to the design, implementation, testing, and documentation of our public APIs. Develop, scale, and maintain the infrastructure necessary to deliver seamless service to tens of millions of worldwide players.
Systems Automation and Orchestration: Utilize Kubernetes and AWS to automate deployment, scaling, and management of containerized applications. Enhance our CI/CD pipeline integrating GitOps for streamlined operations across development, testing, and production environments.
Monitoring and Telemetry: Implement comprehensive monitoring solutions using Prometheus and AlertManager.
Cross-Functional Collaboration: Work closely with development teams to ensure architectural and operational requirements are incorporated during design and development. Promote a culture of excellence in code health and quality.
Security: Champion the integration of security best practices within backend architectures to protect sensitive user data against emerging threats.

Minimum Requirements:

5 years of experience in software engineering with a focus on reliability, performance optimization, and infrastructure management.
Bachelors or masters degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Expertise in Cloud and Systems Engineering: Extensive experience with AWS, Kubernetes, and modern observability stacks (e.g., Prometheus). Familiarity with CI/CD tools, GitOps practices, and infrastructure as code (e.g., Terraform).
Performance Monitoring: Proficiency in setting up and managing telemetry and alerting systems, with a strong understanding of best practices in monitoring distributed systems.

Preferred Requirements:

Willingness to adapt to changing project demands. Experience working in a startup environment is a plus.
Communicate effectively with remote team members, both written and verbally, providing progress updates, flagging potential roadblocks. and fostering positive and productive working relationships.
Keen interest in automating repetitive tasks and finding innovative solutions to complex technical challenges.