Match score not available

Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree in computer science, information technology, or a related field, or equivalent work experience., Proven experience in software development and/or system administration., Strong scripting and coding skills in languages like Python, Go, or Shell., Familiarity with cloud platforms and containerization technologies..

Key responsabilities:

  • Ensure the reliability and availability of production systems by monitoring and responding to incidents.
  • Develop and maintain automation tools for system monitoring and incident response.
  • Collaborate with development teams for capacity planning and performance improvements.
  • Maintain documentation for operational processes and best practices.

ItsaCheckmate logo
ItsaCheckmate Scaleup https://itsacheckmate.com/
201 - 500 Employees
See all jobs

Job description

● Ensure the reliability and availability of production systems and services by monitoring, troubleshooting, and responding to incidents.

● Develop and maintain tools and automation for system monitoring, alerting, and incident response to minimize manual intervention.

● Collaborate with development teams to plan for capacity scaling and performance improvements based on usage patterns and growth forecasts.

● Collaborate with development and product teams to ensure that new features and services are designed with reliability in mind.

● Maintain documentation for operational processes, system configurations, and best practices.

Requirements

● Bachelor's degree in computer science, information technology, or a related field (or equivalent work experience).

● Proven experience in software development and/or system administration.

● Strong scripting and coding skills (e.g., Python, Go, Shell) for automation and tool development.

● Familiarity with containerization and orchestration technologies like Docker and Kubernetes.

● Experience with cloud platforms (e.g., AWS, Azure, GCP) and infrastructure as code tools (e.g., Terraform).

● Proficiency in monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

● Knowledge of network, security, and database concepts.

● Strong problem-solving skills and the ability to work well under pressure.

● Understanding of agile and DevOps methodologies.

● Excellent communication and collaboration skills.

● Availability to work during US hours till 3 pm ET is essential for this role.

● Candidates must have their own system/work setup for remote work.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs