Match score not available

Network and Systems Operations Engineer (Family Networking)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

5+ years of experience in coding with Java, Python, Shell, or Ruby., 5+ years managing large-scale distributed systems in AWS., Deep expertise with observability tools like Prometheus and Datadog., Strong analytical and troubleshooting skills with attention to detail..

Key responsabilities:

  • Monitor environments using Prometheus, Grafana, and Datadog, and manage incident alerts.
  • Execute runbooks to resolve service issues and contribute to post-mortem analysis.
  • Collaborate with cross-functional teams to enhance operational processes and documentation.
  • Support Infrastructure as Code and manage Docker and Kubernetes systems.

Coherent Solutions logo
Coherent Solutions
1001 - 5000 Employees
See all jobs

Job description

Company Background

Our client is a publicly traded technology company focused on family safety and connectivity, serving millions of users across 140 countries. Their platform provides real-time location sharing, crash detection, roadside assistance, and other safety features. The company operates in a Remote First environment, fostering inclusivity, innovation, and collaboration.

Project Description

The Network and Systems Operations (NSO) Team is part of Cloud Operations, supporting over 325 engineers. The team's mission is twofold:

  • Providing world-class observability infrastructure and tooling for system monitoring and reporting;
  • L1 service support and incident management, ensuring high availability and reliability of services.

The role involves monitoring, responding to alerts, and executing runbooks to resolve service issues. The system comprises dozens of microservices, all requiring tracking, reporting, and optimization. The position requires strong troubleshooting skills, familiarity with observability tools, and a proactive approach to automation.

Technologies
  • Prometheus
  • Grafana
  • Datadog
  • Java
  • Python
  • Shell
  • Ruby
  • Docker
  • Kubernetes
  • AWS
  • Terraform
  • CloudFormation
  • Chef
  • Ansible
What You'll Do
  • Use Prometheus, Grafana, and Datadog to monitor environments, triage alerts, and track resolution progress;
  • Respond to alerts in PagerDuty, manage incidents, and contribute to post-mortem analysis for system improvement;
  • Serve as part of a "follow the sun" L1 support rotation, handling system alerts, executing runbooks, and escalating issues as needed;
  • Work with cross-functional teams to improve processes, documentation, and tooling for operational excellence;
  • Troubleshoot large-scale distributed systems running on Linux-based environments in AWS;
  • Assist in managing Docker, Kubernetes, and cloud monitoring/logging systems;
  • Support Infrastructure as Code (IaC) and configuration management tools such as Terraform, CloudFormation, Chef, and Ansible;
Job Requirements
  • 5+ years of experience writing, reading, and debugging code in languages such as Java, Python, Shell, or Ruby;
  • 5+ years of experience managing large-scale distributed systems and Linux-based systems in cloud environments such as AWS;
  • Deep expertise with large-scale observability systems like Prometheus, Datadog, or similar;
  • 3+ years of experience with solutions like Docker, Kubernetes, and system virtualization;
  • Proficiency in Infrastructure as Code (IaC) and configuration management tools like Terraform, CloudFormation, Chef, or Ansible;
  • Strong analytical, troubleshooting, and problem-solving skills;
  • Ability to quickly learn new technologies and adapt to industry trends;
  • Attention to detail and ability to optimize high-traffic applications;
  • English proficiency from B1+ for effective communication;
What Do We Offer

The global benefits package includes:

  • Technical and non-technical training for professional and personal growth;
  • Internal conferences and meetups to learn from industry experts;
  • Support and mentorship from an experienced employee to help you professional grow and development;
  • Internal startup incubator;
  • Health insurance;
  • English courses;
  • Sports activities to promote a healthy lifestyle;
  • Flexible work options, including remote and hybrid opportunities;
  • Referral program for bringing in new talent;
  • Work anniversary program and additional vacation days.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)
  • Adaptability
  • Detail Oriented
  • Problem Solving
  • Analytical Skills

Network Systems Engineer Related jobs