4+ years of hands-on Datadog platform experience in production with deep expertise across Infrastructure Monitoring, APM, Log Management, Synthetics, Network Monitoring, and Real User Monitoring (RUM).
Proven experience in Datadog cost optimization, including data ingestion reduction, license right-sizing, and metric cardinality management, with ability to forecast costs based on usage patterns.
Expert-level knowledge of Datadog Agent deployment, configuration, and troubleshooting across bare-metal, VM, and container environments (Docker, Kubernetes); strong tagging strategy, service catalogue, and custom metrics (DogStatsD, custom checks).
Strong AWS experience (minimum 3 years) including EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, and VPC networking; familiarity with Infrastructure-as-Code (Terraform, CloudFormation) for managing Datadog resources programmatically; understanding of Kubernetes monitoring patterns.
Requirements:
Audit and optimize the Datadog platform across core product suites to improve observability efficiency.
Re-engineer data flows within the Datadog ecosystem to reduce ingestion, refine metrics, and optimize tagging and dashboards.
Design, implement, and manage automated Datadog resources via API, including monitors, dashboards, and SLOs, and develop automated remediation workflows triggered by alerts.
Collaborate with cloud, SRE, and FinOps teams to forecast costs, communicate data-driven insights, and deliver actionable dashboards for engineering leadership.
Job description
This is a remote position.
Datadog Platform Expert
We are seeking a high-level Datadog Expert to audit and optimize our leading client’s primary observability platform. This is not a "user" role; we need an expert capable of re-engineering data flows for maximum efficiency.
Datadog Platform Expertise (Must Have)
Minimum 4+ years of hands-on experience with the Datadog platform in production environments.
Deep expertise across Datadog’s core product suite: Infrastructure Monitoring, APM (Application Performance Monitoring), Log Management, Synthetics, Network Monitoring, and Real User Monitoring (RUM).
Proven experience in Datadog cost optimisation, including data ingestion reduction, licence right-sizing, and metric cardinality management.
Expert-level knowledge of Datadog Agent deployment, configuration, and troubleshooting across bare-metal, VM, and containerised environments (Docker, Kubernetes).
Strong experience with Datadog’s tagging strategy, service catalogue, and custom metrics (DogStatsD, custom checks).
Experience with Datadog API and programmatic management of monitors, dashboards, and SLOs.
Familiarity with Datadog’s pricing model and ability to forecast and optimise costs based on usage patterns.
Cloud Infrastructure (Must Have)
Strong AWS experience (minimum 3+ years), including EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, and VPC networking.
Experience monitoring AWS cost drivers and correlating infrastructure changes with observability cost impact.
Familiarity with Infrastructure-as-Code (Terraform, CloudFormation) for managing Datadog resources programmatically.
Understanding of Kubernetes monitoring patterns: DaemonSets, sidecar injection, cluster-level metrics, and container log collection.
Service Management and Automation (Must Have)
Experience integrating Datadog with Jira Service Management, including webhook-based alert forwarding and bidirectional status sync.
Knowledge of incident management workflows: escalation policies, runbook automation, and post-incident review processes.
Experience with PagerDuty, OpsGenie, or similar on-call management tools and their integration with Datadog.
Ability to design and implement automated remediation workflows triggered by Datadog alerts.
Data Quality and Analytics (Must Have)
Experience auditing and improving data quality in observability pipelines (metrics, logs, traces).
Strong analytical skills with the ability to identify patterns, anomalies, and data integrity issues in large-scale telemetry data.
Experience designing custom dashboards and reports for engineering leadership, focusing on actionable insights.