Logo for BilgeAdam Technologies

Datadog Platform Expert

Roles & Responsibilities

  • 4+ years of hands-on Datadog platform experience in production with deep expertise across Infrastructure Monitoring, APM, Log Management, Synthetics, Network Monitoring, and Real User Monitoring (RUM).
  • Proven experience in Datadog cost optimization, including data ingestion reduction, license right-sizing, and metric cardinality management, with ability to forecast costs based on usage patterns.
  • Expert-level knowledge of Datadog Agent deployment, configuration, and troubleshooting across bare-metal, VM, and container environments (Docker, Kubernetes); strong tagging strategy, service catalogue, and custom metrics (DogStatsD, custom checks).
  • Strong AWS experience (minimum 3 years) including EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, and VPC networking; familiarity with Infrastructure-as-Code (Terraform, CloudFormation) for managing Datadog resources programmatically; understanding of Kubernetes monitoring patterns.

Requirements:

  • Audit and optimize the Datadog platform across core product suites to improve observability efficiency.
  • Re-engineer data flows within the Datadog ecosystem to reduce ingestion, refine metrics, and optimize tagging and dashboards.
  • Design, implement, and manage automated Datadog resources via API, including monitors, dashboards, and SLOs, and develop automated remediation workflows triggered by alerts.
  • Collaborate with cloud, SRE, and FinOps teams to forecast costs, communicate data-driven insights, and deliver actionable dashboards for engineering leadership.

Job description

This is a remote position.


Datadog Platform Expert

We are seeking a high-level Datadog Expert to audit and optimize our leading client’s primary observability platform. This is not a "user" role; we need an expert capable of re-engineering data flows for maximum efficiency.


Datadog Platform Expertise (Must Have)
  • Minimum 4+ years of hands-on experience with the Datadog platform in production environments.
  • Deep expertise across Datadog’s core product suite: Infrastructure Monitoring, APM (Application Performance Monitoring), Log Management, Synthetics, Network Monitoring, and Real User Monitoring (RUM).
  • Proven experience in Datadog cost optimisation, including data ingestion reduction, licence right-sizing, and metric cardinality management.
  • Expert-level knowledge of Datadog Agent deployment, configuration, and troubleshooting across bare-metal, VM, and containerised environments (Docker, Kubernetes).
  • Strong experience with Datadog’s tagging strategy, service catalogue, and custom metrics (DogStatsD, custom checks).
  • Experience with Datadog API and programmatic management of monitors, dashboards, and SLOs.
  • Familiarity with Datadog’s pricing model and ability to forecast and optimise costs based on usage patterns.

Cloud Infrastructure (Must Have)

  • Strong AWS experience (minimum 3+ years), including EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch, and VPC networking.
  • Experience monitoring AWS cost drivers and correlating infrastructure changes with observability cost impact.
  • Familiarity with Infrastructure-as-Code (Terraform, CloudFormation) for managing Datadog resources programmatically.
  • Understanding of Kubernetes monitoring patterns: DaemonSets, sidecar injection, cluster-level metrics, and container log collection.

Service Management and Automation (Must Have)

  • Experience integrating Datadog with Jira Service Management, including webhook-based alert forwarding and bidirectional status sync.
  • Knowledge of incident management workflows: escalation policies, runbook automation, and post-incident review processes.
  • Experience with PagerDuty, OpsGenie, or similar on-call management tools and their integration with Datadog.
  • Ability to design and implement automated remediation workflows triggered by Datadog alerts.

Data Quality and Analytics (Must Have)

  • Experience auditing and improving data quality in observability pipelines (metrics, logs, traces).
  • Strong analytical skills with the ability to identify patterns, anomalies, and data integrity issues in large-scale telemetry data.
  • Experience designing custom dashboards and reports for engineering leadership, focusing on actionable insights.

Preferred and Bonus Skills

  • Datadog Fundamentals Certification, Log Management Certification, or APM Certification (highly preferred).
  • Datadog Cloud SIEM for AWS Fundamentals certification.
  • Experience with FinOps frameworks and cloud cost management tools (AWS Cost Explorer, Trusted Advisor, CloudHealth, Kubecost).
  • Experience in financial services or banking environments, particularly with regulatory compliance for data handling and retention.
  • Familiarity with Thought Machine (core banking platform) or similar modern banking technology stacks.
  • Experience with AI/ML-driven observability features: anomaly detection, forecasting, Watchdog, and intelligent alerting.
  • Contributions to or experience with Datadog’s open-source ecosystem (datadog-agent, dd-trace libraries, integrations).
  • Experience with log parsing, pipeline processing, and log-to-metric conversion strategies in Datadog.


Related jobs

Other jobs at BilgeAdam Technologies

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.