Skip to main content

Overview

Datadog is one of the leading platforms for infrastructure monitoring, log management, Application Performance Monitoring (APM), and more. This guide walks you through deploying the Datadog Agent on a Qovery-managed Kubernetes cluster using Helm.

Architecture

The Datadog Helm chart deploys three main components on your cluster:
ComponentTypeRole
Node AgentDaemonSetRuns on every node. Collects metrics, logs, and traces from the node and its pods
Cluster AgentDeploymentCentralized collection of cluster-level metadata (events, leader election, external metrics)
Cluster Check RunnersDeployment (optional)Run cluster checks (e.g., database monitoring) without tying them to a specific node

Prerequisites

Before you begin, this guide assumes the following: You will need the following information from your Datadog account:
VariableWhere to find itRequired
DD_API_KEYOrganization Settings → API Keys https://app.datadoghq.<region>/organization-settings/api-keysYes
DD_APP_KEYOrganization Settings → Application KeysOnly for Database Monitoring
DD_SITEYour Datadog region (e.g., datadoghq.eu, datadoghq.com)Yes
CLUSTER_NAMEA friendly name you choose for your clusterYes
An API key is required, not an Application key. Please ensure you are using the correct key to authenticate.
Datadog - API Key

Installation

In this tutorial, we will install the Datadog agent on a Qovery cluster to gather metrics about infrastructure and applications.
This tutorial is based on a specific version of Datadog. We have created it to assist our users, but Qovery is not responsible for any configuration issues — please contact Datadog support for chart-specific questions.

Step 1: Add the Datadog Helm Repository

1

Add Helm Repository

In Qovery Console:
  1. Go to SettingsHelm Repositories
  2. Click Add Repository
  3. Configure:
    • Repository name: Datadog
    • Kind: HTTPS
    • Repository URL: https://helm.datadoghq.com
See Helm Repository Management for more details.

Step 2: Create the Datadog Helm Service

1

Create Helm Service

In your dedicated environment:
  1. Click CreateHelm Chart
  2. Configure:
    • Application name: Datadog
    • Helm source: Helm repository
    • Repository: Datadog
    • Chart name: datadog
    • Version: 3.87.1 (or latest from Datadog)
    • Allow cluster-wide resources: ✔️
If you prefer not to enable cluster-wide resources, see Step 3 for an alternative approach using values override.
See Helm Charts for more details on creating a Helm service.

Step 3: Configure Helm Chart Settings

The Datadog chart installs Custom Resource Definitions (CRDs) by default (e.g., DatadogMetric, DatadogMonitor). You have two options:

Step 4: Store Secrets and Variables

1

Add the API Key as Secret

  1. Open the Datadog service overview
  2. Go to the Variables section
  3. Add a new variable:
    • Variable: DD_API_KEY
    • Value: <your_API_KEY>
    • Scope: Service
    • Secret variable: ✔️
    Datadog Secret Configuration
2

Add optional variables

Add these additional variables as needed:
VariableValueScopeSecretNotes
DD_APP_KEY<your_APP_KEY>Service✔️Only needed for Database Monitoring
DD_SITEdatadoghq.eu or datadoghq.comServiceYour Datadog region
CLUSTER_NAMEmy-cluster-name or alias of QOVERY_KUBERNETES_CLUSTER_NAMEServiceFriendly name for your cluster
Built-in variables (e.g., QOVERY_KUBERNETES_CLUSTER_NAME) cannot be used directly in Helm value overrides via qovery.env.*. To use a built-in variable value, create an alias variable that references it and use that alias in your override instead.
See Environment Variables for more details on managing variables in Qovery.

Step 5: Configure Values Override

In the Override as file section of your Helm service, add the following minimal configuration:
# Minimal working configuration for Datadog on Qovery
datadog:
  apiKey: qovery.env.DD_API_KEY
  site: qovery.env.DD_SITE
  clusterName: qovery.env.CLUSTER_NAME

  kubelet:
    tlsVerify: false  # Required on most Qovery clusters
How qovery.env.* works: At deploy time, Qovery replaces qovery.env.DD_API_KEY with the actual value of the DD_API_KEY variable. The real value never appears in the Qovery UI — it is only injected at Helm install time. See Environment Variables in Helm Values for more details.
Variable names must match exactly (case-sensitive). If you define DD_API_KEY in Variables, you must use qovery.env.DD_API_KEY in the override — not qovery.env.dd_api_key.

Step 6: Deploy the Chart

1

Deploy

  1. Click the Deploy button
  2. Follow deployment logs
  3. Verify Datadog agent pods are running Deploy Datadog Datadog Pods Running
If the deployment times out, you can add --timeout 15m0s in the Helm Arguments field. Datadog CRDs can take time to apply on the first install.

Step 7: Verify Setup on Datadog

1

Check Datadog Dashboard

  1. Access the Datadog interface
  2. Navigate to InfrastructureContainersKubernetes
  3. Confirm data is coming from your Qovery cluster Datadog Console

Advanced Configuration

Log Collection

Enable log collection to stream container logs to Datadog:
datadog:
  apiKey: qovery.env.DD_API_KEY
  site: qovery.env.DD_SITE
  clusterName: qovery.env.CLUSTER_NAME

  kubelet:
    tlsVerify: false

  logs:
    enabled: true
    containerCollectAll: true
    autoMultiLineDetection:
      enabled: true  # Automatically detects multi-line logs (Java stack traces, etc.)

Unified Service Tagging

Unified Service Tagging ties Datadog telemetry (metrics, logs, traces) together using three standard labels. You can add them to your Qovery services via the Labels & Annotations feature:
LabelValuePurpose
tags.datadoghq.com/envproductionEnvironment name
tags.datadoghq.com/servicemy-apiService name
tags.datadoghq.com/version1.2.3Version
Once these labels are applied, Datadog automatically correlates logs, traces, and metrics for the same service.

Datadog Autodiscovery Annotations

For more granular control over log collection (e.g., setting a source, custom pipeline), add Datadog Autodiscovery annotations to your Qovery services via the Labels & Annotations feature:
ad.datadoghq.com/<container-name>.logs: '[{"source":"java","service":"my-api"}]'
Replace <container-name> with the actual container name of the pod.

Database Monitoring (PostgreSQL)

Datadog can monitor your PostgreSQL databases using Cluster Checks. This requires the DD_APP_KEY variable.
1

Enable Cluster Checks in values

datadog:
  apiKey: qovery.env.DD_API_KEY
  appKey: qovery.env.DD_APP_KEY
  site: qovery.env.DD_SITE
  clusterName: qovery.env.CLUSTER_NAME

  kubelet:
    tlsVerify: false

  clusterChecks:
    enabled: true

clusterAgent:
  enabled: true
  confd:
    postgres.yaml: |-
      cluster_check: true
      init_config:
      instances:
        - dbm: true
          host: <YOUR_PG_HOST>
          port: 5432
          username: datadog
          password: qovery.env.PG_DATADOG_PASSWORD
          tags:
            - "env:production"
            - "service:my-database"

clusterChecksRunner:
  enabled: true
  replicas: 2
2

Create the monitoring user on PostgreSQL

On your PostgreSQL instance, create a dedicated datadog user with the required permissions. See Datadog’s Database Monitoring docs for the full setup instructions.

APM with Admission Controller

The Datadog Admission Controller can automatically inject the tracing library into your application pods — no code changes required:
datadog:
  apiKey: qovery.env.DD_API_KEY
  site: qovery.env.DD_SITE
  clusterName: qovery.env.CLUSTER_NAME

  kubelet:
    tlsVerify: false

  apm:
    portEnabled: true
    port: 8126

clusterAgent:
  enabled: true
  admissionController:
    enabled: true
    mutateUnlabelled: false  # Set to true to inject into all pods
When mutateUnlabelled is false, you must add the following label to your services (via Labels & Annotations) to opt in:
admission.datadoghq.com/enabled: "true"
The Admission Controller will automatically set DD_AGENT_HOST, DD_ENTITY_ID, and inject the appropriate tracing library based on the language annotation:
admission.datadoghq.com/java-lib.version: "latest"
Supported language annotations: java-lib, python-lib, js-lib, dotnet-lib, ruby-lib.

Manual APM Instrumentation

If you prefer not to use the Admission Controller, you can manually instrument your applications by adding these environment variables to your Qovery services:
DD_AGENT_HOST=datadog-agent.qovery.svc.cluster.local
DD_TRACE_AGENT_PORT=8126
DD_SERVICE=my-app
DD_ENV=production
DD_VERSION=$QOVERY_COMMIT_ID
Then instrument your application using the Datadog tracer for your language. See Datadog’s instrumentation docs for language-specific guides. Here is a comprehensive production-ready configuration that enables the most commonly used features:
datadog:
  apiKey: qovery.env.DD_API_KEY
  site: qovery.env.DD_SITE
  clusterName: qovery.env.CLUSTER_NAME

  kubelet:
    tlsVerify: false

  # Log collection
  logs:
    enabled: true
    containerCollectAll: true
    autoMultiLineDetection:
      enabled: true

  # APM
  apm:
    portEnabled: true
    port: 8126

  # Process monitoring
  processAgent:
    enabled: true
    processCollection: true

  # Network monitoring
  networkMonitoring:
    enabled: true

  # Cluster checks (for Database Monitoring, etc.)
  clusterChecks:
    enabled: true

  # Avoid auto-detecting the Qovery cluster agent
  ignoreAutoConfig:
    - datadog_cluster_agent

# Node Agent
agents:
  enabled: true
  priorityClassName: qovery-high-priority

  # Tolerate all taints (important for Karpenter clusters)
  tolerations:
    - operator: Exists

  # Rolling update to avoid scheduling all pods at once
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: "33%"

  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

# Cluster Agent
clusterAgent:
  enabled: true
  replicas: 2
  priorityClassName: qovery-high-priority

  admissionController:
    enabled: true
    mutateUnlabelled: false

  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

# Cluster Check Runners (optional, for database monitoring)
clusterChecksRunner:
  enabled: true
  replicas: 2

  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

Troubleshooting

Deployment Issues

Problem: Helm deployment times out before completing.Cause: Datadog CRDs can take time to apply, especially on the first install.Solution: Add --timeout 15m0s in the Helm Arguments field of your Datadog service in Qovery Console.
Problem: Datadog agent pods are in CrashLoopBackOff state.Possible causes and solutions:
  • Invalid API key: Check agent logs with kubectl logs -n qovery datadog-agent-xxx. If you see authentication errors, verify your DD_API_KEY is correct.
  • Resource limits too low: The agent may be OOMKilled. Increase memory limits (start with 512Mi, go up to 1Gi if needed).
  • Verify secret exists: Run kubectl get secret -n qovery to confirm the secret was created.
  • YAML syntax errors: Review your values override for indentation or syntax issues.
Problem: Datadog agent pods fail to start or stay in Pending state, causing the deployment to time out. Node events show scheduling errors or IP address timeout messages.Cause: Since Datadog deploys a DaemonSet (at least one agent pod per node), if even a single node has reached its maximum pod capacity, the deployment will fail. This is common on clusters with many services or smaller instance types.Solutions:
  1. Set a high priority class so Datadog pods get scheduled before lower-priority workloads:
    agents:
      priorityClassName: qovery-high-priority
    
    clusterAgent:
      priorityClassName: qovery-high-priority
    
  2. Ignore auto-detection of the Qovery cluster agent to avoid unnecessary pod conflicts:
    datadog:
      ignoreAutoConfig:
        - datadog_cluster_agent
    
  3. Use a rolling update strategy to avoid scheduling all pods at once:
    agents:
      updateStrategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: "33%"
    
If you recently changed your node instance type (e.g., to xlarge), the max pods per node limit increases. However, existing nodes may still be at capacity until workloads are redistributed. Redeploying may succeed on a subsequent attempt once Karpenter provisions new nodes.

Configuration Issues

Problem: The Helm chart receives literal strings like qovery.env.DD_API_KEY instead of the actual value.Cause: The variable name in the override doesn’t match the variable defined in Qovery.Solutions:
  • Variable names are case-sensitive: qovery.env.DD_API_KEYqovery.env.dd_api_key
  • The variable must exist on the Helm service (correct scope)
  • Check for typos in both the variable name and the override value
  • See Environment Variables in Helm Values for details

Logs & Monitoring Issues

Problem: On a cluster with Karpenter enabled, some applications have their logs visible in Datadog but logs are missing for others.Reason: The Datadog agent DaemonSet likely has node selectors or taints/tolerations that prevent it from running on the stable nodes where some of your services have been scheduled. Services with only one pod typically run on the stable node pool, which explains why logs from these services are missing when Karpenter is enabled.Solutions:
  • Option 1 — Tolerate all taints (recommended):
    agents:
      tolerations:
        - operator: Exists  # This will tolerate all taints
    
  • Option 2 — Specific node pool tolerations:
    agents:
      tolerations:
        - key: "qovery.com/node-pool"
          operator: "Equal"
          value: "stable"
          effect: "NoSchedule"
        - key: "qovery.com/node-pool"
          operator: "Equal"
          value: "default"
          effect: "NoSchedule"
    
After updating your Datadog Helm chart with these tolerations, the agent should be able to collect logs from services running on both the default and stable node pools.See Deploy a DaemonSet in a Karpenter Context for more details.
Problem: Cluster appears in Datadog but no metrics are shown.Solutions:
  • Wait 5-10 minutes for initial data to appear
  • Verify the agent is scraping: check agent logs
  • Ensure the correct site is set (datadoghq.com for US, datadoghq.eu for EU)
  • Check that firewall/network policies allow outbound traffic to Datadog
Problem: Node-level metrics (CPU, memory per node) are missing.Cause: TLS verification failure when the agent contacts the kubelet.Solution: Set kubelet.tlsVerify: false in your values override:
datadog:
  kubelet:
    tlsVerify: false

APM Issues

Problem: APM traces are not visible in the Datadog APM dashboard.Solutions:
  • Verify DD_AGENT_HOST points to the correct service: datadog-agent.qovery.svc.cluster.local
  • Confirm APM is enabled in the Helm values (apm.portEnabled: true)
  • Check that port 8126 is open and not blocked by network policies
  • Verify the application is correctly instrumented with the Datadog tracing library
Problem: The tracing library is not automatically injected into application pods.Solutions:
  • Ensure the Admission Controller is enabled in the Helm values
  • If mutateUnlabelled: false, verify the label admission.datadoghq.com/enabled: "true" is set on the pod
  • Check the language annotation is set (e.g., admission.datadoghq.com/java-lib.version: "latest")
  • Check Cluster Agent logs: kubectl logs -n qovery deployment/datadog-cluster-agent

Deployment Checklist

Use this checklist to verify your Datadog deployment:
  • Helm repository Datadog added in Organization Settings
  • Helm service created with chart datadog and Allow cluster-wide resources enabled
  • DD_API_KEY stored as a secret variable on the service
  • DD_SITE and CLUSTER_NAME configured as variables
  • Values override configured with kubelet.tlsVerify: false
  • Log collection enabled if needed (logs.enabled: true)
  • APM configured if needed (apm.portEnabled: true)
  • Tolerations added if Karpenter is enabled
  • Deployment successful — agent pods running
  • Agent visible in Datadog → Infrastructure → Kubernetes