Dash0 Raises $110M Series B at $1B Valuation

Last updated: June 8, 2026

About Kubernetes Monitoring

Monitor Kubernetes clusters with automatic instrumentation via the Dash0 operator for Kubernetes.

There is no faster or easier way to monitor your Kubernetes cluster and workloads than using the Dash0 operator for Kubernetes. It is built on open standards and tailored for the optimal user experience. Simply install the operator into your cluster to get OpenTelemetry data flowing from your Kubernetes workloads to Dash0.

The Dash0 operator for Kubernetes installs an OpenTelemetry collector into your cluster that sends data to your Dash0 ingress endpoint, with authentication already configured out of the box. It also gathers OpenTelemetry data from applications deployed to the cluster, including traces, logs and metrics.

Supported Runtimes

Supported runtimes for automatic workload instrumentation:

  • Java 8+
  • Node.js 16+
  • .NET
  • Python (opt-in)

Metrics and log collection are independent of the runtime of workloads.

Prerequisites

To use the operator, you will need provide two configuration values:

  • endpoint: The URL of the Dash0 ingress endpoint backend to which telemetry data will be sent. This property is mandatory when installing the operator. This is the OTLP/gRPC endpoint of your Dash0 organization. The correct OTLP/gRPC endpoint can be copied from https://app.dash0.com → organization settings → "Endpoints" → "OTLP/gRPC". Note that the correct endpoint value will always start with ingress. and end in dash0.com:4317. Including a protocol prefix (e.g. https://) is optional.
  • Either token or secretRef: Exactly one of these two properties needs to be provided when installing the operator.
    • token: This is the Dash0 authorization token of your organization. The authorization token for your Dash0 organization can be copied from https://app.dash0.com → organization settings → "Auth Tokens". The prefix Bearer must not be included in the value. Note that when you provide a token, it will be rendered verbatim into a Kubernetes ConfigMap object. Anyone with API access to the Kubernetes cluster will be able to read the value. Use a secret reference and a Kubernetes secret if you want to avoid that.
    • secretRef: A reference to an existing Kubernetes secret in the Dash0 operator's namespace. The secret needs to contain the Dash0 authorization token. See below for details on how exactly the secret should be created and configured.

Installation

Before installing the operator, add the Dash0 operator's Helm repository as follows:

console
12
helm repo add dash0-operator https://dash0hq.github.io/dash0-operator
helm repo update dash0-operator

Now you can install the operator into your cluster via Helm with the following command:

console
1234567891011
helm install \
--wait \
--namespace dash0-system \
--create-namespace \
--set operator.dash0Export.enabled=true \
--set operator.dash0Export.endpoint=REPLACE THIS WITH YOUR DASH0 INGRESS ENDPOINT \
--set operator.dash0Export.apiEndpoint=REPLACE THIS WITH YOUR DASH0 API ENDPOINT \
--set operator.dash0Export.token=REPLACE THIS WITH YOUR DASH0 AUTH TOKEN \
--set operator.clusterName=REPLACE THIS WITH YOUR THE NAME OF THE CLUSTER (OPTIONAL) \
dash0-operator \
dash0-operator/dash0-operator

Instead of providing the auth token directly, you can also use a secret reference:

console
123456789101112
helm install \
--wait \
--namespace dash0-system \
--create-namespace \
--set operator.dash0Export.enabled=true \
--set operator.dash0Export.endpoint=REPLACE THIS WITH YOUR DASH0 INGRESS ENDPOINT \
--set operator.dash0Export.apiEndpoint=REPLACE THIS WITH YOUR DASH0 API ENDPOINT \
--set operator.dash0Export.secretRef.name=REPLACE THIS WITH THE NAME OF AN EXISTING KUBERNETES SECRET \
--set operator.dash0Export.secretRef.key=REPLACE THIS WITH THE PROPERTY KEY IN THAT SECRET \
--set operator.clusterName=REPLACE THIS WITH YOUR THE NAME OF THE CLUSTER (OPTIONAL) \
dash0-operator \
dash0-operator/dash0-operator

See the section Using a Kubernetes Secret for the Dash0 Authorization Token for more information on using a Kubernetes secrets with the Dash0 operator.

You can consult the chart's values.yaml file for a complete list of available configuration settings.

See the section Notes on Creating the Operator Configuration Resource Via Helm for more information on providing Dash0 export settings via Helm and how it affects manual changes to the operator configuration resource.

Last but not least, you can also install the operator without providing a Dash0 backend configuration:

console
123456
helm install \
--wait \
--namespace dash0-system \
--create-namespace \
dash0-operator \
dash0-operator/dash0-operator

However, you will need to create a Dash0 operator configuration resource later that provides the backend connection settings. That is, providing --set operator.dash0Export.enabled=true and the other backend-related settings when running helm install is simply a shortcut to deploy the Dash0 operator configuration resource automatically at startup.

On its own, the operator will only collect Kubernetes metrics. To actually have the operator properly monitor your workloads, two more things need to be set up:

  1. A Dash0 backend connection has to be configured (unless you did that already with the Helm values operator.dash0Export.*), and
  2. Monitoring namespaces and their workloads to collect logs, traces and metrics has to be enabled per namespace, or configure namespace auto-monitoring.

Both steps are described in the following sections.

Support for Prometheus CRDs

If you would like to enable support for Prometheus CRDs

  1. ensure the CRDs (ServiceMonitor, PodMonitor, ScrapeConfig) are installed in the cluster
  2. include --set operator.prometheusCrdSupportEnabled=true when running helm install

Alternatively, if you are creating the operator configuration resource manually, set spec.prometheusCrdSupport.enabled: true in the operator configuration resource. Refer to the Configuration for details.

The operator supports the following CRDs

  • ServiceMonitor
  • PodMonitor
  • ScrapeConfig with kubernetesSDConfigs

The Dash0 Operator uses the OpenTelemetry Target Allocator to watch Prometheus CRDs and assign targets to the collector running on the same node as the monitored workload.

Authorization

If the scraped endpoints require authorization, it is mandatory to configure mTLS for the communication between the OpenTelemetry Target Allocator and the collectors, so the credentials can be transfered in a secure manner.

We recommend to use cert-manager for the creation of the certificates/secrets, but any secrets following the kubernetes.io/tls secret type and providing ca.crt, tls.crt, and tls.key should be compatible. Note that the server and client certificates need to be signed by the same CA to be trusted (i.e. two random self-signed certificates won't work).

You can find an example of minimal issuers and certificates for cert-manager in /test-resources/cert-manager/ta-issuers-and-cert.yaml.template.

The secrets must be created in the Dash0 operator namespace.

Once you have created the required secrets holding the certificates, you can enable mTLS and set the secret names via the Helm chart:

yaml
123456
operator:
targetAllocator:
mTls:
enabled: true
serverCertSecretName: "ta-mtls-server-cert-secret"
clientCertSecretName: "ta-mtls-client-cert-secret"

Configuring the resource requests/limits for the target-allocator

Depending on individual requirements (like the number of watched resources), it might be necessary to increase the resource requests/limits of the target-allocator. This can be achieved by setting the respective fields via Helm:

yaml
12345678910
operator:
targetAllocator:
containerResources:
limits:
cpu: 200m
memory: 500Mi
gomemlimit: 400MiB
requests:
cpu: 200m
memory: 128Mi

Configuration

Configuring the Dash0 Backend Connection

You can skip this step if you provided --set operator.dash0Export.enabled=true together with the endpoint and either a token or a secret reference when running helm install. In that case, proceed to the next section, Enable Dash0 Monitoring For a Namespace.

Otherwise, configure the backend connection now by creating a file dash0-operator-configuration.yaml with the following content:

yaml
1234567891011121314151617
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- dash0:
# Replace this value with the actual OTLP/gRPC endpoint of your Dash0 organization.
endpoint: ingress... # TODO needs to be replaced with the actual value, see below
authorization:
# Provide the Dash0 authorization token as a string via the token property:
token: auth_... # TODO needs to be replaced with the actual value, see below
apiEndpoint: https://api.....dash0.com # TODO needs to be replaced with the actual value, see below
clusterName: my-kubernetes-cluster # optional, see below

Here is a list of configuration options for this resource:

  • spec.exports[]: One or more export configs defining endpoints and authorization (see below for details). If multiple exports are defined, the telemetry will be exported to all defined exports and CRs (views, synthetic checks, dashboards, check rules, notification channels, spam filters, signal-to-metrics rules) will be synced to all defined Dash0 exports.

  • spec.exports[].dash0.endpoint: The URL of the Dash0 ingress endpoint to which telemetry data will be sent. This property is mandatory. Replace the value in the example above with the OTLP/gRPC endpoint of your Dash0 organization. The correct OTLP/gRPC endpoint can be copied from https://app.dash0.com → organization settings → "Endpoints" → "OTLP/gRPC". Note that the correct endpoint value will always start with ingress. and end in dash0.com:4317. Including a protocol prefix (e.g. https://) is optional.

  • spec.exports[].dash0.authorization.token or spec.exports[].dash0.authorization.secretRef: Exactly one of these two properties needs to be provided. Providing both will cause a validation error when installing the Dash0Monitoring resource.

    • spec.export.dash0.authorization.token: Replace the value in the example above with the Dash0 authorization token of your organization. The authorization token for your Dash0 organization can be copied from https://app.dash0.com → organization settings → "Auth Tokens". The prefix Bearer must not be included in the value. Note that the value will be rendered verbatim into a Kubernetes ConfigMap object. Anyone with API access to the Kubernetes cluster will be able to read the value. Use the secretRef property and a Kubernetes secret if you want to avoid that.

    • spec.export.dash0.authorization.secretRef: A reference to an existing Kubernetes secret in the Dash0 operator's namespace. See the section Using a Kubernetes Secret for the Dash0 Authorization Token for an example file that uses a secretRef. The secret needs to contain the Dash0 authorization token. See below for details on how exactly the secret should be created and configured. Note that by default, Kubernetes secrets are stored unencrypted, and anyone with API access to the Kubernetes cluster will be able to read the value. Additional steps are required to make sure secret values are encrypted. See https://kubernetes.io/docs/concepts/configuration/secret/ for more information on Kubernetes secrets.

  • spec.exports[].dash0.apiEndpoint: The base URL of the Dash0 API to talk to. This is not where telemetry will be sent, but it is used for managing dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the operator. This property is optional. The value needs to be the API endpoint of your Dash0 organization. The correct API endpoint can be copied from https://app.dash0.com → organization settings → "Endpoints" → "API". The correct endpoint value will always start with "https://api." and end in ".dash0.com". If this property is omitted, managing dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the operator will not work.

  • spec.selfMonitoring.enabled: An opt-out for self-monitoring for the operator. If enabled, the operator will collect self-monitoring telemetry and send it to the configured Dash0 backend. This setting is optional, it defaults to true.

  • spec.kubernetesInfrastructureMetricsCollection.enabled: If enabled, the operator will collect Kubernetes infrastructure metrics. This setting is optional, it defaults to true; unless telemetryCollection.enabled is set to false, then kubernetesInfrastructureMetricsCollection.enabled defaults to false as well. It is a validation error to set telemetryCollection.enabled=false and kubernetesInfrastructureMetricsCollection.enabled=true at the same time.

  • spec.collectPodLabelsAndAnnotations.enabled: If enabled, the operator will collect all Kubernetes pod labels and annotations and convert them to resource attributes for all spans, log records and metrics. The resulting resource attributes are prefixed with k8s.pod.label. or k8s.pod.annotation. respectively. This setting is optional, it defaults to true; unless telemetryCollection.enabled is set to false, then collectPodLabelsAndAnnotations.enabled defaults to false as well. It is a validation error to set telemetryCollection.enabled=false and collectPodLabelsAndAnnotations.enabled=true at the same time.

  • spec.clusterName: If set, the value will be added as the resource attribute k8s.cluster.name to all telemetry. This setting is optional. By default, k8s.cluster.name will not be added to telemetry.

  • spec.telemetryCollection.enabled: An opt-out switch for all telemetry collection, and to avoid having the operator deploy OpenTelemetry collectors in the cluster. This setting is optional, it defaults to true (that is, by default, OpenTelemetry collectors will be deployed and telemetry will be collected). If telemetry collection is disabled via this switch, the operator will not collect any telemetry, in particular it will not deploy any OpenTelemetry collectors in the cluster. This is useful if you want to use the operator for infrastructure-as-code (e.g. to synchronize dashboards & check rules), but do not want it to deploy the OpenTelemetry collector. Note that setting this to false does not disable the operator's self-monitoring telemetry, use the setting spec.selfMonitoring.enabled to disable self-monitoring if required (self-monitoring does not require an OpenTelemetry collector). Also note that this setting is not exposed via Helm, i.e. if you want to set this to false you need to deploy the operator configuration resource manually, i.e. omit the Helm value operator.dash0Export.enabled or set it to false, then deploy an operator configuration resource via kubectl apply -f or similar.

  • spec.prometheusCrdSupport.enabled: A flag controlling whether support for Prometheus CRDs will be enabled. This setting is optional and the default is false. Setting it to true and having at least one namespace with prometheusScraping enabled, will deploy the OpenTelemetry target-allocator and update the prometheusreceiver in the OpenTelemetry collectors, so they query the allocator for targets to be scraped.

  • spec.instrumentWorkloads.instrumentationDelivery: Whether to use an image volume or an init container plus an emptyDir volume to provide instrumentation files to workloads when applying auto-instrumentation. See Using Image Volumes for Auto-Instrumentation Files. Allowed values:

    • auto: use image volumes if the Kubernetes version is 1.36 or later, otherwise use the init container approach.
    • image-volume: always use image volumes, also on Kubernetes versions older than 1.36. If the Kubernetes version is older than 1.31, the operator manager will log a warning and fall back to the init container approach, since image volumes are not supported in that version. Note that if you are using Kubernetes 1.34 or earlier, and you want to use this setting, you need to enable image volumes when configuring your cluster, since image volumes are disabled by default in versions older than 1.35.
    • init-container: always use the init container approach, regardless of the Kubernetes version. This is the default.
  • spec.autoMonitorNamespaces.enabled: Controls whether monitoring is set up for namespaces automatically. By default, a Dash0Monitoring resource has to be added to each namespace that you want to monitor. With automatic namespace monitoring, you can let the Dash0 operator automate this. This is useful if you want to monitor all or almost all namespaces in your cluster. It is also useful if you create new namespaces frequently and want to have them monitored right away, without additional setup. It is best suited if almost all namespace should be monitored in the same fashion. If enabled, the operator will:

    • automatically add monitoring to all existing namespaces at startup, and
    • automatically add monitoring to new namespaces, as they are created. Even when enabled, individual namespaces can opt out of automatic monitoring via label selectors. Namespaces which are subject to automatic namespace monitoring will be monitored according to the settings of the monitoringTemplate.
  • operatorconfigurationresource.spec.autoMonitorNamespaces.labelSelector: An optional configurable label selector for controlling which namespaces are automatically monitored. Namespaces which match this label selector will be monitored automatically (if autoMonitorNamespaces.enabled is set to true). Namespaces which do not match this label selector will not be monitored, regardless of the value of autoMonitorNamespaces.enabled. By default, this label selector has the value "dash0.com/enable!=false" - that is, the following namespaces will be monitored:

    • namespaces which do not have the label dash0.com/enable at all, and
    • namespaces which have the label dash0.com/enable with a value other than "false". Namespaces which are subject to automatic namespace monitoring will be monitored according to the settings of the monitoringTemplate.
  • operatorconfigurationresource.spec.monitoringTemplate: Specification of the desired settings for automatically monitoring namespaces.

After providing the required values (at least endpoint and authorization), save the file and apply the resource to the Kubernetes cluster you want to monitor:

console
1
kubectl apply -f dash0-operator-configuration.yaml

The Dash0 operator configuration resource is cluster-scoped, so a specific namespace should not be provided when applying it.

Note: All configuration options available in the operator configuration resource can also be configured when letting the Helm chart auto-create this resource, as explained in the section Installation. You can consult the chart's values.yaml file for a complete list of available configuration settings.

Notes on Creating the Operator Configuration Resource Via Helm

Providing the backend connection settings to the operator via Helm parameters is a convenience mechanism to get monitoring started right away when installing the operator. Setting operator.dash0Export.enabled to true and providing other necessary operator.dash0Export.* values like operator.dash0Export.endpoint will instruct the operator manager to create an operator configuration resource with the provided values at startup. This automatically created operator configuration resource will have the name dash0-operator-configuration-auto-resource.

If an operator configuration resource with any other name already exists in the cluster (e.g. a manually created operator configuration resource), the operator will treat this as an error and refuse to overwrite the existing operator configuration resource with the values provided via Helm.

If an operator configuration resource with the name dash0-operator-configuration-auto-resource already exists in the cluster (e.g. a previous startup of the operator manager has created the resource), the operator manager will update/overwrite this resource with the values provided via Helm.

Manual changes to the dash0-operator-configuration-auto-resource are permissible for quickly experimenting with configuration changes, without an operator restart, but you need to be aware that they will be overwritten with the settings provided via Helm the next time the operator manager pod is restarted. Possible reasons for a restart of the operator manager pod include upgrading to a new operator version, running helm upgrade ... dash0-operator dash0-operator/dash0-operator, or Kubernetes moving the operator manager pod to a different node.

For this reason, when using this feature, it is recommended to treat the Helm values as the source of truth for the operator configuration. Any changes you want to be permanent should be applied via Helm and the operator.dash0Export.* settings.

If you would rather retain manual control over the operator configuration resource, you should omit any operator.dash0Export.* Helm values and create and manage the operator configuration resource manually (that is, via kubectl, ArgoCD etc.).

Enable Dash0 Monitoring For a Namespace

Note: As an alternative to enabling monitoring per namespace, you can also monitor all namespaces automatically.

Note: By default, when enabling Dash0 monitoring for a namespace, all workloads in this namespace will be restarted to apply the Dash0 instrumentation. If you want to avoid this, set the instrumentWorkloads property in the monitoring resource spec to created-and-updated. See below for more information on the instrumentWorkloads modes.

For each namespace that you want to monitor with Dash0, enable monitoring by installing a Dash0 monitoring resource into that namespace:

Create a file dash0-monitoring.yaml with the following content:

yaml
1234
apiVersion: operator.dash0.com/v1beta1
kind: Dash0Monitoring
metadata:
name: dash0-monitoring-resource

Save the file and apply the resource to the namespace you want to monitor. For example, if you want to monitor workloads in the namespace my-nodejs-applications, use the following command:

console
1
kubectl apply --namespace my-nodejs-applications -f dash0-monitoring.yaml

If you want to monitor the default namespace with Dash0, use the following command:

console
1
kubectl apply -f dash0-monitoring.yaml

Note: Even when no monitoring resources has been installed and no namespace is being monitored by Dash0, the Dash0 operator's collector will collect Kubernetes infrastructure metrics that are not namespace scoped, like node-related metrics. The only prerequisite for this is an operator configuration with exports settings.

Additional Configuration Per Namespace

The Dash0 monitoring resource supports additional configuration settings:

  • spec.instrumentWorkloads.mode: A namespace-wide configuration for the workload instrumentation strategy for the target namespace. There are three possible settings: all, created-and-updated and none. By default, the setting all is assumed; unless there is an operator configuration resource with telemetryCollection.enabled=false, then the setting none is assumed by default. Note that spec.instrumentWorkloads.mode is the path for this setting starting with version v1beta1 of the Dash0 monitoring resource; when using v1alpha1, the path is spec.instrumentWorkloads.

    • all: If set to all, the operator will:

      • instrument existing workloads in the target namespace (i.e. workloads already running in the namespace) when the Dash0 monitoring resource is deployed,
      • instrument existing workloads or update the instrumentation of already instrumented workloads in the target namespace when the Dash0 operator is first started or updated to a newer version,
      • instrument new workloads in the target namespace when they are deployed, and
      • instrument changed workloads in the target namespace when changes are applied to them.
      • Note that the first two actions (instrumenting existing workloads) will result in restarting the pods of the affected workloads. Use created-and-updated if you want to avoid pod restarts.
    • created-and-updated: If set to created-and-updated, the operator will not instrument existing workloads in the target namespace. Instead, it will only:

      • instrument new workloads in the target namespace when they are deployed, and
      • instrument changed workloads in the target namespace when changes are applied to them. This setting is useful if you want to avoid pod restarts as a side effect of deploying the Dash0 monitoring resource or restarting the Dash0 operator.
    • none: You can opt out of instrumenting workloads entirely by setting this option to none. With spec.instrumentWorkloads: none, workloads in the target namespace will never be instrumented to emit telemetry.

    If this setting is omitted, the value all is assumed and new/updated as well as existing Kubernetes workloads will be instrumented by the operator to emit telemetry, as described above. There is one exception to this rule: If there is an operator configuration resource with telemetryCollection.enabled=false, then the default setting is none instead of all, and no workloads will be instrumented by the Dash0 operator.

    More fine-grained per-workload control over instrumentation is available by setting the label dash0.com/enable=false on individual workloads, see Disabling Auto-Instrumentation for Specific Workloads.

    The behavior when changing this setting for an existing Dash0 monitoring resource is as follows:

    • When this setting is updated to spec.instrumentWorkloads=all (and it had a different value before): All existing uninstrumented workloads will be instrumented. Their pods will be restarted to apply the instrumentation.
    • When this setting is updated to spec.instrumentWorkloads=none (and it had a different value before): The instrumentation will be removed from all instrumented workloads. Their pods will be restarted to remove the instrumentation. (After this change, the operator will no longer instrument any workloads nor will it restart any pods.)
    • Updating this value to spec.instrumentWorkloads=created-and-updated has no immediate effect; existing uninstrumented workloads will not be instrumented, existing instrumented workloads will not be uninstrumented. Newly deployed or updated workloads will be instrumented from the point of the configuration change onwards as described above.

    Automatic workload instrumentation will automatically add tracing to your workloads. You can read more about what exactly this feature entails in the section Automatic Workload Instrumentation.

  • spec.instrumentWorkloads.labelSelector: A custom Kubernetes label selector for controlling the workload instrumentation on the level of individual workloads, see Using a Custom Label Selector to Control Auto-Instrumentation.

  • spec.instrumentWorkloads.traceContext.propagators: When set, the operator will add the environment variable OTEL_PROPAGATORS to all instrumented workloads in the target namespace. This environment variable determines which trace context propagation headers an OTel SDK uses. Setting this can be useful if the workloads in this namespace interact with services that do not use the W3C trace context standard header traceparent for trace context propagation, but for example AWS X-Ray (X-Amzn-Trace-Id). The value of the setting will be validated, it needs to be a comma-separated list of valid propagators. See https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_propagators for more information.

    When the option is not set, the operator will not set the environment variable OTEL_PROPAGATORS. If the option is set in the monitoring resource at some point and then later removed again, the operator will remove the environment variable from instrumented workloads if and only if the value of the environment variable matches the previously used setting in the monitoring resource. This is done to prevent accidentally removing an OTEL_PROPAGATORS environment variable that has been set manually and not by the operator. (For that purpose, the previous setting is stored in the monitoring resource's status.)

  • spec.instrumentWorkloads.captureSqlQueryParameters: When set to true, the operator enables SQL query parameter capture in the language agents that support it. Currently this covers the OpenTelemetry Java agent's JDBC instrumentation and the OpenTelemetry .NET auto-instrumentation for Microsoft.Data.SqlClient and Entity Framework Core, by adding the following environment variables (all set to true) to instrumented containers in the target namespace:

    • OTEL_INSTRUMENTATION_JDBC_EXPERIMENTAL_CAPTURE_QUERY_PARAMETERS
    • OTEL_DOTNET_EXPERIMENTAL_SQLCLIENT_ENABLE_TRACE_DB_QUERY_PARAMETERS
    • OTEL_DOTNET_EXPERIMENTAL_EFCORE_ENABLE_TRACE_DB_QUERY_PARAMETERS

    Recorded query parameter values are added to the resulting database span attributes. Other language agents ignore these variables.

    Note that these are experimental upstream OpenTelemetry agent flags and may be renamed or removed in future agent releases. Recorded query parameter values may include sensitive data such as personally identifiable information (PII), so enable this only for namespaces where capturing query parameter values is acceptable.

    When the option is not set or set to false, the operator will not set the environment variables. If the option is set to true at some point and then later set to false or removed, the operator will remove each environment variable from instrumented workloads if and only if its current value still matches what the operator would have set (the literal true); a user-set value that differs is preserved. (For that purpose, the previous setting is stored in the monitoring resource's status.)

  • spec.logCollection.enabled: A namespace-wide opt-out for collecting pod logs via the filelog receiver. If enabled, the operator will configure its OpenTelemetry collector to watch the log output of all pods in the namespace and send the resulting log records to Dash0. This setting is optional, it defaults to true; unless there is an operator configuration resource with telemetryCollection.enabled=false, then log collection is off by default. It is a validation error to set telemetryCollection.enabled=false in the operator configuration resource and logCollection.enabled=true in any monitoring resource at the same time.

  • spec.prometheusScraping.enabled: A namespace-wide opt-out for Prometheus scraping for the target namespace. If enabled, the operator will configure its OpenTelemetry collector to scrape metrics from pods in the namespace of this Dash0Monitoring resource according to their prometheus.io/scrape annotations via the OpenTelemetry Prometheus receiver. In addition, if the operator configuration resource has prometheusCrdSupport.enabled=true, the collectors will scrape metrics from endpoints defined in Prometheus CRs (ServiceMonitor, PodMonitor, ScrapeConfig) present in this namespace. This setting is optional, it defaults to true; unless there is an operator configuration resource with telemetryCollection.enabled=false, then Prometheus scraping is off by default. It is a validation error to set telemetryCollection.enabled=false in the operator configuration resource and prometheusScraping.enabled=true in any monitoring resource at the same time. Note that the collection of OpenTelemetry-native metrics is not affected by setting prometheusScraping.enabled=false for a namespace.

  • spec.filter: An optional custom filter configuration to drop some of the collected telemetry before sending it to the configured telemetry backend. Filters for a specific telemetry object type (e.g. spans) are lists of OTTL expressions. If at least one of the conditions of a list evaluates to true, the object will be dropped. (That is, conditions are implicitly connected by a logical OR.) The configuration structure is identical to the configuration of the OpenTelemetry collector's filter processor. One difference to the filter processor is that the filter rules configured in a Dash0 monitoring resource will only be applied to the telemetry collected in the namespace the monitoring resource is installed in. Telemetry from other namespaces is not affected. Existing configurations for the filter processor can be copied and pasted without syntactical changes.

    • spec.filter.traces.span: A list of OTTL conditions for filtering spans. All spans where at least one condition evaluates to true will be dropped. (That is, conditions are implicitly connected by a logical OR.)
    • spec.filter.traces.spanevent: A list of OTTL conditions for filtering span events. All span events where at least one condition evaluates to true will be dropped. If all span events for a span are dropped, the span will be left intact.
    • spec.filter.metrics.metric: A list of OTTL conditions for filtering metrics. All metrics where at least one condition evaluates to true will be dropped.
    • spec.filter.metrics.datapoint: A list of OTTL conditions for filtering individual data points of metrics. All data points where at least one condition evaluates to true will be dropped. If all datapoints for a metric are dropped, the metric will also be dropped.
    • spec.filter.logs.log_records: A list of OTTL conditions for filtering log records. All log records where at least one condition evaluates to true will be dropped.
    • spec.filter.profiles.profile: A list of OTTL conditions for filtering profiles. All profiles where at least one condition evaluates to true will be dropped. This setting is optional, by default, no filters are applied. It is a validation error to set telemetryCollection.enabled=false in the operator configuration resource and set filters in any monitoring resource at the same time.

    Note that although error_mode can be specified per namespace, the filter conditions will be aggregated into one single filter processor in the resulting OpenTelemetry collector configuration; if different error modes are specified in different namespaces, the "most severe" error mode will be used (propagate > ignore > silent).

  • spec.transform: An optional custom transformation configuration that will be applied to the collected telemetry before sending it to the configured telemetry backend. Transformations for a specific telemetry signal (e.g. traces, metrics, logs, profiles) are lists of OTTL statements. All telemetry for the respective signal will be routed through all transformation statements. The statements are executed in the order they are listed. The configuration structure is identical to the configuration of the OpenTelemetry collector's transform processor. Both the basic configuration style, and the advanced configuration style of the transform processor are supported. One difference to the transform processor is that the transform rules configured in a Dash0 monitoring resource will only be applied to the telemetry collected in the namespace the monitoring resource is installed in. Telemetry from other namespaces is not affected. If both spec.filter and spec.transform are configured, the filtering for a given signal (traces, metrics, logs, profiles) will be executed before the transform processor. (That is, you cannot assume that transformations have already been applied when writing filter rules.) Existing configurations for the transform processor can be copied and pasted without syntactical changes.

    • spec.transform.trace_statements: A list of OTTL statements (or a list of groups in the advanced config style) for transforming trace telemetry.
    • spec.transform.metric_statements: A list of OTTL statements (or a list of groups in the advanced config style) for transforming metric telemetry.
    • spec.transform.log_statements: A list of OTTL statements (or a list of groups in the advanced config style) for transforming log telemetry.
    • spec.transform.profile_statements: A list of OTTL statements (or a list of groups in the advanced config style) for transforming profile telemetry. This setting is optional, by default, no transformations are applied. It is a validation error to set telemetryCollection.enabled=false in the operator configuration resource and set transforms in any monitoring resource at the same time.

    Note that although error_mode can be specified per namespace, the transform statements will be aggregated into one single transform processor in the resulting OpenTelemetry collector configuration; if different error modes are specified in different namespaces, the "most severe" error mode will be used (propagate > ignore > silent).

  • spec.synchronizePersesDashboards: A namespace-wide opt-out for synchronizing Perses dashboard resources found in the target namespace. If enabled, the operator will watch Perses dashboard resources in this namespace and create corresponding dashboards in Dash0 via the Dash0 API. More fine-grained per-resource control over synchronization is available by setting the label dash0.com/enable=false on individual Perses dashboard resources. See Managing Dash0 Dashboards for details. This setting is optional, it defaults to true.

  • spec.synchronizePrometheusRules: A namespace-wide opt-out for synchronizing Prometheus rule resources found in the target namespace. If enabled, the operator will watch Prometheus rule resources in this namespace and create corresponding check rules in Dash0 via the Dash0 API. More fine-grained per-resource control over synchronization is available by setting the label dash0.com/enable=false on individual Prometheus rule resources. See Managing Dash0 Check Rules for details. This setting is optional, it defaults to true.

Example

Here is a comprehensive example for a monitoring resource which:

  • sets the instrumentation mode to created-and-updated,
  • disables Prometheus scraping,
  • sets a couple of filters for all six telemetry object types,
  • applies transformations to limit the length of span attributes, datapoint attributes, log attributes, and profile attributes (with the metric transform using the advanced transform config style),
  • disables Perses dashboard synchronization, and
  • disables Prometheus rule synchronization.
yaml
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
apiVersion: operator.dash0.com/v1beta1
kind: Dash0Monitoring
metadata:
name: dash0-monitoring-resource
spec:
instrumentWorkloads: created-and-updated
prometheusScraping:
enabled: false
filter:
traces:
span:
- 'attributes["http.route"] == "/ready"'
- 'attributes["http.route"] == "/metrics"'
spanevent:
- 'attributes["grpc"] == true'
- 'IsMatch(name, ".*grpc.*")'
metrics:
metric:
- 'name == "k8s.replicaset.available"'
- 'name == "k8s.replicaset.desired"'
datapoint:
- 'metric.type == METRIC_DATA_TYPE_SUMMARY'
- 'resource.attributes["service.name"] == "my_service_name"'
logs:
log_records:
- 'IsMatch(body, ".*password.*")'
- 'severity_number < SEVERITY_NUMBER_WARN'
profiles:
profile:
- 'resource.attributes["k8s.pod.name"] == "debug-pod"'
transform:
trace_statements:
- 'truncate_all(span.attributes, 1024)'
metric_statements:
- conditions:
- 'metric.type == METRIC_DATA_TYPE_SUM'
statements:
- 'truncate_all(datapoint.attributes, 1024)'
log_statements:
- 'truncate_all(log.attributes, 1024)'
profile_statements:
- 'truncate_all(profile.attributes, 1024)'
synchronizePersesDashboards: false
synchronizePrometheusRules: false

Namespace-specific Exports and API-sync Overrides

It is possible to override the global/default exports config provided via the Dash0 operator configuration on a per-namespace basis by providing the corresponding exports config via the Dash0 monitoring resource. Please note that the namespace-specific overrides replace the default list of exports from the Dash0 operator configuration.

The override supports both the export of telemetry and the sync of resources, like dashboards or views, via the Dash0 API.

Note: The operator always expects default export and API-sync settings via the Dash0 operator configuration, even when namespace-specific overrides are used.

Automatic Namespace Monitoring

By default, a Dash0Monitoring resource has to be added to each namespace that you want to monitor (see Enable Dash0 Monitoring For a Namespace). With automatic namespace monitoring, you can let the Dash0 operator automate this. This is useful if you want to monitor all or almost all namespaces in your cluster. It is also useful if you create new namespaces frequently and want to have them monitored right away, without additional setup. It is best suited if almost all namespace should be monitored in the same fashion.

Use the following Helm values to enable automatic namespace monitoring:

1234567891011
operator:
dash0Export:
# operator.dash0Export.enabled must be true to facilitate automatic namespace monitoring.
# Refer to the section "Installation" for details.
enabled: true
...
autoMonitorNamespaces:
# Setting operator.autoMonitorNamespaces.enabled=true activates automatic namespace monitoring.
enabled: true

If automatic namespace monitoring is enabled, the operator will:

  • automatically add monitoring to all existing namespaces at startup, and
  • automatically add monitoring to new namespaces, as they are created.

Even when this feature is enabled, individual namespaces can opt out of automatic monitoring via label selectors. Set dash0.com/enable: "false" on a namespace to exclude it from automatic namespace monitoring.

The namespace label selector is configurable. For example, to use an opt-in approach instead of opt-out, something like this can be used:

123456789
operator:
dash0Export:
enabled: true
...
autoMonitorNamespaces:
enabled: true
labelSelector: monitor-namespace-with-dash0==true

With this configuration, only namespaces that have the label monitor-namespace-with-dash0: "true" will be monitored.

The following namespaces will not be monitored by automatic namespace monitoring, regardless of the label selector:

  • kube-system
  • kube-node-lease
  • kube-public
  • the namespace of the Dash0 operator

You can deploy a monitoring resource to these namespaces manually though.

Automatically monitoring namespaces will be monitored with the following default settings:

  • instrumentWorkloads.mode: created-and-updated
  • Log Collection: Enabled
  • Event Collection: Enabled
  • Prometheues Scraping: Enabled
  • Synchronize Perses Dashboards: Enabled
  • Synchronize Prometheus Rules: Enabled

The defaults can be customized, as follows:

1234567891011121314151617181920
operator:
dash0Export:
enabled: true
...
autoMonitorNamespaces:
enabled: true
monitoringTemplate:
spec:
instrumentWorkloads:
mode: none
logCollection:
enabled: true
eventCollection:
enabled: true
prometheusScraping:
enabled: false
synchronizePersesDashboards: false
synchronizePrometheusRules: false

With the previous example, only logs and Kubernetes events would be collected in automatically monitored namespaces, no workload auto-instrumentation would happen, and Prometheus metrics would not be scraped.

All settings mentioned in Additional Configuration Per Namespace) can be set with the monitoring template, with the exception of per-namespace exports. (Use the operator.dash0Export to configure the export instead.)

The following snippet shows all possible settings:

123456789101112131415161718192021222324252627282930313233343536373839404142
operator:
monitoringTemplate:
spec:
instrumentWorkloads:
mode: none
logCollection:
enabled: true
eventCollection:
enabled: true
prometheusScraping:
enabled: false
filter:
traces:
span:
- 'attributes["http.route"] == "/ready"'
- 'attributes["http.route"] == "/metrics"'
spanevent:
- 'attributes["grpc"] == true'
- 'IsMatch(name, ".*grpc.*")'
metrics:
metric:
- 'name == "k8s.replicaset.available"'
- 'name == "k8s.replicaset.desired"'
datapoint:
- 'metric.type == METRIC_DATA_TYPE_SUMMARY'
- 'resource.attributes["service.name"] == "my_service_name"'
logs:
log_records:
- 'IsMatch(body, ".*password.*")'
- 'severity_number < SEVERITY_NUMBER_WARN'
transform:
trace_statements:
- 'truncate_all(span.attributes, 1024)'
metric_statements:
- conditions:
- 'metric.type == METRIC_DATA_TYPE_SUM'
statements:
- 'truncate_all(datapoint.attributes, 1024)'
log_statements:
- 'truncate_all(log.attributes, 1024)'
synchronizePersesDashboards: false
synchronizePrometheusRules: false

If there are some namespaces which require different monitoring settings, exclude them from automatic namespace monitoring via the label selector (e.g. dash0.com/enable: "false") and deploy a Dash0Monitoring resource there manually.

If the cluster has a lot of namespaces (e.g. close to or more than 1,000), it is recommended to set operator.collectors.compressConfigMaps to true. This will enable gzip compression for the ConfigMaps for the OpenTelemetry collectors.

Example:

12345678910
operator:
dash0Export:
enabled: true
...
collectors:
compressConfigMaps: true
autoMonitorNamespaces:
enabled: true

Using Image Volumes for Auto-Instrumentation Files

When using auto-instrumentation of workloads, by default the operator adds an emptyDir volume and an init container to provide the instrumentation files to the workload.

Image volumes are a new Kubernetes feature that provides a better way to do this. They provide the following advantages:

  • No additional ephemeral storage usage.
  • Faster workload startup (because nothing needs to be copied over from the init container to the empty dir volume)

They were introduced in Kubernetes 1.31 as an alpha feature behind a feature gate. In version 1.33 they graduated to beta, but were still disabled by default. Starting with version 1.35, the image volume feature gate is enabled by default, but they are still considered beta. Image volumes finally became a stable feature in version 1.36

Set the instrumentationDelivery setting in the operator configuration resource to determine under which circumstances image volumes will be used instead of the init container approach. When using the operator manager to create and manage the operator configuration resource (i.e. with operator.dash0Export.enabled=true), set operator.instrumentation.delivery in Helm to configure image volumes.

Allowed values for the instrumentation delivery setting:

  • auto: use image volumes if the Kubernetes version is 1.36 or later, otherwise use the init container approach.
  • image-volume: always use image volumes, also on Kubernetes versions older than 1.36. If the Kubernetes version is older than 1.31, the operator manager will log a warning and fall back to the init container approach, since image volumes are not supported in that version. Note that if you are using Kubernetes 1.34 or earlier, and you want to use this setting, you need to enable image volumes when configuring your cluster, since image volumes are disabled by default in versions older than 1.35.
  • init-container: always use the init container approach, regardless of the Kubernetes version. This is the default.

Note: Changing the instrumentation delivery setting for an existing operator installation will not trigger a bulk re-instrumentation of all existing workloads, even for namespaces that are set to instrumentWorkloadsMode=all. Once a workload has been successfully instrumented, there is no benefit in re-instrumenting it with a different delivery mechanism. The new setting will be applied when instrumenting newly deployed workloads, or when a workload is updated/re-deployed.

Python Auto-Instrumentation

To enable auto-instrumentation for Python workloads, set operator.instrumentation.enablePythonAutoInstrumentation=true via Helm. If this setting is enabled for an existing operator installation, Python auto-instrumentation will be enabled immediately for workloads in namespaces that have a Dash0Monitoring resource with instrumentWorkloads.mode all. This will cause all pods in these namespaces to be restarted. For workloads in namespaces that use instrumentWorkloads.mode=created-and-updated, it will become active with the next re-deployment of the workload. The setting has no effect on workloads in namespaces that use instrumentWorkloads.mode=none or do not have a Dash0Monitoring resource.

Python auto-instrumentation is only supported for Python 3.9 or later. If the Dash0 Python auto-instrumentation detects an incompatible Python version (i.e. version 3.8 or older), it will automatically deactivate itself safely and print a warning to stderr:

1
[dash0] warning: cannot auto-instrument Python process: unsupported Python version: 3.8.0

This warning is also visible in the Dash0 UI's log view, unless log collection has been disabled for the namespace. Update the Python version to enable automatic Python instrumentation by Dash0 for this workload.

Python auto-instrumentation only works if the configured OTLP export protocol is http/protobuf. If the operator is managing the container's OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_PROTOCOL variables, this will be set correctly automatically. If the Dash0 Python auto-instrumentation detects an incompatible OTEL_EXPORTER_OTLP_PROTOCOL setting, it will automatically deactivate itself safely and print a warning to stderr:

1
[dash0] warning: cannot auto-instrument Python process: OTEL_EXPORTER_OTLP_PROTOCOL=grpc is not supported

This can only happen if the container is setting its own OTEL_EXPORTER_OTLP_ENDPOINT and/or OTEL_EXPORTER_OTLP_PROTOCOL. Remove these environment variables from the pod spec template to enable automatic Python instrumentation by Dash0 for this workload.

Dash0's Python auto-instrumentation is not compatible with workloads that are already instrumented, either manually or using the zero-code instrumentation, e.g. the opentelemetry-instrument wrapper. If existing instrumentation is detected, the Dash0 Python auto-instrumentation will automatically deactivate itself safely and print a warning to stderr:

123
[dash0] warning: cannot auto-instrument Python process: The application has OpenTelemetry dependencies which indicate
that it is already instrumented. The following problematic dependencies have been found: ...
Skipping the Dash0 Python auto-instrumentation to avoid double instrumentation.

This warning is also visible in the Dash0 UI's log view, unless log collection has been disabled for the namespace. Remove the existing instrumentation from the workload to enable automatic Python instrumentation by Dash0 for this, or leave the existing instrumentation in place, in which case Dash0 will refrain from instrumenting it.

Last but not least, due to the nature of Python's dependency management, Python auto-instrumentation has the potential to introduce dependency conflicts. The Dash0 Python auto-instrumentation checks for potential dependency conflicts before actually instrumenting a process. If a dependency conflict is detected, the Dash0 Python auto-instrumentation will automatically deactivate itself safely and print a warning to stderr:

1
[dash0] warning: cannot auto-instrument Python process: dependency conflicts: {'package-name': {'version_required': '>=20.0', 'version_found': '19.0'}}

This warning is also visible in the Dash0 UI's log view, unless log collection has been disabled for the namespace. Resolve the version conflicts to enable automatic Python instrumentation by Dash0 for this workload, for example by updating the dependency versions used by the workload. If the conflicting dependencies cannot be resolved, you might need to instrument this workload individually, for example by using the OpenTelemetry Python zero-code instrumentation.

Using a Kubernetes Secret for the Dash0 Authorization Token

If you want to provide the Dash0 authorization token via a Kubernetes secret instead of providing the token as a string, create the secret in the namespace where the Dash0 operator is installed. This also applies when providing a per-namespace export and API-sync config via a monitoring resource, i.e. the operator will always try to look up the auth token secret in the operator namespace. If you followed the guide above, the name of that namespace is dash0-system. The authorization token for your Dash0 organization can be copied from https://app.dash0.com → organization settings → "Auth Tokens". You can freely choose the name of the secret and the key of the token within the secret.

Create the secret by using the following command:

console
1234
kubectl create secret generic \
dash0-authorization-secret \
--namespace dash0-system \
--from-literal=token=auth_...your-token-here...

With this example command, you would create a secret with the name dash0-authorization-secret in the namespace dash0-system. If you installed (or plan to install) the operator into a different namespace, replace the --namespace parameter accordingly.

The name of the secret as well as the key of the token value within the secret must be provided when referencing the secret during helm install, or in the YAML file for the Dash0 operator configuration resource (in the secretRef property).

For creating the operator configuration resource with helm install, the command would look like this, assuming the secret has been created as shown above:

console
12345678910
helm install \
--wait \
--namespace dash0-system \
--set operator.dash0Export.enabled=true \
--set operator.dash0Export.endpoint=REPLACE THIS WITH YOUR DASH0 INGRESS ENDPOINT \
--set operator.dash0Export.apiEndpoint=REPLACE THIS WITH YOUR DASH0 API ENDPOINT \
--set operator.dash0Export.secretRef.name=dash0-authorization-secret \
--set operator.dash0Export.secretRef.key=token \
dash0-operator \
dash0-operator/dash0-operator

If you do not want to install the operator configuration resource via helm install but instead deploy it manually, and use a secret reference for the auth token, the following example YAML file would work with the secret created above:

yaml
123456789101112131415
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- dash0:
endpoint: ingress... # TODO REPLACE THIS WITH YOUR DASH0 INGRESS ENDPOINT
authorization:
secretRef:
name: dash0-authorization-secret
key: token
apiEndpoint: https://api... # optional, see above

When deploying the operator configuration resource via kubectl, the following defaults apply:

  • If the name property is omitted, the name dash0-authorization-secret will be assumed.
  • If the key property is omitted, the key token will be assumed.

With these defaults in mind, the secretRef could have also been written as follows:

yaml
12345678910111213
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- dash0:
endpoint: ingress... # TODO needs to be replaced with the actual value, see above
authorization:
secretRef: {}
apiEndpoint: https://api... # optional, see above

Note: There are no defaults when using --set operator.dash0Export.secretRef.name and --set operator.dash0Export.secretRef.key with helm install, so for that approach the values must always be provided explicitly.

Note that by default, Kubernetes secrets are stored unencrypted, and anyone with API access to the Kubernetes cluster will be able to read the value. Additional steps are required to make sure secret values are encrypted, if that is desired. See https://kubernetes.io/docs/concepts/configuration/secret/ for more information on Kubernetes secrets.

Dash0 Dataset Configuration

Use the spec.exports[].dash0.dataset property to configure the dataset that should be used for the telemetry data. By default, data will be sent to the dataset default. Here is an example for a configuration that uses a different Dash0 dataset:

yaml
123456789101112131415
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- dash0:
endpoint: ingress... # see above
dataset: my-custom-dataset # This optional setting determines the Dash0 dataset to which telemetry will be sent.
authorization: # see above
...
apiEndpoint: https://api... # optional, see above

Configure Metrics Collection

By default, the operator collects metrics as follows:

  • The operator collects node, pod, container, and volume metrics from the API server on via the Kubelet Stats Receiver, cluster-level metrics from the Kubernetes API server via the Kubernetes Cluster Receiver, and system metrics from the underlying nodes via the Host Metrics Receiver. Collecting these metrics can be disabled per cluster by setting kubernetesInfrastructureMetricsCollection.enabled: false in the Dash0 operator configuration resource (or setting the value operator.kubernetesInfrastructureMetricsCollectionEnabled to false when deploying the operator configuration resource via the Helm chart).
  • Namespace-scoped metrics (e.g. metrics related to a workload running in a specific namespace) will only be collected if the namespace is monitored, that is, there is a Dash0 monitoring resource in that namespace.
  • The Dash0 operator scrapes Prometheus endpoints on pods annotated with the prometheus.io/* annotations in monitored namespaces, as described in the section Scraping Prometheus Endpoints. This can be disabled per namespace by explicitly setting prometheusScraping.enabled: false in the Dash0 monitoring resource.
  • Metrics which are not namespace-scoped (for example node metrics like k8s.node.* or host metrics like system.cpu.utilization) will always be collected, unless metrics collection is disabled globally for the cluster (kubernetesInfrastructureMetricsCollection.enabled: false, see above). An operator configuration resource with exports settings has to be present in the cluster, otherwise no metrics collection takes place.
  • Disabling or enabling individual metrics via configuration is not supported.
  • Changing the frequency of metrics collection is not supported.

Resource Attributes for Prometheus Scraping

When the operator scrapes Prometheus endpoints on pods, it does not have access to all the same metadata that is available to the OpenTelemetry SDK in an instrumented application. For that reason, resource attributes including the service name might be different. The operator makes an effort to derive reasonable resource attributes.

The service name is derived as follows:

  1. If the scraped service provides the target_info metric with a service_name attribute, that service name will be used. See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#resource-attributes
  2. If no service name was found via (1.), but the pod has the app.kubernetes.io/name label, the value of that label will be used as the service name. If the service name is derived from this pod label, the following pod labels (if present) will be mapped to resource attributes as well:
    • app.kubernetes.io/version to service.version
    • app.kubernetes.io/part_of to service.namespace
  3. If no service name was found via (1.) or (2.), no service name is set for the Prometheus metrics. If there is other telemetry (tracing, logs, OpenTelemetry metrics) for the same pod in Dash0, and these other signals carry a service name, the Prometheus metrics for this pod will be associated with that service name as well. This is actually the recommendation for handling Prometheus metrics for most users: If you do not have specific reasons to set the service name for Prometheus metrics, the best option is usually to not use the target_info metric or the app.kubernetes.io/name pod label, but let the Dash0 backend aggregate all telemetry into one service (to see everything in one place), with the service name taken from other signals than Prometheus metrics.

Note: In contrast to Resource Attributes for Workloads via Labels and Annotations, Prometheus scraping can only see pod labels, not workload level (deployment, daemonset, ...) labels.

Providing a Filelog Offset Volume

The operator's collector uses the filelog receiver to read pod log files for monitored workloads. When the collector is restarted (which can happen for various reasons, for example to apply configuration changes), it is important that the filelog receiver can continue reading the log files from where it left off. If the filelog receiver started to read all log files from the beginning again after a restart, log records would be duplicated, that is, they would appear multiple times in Dash0. For that purpose, the filelog receiver stores the log file offsets in persistent storage. By default, the offsets are stored in a config map in the operator's namespace. For small- to medium-sized clusters, this is usually sufficient, and it requires no additional configuration by users. For larger clusters or clusters with many short-lived pods, we recommend providing a persistent volume for storing offsets.

Any persistent volume that is accessible from the collector pods can be used for this purpose.

Here is an example with a hostPath volume (see also https://kubernetes.io/docs/concepts/storage/volumes/#hostpath for considerations around using hostPath volumes):

yaml
1234567
operator:
collectors:
filelogOffsetSyncStorageVolume:
name: filelogreceiver-offsets
hostPath:
path: /data/dash0-operator/offset-storage
type: DirectoryOrCreate

The directory in the hostPath volume will automatically be created with the correct permissions so that the OpenTelemetry collector container can write to it.

Here is another example based on persistent volume claims. (This assumes that a PersistentVolumeClaim named offset-storage-claim exists.) See also https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims and https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming.

yaml
123456
operator:
collectors:
filelogOffsetSyncStorageVolume:
name: filelogreceiver-offsets
persistentVolumeClaim:
claimName: offset-storage-claim

Important: Since this volume is needed by a Daemonset run by the Dash0 operator, the PersistentVolumeClaim needs to be set with the ReadWriteMany access mode.

Using a volume instead of the default config map approach is also helpful if you have webhooks in your cluster which process every config map update.

Known examples for this are:

  • The Open Police Agent, in particular the OPA gatekeeeper. When operating the OPA gatekeeper in the same cluster as the Dash0 operator, it is highly recommended to use a volume for filelog offsets. Using the default config map filelog offset storage in clusters with the OPA gatekeeper can lead to severe performance issues, since the default config map for filelog offsets is updated very frequently. This can cause the OPA gatekeeper to consume a lot of CPU and memory resources, potentially even leading to OOMKills of the OPA gatekeeper.
  • AKS clusters with the Azure Policy add-on: A managed instance of the OPA gatekeeper (see above) is installed by the Azure Policy add-on for AKS, i.e. the OPA gatekeeper is active in AKS cluster that have enabled the Azure Policy add-on. It is highly recommended to use a volume for filelog offsets for AKS clusters with the Azure Policy add-on.
  • The Kyverno admission controller. When operating Kyverno in the same cluster as the Dash0 operator, it is highly recommended to either use a volume for filelog offsets, or to exclude ConfigMaps (or all resource types) in the Dash0 operator's namespace from Kyverno's processing. Leaving Kyverno processing in place and using the config map filelog offset storage can lead to severe performance issues, since the default config map for filelog offsets is updated very frequently. This can cause Kyverno to consume a lot of CPU and memory resources, potentially even leading to OOMKills of the Kyverno admission controller.

Using cert-manager

When installing the Helm chart, it generates TLS certificates on the fly for all components that need certificates (the operator's webhook service and its metrics service). This also happens when updating the operator, e.g. via helm upgrade. This is the default behavior, and it works out of the box without the need to manage certificates with a third-party solution.

As an alternative, you can use cert-manager to manage the certificates. To let cert-manager handle TLS certificates, provide the following additional settings when applying the Helm chart:

yaml
1234567891011121314151617181920212223242526272829303132333435
operator:
# Settings for using cert-manager instead of auto-generating TLS certificates.
certManager:
# This disables the usage of automatically generated certificates by the Helm chart.
# If this is set to true, the assumption is that certificates are managed by
# cert-manager (see https://cert-manager.io/), and the other settings shown in this
# snippet (certManager.secretName, certManager.certManagerAnnotations) become
# required settings. It is recommended to set webhookService.name as well.
useCertManager: true
# The name of the secret used by cert-manager for the certificate.
# The provided name must match the `secretName` in the `cert-manager.io/v1.Certificate`
# resource's spec (see below).
# Note: This secret is created and managed by cert-manager, you do not need to create
# it manually.
secretName: dash0-operator-certificate-secret
# A map of additional annotations that are added to all Kubernetes resources that need
# the certificate.
# Usually this will be a single `cert-manager.io/inject-ca-from` annotation with the
# namespace and name of the Certificate resource (see below), but other annotations
certManagerAnnotations:
cert-manager.io/inject-ca-from: "dash0-system/dash0-operator-serving-certificate"
webhookService:
# A name override for the webhook service, defaults to dash0-operator-webhook-service.
# The name of the webhook service must match the DNS names provided in the
# cert-manager's Certificate resource.
# For that reason it is recommended to set this value when using cert-manager, as it
# guarantees that the names match, even if the Dash0 operator helm chart would change
# the default name of the webhook service in a future release.
name: dash0-operator-webhook-service-name

You will also need to provide cert-manager resources, for example a Certificate and an Issuer. Explaining all configuration options of cert-manager is out of scope for this documentation. Please refer to the cert-manager documentation for details on how to install and configure cert-manager in your cluster.

The following annotated example is a minimal configuration that matches the Dash0 operator configuration snippet shown above. Both configuration snippets assume that the Dash0 operator is installed in the default dash0-system namespace, and the Certificate and Issuer resource are also deployed into that namespace.

yaml
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
# NOTE: This value, together with the namespace this resource is deployed into, must
# match the value of the cert-manager.io/inject-ca-from annotation provided in the
# certManager.certManagerAnnotations setting, i.e. this matches the following annotation:
# cert-manager.io/inject-ca-from: "dash0-system/dash0-operator-serving-certificate"
name: dash0-operator-serving-certificate
labels:
app.kubernetes.io/name: certificate
app.kubernetes.io/instance: serving-cert
app.kubernetes.io/component: certificate
spec:
# NOTE: Provide all DNS names that are used to access the webhook service.
# The service names usually follow the patterns <service-name>.<namespace>.svc and
# <service-name>.<namespace>.svc.cluster.local.
# If you use a custom name for the Dash0 operator's webhook service (recommended), the
# first part of the DNS names must match that name.
# If you use the default name for the webhook service, the first part of the DNS names
# must match the default name (dash0-operator-webhook-service).
# If the Dash0 operator is installed into a different namespace, you need to change the
# ".dash0-system." part of the DNS names accordingly.
dnsNames:
- dash0-operator-webhook-service-name.dash0-system.svc
- dash0-operator-webhook-service-name.dash0-system.svc.cluster.local
issuerRef:
kind: Issuer
# NOTE: Must match the name of the Issuer resource provided below.
name: dash0-operator-selfsigned-issuer
# NOTE: The secret name must match the operator.certManager.secretName setting above
# provided to the Dash0 operator Helm chart. This secret is created and managed by
# cert-manager, you do not need to create it manually.
secretName: dash0-operator-certificate-secret
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
# NOTE: This issuer's name must match the issuerRef.name in the Certificate resource above.
name: dash0-operator-selfsigned-issuer
labels:
app.kubernetes.io/name: certificate
app.kubernetes.io/instance: serving-cert
app.kubernetes.io/component: certificate
spec:
# There are many options for configuring an Issuer, this example uses a simple self-signed
# issuer. Make sure to configure the issuer according to your requirements.
selfSigned: {}

Controlling On Which Nodes the Operator's Collector Pods Are Scheduled

Allow Scheduling on Tainted Nodes

The operator uses a Kubernetes daemonset to deploy the OpenTelemetry collector on each node; to collect telemetry from that node and workloads running on that node. If you use taints on certain nodes, Kubernetes will not schedule any pods there, preventing the daemonset collector pods to be present on these nodes. You can allow the daemonset collector pods to be scheduled there by configuring tolerations matching your taints for the collector pods. Tolerations can be configured as follows:

yaml
12345678910
operator:
collectors:
daemonSetTolerations:
- key: key1
operator: Equal
value: value1
effect: NoSchedule
- key: key2
operator: Exists
effect: NoSchedule

In the same fashion, tolerations can also be configured for the Dash0 operator manager (Helm value operator.tolerations), the OpenTelemetry collector deployment for collecting cluster metrics (Helm value operator.collectors.deploymentTolerations) and the OpenTelemetry target-allocator deployment (Helm value operator.targetAllocator.tolerations).

Changing Helm settings while the operator is already running requires a helm upgrade/helm upgrade --reuse-values or similar to take effect.

Preventing Operator Scheduling on Specific Nodes

All the pods deployed by the operator have a default node anti-affinity for the dash0.com/enable=false node label. That is, if you add the dash0.com/enable=false label to a node, none of the pods owned by the operator will be scheduled on that node.

IMPORTANT: This includes the daemonset that the operator will set up to receive telemetry from the pods, which might lead to situations in which instrumented pods cannot send telemetry because the local node does not have a daemonset collector pod. In other words, if you want to monitor workloads with the Dash0 operator and use the dash0.com/enable=false node anti-affinity, make sure that the workloads you want to monitor have the same anti-affinity:

yaml
12345678910
# Add this to your workloads
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "dash0.com/enable"
operator: "NotIn"
values: ["false"]

Custom Node Affinity

The node affinity for all pods deployed by the operator can be customized by setting the nodeAffinity field for the respective component.

yaml
123456789101112131415
operator:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "a-custom-label"
operator: "In"
values: ["custom_value"]
collectors:
daemonSetNodeAffinity: <custom_node_affinity>
deploymentNodeAffinity: <custom_node_affinity>
targetAllocator:
nodeAffinity: <custom_node_affinity>

Disabling Auto-Instrumentation for Specific Workloads

In namespaces that are Dash0-monitoring enabled, all workloads are automatically instrumented for tracing and to improve OpenTelemetry resource attributes. This process will modify the Pod spec, e.g. by adding environment variables, Kubernetes labels and an init container. The modifications are described in detail in the section Automatic Workload Instrumentation.

You can disable these workload modifications for specific workloads by setting the label dash0.com/enable: "false" in the top level metadata section of the workload specification.

Note: The actual label selector for enabling or disabling workload modification can be customized in the Dash0 monitoring resource. The label dash0.com/enable: "false" can be used when no custom label selector has been configured in the Dash0 monitoring resource, see Using a Custom Label Selector to Control Auto-Instrumentation.

Here is an example for a deployment with this label:

yaml
1234567891011121314151617181920
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
labels:
app: my-deployment-app
dash0.com/enable: "false"
spec:
replicas: 1
selector:
matchLabels:
app: my-deployment-app
template:
metadata:
labels:
app: my-deployment-app
spec:
containers:
- name: my-deployment-app
image: "some-image:latest"

The label can also be applied by using kubectl:

console
1
kubectl label --namespace $YOUR_NAMESPACE --overwrite deployment $YOUR_DEPLOYMENT_NAME dash0.com/enable=false

Note that setting dash0.com/enable: "false" will not prevent log collection for pods of workloads with that label, in namespaces that have Dash0 log collection enabled. Also, log collection is enabled by default for all monitored namespaces, unless spec.logCollection.enabled has been set to false explicitly in the respective Dash0 monitoring resource. Controlling log collection for individual workloads via Kubernetes labels is not supported. To disable log collection for specific workloads in namespaces where log collection is enabled, you can add a filter rule to the monitoring resource. Here is an example:

yaml
123456789
apiVersion: operator.dash0.com/v1beta1
kind: Dash0Monitoring
metadata:
name: dash0-monitoring-resource
spec:
filter:
logs:
log_records:
- 'IsMatch(resource.attributes["k8s.pod.name"], "my-workload-name-*")'

Using a Custom Label Selector to Control Auto-Instrumentation

By providing a custom Kubernetes label selector in spec.instrumentWorkloads.labelSelector in a Dash0 monitoring resource, you can control which workloads in this namespace will be instrumented by the Dash0 operator.

  • Workloads which match this label selector will be instrumented, subject to the value of spec.instrumentWorkloads.mode.
  • Workloads which do not match this label selector will never be instrumented, regardless of the value of spec.instrumentWorkloads.mode.
  • The setting spec.instrumentWorkloads.labelSelector setting is ignored if spec.instrumentWorkloads.mode=none.

If not set explicitly, this label selector assumes the value "dash0.com/enable!=false" by default. That is, when no explicit label selector is provided via spec.instrumentWorkloads.labelSelector, workloads which:

  • do not have the label dash0.com/enable at all, or
  • have the label dash0.com/enable with a value other than "false"

will be instrumented, as explained in the previous section.

It is recommended to leave this setting unset (i.e. leave the default "dash0.com/enable!=false" in place), unless you have a specific use case that requires a different label selector.

One such use case is implementing an opt-in model for workload instrumentation instead of the usual opt-out model. That is, instead of instrumenting all workloads in a namespace by default and only disabling instrumentation for a few specific workloads, you want to deliberately turn on auto-instrumentation for a few specific workloads and leave all others uninstrumented. Use a label selector with equals (=) instead of not-equals (!=) to achieve this, for example auto-instrument-this-workload-with-dash0="true".

Note: Opting out of auto-instrumentation and workload modification via a label/label selector will not prevent log collection for pods in namespaces that have Dash0 log collection enabled, see previous section for details.

Specifying Additional Resource Attributes via Labels and Annotations

Note: The labels and annotations listed in this section can be specified at the pod level, or at the workload level (i.e., the cronjob, deployment, daemonset, job, replicaset, or statefulset). Pod labels and annotations take precedence over workload labels and annotations.

The following standard Kubernetes labels are mapped to resource attributes as follows:

  • The label app.kubernetes.io/name is mapped to service.name.
  • If app.kubernetes.io/name is set, and the label app.kubernetes.io/version is also set, it is mapped to service.version.
  • If app.kubernetes.io/name is set, and the label app.kubernetes.io/part-of is also set, it is mapped to service.namespace.

The operator will not combine pod labels with workload labels for this mapping. The labels app.kubernetes.io/version and app.kubernetes.io/part-of are only read from the pod labels if app.kubernetes.io/name is present on the pod. Similarly, the labels app.kubernetes.io/version and app.kubernetes.io/part-of are only read from the workload labels if app.kubernetes.io/name is present on the workload. Workload labels are not considered at all if app.kubernetes.io/name is present on the pod. This ensures that resource attributes are not partially based on pod and partially on workload labels, giving an inconsistent result.

Note: The OTEL_SERVICE_NAME environment variable and service.* key-value pairs specified in the OTEL_RESOURCE_ATTRIBUTES environment variable have precedence over attributes derived from the app.kubernetes.io/* labels.

Any annotation in the form of resource.opentelemetry.io/<key>: <value> is also mapped to the resource attribute <key>=<value>. For example, the following results in the my.attribute=my-value resource attribute:

yaml
12345
apiVersion: v1
kind: Pod
metadata:
annotations:
resource.opentelemetry.io/my.attribute: my-value

As with the app.kubernetes.io/* labels, the resource.opentelemetry.io/* annotations can be set on the pod as well as on the workload. In contrast to the app.kubernetes.io/* labels, mixing workload level and pod annotations is allowed, that is, you can set resource.opentelemetry.io/attribute-one on the workload and resource.opentelemetry.io/attribute-two on the pod, and both will be used. In case the same key is listed both on the workload and on the pod, the pod annotation takes precedence.

Key-value pairs with a specific key set via the OTEL_RESOURCE_ATTRIBUTES environment variable will override the value derived from a resource.opentelemetry.io/<key>: <value> annotation. Resource attributes set via the resource.opentelemetry.io/<key>: <value> annotations will override the resource attributes value set via app.kubernetes.io/* labels: for example, resource.opentelemetry.io/service.name has precedence over app.kubernetes.io/name.

Sending Data to the OpenTelemetry Collectors Managed by the Dash0 Operator

Besides automatic workload instrumentation (which will make sure that the instrumented workloads send telemetry to the OpenTelemetry collectors managed by the operator), you can also send telemetry data from workloads that are not instrumented by the operator.

To do so, you need to add an OpenTelemetry SDK to your workload.

If the workload is in a namespace that is monitored by Dash0, the OpenTelemetry SDK will automatically be configured to send telemetry to the OpenTelemetry collectors managed by the Dash0 operator. This is because the operator automatically sets OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_PROTOCOL to the correct values when applying the automatic workload instrumentation.

If the workload is in a namespace that is not monitored by Dash0 (or if spec.instrumentWorkloads.mode is set to none in the respective Dash0 monitoring resource, or if the workload has opted out of auto-instrumentation via a label, you need to set the environment variable OTEL_EXPORTER_OTLP_ENDPOINT (and optionally also OTEL_EXPORTER_OTLP_PROTOCOL) yourself.

The DaemonSet OpenTelemetry collector managed by the Dash0 operator listens on host port 40318 for HTTP traffic and 40317 for gRPC traffic (unless the Helm chart has been deployed with operator.collectors.disableHostPorts=true, which disables the host ports for the collector pods). A service for the DaemonSet collector which listens on the standard ports (4318 for HTTP and 4317 for gRPC) is also available.

The preferred way of sending OTLP from your workload to the Dash0-managed collector is to use node-local traffic via the host port. To do so, add the following environment variables to your workload:

yaml
123456789
env:
- name: K8S_NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(K8S_NODE_IP):40318"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"

Notes:

  • Listing the definition for K8S_NODE_IP before OTEL_EXPORTER_OTLP_ENDPOINT is crucial.
  • Adding OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf is optional when the OpenTelemetry SDK in question uses that protocol as the default.
  • For gRCP, use OTEL_EXPORTER_OTLP_ENDPOINT=http://$(K8S_NODE_IP):40317 together with OTEL_EXPORTER_OTLP_PROTOCOL=grpc instead.

To use the service endpoint instead of the host port, you need to know:

  • the Helm release name of the Dash0 operator (for example dash0-operator), and
  • the namespace where the Dash0 operator is installed (for example dash0-system).

With that information at hand, add the following to your workload:

yaml
12345
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://${helm-release-name}-opentelemetry-collector-service.${namespace-of-the-dash0-operator}.svc.cluster.local:4318"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"

or, for gRPC:

yaml
12345
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://${helm-release-name}-opentelemetry-collector-service.${namespace-of-the-dash0-operator}.svc.cluster.local:4317"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"

In both cases, ${helm-release-name} and ${namespace-of-the-dash0-operator} needs to be replaced with the actual values.

If the workload is in a namespace that is monitored by Dash0 and workload instrumentation is enabled, Dash0 will automatically add Kubernetes-related OpenTelemetry resource attributes to your telemetry, even if the runtime in question is not yet supported by Dash0's auto-instrumentation. There is currently one caveat: The resource attribute auto-detection relies on the process or runtime in question to use dynamic linking at startup (that is, binding to a flavor of libc), which is true for almost all runtimes. One notable exception are so called freestanding a.k.a. libc-free binaries, for example most binaries built with Go.

Profiling

The Dash0 operator can be configured to accept, process, and export profiling data via OTLP.

Note: The Dash0 Operator does not currently support collecting profiles, see the Collecting Profiles with the OpenTelemetry eBPF Profiler section.

To enable profiling support, set the operator.profilingEnabled Helm value to true:

console
12345
helm install \
--set operator.profilingEnabled=true \
... \
dash0-operator \
dash0-operator/dash0-operator

Alternatively, you can set spec.profiling.enabled: true directly on the Dash0OperatorConfiguration custom resource:

yaml
12345678
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration-resource
spec:
profiling:
enabled: true
# ... other settings

When profiling is enabled, you can use the same spec.filter and spec.transform settings on your Dash0Monitoring resources to filter and transform profiling data, just like for traces, metrics, and logs. See the spec.filter and spec.transform sections for details.

Collecting Profiles with the OpenTelemetry eBPF Profiler

Since the standard OpenTelemetry auto-instrumentation agents do not yet emit OTLP profiles, a separate profiling agent is needed to generate profiling data, like the OpenTelemetry eBPF profiler.

The eBPF profiler is distributed as a specialized OpenTelemetry Collector (otelcol-ebpf-profiler) that includes a profiling receiver. It runs as a privileged DaemonSet with host PID access, since it relies on eBPF to collect stack traces from all processes on the node.

Below is an example of deploying the eBPF profiler to send profiles to the Dash0 operator's collector:

yaml
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
apiVersion: v1
kind: ConfigMap
metadata:
name: ebpf-profiler-config
namespace: dash0-system # same namespace as the operator
data:
config.yaml: |
receivers:
profiling:
exporters:
otlp/collector:
endpoint: ${helm-release-name}-opentelemetry-collector-service.${namespace-of-the-dash0-operator}.svc.cluster.local:4317
# The operator's OTLP receiver listens on plain-text gRPC (no TLS), so the exporter must
# be configured accordingly. Without this, the gRPC exporter defaults to requiring TLS.
tls:
insecure: true
service:
telemetry:
logs:
level: info
pipelines:
profiles:
receivers: [profiling]
exporters: [otlp/collector]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ebpf-profiler
namespace: dash0-system # same namespace as the operator
labels:
app: ebpf-profiler
spec:
selector:
matchLabels:
app: ebpf-profiler
template:
metadata:
labels:
app: ebpf-profiler
spec:
hostPID: true
containers:
- name: ebpf-profiler
image: otel/opentelemetry-collector-ebpf-profiler:0.148.0
args:
- --config=file:/etc/otelcol/config.yaml
- --feature-gates=service.profilesSupport
securityContext:
privileged: true
volumeMounts:
- name: config
mountPath: /etc/otelcol
- name: proc
mountPath: /proc
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
volumes:
- name: config
configMap:
name: ebpf-profiler-config
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys

Replace ${helm-release-name} and ${namespace-of-the-dash0-operator} with the actual Helm release name and namespace.

The eBPF profiler requires:

  • Linux kernel 5.4 or later.
  • The container must run as privileged (or with CAP_SYS_ADMIN, CAP_PERFMON, and CAP_BPF capabilities).
  • hostPID: true to observe processes running on the node.
  • Access to /proc and /sys from the host.

Once both profiling support in the operator and the eBPF profiler DaemonSet are deployed, profiling data will flow from the eBPF profiler through the operator's collector pipelines (including processors like k8s_attributes for Kubernetes metadata enrichment) and on to the configured backend.

Exporting Data to Other Observability Backends

Instead of spec.exports[].dash0 in the Dash0 operator configuration resource, you can also provide spec.exports[].http or spec.exports[].grpc to export telemetry data to arbitrary OTLP-compatible backends, or to another local OpenTelemetry collector.

Here is an example for HTTP:

yaml
123456789101112
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- http:
endpoint: ... # provide the OTLP HTTP endpoint of your observability backend here
headers: # you can optionally provide additional headers, for example for authorization
- name: X-My-Header
value: my-value
encoding: json # optional, can be "json" or "proto", defaults to "proto"

Here is an example for gRPC:

yaml
1234567891011
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- grpc:
endpoint: ... # provide the OTLP gRPC endpoint of your observability backend here
headers: # you can optionally provide additional headers, for example for authorization
- name: X-My-Header
value: my-value

Export to multiple backends is also supported. The supplied backends can be either of the same type or of different types. In the following example the telemetry would be sent to two different datasets in Dash0 and in addition to a gRPC endpoint:

yaml
123456789101112131415161718
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- dash0:
dataset: dataset-one
endpoint: ingress... # TODO needs to be replaced with the actual value, see above
authorization:
token: auth_... # TODO needs to be replaced with the actual value, see above
- dash0:
dataset: dataset-two
endpoint: ingress... # TODO needs to be replaced with the actual value, see above
authorization:
token: auth_... # TODO needs to be replaced with the actual value, see above
- grpc:
endpoint: ... # provide the OTLP gRPC endpoint of your observability backend here

Note regarding TLS when using arbitrary OTLP-compatible backends

gRPC
  • By default, a secure connection is assumed, unless explicitly setting insecure: true, or when the insecure field is omitted and the endpoint URL starts with http://
  • When using TLS, you can set insecureSkipVerify: true to disable the verification of the server's certificate chain, which can be useful when using self-signed certificates.

Here's an example using insecureSkipVerify:

yaml
123456789
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration
spec:
exports:
- grpc:
endpoint: ... # provide the secure OTLP gRPC endpoint of your observability backend here
insecureSkipVerify: true # disables the verification of the server's certificate chain

Please note that it is a validation error to set both insecure and insecureSkipVerify explicitly to true at the same time, since insecureSkipVerify is only applicable when using TLS.

HTTP
  • For HTTP, the connection security is automatically detected based on whether the endpoint URL starts with http:// or https://
  • When using TLS, you can set insecureSkipVerify: true to disable the verification of the server's certificate chain, which can be useful when using self-signed certificates.

Disable Self-Monitoring

By default, self-monitoring is enabled for the Dash0 operator as soon as you deploy a Dash0 operator configuration resource with exports. That means, the operator will send self-monitoring telemetry to the configured Dash0 backend. Disabling self-monitoring is available as a setting on the Dash0 operator configuration resource. Dash0 does not recommend to disable the operator's self-monitoring.

Here is an example with self-monitoring disabled:

yaml
123456789
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0OperatorConfiguration
metadata:
name: dash0-operator-configuration-resource
spec:
selfMonitoring:
enabled: false
exports:
- # ... see above for details on the exports settings

Disable Dash0 Monitoring For a Namespace

If you want to stop monitoring a namespace with Dash0, remove the Dash0 monitoring resource from that namespace. For example, if you want to stop monitoring workloads in the namespace my-nodejs-applications, use the following command:

console
1
kubectl delete --namespace my-nodejs-applications Dash0Monitoring dash0-monitoring-resource

or, alternatively, by using the dash0-monitoring.yaml file created earlier:

console
1
kubectl delete --namespace my-nodejs-applications -f dash0-monitoring.yaml

Upgrading

To upgrade the Dash0 operator to a newer version, run the following commands:

console
12
helm repo update dash0-operator
helm upgrade --wait --namespace dash0-system dash0-operator dash0-operator/dash0-operator

CRD Version Upgrades

Occasionally, the custom resource definitions (CRDs) used by the Dash0 operator (Dash0OperatorConfiguration, Dash0Monitoring) will be updated to new versions. Whenever possible, this will happen in a way that requires no manual intervention by users. This section contains details about CRD version updates and version migrations.

Operator Version 0.71.0: operator.dash0.com/v1alpha1/Dash0Monitoring -> operator.dash0.com/v1beta1/Dash0Monitoring

With operator version 0.71.0, the Dash0 operator's Dash0Monitoring custom resource definition (CRD) is upgraded from version v1alpha1 to v1beta1. The operator handles both versions correctly, that is version v1alpha1 to v1beta1 are both fully supported. Here is what you need to know about this version update for Dash0Monitoring:

  • If you have existing Dash0Monitoring resources in version v1alpha1 in your cluster, they will be automatically converted on the fly, for example when the Dash0 operator reads the v1alpha1 resource version. At some point Kubernetes might also convert the resource permanently and store it in version v1beta1.
  • After the upgrade to version 0.71.0, you can still deploy new Dash0Monitoring resource in version v1alpha1 (for example via kubectl apply). They will be automatically converted and stored as v1beta1 resources by Kubernetes.
  • If you want to migrate a Dash0Monitoring template (e.g. a yaml file) from version v1alpha1 to v1beta1, follow these steps:
    • If the template specifies the workload instrumentation mode via spec.instrumentWorkloads, replace that with spec.instrumentWorkloads.mode. That is:
      yaml
      12
      spec:
      instrumentWorkloads: created-and-updated
      becomes
      yaml
      123
      spec:
      instrumentWorkloads:
      mode: created-and-updated
      If the template does not specify the workload instrumentation mode explicitly (that is, it relies on using the default instrumentation mode), no change is necessary here.
    • If the template contains the attribute spec.prometheusScrapingEnabled, replace that with spec.prometheusScraping.enabled. That is:
      yaml
      12
      spec:
      prometheusScrapingEnabled: true
      becomes
      yaml
      123
      spec:
      prometheusScraping:
      enabled: true
      The attribute spec.prometheusScraping.enabled is also already valid for v1alpha1, so this particular change can be applied independently of the CRD version change. If the template does not specify whether prometheusScraping is enabled or not (that is, it relies on using the default value), no change is necessary here.
  • We recommend to update your templates from v1alpha1 to v1beta1 at some point. However, there are currently no plans to remove support for version v1alpha1.
  • If you want to use the new trace context propagators option that has been added in version 0.71.0, you need to use version v1beta1 of the Dash0Monitoring resource. This includes updating your Yaml templates to that version, as described above.
  • After upgrading to operator version 0.71.0 or later, you can no longer easily downgrade to a version before 0.71.0. In particular, this downgrade would require to manually delete all Dash0Monitoring resources in the cluster. The reason is that the Dash0Monitoring resources are now stored as version v1beta1 by Kubernetes and there is no automatic downward conversion from v1beta1 back to v1alpha1.
  • The only supported version for Dash0OperatorConfiguration is still v1alpha1, that is, trying to use operator.dash0.com/v1beta1/Dash0OperatorConfiguration will not work.

Uninstallation

To remove the Dash0 operator from your cluster, run the following command:

console
1
helm uninstall dash0-operator --namespace dash0-system

Depending on the command you used to install the operator, you may need to use a different Helm release name or namespace.

This will also automatically disable Dash0 monitoring for all namespaces by deleting the Dash0 monitoring resources in all namespaces. All workload modifications applied by the Dash0 operator will be reverted. This will restart the pods of all workloads that were previously instrumented by the Dash0 operator.

Optionally, after helm uninstall has finished, remove the namespace that has been created for the operator:

console
1
kubectl delete namespace dash0-system

If you choose to not remove the namespace, you might want to consider removing the secret with the Dash0 authorization token (if such a secret has been created):

console
1
kubectl delete secret --namespace dash0-system dash0-authorization-secret

If you later decide to install the operator again, you will need to perform the initial configuration steps again:

  1. Set up a Dash0 backend connection, and
  2. Enable Dash0 monitoring in each namespace you want to monitor, see Enable Dash0 Monitoring For a Namespace.

Unsupported Uninstallation Procedures

  • Do not delete the Dash0 operator controller deployment manually, always use helm uninstall to remove the operator.
  • Do not delete the Dash0 operator's namespace before running helm uninstall (this would also implicitly delete the operator deployment).

Deleting the Dash0 operator deployment without running helm uninstall will lead to an inconsistent state. In particular, the operator's admission webhooks are still registered, but the service that responds to the webhook requests has been removed, so all webhook requests will time out. This will make requests to delete Dash0 monitoring resources fail. In addition, the service that is responsible for removing the finalizer from the Dash0 monitoring resources is no longer there. In turn, this will make it harder to delete namespaces with a Dash0 monitoring resource, the namespace will get stuck in the "Terminating" state, due to the finalizer in the monitoring resource no longer being handled correctly.

To rectify this, follow these steps:

  1. Delete all Dash0 validating/mutating webhook configs manually (exact command depends on the Helm release name):
    console
    12
    kubectl delete validatingwebhookconfiguration dash0-operator-monitoring-validator dash0-operator-operator-configuration-validator
    kubectl delete mutatingwebhookconfiguration dash0-operator-injector dash0-operator-monitoring-mutating dash0-operator-operator-configuration-mutating
  2. Remove the finalizer from all Dash0 monitoring resources:
    console
    1
    kubectl patch dash0monitorings <name> -n <namespace> --type=json -p='[{"op":"remove","path":"/metadata/finalizers"}]'
  3. Delete the Dash0 monitoring resources:
    console
    1
    kubectl delete dash0monitorings <name> -n <namespace>

Automatic Workload Instrumentation

In namespaces that are enabled for Dash0 monitoring, all supported workload types are automatically instrumented by the Dash0 operator, to achieve two goals:

  1. Enable tracing for supported runtimes out of the box, and
  2. Improve auto-detection of OpenTelemetry resource attributes.

This allows Dash0 users to avoid the hassle of manually adding the OpenTelemetry SDK to their applications, or to set Kubernetes-related resource attributes manually. Dash0 simply takes care of it automatically!

Automatic tracing only works for supported runtimes. For other runtimes, you can add an OpenTelemetry SDK to your workloads by other means.

Auto-detecting OpenTelemetry resource attributes works for all runtimes, that is, for runtimes that are supported by Dash0's auto-instrumentation as well as for workloads to which an OpenTelemetry SDK has been added otherwise. There is currently one caveat: The resource attribute auto-detection relies on the process or runtime in question to use dynamic linking at startup (that is, binding to a flavor of libc), which is true for almost all runtimes. One notable exception are so called freestanding a.k.a. libc-free binaries, for example most binaries built with Go.

The Dash0 operator will instrument the following workload types:

Note that Kubernetes jobs and Kubernetes pods are only instrumented at deploy time, existing jobs and pods cannot be instrumented since there is no way to restart them. For all other workload types, the operator can instrument existing workloads as well as new workloads at deploy time (depending on the setting of spec.instrumentWorkloads.mode in the Dash0 monitoring resource).

The instrumentation process is performed by modifying the Pod spec template (for CronJobs, DaemonSets, Deployments, Jobs, ReplicaSets, and StatefulSets) or the Pod spec itself (for standalone Pods).

The modifications that are performed for workloads are the following, depending on the chosen instrumentation delivery mechanism. With instrumentation delivery image-volume:

  • Add an image volume named dash0-instrumentation to the pod spec, which contains the instrumentation files With instrumentation delivery init-container:
  • Add an emptyDir volume named dash0-instrumentation to the pod spec
  • Add an init container named dash0-instrumentation that will copy the OpenTelemetry SDKs and distributions for supported runtimes to the dash0-instrumentation volume mount, so they are available in the target container's file system
  • Add the cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes=dash0-instrumentation annotation to the pod spec Regardless of the instrumentation delivery:
  • Add a volume mount dash0-instrumentation to all containers of the pod
  • Add environment variables (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, LD_PRELOAD, and several Dash0-specific variables prefixed with DASH0_) to all containers of the pod
  • Add the OpenTelemetry injector (see below for details) as a startup hook (via the LD_PRELOAD environment variable) to all containers of the pod
  • Add the following labels to the workload metadata:
    • dash0.com/instrumented: true or false depending on whether the workload has been successfully instrumented or not
    • dash0.com/operator-image: the fully qualified name of the Dash0 operator image that has instrumented this workload
    • dash0.com/instrumentation-image: the fully qualified name of the image that has been used to deliver instrumentation files to the workload
    • dash0.com/instrumented-by: either controller or webhook, depending on which component has instrumented this workload. The controller is responsible for instrumenting existing workloads while the webhook is responsible for instrumenting new workloads at deploy time.
  • Add the following annotations to the workload metadata:
    • dash0.com/instrumented-by: either controller or webhook, depending on which component has instrumented this workload. The controller is responsible for instrumenting existing workloads while the webhook is responsible for instrumenting new workloads at deploy time.

Notes:

  • Automatic tracing will only happen for supported runtimes. Nonetheless, the modifications outlined above are performed for every workload. One reason for that is that there is no way to tell which runtime a workload uses from the outside, e.g. on the Kubernetes level. The more important reason is that runtimes that are not (yet) supported for auto-instrumentation still benefit from the improved OpenTelemetry resource attribute detection.
  • The operator will add neither OTEL_EXPORTER_OTLP_ENDPOINT nor OTEL_EXPORTER_OTLP_PROTOCOL to containers that already have at least one of those environment variables set. A Kubernetes event of type Warning is created for workloads with affected containers.
  • The operator sets OTEL_EXPORTER_OTLP_ENDPOINT=http://$(NODE_IP):40318, that is, it tells the workload to send OTLP traffic to the HTTP port of the OpenTelemetry collector pod on the same host, which belongs to the OpenTelemetry collector DaemonSet managed by the operator. It also sets the protocol accordingly by setting OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf. The protocol http/protobuf is the recommended default according to the OpenTelemetry specification, and it is widely supported. It does so under the assumption that workloads which have an OpenTelemetry SDK use an SDK that respects OTEL_EXPORTER_OTLP_PROTOCOL and also has support for the http/protobuf protocol. For workloads that have an OpenTelemetry SDK that either does not respect OTEL_EXPORTER_OTLP_PROTOCOL (and defaults to grpc) or does not have support for http/protobuf, this will lead to the SDK trying to establish a gRPC connection to the collector's HTTP endpoint, that is, the SDK will not be able to emit telemetry. SDKs without support for http/protobuf are rather rare, but one prominent example is the Kubernetes ingress-nginx. The recommended approach is to either set OTEL_EXPORTER_OTLP_ENDPOINT manually to the gRPC port or to disable workload instrumentation by the Dash0 operator for these workloads. To set OTEL_EXPORTER_OTLP_ENDPOINT manually, you can add the following entries to the container's env section:
    123456
    - name: MY_NODE_IP
    valueFrom:
    fieldRef:
    fieldPath: status.hostIP
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: http://$(MY_NODE_IP):40317
    To disable workload instrumentation for a workload, you can opt out of auto-instrumentation via a workload label (i.e. dash0.com/enable: "false", see Disabling Auto-Instrumentation for Specific Workloads), or by not installing a Dash0 monitoring resource in the namespace where these workloads are located. The workloads can then be monitored by following the setup described in Sending Data to the OpenTelemetry Collectors Managed by the Dash0 Operator to have the workload send telemetry to the collectors managed by the Dash0 operator, using gRPC. Note that this is not relevant for workloads that do not have an OpenTelemetry SDK at all, since they will ignore OTEL_EXPORTER_OTLP_ENDPOINT. In case the Dash0 operator Helm chart has been deployed with operator.collectors.forceUseServiceUrl=true or operator.collectors.disableHostPorts=true, OTEL_EXPORTER_OTLP_ENDPOINT is not set to http://$(NODE_IP):40318, but to the HTTP port of the DaemonSet collector's service URL http://${helm-release-name}-opentelemetry-collector-service.${namespace-of-the-dash0-operator}.svc.cluster.local:4318 instead.

The remainder of this section provides a more detailed step-by-step description of how the Dash0 operator's workload instrumentation for tracing works internally, intended for the technically curious reader. You can safely skip this section if you are not interested in the technical details.

  1. The Dash0 operator adds the dash0-instrumentation init container with the Dash0 instrumentation image to the pod spec template of workloads. The instrumentation image contains OpenTelemetry SDKs and distributions for all supported runtimes and the OpenTelemetry injector binary.
  2. When the init container starts, it copies the OpenTelemetry distributions and the OpenTelemetry injector binary to a dedicated shared volume mount that has been added by the operator, so they are available in the target container's file system. When it has copied all files, the init container exits.
  3. The operator also adds environment variables to the target container to ensure that the OpenTelemetry SDK has the correct configuration and will get activated at startup. The activation of the OpenTelemetry SDK happens via an LD_PRELOAD hook. For that purpose, the Dash0 operator adds the LD_PRELOAD environment variable to the pod spec template of the workload. LD_PRELOAD is an environment variable that is evaluated by the dynamic linker/loader when a Linux executable starts. In general, it specifies a list of additional shared objects to be loaded before the actual code of the executable. In this specific case, the OpenTelemetry injector binary is added to the LD_PRELOAD list.
  4. At process startup, the OpenTelemetry injector adds additional environment variables to the running process by hooking into the application startup, finding the dlsym symbol and setenv symbols, and then calling setenv to add or modify environment variables (like OTEL_RESOURCE_ATTRIBUTES, NODE_OPTIONS, JAVA_TOOL_OPTIONS and others). The reason for doing that at process startup and not when modifying the pod spec (where environment variables can also be added and modified) is that the original environment variables are not necessarily fully known at that time. Workloads will sometimes set environment variables in their Dockerfile or in an entrypoint script; those environment variables are only available at process runtime. For example, the OpenTelemetry injector sets (or appends to) NODE_OPTIONS to activate the Dash0 OpenTelemetry distribution for Node.js to collect tracing data from all Node.js workloads. For JVMs, the same is achieved by setting (or appending to) the JAVA_TOOL_OPTIONS environment variable, namely adding a -javaagent). For .NET or other CLR-based workloads, the CORECLR_PROFILER mechanism is used to add the OpenTelemetry .NET instrumentation. For Python auto-instrumentation, the OpenTelemetry SDK is prepended to PYTHONPATH. (Python auto-instrumentation needs to be enabled explicitly via Helm.)
  5. The OpenTelemetry injector also automatically improves Kubernetes-related resource attributes as follows: The operator sets the environment variables OTEL_INJECTOR_K8S_NAMESPACE_NAME, OTEL_INJECTOR_K8S_POD_NAME, OTEL_INJECTOR_K8S_POD_UID and OTEL_INJECTOR_K8S_CONTAINER_NAME on workloads. The OpenTelemetry injector binary picks these values up and uses them to populate the resource attributes k8s.namespace.name, k8s.pod.name, k8s.pod.uid and k8s.container.name via the OTEL_RESOURCE_ATTRIBUTES environment variable. If OTEL_RESOURCE_ATTRIBUTES is already set on the process, the key-value pairs for these attributes are appended to the existing value of OTEL_RESOURCE_ATTRIBUTES. If OTEL_RESOURCE_ATTRIBUTES was not set on the process, the OpenTelemetry injector will add OTEL_RESOURCE_ATTRIBUTES as a new environment variable.

Scraping Prometheus Endpoints

The Dash0 operator automatically scrapes Prometheus endpoints on pods annotated with the prometheus.io/* annotations as defined by the Prometheus Helm chart.

The supported annotations are:

  • prometheus.io/scrape: Only scrape pods that have a value of true, except if prometheus.io/scrape-slow is set to true as well. Endpoints on pods annotated with this annotation are scraped every minute, i.e., scrape interval is 1 minute, unless prometheus.io/scrape-slow is also set to true.
  • prometheus.io/scrape-slow: If set to true, enables scraping for the pod with scrape interval of 5 minutes. If both prometheus.io/scrape and prometheus.io/scrape-slow are annotated on a pod with both values set to true, the pod will be scraped every 5 minutes.
  • prometheus.io/scheme: If the metrics endpoint is secured then you will need to set this to https.
  • prometheus.io/path: Override the metrics endpoint path if it is not the default /metrics.
  • prometheus.io/port: Override the metrics endpoint port if it is not the default 9102.

To be scraped, a pod annotated with the prometheus.io/scrape or prometheus.io/scrape-slow annotations must belong to namespaces that are configured to be monitored by the Dash0 operator (see Enable Dash0 Monitoring For a Namespace).

The scraping of a pod is executed from the same Kubernetes node the pod resides on.

This feature can be disabled for a namespace by explicitly setting prometheusScraping.enabled: false in the Dash0 monitoring resource.

Note: To also have Kube state metrics (which are used extensively in Awesome Prometheus alerts) scraped and delivered to Dash0, you can annotate the kube-state-metrics pod with prometheus.io/scrape: "true" and add a Dash0 monitoring resource to the namespace it is running in.

Managing Dash0 Configurations

The Dash0 operator offers capabilities to manage certain Dash0 configurations as infrastructure as code, by deploying them as Kubernetes resources to a cluster and letting the operator synchronize them to the Dash0 API.

Managing Dash0 Dashboards

You can manage your Dash0 dashboards via the Dash0 operator.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up Perses dashboard resources in namespaces that have a Dash0 monitoring resource deployed.
  • The operator will not synchronize Perses dashboard resources in namespaces where the Dash0 monitoring resource has the setting synchronizePersesDashboards set to false. (This setting is optional and defaults to true when omitted.)
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

The custom resource definition for Perses dashboards needs to be installed in the cluster. There are two ways to achieve this:

  1. Install the Perses dashboard custom resource definition with the following command:
    console
    1
    kubectl apply --server-side -f https://raw.githubusercontent.com/perses/perses-operator/refs/tags/v0.4.0/config/crd/bases/perses.dev_persesdashboards.yaml
  2. Alternatively, install the full Perses operator: Go to https://github.com/perses/perses-operator and follow the installation instructions there.

With the prerequisites in place, you can manage Dash0 dashboards via the operator. The Dash0 operator will watch for Perses dashboard resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the Perses dashboard resources with the Dash0 backend:

  • When a new Perses dashboard resource is created, the operator will create a corresponding dashboard via Dash0's API.
  • When a Perses dashboard resource is changed, the operator will update the corresponding dashboard via Dash0's API.
  • When a Perses dashboard resource is deleted, the operator will delete the corresponding dashboard via Dash0's API.

The dashboards created by the operator will be in read-only mode in Dash0.

If the Dash0 operator configuration resource has the dataset property set, the operator will create the dashboards in that dataset, otherwise they will be created in the default dataset.

You can opt out of synchronization for individual Perses dashboard resources by adding the Kubernetes label dash0.com/enable: false to the Perses dashboard resource. If this label is added to a dashboard which has previously been synchronized to Dash0, the operator will delete the corresponding dashboard in Dash0. Note that the spec.instrumentWorkloads.labelSelector in the monitoring resource does not affect the synchronization of Perses dashboards, the label to opt out of synchronization is always dash0.com/enable: false, even if a non-default label selector has been set in spec.instrumentWorkloads.labelSelector.

When a Perses dashboard resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to the status of the Dash0 monitoring resource in the same namespace. This summary will also show whether the dashboard had any validation issues or an error occurred during synchronization:

yaml
1234567
Kind: Dash0Monitoring
...
Status:
Perses Dashboard Synchronization Results:
my-namespace/perses-dashboard-test:
Synchronization Status: successful
Synchronized At: 2024-10-25T12:02:12Z

Note: If you only want to manage dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the Dash0 operator, and you do not want it to collect telemetry, you can set telemetryCollection.enabled to false in the Dash0 operator configuration resource. This will disable the telemetry collection by the operator, and it will also instruct the operator to not deploy the OpenTelemetry collector in your cluster.

Conversion Webhook for the PersesDashboard CRD

Starting with version v0.3.0, the persesdashboards.perses.dev CRD manifest provides two CRD versions, v1alpha1 and v1alpha2, with v1alpha2 as the storage version. Converting between them requires a conversion webhook. If you install only the standalone CRD without the Perses operator (the first option above), no conversion webhook is present in the cluster. The Kubernetes API server would silently prune fields when a v1alpha1 resource is deployed, resulting in an empty dashboard.

To prevent this, the Dash0 operator can host the conversion webhook itself. When the Dash0 operator discovers the PersesDashboard CRD, it will check if a conversion webhook is already configured. If it is, the Dash0 operator will assume that the full Perses operator including its conversion webhook has been deployed, and that no additional conversion webhook is required. If there is no conversion webhook configured, the Dash0 operator patches the CRD's spec.conversion stanza to point at its own webhook endpoint. This can be disabled by setting the Helm value operator.iac.persesDashboard.autoPatchConversionWebhook to false (the default is true). If you manage the PersesDashboard CRD via GitOps (Argo CD, Flux, etc.), configure it to ignore spec.conversion in the persesdashboards.perses.dev CRD. Alternatively, set operator.iac.persesDashboard.autoPatchConversionWebhook to false and make sure to only deploy resources in version v1alpha2, or add the conversion webhook to the CRD in your GitOps sources:

123456789101112131415161718
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: persesdashboards.perses.dev
...
spec:
conversion:
strategy: Webhook
webhook:
clientConfig:
service:
name: dash0-operator-webhook-service-name
namespace: operator-namespace
path: /convert-persesdashboard
port: 443
conversionReviewVersions:
- v1
...

If you leave operator.iac.persesDashboard.autoPatchConversionWebhook enabled and your GitOps tooling removes spec.conversion on every sync, the operator will re-apply the patch on each CRD update event and log a warning, so the back-and-forth patching between the Dash0 operator and the GitOps system is observable.

Managing Dash0 Check Rules

You can manage your Dash0 check rules via the Dash0 operator.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up Prometheus rule resources in namespaces that have a Dash0 monitoring resource deployed.
  • The operator will not synchronize Prometheus rule resources in namespaces where the Dash0 monitoring resource has the setting synchronizePrometheusRules set to false. (This setting is optional and defaults to true when omitted.)
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

The custom resource definition for Prometheus rules also needs to be installed in the cluster. There are two ways to achieve this:

  1. Install the Prometheus rules custom resource definition with the following command:
    console
    1
    kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.91.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
  2. Alternatively, install the full kube-prometheus stack Helm chart: Go to https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack and follow the installation instructions there.

With the prerequisites in place, you can manage Dash0 check rules via the operator. The Dash0 operator will watch for Prometheus rule resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the Prometheus rule resources with the Dash0 backend:

  • When a new Prometheus rule resource is created, the operator will create corresponding check rules via Dash0's API.
  • When a Prometheus rule resource is changed, the operator will update the corresponding check rules via Dash0's API.
  • When a Prometheus rule resource is deleted, the operator will delete the corresponding check rules via Dash0's API.

Note that a Prometheus rule resource can contain multiple groups, and each of those groups can have multiple rules. The Dash0 operator will create individual check rules for each rule in each group.

If the Dash0 operator configuration resource has the dataset property set, the operator will create the rules in that dataset, otherwise they will be created in the default dataset.

Prometheus Rule to Dash0 Check Rule Mapping

Prometheus rules will be mapped to Dash0 check rules as follows:

  • Each rules list item with an alert attribute in all groups will be converted to an individual check rule in Dash0.
  • Rules that have the record attribute set instead of the alert attribute will be ignored.
  • The name of the Dash0 check rule will be "${group_name} - ${alert}", where ${group_name} is the name attribute of the group in the Prometheus rule resource, and ${alert} is value of the alert attribute.
  • The interval attribute of the group will be used for the setting "Evaluate every" for each Dash0 check rule in that group.
  • Other attributes in the Prometheus rule are converted to Dash0 check rule attributes as described in the table below.
  • Top level Kubernetes annotations from the PrometheusRule metadata are added to each check rule's annotation map. If the same annotation appears in both the PrometheusRule metadata and an individual rule, the annotation value in the individual check rule takes priority and overrides the metadata annotation. This allows defining common annotations at the PrometheusRule level and selectively override them for specific rules as needed.
  • Some rule annotations and labels are interpreted by Dash0, these are described in the conversion table below. For example, to set the summary of the Dash0 check rule, add an annotation summary to the rules item in the Prometheus rule resource.
  • If expr contains the token $__threshold, and neither annotation dash0-threshold-degraded nor dash0-threshold-critical is present, the rule will be considered invalid and will not be synchronized to Dash0.
  • If the rule has the annotation dash0-enabled=false, the check rule will be synchronized but disabled in Dash0. This Prometheus annotation is not to be confused with the Kubernetes label dash0.com/enable: false, which disables synchronization of the entire Prometheus rules resource (and all its check rules) to Dash0 (see below).
  • The group attribute limit is not supported by Dash0 and will be ignored.
  • The group attribute partial_response_strategy is not supported by Dash0 and will be ignored.
  • All labels (except for the ones explicitly mentioned in the conversion table below) will be listed under "Additional labels".
  • All annotations (except for the ones explicitly mentioned in the conversion table below) will be listed under "Annotations".
Prometheus alerting rule attributeDash0 Check rule fieldNotes
alertName of check rule, prefixed by the group namemust be a non-empty string
expr"Expression"must be a non-empty string
interval (from group)"Evaluate every"
for"Grace periods"/"For"default: "0s"
keep_firing_for"Grace periods"/"Keep firing for"default: "0s"
annotations/summary"Summary"
annotations/description"Description"
annotations/dash0-enabledDenotes whether the check rule is enabled and should be evaluated.default: "true"; must be either "true" or "false"
annotations/dash0-threshold-degradedWill be used in place of the token $__threshold in the expression, to determine whether the check is degradedIf present, it needs to be a string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals.
annotations/dash0-threshold-criticalWill be used in place of the token $__threshold in the expression, to determine whether the check is criticalIf present, it needs to be a string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals.
annotations/*"Annotations"
labels/*"Additional labels"

You can opt out of synchronization for individual Prometheus rules resources by adding the Kubernetes label dash0.com/enable: false to it. If this label is added to a Prometheus rules resource which has previously been synchronized to Dash0, the operator will delete all corresponding check rules in Dash0. Note that this refers to a Kubernetes label on the Kubernetes resource, and it will affect all check rules contained in this Prometheus rules resource. This mechanism is not to be confused with the Prometheus annotation dash0-enabled, which can be applied to individual rules in a Prometheus rules resource, and controls whether the check rule is enabled or disabled in Dash0. Please also note that the spec.instrumentWorkloads.labelSelector in the monitoring resource does not affect the synchronization of Prometheus rule resources, the label to opt out of synchronization is always dash0.com/enable: false, even if a non-default label selector has been set in spec.instrumentWorkloads.labelSelector.

When a Prometheus rules resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to the status of the Dash0 monitoring resource in the same namespace. This summary will also show whether any of the rules had validation issues or errors occurred during synchronization:

yaml
123456789101112131415161718192021222324
Kind: Dash0Monitoring
...
Status:
Prometheus Rule Synchronization Results:
my-namespace/prometheus-example-rules:
Synchronization Status: successful
Synchronized At: 2026-04-30T07:12:41Z
Alerting Rules Total: 3
Recording Rules Total: 1
Invalid Rules Total: 0
Synchronization Results:
dash0ApiEndpoint: https://api.$region.dash0.com/
dash0Dataset: default
Synchronization Errors Total: 0
Synchronized Rules Attributes:
dash0/collector - exporter send failed spans:
dash0Origin: dash0-operator_e70c4b03-7e8d-432d-b2cd-addff593076e_default_namespace_prometheus-example-rules_dash0|collector_exporter send failed spans
dash0/k8s - K8s Deployment replicas mismatch:
dash0Origin: dash0-operator_e70c4b03-7e8d-432d-b2cd-addff593076e_default_namespace_prometheus-example-rules_dash0|k8s_K8s Deployment replicas mismatch
dash0/k8s - K8s pod crash looping:
dash0Origin: dash0-operator_e70c4b03-7e8d-432d-b2cd-addff593076e_default_namespace_prometheus-example-rules_dash0|k8s_K8s pod crash looping
dash0/k8s - job:http_requests:rate5m:
dash0Origin: dash0-operator_e70c4b03-7e8d-432d-b2cd-addff593076e_default_namespace_prometheus-example-rules_dash0|k8s_job|http_requests|rate5m
Synchronized Rules Total: 4

Note: If you only want to manage dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the Dash0 operator, and you do not want it to collect telemetry, you can set telemetryCollection.enabled to false in the Dash0 operator configuration resource. This will disable the telemetry collection by the operator, and it will also instruct the operator to not deploy the OpenTelemetry collector in your cluster.

Managing Dash0 Synthetic Checks

You can manage your Dash0 synthetic checks via the Dash0 operator.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up synthetic check resources in namespaces that have a Dash0 monitoring resource deployed.
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

With the prerequisites in place, you can manage Dash0 synthetic checks via the operator. The Dash0 operator will watch for synthetic check resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the synthetic check resources with the Dash0 backend:

  • When a new synthetic check resource is created, the operator will create a corresponding synthetic check via Dash0's API.
  • When a synthetic check resource is changed, the operator will update the corresponding synthetic check via Dash0's API.
  • When a synthetic check resource is deleted, the operator will delete the corresponding synthetic check via Dash0's API.

The synthetic checks created by the operator will be in read-only mode in Dash0.

The custom resource definition for Dash0 synthetic checks can be found here. An easy way to get started is to create a synthetic check in the Dash0 UI and then download the YAML representation of that check with the button in the upper right corner. The downloaded YAML can then be deployed as the manifest of a synthethic check in your Kubernetes cluster. Once the check is managed via the operator, you might want to delete the synthetic check that has been created in the Dash0 UI directly in the first step -- otherwise it would show up as a duplicate in the Dash0 UI, i.e. two synthetic checks with the same name but different internal IDs.

If the Dash0 operator configuration resource has the dataset property set, the operator will create the synthetic checks in that dataset, otherwise they will be created in the default dataset.

You can opt out of synchronization for individual synthetic check resources by adding the Kubernetes label dash0.com/enable: false to the synthetic check resource. If this label is added to a synthetic check which has previously been synchronized to Dash0, the operator will delete the corresponding synthetic check in Dash0. Note that the spec.instrumentWorkloads.labelSelector in the monitoring resource does not affect the synchronization of synthetic checks, the label to opt out of synchronization is always dash0.com/enable: false, even if a non-default label selector has been set in spec.instrumentWorkloads.labelSelector.

When a synthetic check resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to its status. Note that in contrast to synchronizing Prometheus rules or Perses dashboards (which are third-party custom resources from the perspective of the operator, i.e. they are potentially owned and managed by another Kubernetes operator), the result of the synchronization operation will not be written to the status of the Dash0 monitoring resource in the same namespace, but to the synthetic check resource status directly. The status will also show whether the synthetic check had any validation issues or an error occurred during synchronization.

yaml
12345
Kind: Dash0SyntheticCheck
...
Status:
Synchronization Status: successful
Synchronized At: 2025-09-05T11:47:56Z

Note: If you only want to manage dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the Dash0 operator, and you do not want it to collect telemetry, you can set telemetryCollection.enabled to false in the Dash0 operator configuration resource. This will disable the telemetry collection by the operator, and it will also instruct the operator to not deploy the OpenTelemetry collector in your cluster.

Managing Dash0 Views

You can manage your Dash0 views via the Dash0 operator.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up Dash0 view resources in namespaces that have a Dash0 monitoring resource deployed.
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

With the prerequisites in place, you can manage Dash0 views via the operator. The Dash0 operator will watch for view resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the view resources with the Dash0 backend:

  • When a new view resource is created, the operator will create a corresponding view via Dash0's API.
  • When a view resource is changed, the operator will update the corresponding view via Dash0's API.
  • When a view resource is deleted, the operator will delete the corresponding view via Dash0's API.

The views created by the operator will be in read-only mode in Dash0.

The custom resource definition for Dash0 views can be found here. An easy way to get started is to create a view in the Dash0 UI and then download the YAML representation of that view by using the "Download → YAML" action from the context menu of the view. The downloaded YAML can then be deployed as the manifest of a view in your Kubernetes cluster. Once the view is managed via the operator, you might want to delete the view that has been created in the Dash0 UI directly in the first step -- otherwise it would show up as a duplicate in the Dash0 UI, i.e. two views with the same name but different internal IDs.

If the Dash0 operator configuration resource has the dataset property set, the operator will create the view in that dataset, otherwise they will be created in the default dataset.

You can opt out of synchronization for individual view resources by adding the Kubernetes label dash0.com/enable: false to the view resource. If this label is added to a view which has previously been synchronized to Dash0, the operator will delete the corresponding view in Dash0. Note that the spec.instrumentWorkloads.labelSelector in the monitoring resource does not affect the synchronization of views, the label to opt out of synchronization is always dash0.com/enable: false, even if a non-default label selector has been set in spec.instrumentWorkloads.labelSelector.

When a view resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to its status. Note that in contrast to synchronizing Prometheus rules or Perses dashboards (which are third-party custom resources from the perspective of the operator, i.e. they are potentially owned and managed by another Kubernetes operator), the result of the synchronization operation will not be written to the status of the Dash0 monitoring resource in the same namespace, but to the view resource status directly. The status will also show whether the view had any validation issues or an error occurred during synchronization.

yaml
12345
Kind: Dash0View
...
Status:
Synchronization Status: successful
Synchronized At: 2025-09-05T11:47:56Z

Note: If you only want to manage dashboards, check rules, synthetic checks, views, notification channels, spam filters and signal-to-metrics rules via the Dash0 operator, and you do not want it to collect telemetry, you can set telemetryCollection.enabled to false in the Dash0 operator configuration resource. This will disable the telemetry collection by the operator, and it will also instruct the operator to not deploy the OpenTelemetry collector in your cluster.

Managing Dash0 Notification Channels

You can manage your Dash0 notification channels via the Dash0 operator.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up notification channel resources in namespaces that have a Dash0 monitoring resource deployed.
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

With the prerequisites in place, you can manage Dash0 notification channels via the operator. The Dash0 operator will watch for notification channel resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the notification channel resources with the Dash0 backend:

  • When a new notification channel resource is created, the operator will create a corresponding notification channel via Dash0's API.
  • When a notification channel resource is changed, the operator will update the corresponding notification channel via Dash0's API.
  • When a notification channel resource is deleted, the operator will delete the corresponding notification channel via Dash0's API.

Notification channels are organization-level resources. Unlike synthetic checks or views, they are not scoped to a dataset.

The custom resource definition for Dash0 notification channels can be found here.

You can opt out of synchronization for individual notification channel resources by adding the Kubernetes label dash0.com/enable: false to the notification channel resource. If this label is added to a notification channel which has previously been synchronized to Dash0, the operator will delete the corresponding notification channel in Dash0.

When a notification channel resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to its status. The result of the synchronization operation will be written directly to the notification channel resource status (not to the Dash0 monitoring resource). The status will also show whether any error occurred during synchronization.

yaml
12345
Kind: Dash0NotificationChannel
...
Status:
Synchronization Status: successful
Synchronized At: 2025-09-05T11:47:56Z

Supported Notification Channel Types

The type field in the spec determines which type-specific config field must be set. Exactly one config field must be provided, matching the type.

Slack Webhook

Sends notifications to a Slack channel via an incoming webhook. This is the simplest Slack integration and requires no OAuth flow.

yaml
1234567891011
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: slack-alerts
spec:
display:
name: Slack Alerts
type: slack
slackConfig:
webhookURL: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
channel: "#alerts"
Slack Bot

Sends notifications using the Dash0 Slack Bot, which is installed into a Slack workspace via OAuth. The Slack Bot integration supports richer formatting and centralized channel management compared to the simpler Slack Webhook integration. See the comparison table at the end of this section.

Prerequisites:

  • You must have admin permissions on the target Slack workspace.
  • You need access to the Dash0 UI to initiate the OAuth authorization flow.

Step 1: Install the Dash0 Slack App. In the Dash0 UI, navigate to Settings > Notification Channels, click Add Notification Channel and select Slack Bot. Click Authorize to start the OAuth flow. You will be redirected to Slack to authorize the Dash0 bot for your workspace. After authorization, Slack grants Dash0 a bot token scoped to your workspace. This is a one-time operation per workspace -- once authorized, you can create multiple notification channels against different Slack channels in that workspace without repeating the OAuth flow. Note the Team ID displayed after authorization (e.g. T012345). You will need it for the teamId field.

Step 2: Invite the bot to target channels. The Dash0 bot must be explicitly added to each Slack channel it will post to. In Slack, open the target channel and run /invite @Dash0. Repeat this for every channel you want to receive notifications in.

Step 3: Create the notification channel. Once the bot is installed and invited to the target channel, create a Dash0NotificationChannel resource:

yaml
1234567891011
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: slack-bot-alerts
spec:
display:
name: Slack Bot Alerts
type: slack_bot
slackBotConfig:
teamId: "T012345"
channel: "#alerts"

You can also create notification channels programmatically using the Dash0 Go client library:

go
12345
var config dash0.NotificationChannelSpec_Config
config.FromSlackBotConfig(dash0.SlackBotConfig{
Channel: "#alerts",
TeamId: "T012345",
})

Multiple channels: A single OAuth installation supports creating multiple notification channels across different Slack channels in the same workspace. Create additional Dash0NotificationChannel resources with the same teamId but different channel values:

yaml
12345678910111213141516171819202122232425
# Channel 1: production alerts
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: slack-bot-prod
spec:
display:
name: Production Alerts
type: slack_bot
slackBotConfig:
teamId: "T012345"
channel: "#prod-alerts"
---
# Channel 2: staging alerts
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: slack-bot-staging
spec:
display:
name: Staging Alerts
type: slack_bot
slackBotConfig:
teamId: "T012345"
channel: "#staging-alerts"
Email

Sends notifications to a list of email recipients.

yaml
12345678910111213
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: email-alerts
spec:
display:
name: Email Alerts
type: email_v2
emailV2Config:
recipients:
- oncall@example.com
- teamlead@example.com
plaintext: false
Generic Webhook

Sends notifications to an arbitrary HTTP endpoint.

yaml
1234567891011121314
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: webhook-alerts
spec:
display:
name: Webhook Alerts
type: webhook
webhookConfig:
url: "https://example.com/webhook"
headers:
X-Custom-Header: "my-value"
followRedirects: false
allowInsecure: false
Incident.io
yaml
1234567891011
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: incidentio-alerts
spec:
display:
name: Incident.io Alerts
type: incidentio
incidentioConfig:
url: "https://api.incident.io/v2/alert_events/http/my-source"
headers: "Bearer my-api-token"
OpsGenie
yaml
1234567891011
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: opsgenie-alerts
spec:
display:
name: OpsGenie Alerts
type: opsgenie
opsgenieConfig:
instance: eu
apiKey: "my-opsgenie-api-key"
PagerDuty
yaml
1234567891011
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: pagerduty-alerts
spec:
display:
name: PagerDuty Alerts
type: pagerduty
pagerdutyConfig:
key: "my-integration-key"
url: "https://events.pagerduty.com/v2/enqueue"
Microsoft Teams Webhook
yaml
12345678910
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: teams-alerts
spec:
display:
name: Teams Alerts
type: teams_webhook
teamsWebhookConfig:
url: "https://outlook.office.com/webhook/..."
Discord Webhook
yaml
12345678910
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: discord-alerts
spec:
display:
name: Discord Alerts
type: discord_webhook
discordWebhookConfig:
url: "https://discord.com/api/webhooks/..."
Google Chat Webhook
yaml
12345678910
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: google-chat-alerts
spec:
display:
name: Google Chat Alerts
type: google_chat_webhook
googleChatWebhookConfig:
url: "https://chat.googleapis.com/v1/spaces/.../messages?key=..."
iLert
yaml
12345678910
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: ilert-alerts
spec:
display:
name: iLert Alerts
type: ilert
ilertConfig:
url: "https://api.ilert.com/api/events/my-integration-key"
All Quiet
yaml
12345678910
apiVersion: operator.dash0.com/v1beta1
kind: Dash0NotificationChannel
metadata:
name: allquiet-alerts
spec:
display:
name: All Quiet Alerts
type: all_quiet
allQuietConfig:
url: "https://allquiet.app/api/webhook/my-inbound-integration-id"

Optional Fields

frequency: Controls the notification frequency (e.g. 10m, 5m, 1h). Defaults to 10m if omitted.

routing: Defines which assets and filters determine when this channel is notified.

filters is a list of filter groups. The conditions within a single group (the inner list) are combined with AND, and the groups (the outer list) are combined with OR. This two-level structure is what allows expressing rules like "(A and B) or (C and D)", which a flat list could not.

yaml
1234567891011121314151617181920212223242526
spec:
# ...type and config fields...
frequency: 5m
routing:
assets:
- kind: check_rule
id: "rule-id"
name: "My Check Rule"
dataset: "default"
filters:
# Group 1: service.name is "checkout" AND severity is "error" or "critical"
- - key: service.name
operator: is
value: checkout
- key: severity
operator: is_one_of
values:
- error
- critical
# Group 2: service.name is "payments" AND environment is "production"
- - key: service.name
operator: is
value: payments
- key: environment
operator: is
value: production

With the example above, the channel is notified when (service.name = checkout AND severity in [error, critical]) OR (service.name = payments AND environment = production).

Managing Spam Filters

You can manage your spam filters via the Dash0 operator. Spam filters allow you to drop unwanted telemetry data (logs, spans, or metrics) in the Dash0 ingestion pipeline based on attribute conditions before it is stored in Dash0.

Note: For filtering telemetry before it leaves your cluster, use filter rules in your the monitoring resources.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up spam filter resources in namespaces that have a Dash0 monitoring resource deployed.
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

With the prerequisites in place, you can manage Dash0 spam filters via the operator. The Dash0 operator will watch for spam filter resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the spam filter resources with the Dash0 backend:

  • When a new spam filter resource is created, the operator will create a corresponding spam filter via Dash0's API.
  • When a spam filter resource is changed, the operator will update the corresponding spam filter via Dash0's API.
  • When a spam filter resource is deleted, the operator will delete the corresponding spam filter via Dash0's API.

The custom resource definition for Dash0 spam filters can be found here.

Here is an example of a spam filter that drops all log records from the kube-system namespace:

yaml
1234567891011
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0SpamFilter
metadata:
name: drop-health-checks
spec:
contexts:
- log
filter:
- key: "k8s.namespace.name"
operator: "is"
value: "kube-system"

The contexts field specifies which signal types the filter applies to (e.g. log, span, metric or datapoint). The filter field contains one or more conditions. Each condition matches against an attribute key with an operator (e.g. is, is_not, contains, starts_with) and a value to compare against.

If the Dash0 operator configuration resource has the dataset property set, the operator will create the spam filter in that dataset, otherwise they will be created in the default dataset.

You can opt out of synchronization for individual spam filter resources by adding the Kubernetes label dash0.com/enable: false to the spam filter resource. If this label is added to a spam filter which has previously been synchronized to Dash0, the operator will delete the corresponding spam filter in Dash0. Note that the spec.instrumentWorkloads.labelSelector in the monitoring resource does not affect the synchronization of spam filters, the label to opt out of synchronization is always dash0.com/enable: false, even if a non-default label selector has been set in spec.instrumentWorkloads.labelSelector.

When a spam filter resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to its status. The result of the synchronization operation will be written directly to the spam filter resource status (not to the Dash0 monitoring resource). The status will also show whether the spam filter had any validation issues or an error occurred during synchronization.

yaml
12345
Kind: Dash0SpamFilter
...
Status:
Synchronization Status: successful
Synchronized At: 2026-05-01T12:00:00Z

Managing Dash0 Signal-to-Metrics

You can manage your Dash0 signal-to-metrics rules via the Dash0 operator. Signal-to-metrics rules derive custom metrics from spans or log records based on user-defined filter criteria. Span matches produce exponential histograms (latency distributions); log record matches produce monotonic counters.

Pre-requisites for this feature:

  • A Dash0 operator configuration resource has to be installed in the cluster.
  • The operator configuration resource must have the apiEndpoint property.
  • The operator configuration resource must have at least one Dash0 export configured with authorization (either token or secret-ref).
  • The operator will only pick up signal-to-metrics resources in namespaces that have a Dash0 monitoring resource deployed.
  • Optional: In addition to the global/default API endpoint and authorization described above, it is possible to define namespace-specific overrides by providing one or more Dash0 export(s) with an API endpoint and token in the Dash0 monitoring resource.

With the prerequisites in place, you can manage Dash0 signal-to-metrics rules via the operator. The Dash0 operator will watch for signal-to-metrics resources in all namespaces that have a Dash0 monitoring resource deployed, and synchronize the signal-to-metrics resources with the Dash0 backend:

  • When a new signal-to-metrics resource is created, the operator will create a corresponding rule via Dash0's API.
  • When a signal-to-metrics resource is changed, the operator will update the corresponding rule via Dash0's API.
  • When a signal-to-metrics resource is deleted, the operator will delete the corresponding rule via Dash0's API.

The custom resource definition for Dash0 signal-to-metrics rules can be found here.

Here is an example of a signal-to-metrics rule that derives a latency histogram for requests to a specific HTTP route:

yaml
123456789101112131415161718
apiVersion: operator.dash0.com/v1alpha1
kind: Dash0SignalToMetrics
metadata:
name: checkout-latency
spec:
enabled: true
display:
name: Checkout Service Latency
match:
signal: spans
filters:
- key: http.route
operator: is
value: /api/v1/checkout
output:
name: checkout.request.duration
description: How long calls to our checkout API take
interval: 60s

You can opt out of synchronization for individual signal-to-metrics resources by adding the Kubernetes label dash0.com/enable: false to the resource. If this label is added to a signal-to-metrics resource which has previously been synchronized to Dash0, the operator will delete the corresponding rule in Dash0.

When a signal-to-metrics resource has been synchronized to Dash0, the operator will write a summary of that synchronization operation to its status. The result of the synchronization operation will be written directly to the signal-to-metrics resource status (not to the Dash0 monitoring resource). The status will also show whether any error occurred during synchronization.

yaml
12345
Kind: Dash0SignalToMetrics
...
Status:
Synchronization Status: successful
Synchronized At: 2026-05-16T12:00:00Z

Infrastructure-as-Code Only Mode (Disable Telemetry Collection)

If you use the Dash0 operator only for infrastructure-as-code purposes, and not to collect telemetry in the cluster, you can set the Helm flag operator.telemetryCollectionEnabled=false. In this mode, the operator will not deploy any OpenTelemetry collectors to the cluster, and no telemetry will be collected.

By default, the operator Helm chart will set up all necessary Kubernetes RBAC permissions to manage OpenTelemetry collectors and the target allocator. If set to false, these RBAC permissions will not be granted, and the operator will run with a reduced set of RBAC permissions.

If operator.telemetryCollectionEnabled=false is set, it will not be possible to set spec.telemetryCollection.enabled to true in the Dash0OperatorConfiguration, since the required RBAC permissions have not been granted. To enable telemetry collection later on, it is required to change this Helm value to true and perform a helm upgrade --install with the updated setting.

Notes on AWS EKS

If your telemetry from an AWS EKS cluster is missing cloud.provider, cloud.platform and other cloud.* resource attributes, refer to the resource detection processor documentation. In particular, make sure that IMDS is available on your EKS nodes.

Notes on GKE Autopilot

When deploying the Dash0 operator to a GKE Autopilot cluster, provide the following additional setting when applying the Helm chart:

yaml
1234
operator:
gke:
autopilot:
enabled: true

GKE Autopilot restricts what workloads in an autopilot clusters can do. With operator.gke.autopilot.enabled set to true, the Dash0 operator Helm chart deploys an auto.gke.io/AllowlistSynchronizer resource into the target cluster, which in turn will add the required auto.gke.io/WorkloadAllowlist resources for Dash0 workloads (the operator and the OpenTelemetry collectors it manages). This allows the Dash0 operator to work on GKE Autopilot clusters.

Not all restrictions can be lifted via workload allowlist, the following features are not available on GKE Autopilot clusters:

  • collecting utilization metrics with the kubeletstats receiver is disabled; collecting these requires access to the /pod endpoint of the kubelet API which is not available in GKE autopilot due to the lack of the nodes/proxy permission:
    • k8s.pod.cpu_limit_utilization,
    • k8s.pod.cpu_request_utilization,
    • k8s.pod.memory_limit_utilization, and
    • k8s.pod.memory_request_utilization
  • collecting the extra metadata labels container.id and k8s.volume.type for the kubeletstats receiver metrics is disabled, collecting these requires access to the /pod endpoint of the kubelet API which is not available in GKE autopilot due to the lack of the nodes/proxy permission

Refer to https://cloud.google.com/kubernetes-engine/docs/how-to/run-autopilot-partner-workloads for more information on AllowlistSynchronizer, WorkloadAllowlist, and related concepts.

Managing the AllowlistSynchronizer Manually

As an alternative to letting the Helm chart install the AllowlistSynchronizer, you can also choose to manage this manually, if you prefer:

yaml
12345
operator:
gke:
autopilot:
enabled: true
deployAllowlistSynchronizer: false

With these settings, the Dash0 operator Helm chart will not deploy the AllowlistSynchronizer. Using these settings requires that you deploy the Dash0 AllowlistSynchronizer before installing the Dash0 operator. To do that, create the following file dash0-gke-autopilot-allowlist-synchronizer.yaml:

yaml
1234567
apiVersion: auto.gke.io/v1
kind: AllowlistSynchronizer
metadata:
name: dash0-allowlist-synchronizer
spec:
allowlistPaths:
- Dash0/operator/*

Then deploy it as follows:

1
kubectl apply -f dash0-gke-autopilot-allowlist-synchronizer.yaml

When managing the AllowlistSynchronizer manually, you might need to update it from time to time for future Dash0 operator releases.

Notes on Azure AKS

In AKS clusters that have the Azure Policy add-on enabled, it is highly recommended to use a volume for filelog offsets instead of the default filelog offset config map. Using the default config map filelog offset storage in AKS clusters with this add-on can lead to severe performance issues.

Notes on the Open Policy Agent

In clusters that have the OPA gatekeeeper deployed it is highly recommended to use a volume for filelog offsets instead of the default filelog offset config map. Using the default config map filelog offset storage in clusters with this component can lead to severe performance issues.

Notes on Kyverno Admission Controller

In clusters that have the Kyverno admission controller deployed, it is highly recommended to either use a volume for filelog offsets instead of the default filelog offset config map, or to exclude ConfigMaps (or all resource types) in the Dash0 operator's namespace from Kyverno's processing. Leaving Kyverno processing in place and using the config map filelog offset storage can lead to severe performance issues, since the default config map for filelog offsets is updated very frequently. This can cause Kyverno to consume a lot of CPU and memory resources, potentially even leading to OOMKills of the Kyverno admission controller.

Notes on GitOps

When deploying workloads via GitOps tools like ArgoCD or Flux in a cluster where the Dash0 operator is installed, some care needs to be exercised to not create conflicts between the workload definition in the GitOps repsitory and the workload modifications that are applied automatically by the Dash0 operator. Otherwise, workload settings might flip-flop between what the GitOps system wants to apply and what the Dash0 operator does, or the GitOps system might overwrite the Dash0 operator's settings, thereby breaking telemetry collection for the workload.

Environment variable definitions in pod spec templates are the most likely source of conflict. To avoid conflicts, it is recommended to not define the following environment variables via GitOps:

  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_EXPORTER_OTLP_PROTOCOL
  • OTEL_PROPAGATORS
  • LD_PRELOAD
  • DASH0_NODE_IP
  • DASH0_OTEL_COLLECTOR_BASE_URL
  • OTEL_INJECTOR_K8S_NAMESPACE_NAME
  • OTEL_INJECTOR_K8S_POD_NAME
  • OTEL_INJECTOR_K8S_POD_UID
  • OTEL_INJECTOR_K8S_CONTAINER_NAME
  • OTEL_INJECTOR_SERVICE_NAME
  • OTEL_INJECTOR_SERVICE_NAMESPACE
  • OTEL_INJECTOR_SERVICE_VERSION
  • OTEL_INJECTOR_RESOURCE_ATTRIBUTES

This recommendation does not apply to workloads that are excluded from workload instrumentation or workloads in namespaces without a Dash0 monitoring resource or a monitoring resource with spec.instrumentWorkloads.mode set to none.

Notes on ArgoCD

As many other Helm charts, the Dash0 operator Helm chart regenerates TLS certificates for in-cluster communication, that is, for its services and webhooks. The certificate will be regenerated every time the Dash0 operator Helm chart is applied. For users deploying the Dash0 operator via ArgoCD, and in particular without using ArgoCD's auto-sync feature, the certificates and derived data (ca.crt, tls.crt, tls.key, caBundle) will show up as a diff in the ArgoCD UI. The certificate is also regenerated every time the hard refresh option is used in ArgoCD, since this action will trigger rendering the Helm chart templates again, even if nothing has changed in the git repository.

To avoid this, you can instruct ArgoCD to ignore these particular differences. Here is an example for an argoproj.io/v1alpha1.Application resource with ignoreDifferences:

yaml
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dash0-operator
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
source:
chart: dash0-operator
repoURL: https://dash0hq.github.io/dash0-operator
targetRevision: ...
# ... your current spec for the dash0-operator ArgoCD application
# Ignore certificates which are generated on the fly during Helm chart template rendering:
ignoreDifferences:
- kind: Secret
name: dash0-operator-certificates
jsonPointers:
- /data/ca.crt
- /data/tls.crt
- /data/tls.key
- group: admissionregistration.k8s.io
kind: MutatingWebhookConfiguration
name: dash0-operator-injector
jsonPointers:
- /webhooks/0/clientConfig/caBundle
- group: admissionregistration.k8s.io
kind: MutatingWebhookConfiguration
name: dash0-operator-monitoring-mutating
jsonPointers:
- /webhooks/0/clientConfig/caBundle
- group: admissionregistration.k8s.io
kind: MutatingWebhookConfiguration
name: dash0-operator-operator-configuration-mutating
jsonPointers:
- /webhooks/0/clientConfig/caBundle
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
name: dash0-operator-monitoring-validator
jsonPointers:
- /webhooks/0/clientConfig/caBundle
- group: admissionregistration.k8s.io
kind: ValidatingWebhookConfiguration
name: dash0-operator-operator-configuration-validator
jsonPointers:
- /webhooks/0/clientConfig/caBundle

Notes on Running The Operator on Apple Silicon

When running the operator on an Apple Silicon host (M1, M3 etc.), for example via Docker Desktop, some attention needs to be paid to the CPU architecture of images. The architecture of the Kubernetes node for this scenario will be arm64. When running a single-architecture amd64 image (as opposed to a single-architecture arm64 image or a multi-platform build containing amd64 as well as arm64) the operator will prevent the container from starting.

The reason for this is the interaction between Rosetta emulation and how the operator works. The Dash0 instrumentation image (which is added as an init container and contains the OpenTelemetry injector) is a multi-platform image, supporting both amd64 and arm64. When this image is pulled from an Apple Silicon machine, it automatically pulls the arm64 variant. That is, the injector binary that is added via the init container is compiled for arm64. Now, when the application from your amd64 application image is started, the injector and the application will be incompatible, as they have been built for two different CPU architectures.

Under normal circumstances, an amd64 image would not work on an arm64 Kubernetes node anyway, but in the case of Docker Desktop on MacOS, this combination is enabled due to Docker Desktop automatically running amd64 images via Rosetta2 emulation.

You can work around this issue by one of the following methods:

  • Using an amd64 Kubernetes node,
  • By building a multi-platform image for your application, or
  • By building the application as an arm64 image (e.g. by using --platform=linux/arm64 when building the image).

Notes on Running The Operator on Docker Desktop

The hostmetrics receiver will be disabled when using Docker as the container runtime.

Notes on Running The Operator on Minikube

The hostmetrics receiver will be disabled when using Docker as the container runtime.

Troubleshooting

Create Heap Profiles

The instructions in this section are mainly meant to be used in a shared troubleshooting session with Dash0 support.

To get a heap profile from the operator manager container:

  1. Deploy the operator manager with the additional Helm value operator.pprofPort=1777.
  2. Take note of the namespace the operator is deployed in (default: dash0-system).
  3. Run kubectl get pod -n <operator-namespace> -l app.kubernetes.io/component=controller to get the name of the operator manager pod (usually something like dash0-operator-controller-xxxxxxxxx-xxxxx, but the name depends on the Helm release name).
  4. Using the information from the previous two steps, run kubectl port-forward -n <operator-namespace> <operator-manager-pod-name> 1777.
  5. In a separate shell, while the kubectl port-forward command from the previous step is still running, run curl http://localhost:1777/debug/pprof/heap > dash0-operator-manager-heap.out.
  6. Terminate the kubectl port-forward command.
  7. Redeploy the operator without the Helm setting operator.pprofPort=1777.

To get a heap profile from a OpenTelemetry collector daemonset container:

  1. Deploy the operator manager with the additional Helm value operator.collectors.enablePprofExtension=true.
  2. Take note of the namespace the operator is deployed in (default: dash0-system).
  3. Run kubectl top pod -n <operator-namespace> -l app.kubernetes.io/component=agent-collector to get the name of a collector pod that has high memory usage (usually something like dash0-operator-opentelemetry-collector-agent-daemonset-xxxxx, but the name depends on the Helm release name).
  4. Using the information from the previous two steps, run kubectl port-forward -n <operator-namespace> <collector-daemonset-pod-name> 1777.
  5. In a separate shell, while the kubectl port-forward command from the previous step is still running, run curl http://localhost:1777/debug/pprof/heap > dash0-daemonset-collector.out.
  6. Terminate the kubectl port-forward command.
  7. Redeploy the operator without the Helm setting operator.collectors.enablePprofExtension=true.

To get a heap profile from a OpenTelemetry collector deployment container:

  • Follow the same steps as for the collector daemonset, but use -l app.kubernetes.io/component=cluster-metrics-collector in step (3.).