• 20 min read

Observing Dapr with OpenTelemetry and Dash0

Dapr shifts service invocation, state, pub/sub and workflow orchestration into a sidecar - which makes runtime observability critical. Learn how Dapr and OpenTelemetry can be combined to deliver unified traces, metrics, and logs, and why aligning sidecar and app telemetry matters for production systems.

Dapr (Distributed Application Runtime) brings consistency to microservice development by moving common concerns into a sidecar: service invocation, pub/sub messaging, state stores, actors, workflows. This abstraction frees application code from boilerplate - but it also makes a large part of system behavior invisible unless you observe it.

Observability in a Dapr system isn’t just a nice-to-have. If the sidecar is responsible for handling requests, securing communication, retrying messages, or storing state, you need to see what it’s doing. And because these operations happen outside your application container, observability must unify two perspectives:

  • Application telemetry - traces, logs, and metrics from the code you wrote.
  • Runtime telemetry - traces, logs, and metrics from the Dapr sidecars and control plane.

The glue between these perspectives is OpenTelemetry (OTel). Dapr already emits telemetry in OpenTelemetry formats, and the OpenTelemetry Collector and OpenTelemetry Operator provide the machinery to collect, enrich, and export that data. The result is a consistent, vendor-neutral pipeline where both your app and Dapr runtime contribute signals to the same view.

If you want a deeper dive into how the Collector works, see our guide on building telemetry pipelines with the OpenTelemetry Collector. It walks through how the Collector ingests, processes, and exports signals in production setups.

Traces in Dapr

When you run services through Dapr, every interaction crosses at least one sidecar boundary. A client call enters the sidecar, flows into your app, may trigger a pub/sub publish, then fan out into other services via their sidecars. Without distributed tracing, this looks like a blur of disconnected requests.

Traces are the backbone of observability in Dapr. They reveal:

  • How requests flow between applications and Dapr sidecars.
  • Where latency is introduced (in your code, in the sidecar, or in a downstream component).
  • When retries and failures occur, including in asynchronous messaging.
  • How long workflows and actors run, even across restarts.

Because Dapr uses the W3C Trace Context standard, it propagates trace identifiers across services, messages, and workflows. That means a single trace_id can connect an HTTP call in one service, a message in a pub/sub topic, and the subscriber’s processing in another service - even though they’re decoupled in time and infrastructure.

This becomes especially powerful in asynchronous communication. When a service publishes an event, the Dapr sidecar injects the trace context into the message. The subscriber’s sidecar extracts it and starts a new span linked back to the publisher. You can see how long the message spent in transit, how long it took to process, and whether it was retried or dropped. Dapr Workflows extend the same principle: each orchestration step is emitted as a span, grouped under a parent workflow span that represents the entire instance. Even workflows that run for hours or days appear as a coherent timeline. And for actors, placement and activation events can be correlated across sidecars, letting you observe when actors are created, moved, or deactivated.

Configuring Dapr for tracing

Dapr supports OpenTelemetry natively. To enable tracing in Kubernetes, you define a Configuration resource pointing to an OTLP endpoint. For example:

yaml
123456789101112131415161718
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
name: observability
namespace: <namespace>
spec:
tracing:
samplingRate: "1"
otel:
endpointAddress: "<name>.<namespace>.svc.cluster.local:4317"
isSecure: false
protocol: grpc
metric:
enabled: true
recordErrorCodes: true
features:
- name: proxy.tracing
enabled: true

With this configuration applied, any pod annotated with dapr.io/config: "observability" will emit spans to the OpenTelemetry Collector.

Inside the Collector, you can enrich spans with Kubernetes metadata using the k8sattributes processor, batch them for efficiency, and export them to your chosen backend.

Example pipeline (Collector config snippet):

yaml
12345678910111213141516171819202122232425
processors:
batch: {}
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection

The result is a distributed trace that shows both the Dapr runtime operations and the application spans in one continuous view.

If you are using the OpenTelemetry Collector helm chart, you can enable the k8sattributes processor using the presets:

yaml
123
presets:
kubernetesAttributes:
enabled: true

Metrics in Dapr

If traces tell you how requests flow, metrics tell you how the runtime is performing over time. They are the continuous heartbeat of your Dapr sidecars and control-plane components. While traces help you debug a single workflow, metrics help you monitor system health, detect regressions, and set alerts.

Dapr sidecars and system services expose Prometheus-format metrics on port 9090. These cover everything from request counts and latencies to actor placement and certificate renewal.

Together, they let you answer questions like:

  • How many requests per second is each service handling through Dapr?
  • Are any pub/sub messages being dropped?
  • Is the state store becoming a bottleneck?
  • Did the sidecar injector fail to attach to new pods?
  • When will the current mTLS issuer certificate expire?

Key Dapr metrics

Dapr emits a broad set of runtime metrics, but a few categories stand out as especially useful when operating applications in production.

Service Invocation. Metrics like dapr_http_server_request_count, dapr_http_server_latency, and dapr_grpc_io_server_completed_rpcs describe how many requests the sidecar is handling and how long they take. Together, they provide a view into throughput, error rates, and latency overhead introduced by the runtime.

Pub/Sub. The pub/sub subsystem exposes counters such as dapr_component_pubsub_ingress_count and dapr_component_pubsub_egress_count, as well as histograms like dapr_component_pubsub_ingress_latencies. These metrics are labeled with topic and process_status, making it possible to track message flow and detect drops.

State Stores. Metrics such as dapr_component_state_count and dapr_component_state_latencies show how often state stores are accessed and how responsive they are.

Control Plane Health. Dapr’s system components also publish key signals. The injector reports dapr_injector_sidecar_injection_failed_total, while the Sentry service tracks certificate signing outcomes (dapr_sentry_cert_sign_failure_total) and expiry (dapr_sentry_issuercert_expiry_timestamp).

Actors and Placement. For actor-based workloads, gauges such as dapr_placement_runtimes_total and dapr_scheduler_sidecars_connected show how many sidecars are connected to placement and scheduling services.

This is only a subset of what Dapr exposes. For a complete reference, see the full list of metrics in the Dapr documentation.

Figure 1: Overview of Dapr Metrics visualized in Dash0 Metrics Explorer

Configuring the OpenTelemetry Collector to Scrape Dapr Metrics

The OpenTelemetry Collector can scrape these metrics directly using the Prometheus receiver. A minimal configuration might look like this.

yaml
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
receivers:
prometheus:
config:
scrape_configs:
- job_name: dapr-sidecars
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: "true"
source_labels:
- __meta_kubernetes_pod_annotation_dapr_io_enabled
- action: replace
replacement: $1
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
regex: (.*);daprd
replacement: $1-dapr
source_labels:
- __meta_kubernetes_pod_annotation_dapr_io_app_id
- __meta_kubernetes_pod_container_name
target_label: service
- action: replace
replacement: $1:9090
source_labels:
- __meta_kubernetes_pod_ip
target_label: __address__
- job_name: dapr
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: dapr
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- action: keep
regex: dapr
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_part_of
- action: replace
replacement: $1
source_labels:
- __meta_kubernetes_pod_label_app
target_label: app
- action: replace
replacement: $1
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
replacement: $1:9090
source_labels:
- __meta_kubernetes_pod_ip
target_label: __address__

This configuration:

  • Discovers pods with the annotation dapr.io/enabled=true and scrapes their sidecar metrics.
  • Discovers Dapr control-plane pods by label and scrapes them as well.
  • Exports all metrics via OTLP to Dash0 (or any OTLP backend).

Example insights via PromQL

Once collected, Dapr metrics let you ask sharper questions than just “is it running?”. You can quantify performance, spot bottlenecks, and detect early warning signs.

Take request throughput, for example. With:

mysql
123
sum by (app_id, status)(
rate({otel_metric_name = "dapr_http_server_request_count", otel_metric_type = "sum"}[1m])
)

you can see how many HTTP requests each service is handling per second, broken down by status code. If you notice a sudden rise in 4xx or 5xx responses, it often points to problems in downstream dependencies rather than in the Dapr runtime itself.

Latency is another key dimension. Using:

mysql
12345
histogram_quantile(0.95,
sum(
rate({otel_metric_name = "dapr_http_server_latency", otel_metric_type = "histogram"}[5m])
) by (app_id)
)

you can calculate the 95th percentile HTTP latency for each service. This highlights how services perform under load and exposes “tail latencies” that users are most likely to feel. Even if averages look fine, rising p95 values are often the earliest signal of a performance regression.

For asynchronous workloads, pub/sub metrics are invaluable. The query:

mysql
123
sum(
rate({otel_metric_name = "dapr_component_pubsub_ingress_count", otel_metric_type = "sum", process_status="success"}[5m])
) by (app_id, topic)

shows the rate of successfully processed messages by application and topic. Tracking this over time confirms whether subscribers are keeping pace with publishers. A divergence between published and successfully processed messages usually indicates subscriber bottlenecks or misconfiguration.

Together, these kinds of queries move observability from “is the system alive?” to “is it healthy, fast, and reliable?” - and they provide a solid foundation for dashboards and alerting in production Dapr environments.

The OpenTelemetry Collector and Operator

The OpenTelemetry Collector is not just a sink for telemetry - it’s the control plane of observability. In a Dapr environment, it plays three essential roles:

1. Ingesting data from many sources:

  • OTLP spans emitted by Dapr sidecars and app containers.
  • Prometheus metrics scraped from sidecars and control-plane pods.
  • Logs if you route them through sidecars or agents.

2. Enriching telemetry:

  • The k8sattributes processor adds Kubernetes metadata (namespace, pod name, pod UID, node name).
  • This lets you unify spans from the sidecar and the application container under a single resource identity.

3. Exporting to any backend:

  • With exporters like OTLP, Prometheus remote write, or vendor-specific ones, the same telemetry can feed multiple systems.

Because Dapr already speaks OpenTelemetry (for traces) and Prometheus (for metrics), the Collector is the natural aggregation point.

The Operator as the Enabler

In Kubernetes, the OpenTelemetry Operator does more than just manage Collector instances. Its most powerful role in the context of Dapr is as the enabler of no-touch instrumentation.

Through its Instrumentation custom resource, the Operator can automatically inject language-specific agents into your application pods. For example, in a Java service it can mount the OpenTelemetry Java agent into the container at startup, set the necessary environment variables, and export spans to your Collector - all without you having to change application code. Similar support exists for .NET, Node.js, Python, and Go.

This makes it possible to instrument entire applications simply by applying an Instrumentation resource, such as:

yaml
123456789101112131415
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: instrumentation
namespace: opentelemetry
spec:
exporter:
endpoint: http://<name>.<namespace>.svc.cluster.local:4317
propagators:
- tracecontext
- baggage
sampler:
type: always_on
resource:
addK8sUIDAttributes: true

Once applied, the Operator takes care of injecting the right agent and wiring up the exporter. From the developer’s perspective, there’s no code change and no manual configuration - instrumentation just happens.

This is particularly important in a Dapr environment. The sidecar already emits spans for service invocation, pub/sub, and state interactions. With auto-instrumentation, your application logic emits its own spans too, which means both sides of the picture are covered: the plumbing and the business code. Together, they form a complete trace.

The Operator can also be used to manage Collectors, scaling them or enabling features like the Target Allocator, but its role as the provider of auto-instrumentation is what really makes it a game-changer. It bridges the gap between runtime telemetry (from Dapr sidecars) and application telemetry (from your code) without demanding developer effort - true no-touch observability.

All of this is easier to appreciate when you can see the signals come together - so we built a demo that turns these concepts into something you can explore.

Trying it out yourself

Theory only goes so far. To truly understand how Dapr and OpenTelemetry work together, nothing beats hands-on experience with a real distributed system. We’ve built a comprehensive demo application that showcases every observability concept discussed in this post - from distributed tracing across service boundaries to unified metrics collection from both applications and infrastructure.

The Demo Architecture

The demo implements a todo list application as a microservices system, but it’s far from a toy example. It demonstrates production patterns you’d find in real Dapr deployments: multiple language runtimes, asynchronous messaging, state management, and operator-managed infrastructure, all contributing telemetry to a unified pipeline.

At its core, the system includes:

  • A React frontend instrumented with OpenTelemetry’s browser SDK.
  • A Java Spring Boot todo service persisting state in PostgreSQL through Dapr’s state API and publishing events.
  • A validation service demonstrating synchronous Dapr service-to-service calls.
  • A notification service consuming events via RabbitMQ, highlighting asynchronous messaging.
  • PostgreSQL managed by the CloudNativePG operator, with query performance metrics enabled.
  • RabbitMQ via the Cluster Operator, exposing broker and queue metrics.
  • OpenTelemetry Collector scraping Dapr sidecars, collecting infrastructure telemetry and exporting to Dash0.
  • OpenTelemetry Operator injecting language agents (like the Java agent) for no-touch auto-instrumentation.

What makes this demo especially valuable is that it mirrors real production complexity. It combines synchronous service-to-service calls with asynchronous pub/sub, and it integrates infrastructure components alongside application workloads. The result is a single telemetry pipeline where browser interactions, application spans, sidecar traces, and database or message broker metrics all come together.

Exploring the System

Once deployed, the observability story becomes visible almost immediately. When you interact with the frontend, you can follow a distributed trace that begins in the browser, passes through nginx, travels into the Dapr sidecar, and reaches the todo service. From there it continues into validation logic, persists to PostgreSQL, and eventually fans out asynchronously through RabbitMQ to the notification service. Each of these hops is represented as spans in a single trace, giving you a complete view of the request path.

From these traces, Dash0 automatically builds a resource map. This map reveals not just which services exist, but how they depend on one another: frontend to sidecars, sidecars to application services. Seeing the architecture laid out like this helps make sense of the relationships between components.

As you can see on the resource map, there’s currently a gap in the async spans: only the client spans are reported. That’s why you don’t see a direct relation between the todo-service-daprd and notification-service-daprd sidecars, even though messages are flowing between them. We’re actively working with the Dapr community to close this gap, so async communication will appear in the resource map in the future. You can follow the discussion in this GitHub issue.

Metrics complement the traces by showing how the system behaves over time. You’ll be able to observe request throughput via dapr_http_server_request_count, pub/sub reliability by watching dapr_component_pubsub_ingress_count with process_status labels, and state operation latencies through dapr_component_state_latencies. On top of that, PostgreSQL contributes database performance data, RabbitMQ exposes queue depths and broker health, and the Java agent provides JVM runtime metrics.

To push the system further, you can experiment with failures. Shut down a PostgreSQL pod and watch how Dapr retries state operations while traces light up with errors. Scale the notification service down to zero and see RabbitMQ queues begin to grow. Or inject artificial latency into the todo service and observe how it propagates through the trace, affecting every dependent service. These experiments show not just that observability works, but why it matters in production.

This demo isn’t just a sample - it’s a sandbox. By running it locally, you can explore how telemetry reveals both expected behavior and unexpected failures, building intuition for how Dapr and OpenTelemetry fit together in real-world systems.

Give it a try: dash0-examples/dapr.

Aligning App and Sidecar Telemetry

One of the lessons the demo makes clear is that application and runtime signals don’t always align by default. The sidecar and the service each emit their own telemetry, and without correlation they can show up in your backend as two separate “services.” To make sense of the traces and metrics, you need to connect the dots.

The OpenTelemetry Collector’s k8sattributes processor helps by tagging spans and metrics with the same Kubernetes metadata. With attributes like k8s.pod.uid attached, the sidecar’s spans and the application’s spans are recognized as part of the same pod, and therefore the same resource.

In practice, you may still want to distinguish them. In the demo, sidecars are named with a -daprd suffix - for example, the validation service appears as validation-service while its sidecar is validation-service-daprd. This makes it easy to tell at a glance which spans originate from business logic and which from the Dapr runtime. Both approaches are valid: unify them when you want correlation, or keep them separate when you want clarity of attribution. And because the demo uses the -daprd naming convention, you can explore both perspectives directly.

Final thoughts

Dapr moves critical responsibilities - service invocation, messaging, state, security - into a sidecar. That abstraction simplifies development, but it also hides half of your system’s behavior outside your code. Observability is how you bring it back into view.

OpenTelemetry makes Dapr observable by unifying signals. Traces connect workflows end-to-end, across synchronous calls and asynchronous queues. Metrics show runtime health, error rates, and latency over time. Logs add the narrative details that explain what happened and why. The Collector enriches these signals with Kubernetes metadata, and the Operator provides no-touch auto-instrumentation so your services emit their own spans without code changes.

The result is one coherent dataset you can export to any OTLP-compatible backend. In our case we used Dash0, but the pipeline is portable. Start with traces to understand workflows, add metrics to watch runtime health, and bring in logs for context.

And then, don’t just read about it - see it in action. The dash0-examples/dapr demo shows these concepts working together in a minimal production-like system. Run it, break it, and watch observability guide you back to understanding. That’s the real power of Dapr and OpenTelemetry combined.