Dash0 Acquires Lumigo to Expand Agentic Observability Across AWS and Serverless

Last updated: March 18, 2026

Optimize PromQL Query Performance

Fast, accurate metric queries are essential for effective observability. Learn how Dash0's metrics system works and practical strategies to dramatically improve query performance.

This guide explains how Dash0's metrics system works, why certain queries can be slow, and practical strategies to improve query performance — ordered from most to least impactful.

Understanding the Two Types of Metrics

Dash0 provides two fundamentally different types of metrics, each with distinct performance characteristics.

Pre-Computed Metrics: The Fast Path

When you send metrics directly to Dash0 using OpenTelemetry SDKs, the Prometheus receiver, or other metric-producing integrations, these values are pre-computed at the source. Querying them is fast because Dash0 simply retrieves already-calculated values from optimized storage. These metrics enjoy 13-month retention and are ideal for long-term trend analysis, capacity planning, and SLO tracking.

Synthetic Metrics: Flexibility with a Performance Tradeoff

Dash0's synthetic metrics, such as dash0.spans, dash0.logs, dash0.spans.duration, and dash0.span.events, work differently. Rather than pre-computing aggregations, Dash0 calculates these metrics on the fly at query time by scanning raw span and log data. This provides remarkable flexibility: you can filter, group, and aggregate by any attribute without defining metrics upfront.

The tradeoff is performance. Every synthetic metric query must scan the underlying raw data — longer time ranges require scanning more data. Queries over recent data (last 24 hours) typically perform well since this data resides in fast local storage. Older data lives in S3-backed storage, which adds retrieval latency. Additionally, synthetic metrics have a 30-day retention limit compared to 13 months for pre-computed metrics.

PromQL Query Types

PromQL supports two execution modes that affect both what data you get and how quickly you get it.

Instant Queries

An instant query evaluates your PromQL expression at one specific timestamp and returns a single result per matching time series. They are fast because they perform one evaluation regardless of how much historical data you're viewing.

promql
12
# Evaluates once, returns one value per series
rate(http_requests_total[$__rate_interval])

Range Queries

A range query evaluates your expression at multiple timestamps across a time range, running many instant queries at regular intervals (the "step"). Performance scales with (end_time - start_time) / step — a 7-day query with 15-second steps requires 40,320 evaluations, while the same query with 5-minute steps requires only 2,016.

The last_over_time() Alternative

When you need a single recent value with explicit control over the lookback window:

promql
1
last_over_time(http_requests_total[5m])

Optimization Strategies

1. Filter by Resource Attributes First

This is the single most effective optimization. Dash0's storage is optimized for queries filtered by resource attributes. Adding a service_name filter to a query scanning millions of spans can reduce execution time from seconds to milliseconds.

promql
12
# Good: filter before computation
sum(rate({otel_metric_name="dash0.spans", service_name="checkout-service"}[$__rate_interval]))

The most effective resource attributes for filtering:

  • service_name — almost always your first filter
  • k8s_namespace_name — Kubernetes namespace isolation
  • k8s_deployment_name — workload-level filtering
  • deployment_environment_name — separate production from staging
  • dash0_resource_name — works for all resource types

Note that in PromQL, dots in attribute names are replaced with underscores (e.g. k8s.deployment.namek8s_deployment_name).

2. Use OpenTelemetry Attribute Filters

Beyond resource attributes, filtering on span and log attributes reduces the volume of data processed. For log queries, otel_log_severity_range is particularly powerful:

promql
12
# Only scan ERROR-level logs
sum by(k8s_deployment_name) (increase({otel_metric_name="dash0.logs", otel_log_severity_range="ERROR"}[$__interval])) > 0

For span queries, otel_span_status_code quickly isolates errors:

promql
1
sum by(service_name) (rate({otel_metric_name="dash0.spans", otel_span_status_code="ERROR"}[$__rate_interval]))

3. Align Range Selectors with Step Intervals

Use $__rate_interval for all rate() and increase() queries, and $__interval for aggregation functions like avg_over_time(). See Common Metric Query Issues for a detailed explanation of why misalignment causes missing data.

4. Manage Metric Cardinality

Cardinality — the total number of unique time series — grows multiplicatively with each label. A metric with 5 services × 10 endpoints × 3 methods × 12 histogram buckets creates 1,800 time series from just four labels. High cardinality strains memory, slows queries, and increases costs.

Avoid high-cardinality labels:

Problematic labelBetter alternative
user.iduser.tier (free/premium)
request.idRemove entirely
k8s.pod.uidk8s.deployment.name
net.sock.peer.addrcloud.region

Use the Metric Explorer in Dash0 to view cardinality information at a glance — the cardinality score, count, and resource count columns help you quickly identify problematic metrics.

5. Materialize Frequently-Used Calculations

If you repeatedly query the same expensive aggregation, emit it as a pre-aggregated metric from your application. For latency percentiles you check constantly, emitting a histogram metric from your service will always outperform calculating histogram_quantile() over synthetic span duration data.

6. Choose Appropriate Time Ranges

Narrowing your query window directly improves performance for synthetic metrics:

  • Dashboards: Use relative time ranges ("Last 6 hours") rather than fixed ranges spanning weeks.
  • Alerts: Evaluate over short windows (1–5 minutes) where possible.
  • Investigations: Start narrow, then widen only if needed.

Data within the last 24 hours resides in fast local storage; older data requires object storage retrieval.

Expectations for Historical Span and Log Queries

For real-time monitoring and alerting, synthetic metrics work excellently over short time ranges. A 5-minute rate() query filtered by service_name executes in milliseconds.

For historical analysis, emit pre-aggregated metrics:

promql
12345
# Fast: pre-aggregated metric, 13-month retention
rate(http_server_request_duration_seconds_count{service_name="checkout"}[$__rate_interval])
# Slow over long ranges: synthetic metric scanning raw spans
rate({otel_metric_name="dash0.spans", service_name="checkout"}[$__rate_interval])

Retention Summary

SignalRetention
Spans, logs, span events30 days
Pre-aggregated metrics13 months

If you need year-over-year comparisons or long-term SLO tracking, pre-aggregated metrics are your only option.

Summary

The most impactful optimizations are filtering by resource attributes (especially service_name), using proper interval variables ($__rate_interval), and choosing pre-aggregated metrics for historical analysis. Managing cardinality and keeping query time ranges appropriate for your use case further improve the experience.

For dashboards that need both real-time detail and historical context, consider a hybrid approach: synthetic metrics for recent data with tight filters, and pre-aggregated metrics for trend lines and long-term views.