Last updated: March 18, 2026
Optimize PromQL Query Performance
This guide explains how Dash0's metrics system works, why certain queries can be slow, and practical strategies to improve query performance — ordered from most to least impactful.
Understanding the Two Types of Metrics
Dash0 provides two fundamentally different types of metrics, each with distinct performance characteristics.
Pre-Computed Metrics: The Fast Path
When you send metrics directly to Dash0 using OpenTelemetry SDKs, the Prometheus receiver, or other metric-producing integrations, these values are pre-computed at the source. Querying them is fast because Dash0 simply retrieves already-calculated values from optimized storage. These metrics enjoy 13-month retention and are ideal for long-term trend analysis, capacity planning, and SLO tracking.
Synthetic Metrics: Flexibility with a Performance Tradeoff
Dash0's synthetic metrics, such as dash0.spans, dash0.logs, dash0.spans.duration, and dash0.span.events, work differently. Rather than pre-computing aggregations, Dash0 calculates these metrics on the fly at query time by scanning raw span and log data. This provides remarkable flexibility: you can filter, group, and aggregate by any attribute without defining metrics upfront.
The tradeoff is performance. Every synthetic metric query must scan the underlying raw data — longer time ranges require scanning more data. Queries over recent data (last 24 hours) typically perform well since this data resides in fast local storage. Older data lives in S3-backed storage, which adds retrieval latency. Additionally, synthetic metrics have a 30-day retention limit compared to 13 months for pre-computed metrics.
PromQL Query Types
PromQL supports two execution modes that affect both what data you get and how quickly you get it.
Instant Queries
An instant query evaluates your PromQL expression at one specific timestamp and returns a single result per matching time series. They are fast because they perform one evaluation regardless of how much historical data you're viewing.
12# Evaluates once, returns one value per seriesrate(http_requests_total[$__rate_interval])
Range Queries
A range query evaluates your expression at multiple timestamps across a time range, running many instant queries at regular intervals (the "step"). Performance scales with (end_time - start_time) / step — a 7-day query with 15-second steps requires 40,320 evaluations, while the same query with 5-minute steps requires only 2,016.
The last_over_time() Alternative
When you need a single recent value with explicit control over the lookback window:
1last_over_time(http_requests_total[5m])
Optimization Strategies
1. Filter by Resource Attributes First
This is the single most effective optimization. Dash0's storage is optimized for queries filtered by resource attributes. Adding a service_name filter to a query scanning millions of spans can reduce execution time from seconds to milliseconds.
12# Good: filter before computationsum(rate({otel_metric_name="dash0.spans", service_name="checkout-service"}[$__rate_interval]))
The most effective resource attributes for filtering:
service_name— almost always your first filterk8s_namespace_name— Kubernetes namespace isolationk8s_deployment_name— workload-level filteringdeployment_environment_name— separate production from stagingdash0_resource_name— works for all resource types
Note that in PromQL, dots in attribute names are replaced with underscores (e.g. k8s.deployment.name → k8s_deployment_name).
2. Use OpenTelemetry Attribute Filters
Beyond resource attributes, filtering on span and log attributes reduces the volume of data processed. For log queries, otel_log_severity_range is particularly powerful:
12# Only scan ERROR-level logssum by(k8s_deployment_name) (increase({otel_metric_name="dash0.logs", otel_log_severity_range="ERROR"}[$__interval])) > 0
For span queries, otel_span_status_code quickly isolates errors:
1sum by(service_name) (rate({otel_metric_name="dash0.spans", otel_span_status_code="ERROR"}[$__rate_interval]))
3. Align Range Selectors with Step Intervals
Use $__rate_interval for all rate() and increase() queries, and $__interval for aggregation functions like avg_over_time(). See Common Metric Query Issues for a detailed explanation of why misalignment causes missing data.
4. Manage Metric Cardinality
Cardinality — the total number of unique time series — grows multiplicatively with each label. A metric with 5 services × 10 endpoints × 3 methods × 12 histogram buckets creates 1,800 time series from just four labels. High cardinality strains memory, slows queries, and increases costs.
Avoid high-cardinality labels:
| Problematic label | Better alternative |
|---|---|
user.id | user.tier (free/premium) |
request.id | Remove entirely |
k8s.pod.uid | k8s.deployment.name |
net.sock.peer.addr | cloud.region |
Use the Metric Explorer in Dash0 to view cardinality information at a glance — the cardinality score, count, and resource count columns help you quickly identify problematic metrics.
5. Materialize Frequently-Used Calculations
If you repeatedly query the same expensive aggregation, emit it as a pre-aggregated metric from your application. For latency percentiles you check constantly, emitting a histogram metric from your service will always outperform calculating histogram_quantile() over synthetic span duration data.
6. Choose Appropriate Time Ranges
Narrowing your query window directly improves performance for synthetic metrics:
- Dashboards: Use relative time ranges ("Last 6 hours") rather than fixed ranges spanning weeks.
- Alerts: Evaluate over short windows (1–5 minutes) where possible.
- Investigations: Start narrow, then widen only if needed.
Data within the last 24 hours resides in fast local storage; older data requires object storage retrieval.
Expectations for Historical Span and Log Queries
For real-time monitoring and alerting, synthetic metrics work excellently over short time ranges. A 5-minute rate() query filtered by service_name executes in milliseconds.
For historical analysis, emit pre-aggregated metrics:
12345# Fast: pre-aggregated metric, 13-month retentionrate(http_server_request_duration_seconds_count{service_name="checkout"}[$__rate_interval])# Slow over long ranges: synthetic metric scanning raw spansrate({otel_metric_name="dash0.spans", service_name="checkout"}[$__rate_interval])
Retention Summary
| Signal | Retention |
|---|---|
| Spans, logs, span events | 30 days |
| Pre-aggregated metrics | 13 months |
If you need year-over-year comparisons or long-term SLO tracking, pre-aggregated metrics are your only option.
Summary
The most impactful optimizations are filtering by resource attributes (especially service_name), using proper interval variables ($__rate_interval), and choosing pre-aggregated metrics for historical analysis. Managing cardinality and keeping query time ranges appropriate for your use case further improve the experience.
For dashboards that need both real-time detail and historical context, consider a hybrid approach: synthetic metrics for recent data with tight filters, and pre-aggregated metrics for trend lines and long-term views.