Dash0 Raises $35 Million Series A to Build the First AI-Native Observability Platform

Common Metric Query Gotchas

This guide covers common PromQL pitfalls that can lead to inaccurate dashboards and missed alerts in Prometheus. From misaligned time ranges to aggregation errors, these subtle mistakes can significantly impact your monitoring reliability and incident response. Learn how to avoid the most frequent PromQL traps with concrete examples and actionable solutions for building better alerts and dashboards.

Range misaligned to step sizes

One of the most subtle and common mistakes in PromQL is using a range selector that doesn't match your visualization or evaluation interval. This creates a dangerous situation where your charts and alerts don't reflect the actual behavior you're trying to monitor.

The Problem

When you query with a range selector like sum (increase({otel_metric_name="dash0.spans"}[1m])) , you're calculating the rate over a 1-minute window. However, if your chart's step size (the interval between data points) is set to 2 minutes, PromQL will evaluate this query every 2 minutes, but each evaluation only looks at the most recent 1 minute of data.

This means you're missing half of the data. If you have a traffic spike that lasts 1 minute but occurs in the middle of a 2-minute evaluation window, you might only capture a fraction of it or miss it entirely.

Concrete Example

Consider these underlying data points powered by Dash0’s synthetic dash0.spans metric. In this example we assume a step size of 2 minutes.

The following example illustrates a metric query that can yield surprising and incorrect results. The metric query is only covering half of the step size. This results in missing data for the time between minutes two and four. Spikes that occur within that time range wouldn’t appear in any data point. Notice how the range selector is explicitly set to 1m .

Let’s look at a query where the step size and range are aligned. This query explicitly sets the range to 2m, and hence, all the underlying data is covered by the query evaluation.

However, a downside to this approach is that the step size is dynamic. The step size is dynamically chosen when the Dash0 UI is rendered based on the selected time range and available space for charts. For example, it doesn’t make sense to query with a step size that results in 2000 data points when only 500 horizontal pixels are available. This would only leave 0.25 pixels per data point for rendering purposes. To solve this, you can leverage the $__interval and $__rate_interval variables instead of the explicit 2m in the range selector. Both of these variables will remain aligned as closely as possible to the step size.

info

Time series chart tooltips are communicating the actual step size through the presented timestamps. Within dashboarding, you can even configure a minimum step size.

Impact on Alerting

This issue is even more critical for alerts. If your alert rule uses a range of 30s and is evaluated every 1m, you are always going to miss 30 seconds of data in your alert evaluation! The solution is simple: make sure the range is equal to or larger than the evaluation frequency. See the following screenshot for a correctly configured example.

Last updated: December 22, 2025