Last updated: May 29, 2026
Understand the Dash0 Alerting Model
Reference guide for Dash0's alerting model, an extension of the Prometheus alerting model.
Understanding the underlying mechanics helps when writing advanced check rules, debugging unexpected firing behaviour, or importing existing Prometheus alerts.
Prometheus Foundation
Dash0 builds on the standard Prometheus alerting model, extending it with additional severity levels while maintaining full compatibility. In the standard Prometheus model, a check rule contains a PromQL expression. The rule fires if that expression returns any results during evaluation — there is no built-in concept of severity. Each distinct result represents a separate firing instance.
Dash0 check rules are fully compatible with this model. Any valid Prometheus alerting rule can be used in Dash0 without modification.
Failed Check Severity
In Prometheus, an alert is created when an alereting rule returns a value. There is no built-in notion of severity of the alert rule, which you usually manage with labels, and routing based on them in Alertmanager.
In Dash0, Failed checks that are still ongoing can have two severities:
- (DEGRADED)
- CRITICAL
By default, a failed check has severity CRITICAL, but you can change that by using thresholds.
The $__threshold Extension
Dash0 extends the Prometheus model with an optional $__threshold symbol that enables dual-severity alerting. When used in a check rule expression, it lets you specify two named severity levels — one for degraded and one for critical — with separate numeric thresholds. You can configure either one or both.
1sum(rate({otel_metric_name="http.server.errors", service_name="checkout"}[5m])) > $__threshold
If a check rule does not use $__threshold, it behaves exactly like a Prometheus alert: any non-empty result set fires the failed check. If both thresholds are configured, the higher value maps to critical and the lower to degraded. To express this in Prometheus, you would have to have two different alerting rules.
Health Status on the Service Map
Failed checks directly affect service health visualization in the Dash0 service map, providing at-a-glance operational status. Each service monitored by Dash0 has one of three health states, determined by its associated failed checks:
- Gray — healthy, no active check failures
- Yellow — degraded threshold exceeded
- Red — critical threshold exceeded
A failed check colors a service when the query result includes that service's service_name label. Aggregating away the service_name (for example, with an unqualified sum) produces results that are not associated with any service.
One Rule, Multiple Failed Checks
Understanding how check rules produce failed checks helps you design queries that generate the appropriate number of alerts for your use case. A single check rule can produce any number of simultaneously failed checks — one per distinct result returned by the expression. A rule with no by clause produces a single aggregated result.
A rule grouped by service_name or operation_name produces one failed check per unique value of that label, each independently tracked and colored.
Finite State Machine
Grace periods control when checks fire and resolve, preventing alert noise from transient spikes and flapping metrics. Each failed check transitions through states based on the configured grace periods:
- Trigger grace period — how many consecutive evaluation intervals the expression must exceed the threshold before the check fires and notifications are sent. Specified as a multiplier of the evaluation interval (e.g., 2× a 1-minute interval = 2 minutes). Prevents noise from transient spikes.
- Keep-firing grace period — how many evaluation intervals the check remains in a degraded or critical state after the expression drops below the threshold. Also specified as a multiplier. Prevents flapping.
Enablement Conditions
Enablement conditions provide a way to gate failed checks based on additional criteria evaluated alongside the main query. If the enablement condition is not met, the failed check does not fire even if the main query exceeds its thresholds.
This is useful for failed checks that should only be active when the system has relevant traffic or context.
Further Reading
- About Alert Monitoring — Overview of Dash0's alerting capabilities and how check rules monitor your systems.
- Create Check Rules — Set up check rules to monitor metrics, logs, spans, and web events using PromQL expressions and threshold values.
- Investigate Failed Checks — Troubleshoot failed checks by exploring the underlying telemetry and identifying root causes.
- Send Check Rule Notifications — Configure notification channels to keep your team informed when checks fail.
- Route Check Rule Notifications — Use label-based routing to direct alerts to the right teams automatically.