Dash0 Raises $35 Million Series A to Build the First AI-Native Observability Platform

Last updated: December 13, 2025

Infrastructure Monitoring with OpenTelemetry Host Metrics

When you're monitoring infrastructure with OpenTelemetry, the Host Metrics Receiver (hostmetrics) is one of the most relevant components to reach for.

It fully replaces traditional agents (like Prometheus Node Exporter), and collects essential system metrics such as CPU, memory, disk, and network usage directly from the machine where the Collector is running.

Because this receiver needs direct access to the underlying system, it's intended to be used when the Collector is deployed as an Agent. For example, as a DaemonSet on Kubernetes nodes or as a service on a VM or bare-metal host, not as a centralized gateway.

In this guide, you'll learn how to configure it as a Node Exporter alternative for monitoring your server infrastructure.

Quick start: collecting host metrics

To see it in action, let's spin up a Docker Compose setup that uses the OpenTelemetry Collector to scrape your host's vitals and send that data directly to Prometheus.

First, create a docker-compose.yaml to orchestrate the services:

yaml
12345678910111213141516171819202122232425262728293031
# docker-compose.yaml
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.140.0
container_name: otelcol
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
- /:/hostfs:ro
restart: unless-stopped
depends_on:
- prometheus
prometheus:
image: prom/prometheus:v3.7.3
container_name: prometheus
restart: unless-stopped
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=15d
- --web.enable-lifecycle
# Enable Prometheus to accept OTLP writes directly
- --web.enable-otlp-receiver
ports:
- 9090:9090
volumes:
prometheus_data:

There are two key details here:

  1. You must mount the host's root directory (/) to /hostfs inside the container. Without this, the Collector would only see the statistics of its own tiny container, not your actual server.

  2. The --web.enable-otlp-receiver flag enables Prometheus to expose an OTLP endpoint, allowing the Collector to send data to it over standard HTTP.

Next, create the otelcol.yaml configuration file that enables the hostmetrics receiver:

yaml
123456789101112131415161718192021222324252627282930
# otelcol.yaml
receivers:
hostmetrics:
# Tell the scrapers to look at the mounted volume, not the container's /
root_path: /hostfs
collection_interval: 1m # Default is 1m, adjust based on resolution needs
scrapers:
cpu:
memory:
load:
network:
disk:
filesystem:
processors:
batch:
exporters:
otlphttp/prometheus:
# Push metrics to Prometheus's OTLP receiver
metrics_endpoint: http://prometheus:9090/api/v1/otlp/v1/metrics
tls:
insecure: true
service:
pipelines:
metrics:
receivers: [hostmetrics]
processors: [batch]
exporters: [otlphttp/prometheus]

The most important setting here is root_path: /hostfs. It explicitly tells the hostmetrics receiver to read metrics from the host filesystem mounted in the previous step, instead of the container's own filesystem.

Finally, create a minimal prometheus.yml to prevent Prometheus from scraping any other targets, including itself:

yaml
123
# prometheus.yml
global:
scrape_interval: 10s

With no scrape configurations defined, Prometheus starts normally but does not collect its own metrics.

Now, run the command below to launch the services:

bash
1
docker compose up -d

After a minute, navigate to http://localhost:9090 in your browser. You should see metrics like system_cpu_time_seconds_total and system_memory_usage_bytes flowing in from your host.

Prometheus screenshot showing metric

Understanding the host metrics scrapers

The hostmetrics receiver only acts as a scheduler that coordinates when metrics are collected. The actual work is handled by scrapers, which are modular plugins that you enable individually in the scrapers section of your configuration.

If you're coming from the Prometheus ecosystem, these scrapers are roughly equivalent to Node Exporter collectors.

They emit metrics that generally follow OpenTelemetry Semantic Conventions, which can look quite different from the metric names and labels you're used to in Prometheus.

The table below shows how the most commonly used scrapers map to the Node Exporter collectors you may already be familiar with:

OTel ScraperMetric NameNode Exporter Equivalent
cpusystem.cpu.*node_cpu_*
memorysystem.memory.*node_memory_*
loadsystem.cpu.load_average.1mnode_load1
filesystemsystem.filesystem.*node_filesystem_*
disksystem.disk.*node_disk_*
networksystem.network.*node_network_*

While this covers the most commonly used scrapers, there are a few details worth noting:

  • The default memory scraper only reports physical RAM usage. If you're investigating performance issues related to memory pressure, you'll also need to enable the paging scraper to see swap usage and page faults.

  • Pay close attention to the difference between the processes (plural) and process (singular) scrapers:

    • processes is lightweight, as it simply counts the number of running or blocked processes on the host.
    • process is expensive, as it collects detailed CPU and memory metrics for every executable on the system. Enabling it without a strict allowlist is one of the fastest ways to blow up your metric cardinality.
  • The filesystem and disk scrapers are easy to confuse. The former measures free space and tells you when the disk is full, while the latter measures IOPS and throughput to help you debug slow I/O performance. In practice, you'll usually need both.

Configuring scrapers

In production environments, you need to be intentional about what each scraper collects to avoid flooding your backend with low-value or redundant data.

The most basic and important tuning lever is controlling which metric data points are emitted. Every scraper defines a set of default metrics, which are enabled out of the box, and optional metrics, which remain disabled unless you explicitly turn them on.

You control this behavior through the metrics block in the scraper's configuration:

yaml
12345678910111213141516
# otelcol.yaml
receivers:
hostmetrics:
scrapers:
cpu:
metrics:
system.cpu.utilization:
enabled: true
memory:
metrics:
system.memory.utilization:
enabled: true
filesystem:
metrics:
system.filesystem.utilization:
enabled: true

Reducing noise with filters

Some of the more resource-intensive scrapers like disk, filesystem, network, and process, support include and exclude rules to limit data collection to what actually matters. These filters help you avoid collecting metrics from irrelevant devices or especially noisy interfaces.

Filtering rules can be defined using exact string matches (strict) or regular expressions (regexp), depending on how much flexibility you need.

For example, to prevent the Network scraper from generating data for local loopback or Docker bridge interfaces, you can exclude them by name:

yaml
12345678
# otelcol.yaml
receivers:
hostmetrics:
scrapers:
network:
exclude:
interfaces: ["lo", "docker0"]
match_type: strict

The filesystem scraper is another common source of noise, especially on systems with many virtual or container-related mounts. In most cases, you'll want to exclude filesystems like tmpfs or overlay:

yaml
12345678
# otelcol.yaml
filesystem:
exclude_fs_types:
fs_types: ["tmpfs", "autofs", "overlay"]
match_type: strict
exclude_mount_points:
mount_points: ["/var/lib/docker/*"]
match_type: regexp

This kind of filtering keeps your metrics focused, reduces cardinality, and lowers ingestion and storage costs without sacrificing visibility into the parts of the system that matter.

Optimizing metric collection intervals

Not all metrics need to be collected at the same frequency. Highly volatile resources like CPU and memory benefit from high-resolution sampling to capture short-lived spikes, while slower-moving signals such as filesystem usage can be polled far less often without losing useful context.

To support this, you can define named instances of the hostmetrics receiver, each with its own collection_interval and a tailored set of scrapers. This lets you balance visibility against overhead more precisely:

yaml
1234567891011121314151617181920
receivers:
# High-resolution scraping for volatile metrics
hostmetrics/fast:
collection_interval: 10s
scrapers:
cpu:
memory:
# Lower-resolution scraping for stable metrics
hostmetrics/slow:
collection_interval: 1m
scrapers:
filesystem:
disk:
service:
pipelines:
metrics:
# Both instances feed into the same pipeline
receivers: [hostmetrics/fast, hostmetrics/slow]

Note that if you run multiple hostmetrics receiver instances in the same Collector pipeline, they must all use the same root_path setting to ensure they read from the same host filesystem.

Silencing permission errors

The process scraper is especially sensitive to permissions because it attempts to read detailed information for every process on the system, including those owned by root or other users. In restricted container environments, this often results in a steady stream of "permission denied" log messages.

You can suppress these specific errors without losing visibility into the processes the Collector is actually allowed to inspect:

yaml
123
process:
mute_process_user_error: true # Mute "user does not exist" errors
mute_process_io_error: true # Mute "permission denied" on I/O stats

One of the biggest gotchas with the hostmetrics receiver is that it emits effectively "naked" metrics. Out of the box, it does not attach essential resource metadata such as host.name, service.namespace, or cloud.region to the telemetry it produces.

Without these attributes, your backend receives a stream of CPU, memory, and disk metrics with no reliable way to tell which host, environment, or service they came from. The fix is to enrich the data before it leaves the Collector.

Processors like resourcedetection and k8sattributes can automatically query the underlying platform and cloud APIs to discover and attach the correct metadata at runtime.

yaml
123456
# otelcol.yaml
processors:
resourcedetection/docker:
detectors: [env, docker]
timeout: 2s
override: false
yaml
1234567891011
# docker-compose.yml
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.140.0
container_name: otelcol
volumes:
# the `docker` detector requires the Docker socket to be mounted
- /var/run/docker.sock:/var/run/docker.sock
# the `env` detector reads from OTEL_RESOURCE_ATTRIBUTES
environment:
- OTEL_RESOURCE_ATTRIBUTES=service.name=host-metrics-demo,service.version=1.0.0,deployment.environment.name=production

In your prometheus.yml, you'll want to promote key resource attributes to metric labels so they can be used for querying, grouping, and filtering:

yaml
12345
# prometheus.yml
otlp:
promote_resource_attributes:
- service.name
- host.name

By default, Prometheus translates promoted OTLP attribute names into its traditional label format, which means host.name becomes host_name as shown below:

You can now query Prometheus metrics using promoted OpenTelemetry resource attributes

For a deeper dive into managing and standardizing these attributes, see our resource processor guide.

Visualizing and alerting on host metrics

Once host metrics are flowing to your observability backend, you'll need to turn the raw time series into dashboards and alerts that help you spot problems before users do.

Since the hostmetrics receiver follows OpenTelemetry semantic conventions, the metrics map cleanly to the same concepts you're already used to from Node Exporter. The difference is mostly naming, not intent.

One straightforward way to build dashboards atop these metrics is with Perses, a CNCF-backed dashboard system designed to work well with a wide variety of data sources (including Prometheus).

One of Perses' biggest strengths is Dashboards-as-Code. Instead of clicking through a UI to build panels (which is hard to version control), you can define your views and panels through Go or CUE.

While we won't use that model in this guide, it's a powerful shift away from manually maintained, click-built dashboards.

Running Perses locally

To see Perses in action, add the following service to your docker-compose.yaml file:

yaml
123456789101112
# docker-compose.yaml
services:
# [...]
perses:
image: persesdev/perses:latest
container_name: perses
restart: unless-stopped
ports:
- 8080:8080
depends_on:
- prometheus

Start Perses with:

bash
1
docker compose up -d perses

Then open http://localhost:8080 in your browser to access the Perses UI.

Add a new Project in Perses

Click the ADD PROJECT button to add a new Project, then switch to the Datasources tab on the resulting page to add a new data source.

Configure it to point at your running Prometheus instance, then click SAVE:

Configuring Prometheus data source

Creating your first dashboard

Return to the Dashboards tab and click ADD DASHBOARD. After naming the dashboard, you can begin adding panels.

As a simple starting point, create a panel that shows memory utilization for your host and apply thresholds to make it obvious when the system is under pressure.

promql
1
100 * avg by (host_name) (system_memory_utilization_ratio{state="used"})

Configuring memory utilization gauge in Perses

Configuring memory utilization gauge in Perses

Once your first panel is in place, you can continue adding panels to cover other key resources. At a minimum, a useful host-level dashboard should let you answer these questions at a glance:

  • Is this machine CPU-bound?
  • Is it under memory pressure?
  • Is disk space running out?
  • Is disk, network, or I/O throughput becoming a bottleneck?

If you've used Node Exporter dashboards in the past, most of them can be adapted with minimal effort by translating metric names and labels to their OpenTelemetry equivalents.

Alerting on infrastructure symptoms

A good infrastructure alert signals a real risk to system health, fires early enough for someone to respond, and includes enough context to guide the next step in investigation.

Some of the most effective triggers include:

  • Sustained high CPU utilization over a meaningful window
  • Memory usage approaching exhaustion or increasing swap activity
  • Filesystems exceeding safe utilization thresholds
  • Disk I/O latency or queue depth increasing sharply
  • Network error rates or dropped packets

What you generally want to avoid are alerts on brief spikes or single data points. Infrastructure metrics naturally exhibit high short-term variability, so alerts should be based on sustained conditions and reasonable aggregation windows.

From infrastructure symptoms to finding the root cause

Infrastructure metrics are most effective when they aren't viewed in isolation. To find the root cause of an issue, they must be correlated with other signals such as traces and logs.

A CPU saturation alert becomes actionable only when you can immediately answer:

  • Which services are running on this host?
  • Which request, batch job, or background task triggered the spike?
  • Did this coincide with a deployment, a traffic surge, or a downstream failure?

At Dash0, this correlation is the default. Because the platform is OpenTelemetry-native, host metrics collected via OTLP are automatically linked to service-level traces and logs using shared resources, attribute context, and exemplars.

Dash0 is also compatible with Perses dashboards and uses PromQL across metrics, logs, and traces, which means you can:

  • Send native OTLP data without translation layers
  • Directly import and export Perses dashboards as JSON
  • Pivot from an infrastructure symptom to the exact trace or log line involved, using the same query language and mental model

Dash0 host metrics dashboard compatible with Perses

This approach eliminates vendor lock-in since your telemetry, dashboards, and queries remain portable OpenTelemetry, Perses, and PromQL artifacts that can be reused outside Dash0 or integrated with other tools as your stack evolves.

The result is a significantly shorter path from "this host looks unhealthy" to "here is the culprit", without the need to switch tools or guess at relationships.

Final thoughts

The Host Metrics receiver is one of the core building blocks of an infrastructure monitoring setup with OpenTelemetry, as it provides the baseline visibility every system relies on.

But metrics alone only tell part of the story. The payoff comes from being able to instantly correlate a spike in CPU usage with the specific service, trace, or database query causing it.

If you'd like to see what that looks like in practice, consider signing up for a free Dash0 trial.

Authors
Ayooluwa Isaiah
Ayooluwa Isaiah