If you're running applications on a modern Linux system, the systemd journal is probably doing more work behind the scenes than you realize. It serves as the central log service, capturing everything from kernel messages and boot-time errors to the stdout and stderr of all your systemd services.

The OpenTelemetry Collector's journald receiver provides a direct pipeline into this valuable stream of system and application telemetry.

The official documentation covers configuration, but it often leaves out the bigger picture: why this receiver matters and how to make the most of it. That's what this guide is for. We'll look at real-world scenarios, common permission pitfalls, and ways to turn raw journal entries into structured, actionable observability data.

Let's dive in!

What is `systemd-journald` and why use this receiver?

Before looking at configuration, it helps to understand what sets systemd-journald apart from earlier logging systems.

Unlike plain-text log files such as /var/log/syslog, journald stores events in a structured binary format. Each entry is a collection of key–value pairs that capture both the message and its surrounding context.

Unlike plain-text, file-based logging (e.g., /var/log/syslog), journald stores events in a structured, binary format. Instead of simple log lines, each entry is a bundle of key–value pairs that capture both the message and its context.

By default, journald enriches every log entry with metadata such as:

_SYSTEMD_UNIT: The service or systemd target that generated the log.
_PID: The process ID of the logging process.
_UID: The user ID of the logging process.
_HOSTNAME: The host where the log originated.
PRIORITY: The standard syslog severity level (e.g., 3 for error, 6 for info).
And many more.

This built-in metadata makes journald a powerful source of observability data. Instead of parsing unstructured text and trying to extract meaning, you start with structured context out of the box.

To make the most of it, though, you still need to align these fields with the OpenTelemetry log data model.

By applying transformations at ingestion, you can normalize the data, promote key fields into top-level attributes, and ensure your logs integrate cleanly with traces and metrics.

In short, if you're running on Linux, the journald receiver is the most direct way to get logs into your OpenTelemetry pipeline.

Quick start: tailing your system logs

Let's start by getting your Systemd logs flowing through the OpenTelemetry Collector. We'll begin by tailing recent journal entries so you can confirm the journald receiver is working.

First, create a minimal otelcol.yaml for your Collector:

yaml
1234567891011121314
# otelcol.yaml
receivers:
  journald:
    directory: /var/log/journal # defaults to /run/log/journal or /run/journal

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [journald]
      exporters: [debug]

This configuration wires the journald receiver directly to a debug exporter. With no filters, every log entry available to the Collector process is ingested, and because the receiver defaults to "tail mode", you'll only see new entries as they arrive.

The debug exporter prints them to stdout with full verbosity, which means you'll get the raw message as well as all the structured metadata (_SYSTEMD_UNIT, _PID, PRIORITY, etc.).

To run the journald receiver in Docker, you need to address two main challenges: the official Collector image lacks the necessary journalctl binary, and the container needs access to the host machine's journal logs. This involves creating a custom Docker image and running it with the correct volume mounts.

Here's a Dockerfile you can use:

Dockerfile
1234567891011121314151617181920212223242526
FROM debian:13-slim

ARG OTEL_VERSION=0.135.0

RUN apt-get update && \
    apt-get install -y systemd wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN wget "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcol-contrib_${OTEL_VERSION}_linux_amd64.tar.gz" && \
    tar -xzf "otelcol-contrib_${OTEL_VERSION}_linux_amd64.tar.gz" && \
    mv "otelcol-contrib" /usr/local/bin/otelcol-contrib && \
    rm "otelcol-contrib_${OTEL_VERSION}_linux_amd64.tar.gz"

RUN chmod +x /usr/local/bin/otelcol-contrib

RUN groupadd --system --gid 10001 otel && \
    useradd --system --uid 10001 --gid otel otel

RUN usermod -aG systemd-journal otel

USER otel

ENTRYPOINT ["/usr/local/bin/otelcol-contrib"]

CMD ["--config", "/etc/otelcol-contrib/config.yaml"]

This Dockerfile builds a container image that runs the OpenTelemetry Collector with access to systemd journal logs. It installs the necessary dependencies, downloads the requested version of otelcol-contrib, and places it in /usr/local/bin with the correct permissions.

To run securely, the image creates a dedicated non-root otel user, adds it to the systemd-journal group so it can read host logs, and switches to that user. The container starts by executing the Collector binary and expects a configuration file to be mounted at /etc/otelcol-contrib/config.yaml.

When you run the container, you must mount your Collector configuration file and the host's journal directory into the container so that the journald receiver can access it.

Here's a docker-compose.yml file that does just that:

yaml
12345678910
# docker-compose.yml
services:
  otelcol:
    build: .
    container_name: otelcol
    restart: unless-stopped
    volumes:
      - ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
      - /var/log/journal:/var/log/journal:ro
    command: ["--config=/etc/otelcol-contrib/config.yaml"]

With your Dockerfile, otelcol.yaml, and docker-compose.yml files in the same directory, you can start the Collector with a single command:

bash
1
docker compose up -d

This basic pipeline is for confirming that the Collector has permission to read the journal and helping you inspect what data is available before you start filtering, enriching, or forwarding logs to a backend.

Once the Collector is up and running, you can check the logs with:

bash
1
docker compose logs otelcol -f

You should immediately see logs from your system printed to the console. Here's what a single record looks like:

text
123456789
LogRecord #0
ObservedTimestamp: 2025-09-24 08:13:57.494266297 +0000 UTC
Timestamp: 2025-09-24 08:12:37.178081 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Map({"MESSAGE":"Received disconnect from 180.101.88.228 port 11349:11:  [preauth]","PRIORITY":"6","SYSLOG_IDENTIFIER":"sshd","_BOOT_ID":"0fb705b9f6e34383ab5dcf01f01cc301","_CAP_EFFECTIVE":"0","_COMM":"cat","_GID":"1000","_HOSTNAME":"falcon","_MACHINE_ID":"4a3dc42bf0564d50807d1553f485552a","_PID":"19983","_RUNTIME_SCOPE":"system","_STREAM_ID":"6073693a010545748e7bb93cf40d290e","_TRANSPORT":"stdout","_UID":"1000","__CURSOR":"s=e4e334c7ca514019a5be6442d7ecd6f9;i=e72f;b=0fb705b9f6e34383ab5dcf01f01cc301;m=2eab6a9e6e;t=63f879d94dee1;x=bfb32709d845fa9e","__MONOTONIC_TIMESTAMP":"200444386926","__SEQNUM":"59183","__SEQNUM_ID":"e4e334c7ca514019a5be6442d7ecd6f9"})
Trace ID:
Span ID:
Flags: 0

This output shows that the receiver is working correctly. By default, it has captured a complete journal entry and placed all of its structured fields as key-value pairs inside a Map within the log's Body.

You can see useful context directly in the map, such as the service name (_SYSTEMD_UNIT: systemd-resolved.service) and the actual log line (MESSAGE: Clock change detected. Flushing caches.).

One thing you'll notice, though, is that all of these fields are sitting inside the Body map. The top-level Attributes field is missing, and the SeverityNumber is unset.

In the next section, we'll look at how to further refine this data so that they are fully compliant with the OpenTelemetry log data model and semantic conventions.

Transforming logs with operators

The journald receiver's main role is to get raw log entries from the system journal into the Collector pipeline. Often, these entries often need further parsing and restructuring to fully align with the OpenTelemetry model.

To handle this, the receiver supports Stanza operators that can transform logs at ingestion time. Operators can move fields, flatten maps, parse messages, or drop unneeded data before the logs move downstream.

From the previous section, you saw that the structured journald metadata is all buried inside the Body map. To make this data usable, the first step is to move it into Attributes:

yaml
1234567
receivers:
  journald:
    directory: /var/log/journal
    operators:
      - type: move
        from: body
        to: attributes["body"]

Since move cannot write directly to the attributes root, we place the content under the body subfield. At this stage, everything sits under Attributes.body:

text
12345678910
ObservedTimestamp: 2025-09-24 08:20:56.515103556 +0000 UTC
Timestamp: 2025-09-24 08:10:47.894808 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Empty()
Attributes:
     -> body: Map({"MESSAGE":"Received disconnect from 180.101.88.228 port 11349:11:  [preauth]","PRIORITY":"6","SYSLOG_IDENTIFIER":"sshd","_BOOT_ID":"0fb705b9f6e34383ab5dcf01f01cc301","_COMM":"cat","_GID":"1000","_HOSTNAME":"falcon","_MACHINE_ID":"4a3dc42bf0564d50807d1553f485552a","_PID":"19616","_RUNTIME_SCOPE":"system","_STREAM_ID":"f4dc341a347d44d58c3cee95fa362d92","_TRANSPORT":"stdout","_UID":"1000","__CURSOR":"s=e4e334c7ca514019a5be6442d7ecd6f9;i=e707;b=0fb705b9f6e34383ab5dcf01f01cc301;m=2ea480e314;t=63f8797115718;x=67d491c4912333da","__MONOTONIC_TIMESTAMP":"200328405780","__SEQNUM":"59143","__SEQNUM_ID":"e4e334c7ca514019a5be6442d7ecd6f9"})
Trace ID:
Span ID:
Flags: 0

Next, flatten the Attributes.body map so that each journal field becomes a direct child of Attributes:

yaml
123456789
receivers:
  journald:
    directory: /var/log/journal
    operators:
      - type: move
        from: body
        to: attributes["body"]
      - type: flatten
        field: attributes["body"]

The result is that all the journal fields are now directly nested under Attributes:

text
123456789101112131415161718192021222324252627
ObservedTimestamp: 2025-09-24 08:22:59.057886367 +0000 UTC
Timestamp: 2025-09-24 08:12:37.178081 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Empty()
Attributes:
     -> _RUNTIME_SCOPE: Str(system)
     -> __CURSOR: Str(s=e4e334c7ca514019a5be6442d7ecd6f9;i=e72f;b=0fb705b9f6e34383ab5dcf01f01cc301;m=2eab6a9e6e;t=63f879d94dee1;x=bfb32709d845fa9e)
     -> PRIORITY: Str(6)
     -> __SEQNUM_ID: Str(e4e334c7ca514019a5be6442d7ecd6f9)
     -> MESSAGE: Str(Received disconnect from 180.101.88.228 port 11349:11:  [preauth])
     -> _MACHINE_ID: Str(4a3dc42bf0564d50807d1553f485552a)
     -> __MONOTONIC_TIMESTAMP: Str(200444386926)
     -> _UID: Str(1000)
     -> __SEQNUM: Str(59183)
     -> _COMM: Str(cat)
     -> _TRANSPORT: Str(stdout)
     -> _CAP_EFFECTIVE: Str(0)
     -> _GID: Str(1000)
     -> SYSLOG_IDENTIFIER: Str(sshd)
     -> _BOOT_ID: Str(0fb705b9f6e34383ab5dcf01f01cc301)
     -> _PID: Str(19983)
     -> _STREAM_ID: Str(6073693a010545748e7bb93cf40d290e)
     -> _HOSTNAME: Str(falcon)
Trace ID:
Span ID:
Flags: 0

The Body field is still empty, though, which isn't very useful if you want the actual log message visible in queries. To fix this, move the MESSAGE field from Attributes back into the Body:

yaml
123456789101112
receivers:
  journald:
    directory: /var/log/journal
    operators:
      - type: move
        from: body
        to: attributes["body"]
      - type: flatten
        field: attributes["body"]
      - type: move
        from: attributes["MESSAGE"]
        to: body

At this point the Body contains the original log line, while all the supporting metadata remains in Attributes:

text
123456
ObservedTimestamp: 2025-09-24 08:24:39.676402302 +0000 UTC
Timestamp: 2025-09-24 08:12:37.178081 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(Received disconnect from 180.101.88.228 port 11349:11:  [preauth])
. . .

You can go farther by using the regex_parser operator to pull out the client IP and port from the SSH disconnect message and add them to the Attributes:

yaml
1234567891011121314151617
receivers:
  journald:
    directory: /var/log/journal
    operators:
      - type: move
        from: body
        to: attributes["body"]
      - type: flatten
        field: attributes["body"]
      - type: move
        from: attributes["MESSAGE"]
        to: body
      - type: regex_parser
        parse_from: body
        regex:
          'Received disconnect from (?P<client_address>[\d.]+) port
          (?P<client_port>\d+)'

You will observe the client_address and client_port fields in the Attributes field as follows:

text
1234
Attributes:
     -> client_address: Str(180.101.88.228)
     -> client_port: Str(11349)
    . . .

From this point, you can continue using other operators or the OpenTelemetry Transform Language (OTTL) for more advanced transformations. Common next steps include:

Rename client_address to client.address and client_port to client.port to conform to OpenTelemetry semantic conventions.
Map the numeric syslog PRIORITY to proper OpenTelemetry severity fields.
Map the host and process context to their respective resource attributes, then drop noisy keys you no longer need.

You can see a practical example of how this looks in practice here.

Configuring the `journald` receiver

The journald receiver is designed to work well out of the box for common use cases, but knowing its defaults helps you avoid surprises and ensures you're collecting the right data consistently.

Choosing a starting position

By default, the start_at parameter is set to end. When the Collector starts, the receiver only reads new logs written to the journal from that moment onward. Historical logs already in the journal are skipped.

If you want to ingest all existing logs on the first run, change this setting to beginning:

yaml
123
receivers:
  journald:
    start_at: beginning

Enabling cursor persistence

The receiver tracks its place in the journal with a cursor. By default, this cursor is stored in memory only so if the process restarts, it loses that position and resumes at the end of the journal (assuming start_at is set to end). Any logs written to the journal while the Collector was down are lost permanently.

In production, it's best to configure a storage extension like file_storage. This ensures the cursor is written to disk and survives restarts, allowing the receiver to pick up exactly where it left off:

yaml
12345678910
receivers:
  journald:
    storage: file_storage/journald

extensions:
  file_storage/journald:
    directory: .

service:
  extensions: [file_storage/journald]

Understanding default filters

By default, journalctl filters such as units, matches, and identifiers are unset. With no filters, the receiver collects logs from all systemd units, which is rarely what you want in practice.

The only filter applied by default is priority, which is set to info to exclude lower-level logs like debug messages.

yaml
123
receivers:
  journald:
    priority: info # the default

Let's look at filtering in more detail next.

Filtering journald logs

Tailing the entire journal is rarely useful and often wasteful. The journald receiver provides the ability to query the journal for specific logs before they ever enter the pipeline. This approach is far more efficient than ingesting everything and filtering later with a processor.

Behind the scenes, the receiver builds a journalctl command. Each configuration option maps directly to a journalctl flag, giving you fine-grained control over what gets collected.

1. Filtering by service

The most common use case is targeting logs from specific services you're responsible for. The units parameter accepts a list of systemd units to monitor.

For example, you can collect logs only from nginx.service and docker.service with the following configuration:

yaml
12345
receivers:
  journald:
    units:
      - nginx.service
      - docker.service

2. Filtering by priority

In many cases you only want to see problems rather than routine informational messages. The priority parameter lets you narrow collection to a chosen severity level and everything above it.

For example, setting the priority to warning ensures that only warnings, errors, critical alerts, and emergency messages are pulled into the pipeline:

yaml
123
receivers:
  journald:
    priority: warning

3. Filtering with grep

The grep option lets you narrow results to only those log entries whose MESSAGE field matches a given regular expression pattern. This is useful for quickly isolating events that contain specific keywords.

For example, to capture only logs where the message contains the text OOMKilled, use:

yaml
123
receivers:
  journald:
    grep: OOMKilled

This translates into a journalctl -g OOMKilled command behind the scenes. You can supply any valid regular expression, which makes it easy to search for patterns like error codes, substrings of log messages, or application-specific markers.

4. Filtering with dmesg

If you want to focus exclusively on low-level system events such as hardware errors, driver logs, or kernel panics, use the dmesg option. Enabling it adds the _TRANSPORT=kernel filter and limits output to logs from the current boot.

yaml
123
receivers:
  journald:
    dmesg: true

This effectively runs journalctl -b -k, ensuring that you only see entries from the kernel ring buffer. You can build a highly focused view of critical system-level events by combining with other filters like priority.

5. Filtering with identifiers

Every journal entry includes a SYSLOG_IDENTIFIER field, which typically records the name of the process that wrote the log. This is often different from the systemd unit name, and can be a useful way to filter when multiple processes share the same unit or when you care about a specific binary's output.

For example, you might want to capture only logs produced by the CRON process, regardless of which unit invoked it:

yaml
1234
receivers:
  journald:
    identifiers:
      - CRON

If you provide multiple identifiers, they are treated as an OR condition:

yaml
12345
receivers:
  journald:
    identifiers:
      - CRON
      - kernel

In this case, the receiver collects logs generated by either CRON or kernel.

Using identifiers is handy when a single unit manages multiple processes, or when you want to isolate output from a daemon without pulling in everything from its parent service.

6. Filtering on journal fields

For more complex scenarios, the matches option allows you to filter on any field in a journal entry. This gives you fine-grained control when you need to build advanced queries. Each item in the matches list acts as an OR condition, while the key–value pairs within a single item act as AND conditions.

For instance, if you want to capture only the logs from myapp.service that were generated by user ID 1001, you can combine both fields in the same match:

yaml
123456
receivers:
  journald:
    matches:
      # This single item means: _SYSTEMD_UNIT=myapp.service AND _UID=1001
      - _SYSTEMD_UNIT: myapp.service
        _UID: "1001"

In this configuration, both conditions must be true for a log to be collected.

You can also combine separate items in the list to express OR logic. For instance, if the goal is to collect all logs from either sshd.service or cron.service, the configuration would look like this:

yaml
12345
receivers:
  journald:
    matches:
      - _SYSTEMD_UNIT: sshd.service
      - _SYSTEMD_UNIT: cron.service

To discover which fields are available on your system and their possible values, check out our journalctl guide.

7. Combining multiple filters

When you need more than one filter, the journald receiver lets you layer them together in a logical way. The rules are simple:

Different filtering options are combined with AND logic.
Multiple values within the same option are combined with OR logic.

Let's look at an example:

yaml
1234567
receivers:
  journald:
    priority: warning
    matches:
      - _SYSTEMD_UNIT: containerd.service
      - _SYSTEMD_UNIT: kubelet.service
    grep: failed|error

This configuration results in a journalctl invocation similar to:

bash
123
journalctl --priority=warning \
  --unit=kubelet.service --unit=containerd.service \
  -g 'failed|error'

The logs returned will satisfy all of these conditions:

The log priority is warning or higher.
The _SYSTEMD_UNIT is either kubelet.service or containerd.service.
The MESSAGE field contains the string "failed" or "error" (case insensitive).

By combining filters in this way, you can zero in on the exact events you care about while leaving the rest of the noise behind.

Final thoughts

The journald receiver is an essential component for any observability strategy on Linux. By tapping directly into the system’s core logging service, you'll get deep visibility into the health and behavior of both hosts and services. With the right filtering and deployment setup, you can build a logging pipeline that is efficient, reliable, and rich with context.

Once your logs are structured and flowing, the natural next step is to send them to an OpenTelemetry-native platform such as Dash0. There, you can correlate logs with traces and metrics to see the complete story of your system’s behavior and performance.

Thanks for reading!

OpenTelemetry Journald Receiver: A Guide to Ingesting Systemd Logs