Logging began as a simple, developer-centric practice where printf statements or basic logging functions were used to output plain text messages. These logs were unstructured, human-readable, and primarily used for local debugging.

But modern software's shift towards complex, distributed architectures quickly outpaced this traditional logging approach.

With logs now streaming from a multitude of services, their unstructured format makes them slow and clunky to use for production troubleshooting.

Structured logging is what addresses these challenges. By treating logs as structured data, you leave tedious manual searches and guesswork behind in favor of automated analysis and fast cross-service correlation.

In this article, you’ll learn how structured logging works, and how it fits into a broader observability strategy for understanding and operating complex systems.

Understanding the different approaches to logging

To appreciate the value of structured logging, it is necessary to understand how the practice of logging has evolved over time.

What is unstructured logging?

Unstructured logs are the traditional, free-form approach to application logging where each log entry is a plain text message interspersed with variable data:

plain text

1
2025-05-21 14:30:45 [ERROR] Failed to connect to database: Connection timeout after 30 seconds

In some cases, the logs can follow loose conventions, such as including key-value pairs within text logs, but without adhering to a consistent, machine-readable format.

plain text

1
2025-05-21 14:31:02 [INFO] User login succeeded: user_id=87654 IP=192.168.1.102 method=email

While these logs are readable to humans, they’re difficult for machines to parse reliably. Extracting meaningful data often requires brittle parsing logic that can easily break with minor changes in formatting.

This makes large-scale analysis, automated alerting, and correlation with other telemetry (like traces and metrics) nearly impossible, slowing down the debugging process.

What is structured logging?

Instead of embedding context within free-form strings, structured logs organize data into well-defined fields with consistent naming and types.

The same database error from earlier might look like this in a structured JSON format:

json

123456789
{
  "timestamp": "2025-05-21T14:31:02Z",
  "level": "ERROR",
  "message": "Failed to connect to database",
  "error": {
    "type": "ConnectionTimeout",
    "timeout_seconds": 30
  }
}

This approach makes it trivial to query for all connection timeouts, analyze timeout durations, or correlate errors across services based on request IDs.

It’s also the gateway to making your logs a useful telemetry signal for observability especially when supercharged with OpenTelemetry as you’ll see later in this article.

Instrumenting your applications for JSON logging

To adopt structured logging in your applications, the first step is selecting a logging framework that supports structured output preferably in JSON format. This requires an API capable of attaching contextual metadata directly to log records.

Most language ecosystems offer robust options such as:

Serilog (C#/.NET).
Pino (Node.js).
Slog (Go).
Logback (Java).
Monolog (PHP).
Loguru (Python).

Once you’ve chosen and configured your framework, you can start instrumenting your services. Start with high-traffic or critical paths where improved visibility delivers immediate value.

Given an existing unstructured log like this:

go

1
log.Print("Processing " + r.Method + " request to " + r.URL.Path)

You can apply structure by extracting the contextual details and placing them in their own fields:

go

1234
slog.Info("Processing request",
  slog.String("path", r.URL.Path),
  slog.String("method", r.Method),
)

Which outputs a well-structured JSON log:

json

1234567
{
  "time": "2025-05-21T08:10:47.61335465+01:00",
  "level": "INFO",
  "msg": "Processing request",
  "path": "/hello",
  "method": "GET"
}

The value of structured logging comes from consistently attaching relevant contextual metadata to each log event. This gives you multiple ways to identify patterns and connect related events across your system.

A critical piece of log metadata is the request ID which allows you to trace all logs related to a single request across services and components. You’ll typically generate this ID at the edge of your infrastructure and propagate it throughout the entire request path.

In your code, you’ll need to pass the appropriate mechanisms to keep this request ID in scope for inclusion in your logs. For instance, in Go you can use the context API for this:

go

12345678910111213141516171819
func middleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // The request ID is retrieved  or created and added to the request context
		requestID := uuid.New().String()

        ctx = slogctx.Prepend(r.Context(), "request_id", requestID)

		r = r.WithContext(ctx)

		next.ServeHTTP(w, r)
	})
}

func createUserHandler(w http.ResponseWriter, r *http.Request) {
    u := createUser()

    // Logs using the request context will automatically include the ID
    slog.InfoContext(r.Context(), "user created", "user_id", u.id)
}

This results in logs like:

json

1234567
{
  "time": "2025-05-21T18:43:23.290798-07:00",
  "level": "INFO",
  "msg": "user created",
  "request_id": "dcefa10f-76c4-4ed3-9c3f-b6e940ad7621",
  "user_id": "user-1234"
}

You can apply this approach to include other contextual data. For example, if you’re integrating distributed traces, you may extract the trace ID, span ID, and trace flags from the request headers and map them to relevant context fields.

Some best practices for instrumenting structured logs

To guide your instrumentation efforts and maximize the potential of your logs, consider the following recommendations:

Enforce a consistent log schema across all sources for reliable querying and automation.
Specify units directly in attribute names (such as memory_usage_bytes) for unambiguous measurement interpretation.
If possible, include error stack traces in a structured format.
Ensure request IDs are propagated to all logs created during the request handling.
Include as many high cardinality contextual attributes to unlock nuanced correlation capabilities.

Handling legacy logs in a structured logging pipeline

Legacy systems and external dependencies often can’t emit structured logs directly. Fortunately, you can still bring them into your observability pipeline by transforming logs during or after ingestion.

In the best case scenario, you’ll be able to turn on structured logging through a configuration option. For example, PostgreSQL traditionally outputs logs in plain text:

plain text

1
2025-05-21 10:21:50.236 UTC [3872] postgres@chinook LOG:  statement: select albumid, title from album where artistid = 2;

Starting with PostgreSQL 15, native JSON logging is available but it’s not enabled by default, so you’ll need to explicitly configure it:

plain text

postgresql.conf1
log_destination = 'jsonlog'

With this setting enabled, the same log becomes structured output:

json

123456789101112131415161718
{
  "timestamp": "2025-05-21 10:21:50.236 UTC",
  "user": "postgres",
  "dbname": "chinook",
  "pid": 3872,
  "remote_host": "[local]",
  "session_id": "23mniw39.282x",
  "line_num": 1,
  "ps": "idle",
  "session_start": "2025-05-21 10:21:50.236 ",
  "vxid": "4/3",
  "txid": 0,
  "error_severity": "LOG",
  "message": "statement: select albumid, title from album where artistid = 2",
  "application_name": "psql",
  "backend_type": "client backend",
  "query_id": 0
}

In most cases though, you have no choice but to ingest unstructured logs from these external systems or dependencies. In these situations, you have two main options:

1. Log parsing and enrichment during ingestion

You can convert raw, unstructured logs into structured data before they reach your backend by using a log processor within your telemetry pipeline. When using the OpenTelemetry Collector, the filelog receiver can ingest local log files, while its operators are used to parse and enrich them.

For example, the following configuration parses SSH authentication failure logs using the syslog_parser:

yaml

otelcol.yaml12345678
receivers:
  filelog:
    include:
      - /var/log/auth.log
    start_at: beginning
    operators:
      - type: syslog_parser
        protocol: rfc3164

Before transformation:

You’ll see the following representation when using the debug exporter. Notice that the log lacks structure, with no timestamp, severity, or contextual fields:

plain text

123456
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(<38>May 23 00:03:21 Ubuntu-20-04 sshd[3775989]: Invalid user wordpress from 37.238.10.118 port 52112)
Attributes:
     -> log.file.name: Str(auth.log)

After transformation:

With the syslog_parser, key metadata is parsed and added as structured attributes:

plain text

123456789101112
Timestamp: 2025-05-23 00:03:21 +0000 UTC
SeverityText: info
SeverityNumber: Info(9)
Body: Str(<38>May 23 00:03:21 Ubuntu-20-04 sshd[3775989]: Invalid user wordpress from 37.238.10.118 port 52112)
Attributes:
     -> log.file.name: Str(auth.log)
     -> hostname: Str(Ubuntu-20-04)
     -> message: Str(Invalid user wordpress from 37.238.10.118 port 52112)
     -> facility: Int(4)
     -> priority: Int(38)
     -> appname: Str(sshd)
     -> proc_id: Str(3775989)

You can further enhance this by parsing the message field with the regex_parser operator to extract additional attributes from the log message:

yaml

otelcol.yaml1234567891011
receivers:
  filelog:
    include:
      - /home/ayo/auth.log
    start_at: beginning
    operators:
      - type: syslog_parser
        protocol: rfc3164
      - type: regex_parser
        parse_from: attributes.message
        regex: '^Invalid user (?P<username>\w+) from (?P<ip>[\d.]+) port (?P<port>\d+)$'

You’ll now see additional username, ip, and port attributes in the output:

plain text

1234567891011
Attributes:
     -> log.file.name: Str(auth.log)
     -> port: Str(52112)
     -> appname: Str(sshd)
     -> facility: Int(4)
     -> message: Str(Invalid user wordpress from 37.238.10.118 port 52112)
     -> priority: Int(38)
     -> hostname: Str(Ubuntu-20-04)
     -> username: Str(wordpress)
     -> proc_id: Str(3775989)
     -> ip: Str(37.238.10.118)

2. Post-ingestion transformation

Many observability platforms offer the ability to parse and restructure raw logs after they’ve been ingested, using custom pipelines, filters, or enrichment rules.

Some advanced systems go a step farther by using automated classification and pattern recognition to extract semantic meaning from raw logs. This means identifying common log structures and converting them into structured attributes without requiring manual configuration.

For example, Dash0 natively understands Nginx access logs, and automatically parses them, maps relevant fields to OpenTelemetry conventions, and enriches each log with appropriate metadata.

This includes severity levels that are intelligently inferred from HTTP status codes, making it easier to filter and alert on events at scale.

While retrofitting structured logging into legacy systems takes extra effort, the benefits are often significant. Even partial structuring like consistently extracting timestamps, log levels, and request identifiers can significantly improve searchability and correlation across systems.

As more tools adopt native support for structured logging, achieving full consistency across your environment will become simpler. Until then, thoughtful transformation strategies and automated enrichment tools provide a practical bridge to modern observability.

Standardizing your logs with OpenTelemetry

OpenTelemetry is rapidly becoming the de facto standard for instrumenting cloud-native applications for capturing and exporting logs, metrics, and traces.

Its approach to logging is designed to integrate with existing logging practices while promoting standardization and interoperability across telemetry signals.

Central to its logging support is the log data model, which specifies how logs should be structured.

This model ensures consistent semantics and formatting across different OpenTelemetry components and supported backends.

What an OpenTelemetry log looks like

Here’s an example of a log record that adheres to the OTLP model:

json

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
{
  "resourceLogs": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "my-first-observable-service"
            }
          }
        ],
        "droppedAttributesCount": 0
      },
      "schemaUrl": "https://opentelemetry.io/schemas/1.30.0",
      "scopeLogs": [
        {
          "logRecords": [
            {
              "attributes": [
                {
                  "key": "err",
                  "value": {
                    "stringValue": "..."
                  }
                },
                {
                  "key": "reqId",
                  "value": {
                    "intValue": "3"
                  }
                }
              ],
              "body": {
                "stringValue": "Something happened!"
              },
              "droppedAttributesCount": 0,
              "flags": 0,
              "observedTimeUnixNano": "1747916539302000000",
              "severityNumber": 17,
              "severityText": "error",
              "spanId": "54b056e14538943f",
              "timeUnixNano": "1747916539051000000",
              "traceId": "ec3d0ffbfa03f8a62499ae65f5000857"
            }
          ],
          "schemaUrl": "https://opentelemetry.io/schemas/1.30.0",
          "scope": {...}
        }
      ]
    }
  ],
  "resourceSpans": []
}

The resource attributes are metadata associated with the service that emitted the log. OpenTelemetry SDKs usually auto-populate this field with runtime details, but you can further enrich them with processors like:

resourcedetection processor: Detects platform-specific attributes like cloud provider and region.
k8sattribute processor: Adds Kubernetes pod, namespace, and container details.
resource processor: Lets you add, remove, or override resource attributes manually.

OpenTelemetry also defines several key fields in its logs data model:

severityNumber: A normalized log level (e.g., 9 for INFO, 17 for ERROR).
body: The actual log message or log entry.
attributes: A key-value map of contextual data relevant to the log event.
traceId and spanId: The trace context which enables correlation between logs and distributed traces.
Timestamps: Letting you know when an event happened and when it was observed.

This structure ensures that logs collected from various services can be queried and correlated together, dramatically enhancing their utility in observability workflows.

Understanding semantic conventions

OpenTelemetry also defines semantic conventions which are standard naming guidelines for commonly used telemetry attributes.

For example, instead of logging an error like this:

JavaScript

1234567
attributes: {
  err: {
    type: 'TypeError',
    message: "Cannot read property 'name' of null",
    stack: "<the stack trace>",
  }
}

OpenTelemetry semantic conventions recommend using the following attributes:

JavaScript

12345
attributes: {
  'exception.type': 'TypeError',
  'exception.message': "Cannot read property 'name' of null",
  'exception.stacktrace': '<the stack trace>'
}

These conventions currently cover broad domains like HTTP, exceptions, and databases. For anything outside these areas, you’re encouraged to define your own attributes using OpenTelemetry’s general naming guidelines.

The most important thing here is consistency. If attribute names vary across services or teams, it’ll undermine your ability to query, filter, and correlate logs effectively.

Bridging logs into OpenTelemetry

As you look to standardize your logs with OpenTelemetry, a natural question pops up: "Do I need to overhaul all existing logging instrumentation and start from scratch?"

The comforting answer is, generally, no.

OpenTelemetry understands that most applications already have some logging instrumentation. So the goal isn't to force a massive rewrite of every log statement, but to bring them into the OpenTelemetry ecosystem.

When it comes to your application's own logging, the most common and often most seamless route is through what we call Log Bridges.

It often takes the familiar form of a 'handler', 'appender', or 'transport' in your logging library. Once in place, it captures the native output and translates them into the OpenTelemetry data model while attaching useful context (such as trace and span IDs if available).

JavaScript

123456789
const logger = pino({
  transport: {
    targets: [
      {
        target: "pino-opentelemetry-transport", // Pino's OpenTelemetry bridge
      },
    ],
  },
});

As a developer, your logging responsibilities remain straightforward:

Assign the correct log level (e.g., INFO, ERROR, etc).
Write clear, meaningful messages.
Include any relevant, event-specific attributes.

Then let the OpenTelemetry SDK and log bridge handle the rest.

The same principle also applies to your infrastructure logs. You don't need to discard established agents like Vector or Fluentd. Instead, configure them to forward their log data to the OpenTelemetry Collector.

Within the Collector, powerful processors can then parse, correlate, or enrich these logs before they are exported to your designated logging backend.

Correlating logs with traces for unified observability

One of the most compelling advantages of OpenTelemetry logs lies in how they intertwine with distributed traces.

When your application is instrumented for both tracing and logging, the SDK can automatically attach trace context to each log record:

The traceId so you know which request it belongs to.
The spanId so you know what operation it happened in.

The inclusion of these identifiers provides two-way visibility: you can view the logs associated with an outlier trace, or start from a log and explore the full request context.

Jumping from logs to traces in Dash0 interface

It’s crucial to stress the importance of using a log bridge and OpenTelemetry SDK to facilitate this correlation as it is different from merely adding trace or span IDs as custom attributes in your log data.

For instance, if you ingest a log record like the one below (perhaps through the filelog receiver), where trace context exists only as attributes:

json

12345678
{
  "level": "info",
  "message": "Request to /",
  "span_id": "93b6f3cf8ce8e712",
  "timestamp": "2024-07-01T18:09:06.535Z",
  "trace_flags": "01",
  "trace_id": "027e383aa083bc5c4165a9c7abbe5694"
}

You'll see that the standard traceId, spanId, and severity fields are missing (e.g when using the debug exporter):

In situations where a log bridge isn’t available, or if you’re dealing with logs from systems not instrumented with OpenTelemetry SDKs, you can use the OpenTelemetry Transform Language (OTTL) via the transform processor to extract the log attributes and set the standard OpenTelemetry fields:

yaml

otelcol.yaml12345678910111213
processors:
  transform/json_logs:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(time, Time(attributes["timestamp"], "%Y-%m-%dT%H:%M:%S%z"))
          - set(severity_number, 9) where attributes["level"] == "info"
          - set(severity_text, "INFO") where attributes["level"] == "info"
          - set(severity_number, 17) where attributes["level"] == "error"
          - set(severity_text, "ERROR") where attributes["level"] == "error"
          - set(trace_id.string, attributes["trace_id"]) where attributes["trace_id"] != ""
          - set(span_id.string, attributes["span_id"]) where attributes["span_id"] != ""

After applying such transformations, all the standard fields will be properly set and recognized by any observability backend that supports OTLP:

Even if you’re unable to use the transform processor for every case, Dash0 can intelligently parse common log attribute names (like trace_id and span_id) directly from your JSON logs, and ensure they are correctly linked to traces within the platform.

Properly correlating logs and traces in this manner elevates your structured logs from being merely isolated data points to useful signals for achieving observability.

Final thoughts

Transitioning from plain text to structured, machine-readable data marks a fundamental shift that makes observability a first-class concern in software development.

OpenTelemetry enhances this shift by providing a unified, vendor-neutral standard for capturing and correlating telemetry signals across your entire stack.

Structured logging with OpenTelemetry doesn’t just improve log quality, it lays the groundwork for a system that is easier to observe, troubleshoot, and operate at scale.

As you adopt structured logging, consider how an OpenTelemetry-native platform like Dash0 can help you maximize its value through advanced visualization, correlation, and real-time analysis.

Thanks for reading!

Practical Structured Logging for Modern Applications