Dash0 Raises $35 Million Series A to Build the First AI-Native Observability Platform

Last updated: November 19, 2025

Trace-Aware PostgreSQL Logging with OpenTelemetry

In our previous guide, we focused on configuring PostgreSQL to emit structured JSON logs with the right mix of verbosity, query duration, and error capturing.

But even with structured logging enabled, database activity often isolated from the services calling it. If a query is slow, you know what happened, but not necessarily who triggered it or why.

To make PostgreSQL a fully observable part of your system, those logs need to flow into your telemetry pipeline, follow OpenTelemetry’s data model, and correlate with the traces generated by your applications.

In this guide, you'll learn how to configure the OpenTelemetry Collector to ingest and transform PostgreSQL logs into fully compliant trace-aware OpenTelemetry signals.

We'll cover:

  • Converting raw JSON lines into OTel attributes.
  • Mapping legacy fields to official Semantic Conventions.
  • Normalizing timestamps and severity levels according to the OTel specification.
  • Correlating database activity with application traces via context propagation.
  • Ensuring sensitive data doesn't leak into your backend.

By the end, you'll have end-to-end visibility from the initial HTTP request down to the SQL execution.

Let’s get started!

Prerequisites

Before ingesting PostgreSQL logs into an OpenTelemetry pipeline, make sure your database is already producing structured JSON logs in a predictable location as demonstrated in the previous tutorial.

Ingesting PostgreSQL logs into an OpenTelemetry pipeline

With PostgreSQL configured to emit structured JSON, the next step is ingestion. While deployment architectures vary, reading directly from the file system remains the most robust and universal method.

The filelog receiver is the standard tool for this task. It tails log files, handles rotation, and ingests lines as they appear. In the configuration below, we're pairing it with the debug exporter to create an immediate feedback loop, allowing you to verify exactly what the Collector sees:

yaml
12345678910111213141516171819
# otelcol.yaml
receivers:
filelog:
include: [/var/lib/postgresql/18/main/log/*.json]
start_at: end
processors:
batch:
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [filelog]
processors: [batch]
exporters: [debug]

With this configuration, the Collector is successfully tailing the files and wrapping every JSON entry in an OpenTelemetry LogRecord. However, triggering a query reveals that the data is not yet fully usable:

text
1234567891011
LogRecord #0
ObservedTimestamp: 2025-11-17 13:19:57.866997311 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"timestamp":"2025-11-17 14:19:16.184 UTC","user":"postgres","dbname":"chinook","pid":5329,"remote_host":"[local]","session_id":"691b1c6f.14d1","line_num":2,"ps":"idle","session_start":"2025-11-17 14:00:31 UTC","vxid":"4/3","txid":0,"error_severity":"LOG","message":"statement: table album limit 10;","application_name":"psql","backend_type":"client backend","query_id":0})
Attributes:
-> log.file.name: Str(app.log.json)
Trace ID:
Span ID:
Flags: 0

While the log has been ingested, it is effectively opaque to the analysis backend. Several critical gaps stand out:

  • None of the JSON fields have been promoted to searchable OpenTelemetry attributes.
  • PostgreSQL emits LOG, ERROR, WARNING, and others, but SeverityText and SeverityNumber remain empty.
  • The timestamp defaults to the Unix epoch because we haven’t told the Collector how to interpret PostgreSQL’s timestamp format.
  • All the PostgreSQL log fields need normalization to OpenTelemetry’s attribute names.
  • Trace and span IDs are empty because PostgreSQL doesn't propagate distributed context on its own.

In the next section, you'll close these gaps by parsign the JSON, mapping the fields, and enriching the log record with trace context.

Parsing PostgreSQL JSON logs

Right now the Collector is treating each PostgreSQL log line as a single opaque JSON string.To unlock their value, you must parse that string and promote its contents to top-level attributes.

In this section, you'll use the transform processor to build a pipeline that:

  1. Extracts the JSON payload from the Body.
  2. Merges the fields into the log record’s attributes.
  3. Parses PostgreSQL’s timestamp field into the native OpenTelemetry Timestamp.
  4. Maps Postgres-specific severities to standard OpenTelemetry Severity Numbers.

Add the following configuration to your otelcol.yaml:

<!-- prettier-ignore-start -->
yaml
123456789101112131415161718192021222324252627282930313233343536373839
# otelcol.yaml
processors:
transform/parse_postgres:
log_statements:
# 1. Handle only entries where the Body contains JSON
- context: log
conditions:
- body != nil and Substring(body, 0, 2) == "{\""
statements:
# Parse the JSON string into a map and merge into attributes
- merge_maps(attributes, ParseJSON(body), "upsert")
# 2. Convert PostgreSQL timestamp string -> OpenTelemetry Timestamp
- set(time, Time(attributes["timestamp"], "%Y-%m-%d %H:%M:%S.%f %Z")) where attributes["timestamp"] != nil
- set(observed_time, time)
- delete_key(attributes, "timestamp")
# 3. Map PostgreSQL error_severity -> OTel SeverityText
- set(severity_text, ToUpperCase(attributes["error_severity"])) where attributes["error_severity"] != nil
# 4. Map PostgreSQL error_severity -> OTel SeverityNumber
- set(severity_number, 5) where IsMatch(attributes["error_severity"], "^DEBUG")
- set(severity_number, 9) where attributes["error_severity"] == "INFO"
- set(severity_number, 10) where attributes["error_severity"] == "NOTICE"
- set(severity_number, 12) where attributes["error_severity"] == "LOG"
- set(severity_number, 13) where attributes["error_severity"] == "WARNING"
- set(severity_number, 17) where attributes["error_severity"] == "ERROR"
- set(severity_number, 21) where attributes["error_severity"] == "FATAL"
- set(severity_number, 24) where attributes["error_severity"] == "PANIC"
# Clean up fields that have been promoted or replaced
- delete_key(attributes, "error_severity")
service:
pipelines:
logs:
receivers: [filelog]
processors: [batch, transform/parse_postgres] # <- add it here
exporters: [debug]
<!-- prettier-ignore-end -->

With these transformations applied, the Collector now "understands" your logs.

If you inspect the output now, you will see that the Body is still the original JSON (preserving the source of truth), but the Attributes map is now fully populated, the Timestamp is correct, and SeverityNumber allows for proper filtering in your backend.

text
12345678910111213141516171819202122232425
LogRecord #0
ObservedTimestamp: 2025-11-17 14:19:16.184 +0000 UTC
Timestamp: 2025-11-17 14:19:16.184 +0000 UTC
SeverityText: LOG
SeverityNumber: Info4(12)
Body: Str(...)
Attributes:
-> log.file.name: Str(app.log.json)
-> ps: Str(idle)
-> message: Str(statement: table album limit 10;)
-> application_name: Str(psql)
-> remote_host: Str([local])
-> session_id: Str(691b1c6f.14d1)
-> session_start: Str(2025-11-17 14:00:31 WAT)
-> vxid: Str(4/3)
-> query_id: Double(0)
-> pid: Double(5329)
-> txid: Double(0)
-> line_num: Double(2)
-> backend_type: Str(client backend)
-> user: Str(postgres)
-> dbname: Str(chinook)
Trace ID:
Span ID:
Flags: 0

In the next section, you'll refine this further by mapping these raw Postgres fields to OpenTelemetry Semantic Conventions to ensure consistent correlation across your stack.

Mapping PostgreSQL fields to OpenTelemetry semantic conventions

Now that the Collector is parsing PostgreSQL’s JSON logs and fixing the timestamps and severities, the next step is to align PostgreSQL’s native fields with OpenTelemetry’s Semantic Conventions.

Semantic conventions provide a shared vocabulary for your data. When you use standard attribute names (like db.namespace instead of dbname), your observability backend can automatically group, correlate, and visualize your data.

However, PostgreSQL logs contain far more metadata than the current OpenTelemetry standards cover. To handle this, we use a hybrid approach by:

  1. Mapping core fields to their official OpenTelemetry equivalents.
  2. Namespace PostgreSQL-specific fields with postgresql.*.
  3. Preserve the original JSON payload in log.record.original for full-fidelity debugging.

Here is the updated transform configuration:

<!-- prettier-ignore-start -->
yaml
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
# otelcol.yaml
processors:
transform/parse_postgres:
log_statements:
# [...]
# 1. Map PostgreSQL connection fields
- context: log
statements:
# Map standard semantic conventions
- set(attributes["user.name"], attributes["user"])
- set(attributes["db.nameespace"], attributes["dbname"])
- set(attributes["client.address"], attributes["remote_host"])
- set(attributes["client.port"], attributes["remote_port"])
- set(attributes["db.response.status_code"], attributes["state_code"])
- set(attributes["process.status"], attributes["ps"])
- set(attributes["process.parent_pid"], attributes["leader_pid"])
- set(attributes["session.id"], attributes["session_id"])
- set(attributes["code.function.name"], attributes["func_name"])
- set(attributes["code.file.path"], attributes["file_name"])
- set(attributes["code.line.number"], Int(attributes["file_line_num"]))
- set(attributes["log.record.original"], body)
- set(body, attributes["message"])
# PostgreSQL-specific extensions
- set(attributes["postgresql.session_id"], attributes["session_id"])
- set(attributes["postgresql.txid"], attributes["txid"])
- set(attributes["postgresql.vxid"], attributes["vxid"])
- set(attributes["postgresql.query_id"], attributes["query_id"])
- set(attributes["postgresql.internal_query"], attributes["internal_query"])
- set(attributes["postgresql.backend_type"], attributes["backend_type"])
- set(attributes["postgresql.detail"], attributes["detail"])
- set(attributes["postgresql.hint"], attributes["hint"])
- set(attributes["postgresql.context"], attributes["context"])
- set(attributes["postgresql.line_num"], Int(attributes["line_num"]))
- set(attributes["postgresql.cursor_position"], Int(attributes["cursor_position"]))
- set(attributes["postgresql.internal_position"], Int(attributes["internal_position"]))
- set(attributes["postgresql.application_name"], attributes["application_name"])
# Cleanup redundant fields
- delete_key(attributes, "timestamp")
- delete_key(attributes, "message")
- delete_key(attributes, "error_severity")
- delete_key(attributes, "dbname")
- delete_key(attributes, "user")
- delete_key(attributes, "statement")
- delete_key(attributes, "state_code")
- delete_key(attributes, "remote_host")
- delete_key(attributes, "remote_port")
- delete_key(attributes, "application_name")
- delete_key(attributes, "leader_pid")
- delete_key(attributes, "file_name")
- delete_key(attributes, "func_name")
- delete_key(attributes, "file_line_num")
- delete_key(attributes, "session_id")
- delete_key(attributes, "txid")
- delete_key(attributes, "vxid")
- delete_key(attributes, "query_id")
- delete_key(attributes, "internal_query")
- delete_key(attributes, "backend_type")
- delete_key(attributes, "detail")
- delete_key(attributes, "hint")
- delete_key(attributes, "context")
- delete_key(attributes, "line_num")
- delete_key(attributes, "cursor_position")
- delete_key(attributes, "internal_position")
# Add a new group to for resource attributes
- statements:
- set(resource.attributes["process.id"], log.attributes["pid"])
- delete_key(log.attributes, "pid")
<!-- prettier-ignore-end -->

With this configuration applied, your logs are now fully interoperable with the rest of your stack:

text
1234567891011121314151617181920212223242526272829303132
2025-11-18T10:56:46.810Z info ResourceLog #0
Resource SchemaURL:
Resource attributes:
-> process.id: Int(5329)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2025-11-17 14:19:16.184 +0000 UTC
Timestamp: 2025-11-17 14:19:16.184 +0000 UTC
SeverityText: LOG
SeverityNumber: Info4(12)
Body: Str(statement: table album limit 10;)
Attributes:
-> log.file.name: Str(app.log.json)
-> db.nameespace: Str(chinook)
-> log.record.original: Str({"timestamp":"2025-11-17 14:19:16.184 UTC","user":"postgres","dbname":"chinook","pid":5329,"remote_host":"[local]","session_id":"691b1c6f.14d1","line_num":2,"ps":"idle","session_start":"2025-11-17 14:00:31 WAT","vxid":"4/3","txid":0,"error_severity":"LOG","message":"statement: table album limit 10;","application_name":"psql","backend_type":"client backend","query_id":0})
-> session.id: Str(691b1c6f.14d1)
-> postgresql.backend_type: Str(client backend)
-> postgresql.line_num: Int(2)
-> client.address: Str([local])
-> postgresql.application_name: Str(psql)
-> postgresql.vxid: Str(4/3)
-> user.name: Str(postgres)
-> postgresql.query_id: Double(0)
-> session_start: Str(2025-11-17 14:00:31 WAT)
-> postgresql.session_id: Str(691b1c6f.14d1)
-> process.status: Str(idle)
-> postgresql.txid: Double(0)
Trace ID:
Span ID:
Flags: 0

PostgreSQL is now speaking OpenTelemetry, but some of the most important data (like the query itself and how long it took) is still trapped inside the Body string.

In the next section, we will parse this text to extract precise execution duration and isolate the SQL statement into its own attribute.

Parsing the PostgreSQL message field

Until now, we have treated the PostgreSQL message field as a single opaque string. However, this field is actually a composite that can contain:

  • A plain statement log such as statement: TABLE album LIMIT 10;
  • A duration plus statement: duration: 45.322 ms statement: SELECT * FROM track WHERE milliseconds > 500000;
  • Operational events such as checkpoint complete: wrote 405 buffers (2.5%) ...
  • Errors and warnings with human-readable descriptions.

Leaving this data trapped in a string limits your ability to filter slow queries or group logs by SQL statement. To fix this, we need to extract the specific query text and duration into their own attributes.

We can use the ExtractPatterns() function in OTTL to parse these variable formats. The regular expression below looks for an optional duration prefix followed by the statement:

<!-- prettier-ignore-start -->
yaml
123456789101112131415
processors:
transform/parse_postgres:
log_statements:
# [...]
- context: log
statements:
- merge_maps(attributes, ExtractPatterns(body, "(?:duration:\\s+(?P<pg_duration_ms>[0-9.]+)\\s+ms)?(?:\\s*statement:\\s+(?P<pg_statement>.*))?$"), "upsert")
- set(attributes["db.query.text"], attributes["pg_statement"])
- set(attributes["db.query.text"], attributes["statement"])
- set(attributes["db.query.duration"], Double(attributes["pg_duration_ms"])) where attributes["pg_duration_ms"] != ""
- delete_key(attributes, "pg_statement")
- delete_key(attributes, "statement")
- delete_key(attributes, "pg_duration_ms")
- delete_key(attributes, "pg_statement")
<!-- prettier-ignore-end -->

With this in place, a log line like:

json
123456789
{
"timestamp": "2025-11-18 10:42:01.123 UTC",
"user": "postgres",
"dbname": "chinook",
"error_severity": "LOG",
"message": "duration: 45.322 ms statement: SELECT * FROM track WHERE milliseconds > 500000;",
"application_name": "psql",
"backend_type": "client backend"
}

Turns into an OpenTelemetry log record that includes:

  • db.query.text: SELECT * FROM track WHERE milliseconds > 500000;
  • db.query.duration: 45.322

Non-query messages (for example checkpoint logs or connection events) skip these regex matches and remain unchanged. You can add further statements to parse these messages if you wish.

At this point, your pipeline is doing all the heavy lifting for you: PostgreSQL emits JSON, the Collector parses it, standardizes timestamps and severity, maps fields to semantic conventions, attaches resource metadata, and now even splits the overloaded message into structured fields for statements and durations.

In the next section, you’ll enrich these logs with resource attributes so that every record carries information about the database node, environment, and service identity.

Enriching PostgreSQL logs with resource attributes

Resource attributes describe where the telemetry came from and what produced it. They form the backbone of correlation across logs, metrics, and traces, and they ensure your observability backend can tie PostgreSQL activity to the rest of your system.

Unlike the log attributes we parsed earlier (which change with every query), resource attributes are constant for the lifecycle of the PostgreSQL instance.

Some of the most important resource attributes to set include:

  • service.name: a stable, human-readable identifier for the component.
  • db.system.name: identifies the database technology (postgresql here).
  • service.instance.id: a unique ID for the specific PostgreSQL node or container (often the hostname or container ID).
  • deployment.environment.name: describes the environment (staging, production, etc.).

Here is a simple example that uses the resource processor to apply static fields that will never change for this instance:

yaml
123456789101112
processors:
resource:
actions:
- key: service.name
value: db-service
action: upsert
- key: db.system.name
value: postgresql
action: upsert
- key: deployment.environment.name
value: production
action: upsert

Processors such as resourcedetection and k8sattributes also make it possible to automatically discover metadata from the host, Docker, Kubernetes, or Cloud Provider (AWS/GCP/Azure).

yaml
123456789101112
processors:
resourcedetection:
detectors: [system, docker, ec2] # add relevant detectors
system:
hostname_sources: [os]
service:
pipelines:
logs:
receivers: [filelog]
processors: [resourcedetection, resource, transform/parse_postgres, batch] # <-
exporters: [debug]

Your logs now carry a Resource block that identifies the infrastructure context:

text
123456789
2025-11-18T11:09:44.510Z info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
-> service.name: Str(db-service)
-> db.system.name: Str(postgresql)
-> deployment.environment.name: Str(production)
-> host.name: Str(3cb4a2c0ae38)
-> os.type: Str(linux)
-> process.id: Int(5329)

With parsing, semantic conventions, and resource attribution complete, you have a fully structured PostgreSQL log stream.

In the final section, you'll build on this foundation to correlate these logs with traces by propagating trace_id and span_id from your application into PostgreSQL, closing the loop between API calls and the exact SQL they triggered.

Adding trace context to PostgreSQL logs

At this stage of the pipeline, your PostgreSQL logs are parsed, enriched with semantic attributes, and annotated with resource metadata. The final step is to make them trace-aware so they can be correlated with the requests, spans, and services that triggered the queries.

PostgreSQL has no built-in notion of distributed tracing. It doesn't know which request initiated a query or how the resulting log relates to application-level spans. To bridge that gap, you'll need to embed trace context directly into the SQL statements themselves and let PostgreSQL write it out in the logs. Once the context is in the logs, the Collector can extract it and attach trace_id and span_id to each LogRecord.

This capability is made possible by SQLCommenter, a mechanism that appends a lightweight SQL comment containing OpenTelemetry’s traceparent header to every query executed by your application. S

QLCommenter originated at Google and is now integrated into many official OpenTelemetry instrumentation libraries, including frameworks like Rails, Django, Spring, and Node.js ORMs such as Knex or Sequelize.

When enabled in your application, the instrumentation intercepts the query just before it leaves the service and rewrites it. For example, a normal query:

sql
1
SELECT * FROM album;

Becomes:

sql
1
SELECT * FROM album; /*traceparent='00-866e8698eac009878483cfd029f62af9-bb14062982c63db3-01'*/

Because PostgreSQL logs the query string verbatim, this trace context is captured in your structured logs. Inside the message field of your JSON output, you will now see the traceparent embedded in the text:

json
123456
{
[...],
"message": "statement: SELECT * from album; /*traceparent='00-4b9ce7b9aaf2cc874b523154ae53b89f-484b22db45f0bd1b-01'*/",
"backend_type": "client backend",
"query_id": 0
}

The data is now there. The final piece of the puzzle is teaching the Collector to extract that traceparent from the SQL comment and promote it to the official Trace ID and Span ID fields of the LogRecord.

Extracting trace_id and span_id inside the Collector

With the traceparent comment now embedded in the PostgreSQL log entry, the Collector can parse it to reconstruct the context.

We'll use the transform processor to:

  • Detect logs containing a traceparent.
  • Extract the IDs using a Regex capture group.
  • Promote them to the top-level Trace ID and Span ID fields.
  • Strip the comment from the SQL statement (to ensure query aggregation/metrics still work correctly).

Add this final logic to your transform block:

yaml
123456789101112131415161718192021
# otelcol.yaml
processors:
transform/parse_postgres:
log_statements:
# [...]
- context: log
conditions:
- IsMatch(attributes["db.query.text"], "traceparent")
statements:
- merge_maps(attributes, ExtractPatterns(attributes["db.query.text"],
"/\\*traceparent='00-(?P<trace_id>[0-9a-f]{32})-(?P<span_id>[0-9a-f]{16})-(?P<trace_flags>[0-9a-f]{2})'\\*/"),
"upsert")
- set(trace_id.string, attributes["trace_id"])
- set(span_id.string, attributes["span_id"])
- set(flags, Int(attributes["trace_flags"]))
- replace_pattern(attributes["db.query.text"], "
/\\*traceparent='[^']+'\\*/", "")
- replace_pattern(body, " /\\*traceparent='[^']+'\\*/", "")
- delete_key(attributes, "trace_id")
- delete_key(attributes, "span_id")
- delete_key(attributes, "trace_flags")

After applying this transformation, your database logs are no longer isolated text files. They are fully integrated components of your distributed trace:

text
1234567891011
LogRecord #0
ObservedTimestamp: 2025-11-18 18:58:12.549 +0000 UTC
Timestamp: 2025-11-18 18:58:12.549 +0000 UTC
SeverityText: LOG
SeverityNumber: Info4(12)
Body: Str(statement: SELECT * from album;)
Attributes:
. . .
Trace ID: 4b9ce7b9aaf2cc874b523154ae53b89f
Span ID: 484b22db45f0bd1b
Flags: 1

Final thoughts

By completing this pipeline, you have elevated PostgreSQL from a passive data store into an active participant in your observability stack. You can now:

  • Navigate instantly from a slow application trace directly to the exact PostgreSQL log showing what caused the bottleneck.

  • Filter logs by Trace ID to see the complete story of a request, crossing the boundary between your code and the database engine.

  • Measure accurate latency by comparing the application's view of the span duration against the database's reported execution time.

This is the power of OpenTelemetry: distinct signals (logs, traces, and metrics) from distinct sources (applications, databases, infrastructure) converging into a single, unified view of your system.

Thanks for reading!

Authors
Ayooluwa Isaiah
Ayooluwa Isaiah