Last updated: November 19, 2025
Trace-Aware PostgreSQL Logging with OpenTelemetry
In our previous guide, we focused on configuring PostgreSQL to emit structured JSON logs with the right mix of verbosity, query duration, and error capturing.
But even with structured logging enabled, database activity often isolated from the services calling it. If a query is slow, you know what happened, but not necessarily who triggered it or why.
To make PostgreSQL a fully observable part of your system, those logs need to flow into your telemetry pipeline, follow OpenTelemetry’s data model, and correlate with the traces generated by your applications.
In this guide, you'll learn how to configure the OpenTelemetry Collector to ingest and transform PostgreSQL logs into fully compliant trace-aware OpenTelemetry signals.
We'll cover:
- Converting raw JSON lines into OTel attributes.
- Mapping legacy fields to official Semantic Conventions.
- Normalizing timestamps and severity levels according to the OTel specification.
- Correlating database activity with application traces via context propagation.
- Ensuring sensitive data doesn't leak into your backend.
By the end, you'll have end-to-end visibility from the initial HTTP request down to the SQL execution.
Let’s get started!
Prerequisites
Before ingesting PostgreSQL logs into an OpenTelemetry pipeline, make sure your database is already producing structured JSON logs in a predictable location as demonstrated in the previous tutorial.
Ingesting PostgreSQL logs into an OpenTelemetry pipeline
With PostgreSQL configured to emit structured JSON, the next step is ingestion. While deployment architectures vary, reading directly from the file system remains the most robust and universal method.
The
filelog receiver
is the standard tool for this task. It tails log files, handles rotation, and
ingests lines as they appear. In the configuration below, we're pairing it with
the debug exporter
to create an immediate feedback loop, allowing you to verify exactly what the
Collector sees:
yaml12345678910111213141516171819# otelcol.yamlreceivers:filelog:include: [/var/lib/postgresql/18/main/log/*.json]start_at: endprocessors:batch:exporters:debug:verbosity: detailedservice:pipelines:logs:receivers: [filelog]processors: [batch]exporters: [debug]
With this configuration, the Collector is successfully tailing the files and wrapping every JSON entry in an OpenTelemetry LogRecord. However, triggering a query reveals that the data is not yet fully usable:
text1234567891011LogRecord #0ObservedTimestamp: 2025-11-17 13:19:57.866997311 +0000 UTCTimestamp: 1970-01-01 00:00:00 +0000 UTCSeverityText:SeverityNumber: Unspecified(0)Body: Str({"timestamp":"2025-11-17 14:19:16.184 UTC","user":"postgres","dbname":"chinook","pid":5329,"remote_host":"[local]","session_id":"691b1c6f.14d1","line_num":2,"ps":"idle","session_start":"2025-11-17 14:00:31 UTC","vxid":"4/3","txid":0,"error_severity":"LOG","message":"statement: table album limit 10;","application_name":"psql","backend_type":"client backend","query_id":0})Attributes:-> log.file.name: Str(app.log.json)Trace ID:Span ID:Flags: 0
While the log has been ingested, it is effectively opaque to the analysis backend. Several critical gaps stand out:
- None of the JSON fields have been promoted to searchable OpenTelemetry attributes.
- PostgreSQL emits
LOG,ERROR,WARNING, and others, butSeverityTextandSeverityNumberremain empty. - The timestamp defaults to the Unix epoch because we haven’t told the Collector how to interpret PostgreSQL’s timestamp format.
- All the PostgreSQL log fields need normalization to OpenTelemetry’s attribute names.
- Trace and span IDs are empty because PostgreSQL doesn't propagate distributed context on its own.
In the next section, you'll close these gaps by parsign the JSON, mapping the fields, and enriching the log record with trace context.
Parsing PostgreSQL JSON logs
Right now the Collector is treating each PostgreSQL log line as a single opaque JSON string.To unlock their value, you must parse that string and promote its contents to top-level attributes.
In this section, you'll use the transform processor to build a pipeline that:
- Extracts the JSON payload from the
Body. - Merges the fields into the log record’s attributes.
- Parses PostgreSQL’s
timestampfield into the native OpenTelemetryTimestamp. - Maps Postgres-specific severities to standard OpenTelemetry Severity Numbers.
Add the following configuration to your otelcol.yaml:
<!-- prettier-ignore-start --><!-- prettier-ignore-end -->yaml123456789101112131415161718192021222324252627282930313233343536373839# otelcol.yamlprocessors:transform/parse_postgres:log_statements:# 1. Handle only entries where the Body contains JSON- context: logconditions:- body != nil and Substring(body, 0, 2) == "{\""statements:# Parse the JSON string into a map and merge into attributes- merge_maps(attributes, ParseJSON(body), "upsert")# 2. Convert PostgreSQL timestamp string -> OpenTelemetry Timestamp- set(time, Time(attributes["timestamp"], "%Y-%m-%d %H:%M:%S.%f %Z")) where attributes["timestamp"] != nil- set(observed_time, time)- delete_key(attributes, "timestamp")# 3. Map PostgreSQL error_severity -> OTel SeverityText- set(severity_text, ToUpperCase(attributes["error_severity"])) where attributes["error_severity"] != nil# 4. Map PostgreSQL error_severity -> OTel SeverityNumber- set(severity_number, 5) where IsMatch(attributes["error_severity"], "^DEBUG")- set(severity_number, 9) where attributes["error_severity"] == "INFO"- set(severity_number, 10) where attributes["error_severity"] == "NOTICE"- set(severity_number, 12) where attributes["error_severity"] == "LOG"- set(severity_number, 13) where attributes["error_severity"] == "WARNING"- set(severity_number, 17) where attributes["error_severity"] == "ERROR"- set(severity_number, 21) where attributes["error_severity"] == "FATAL"- set(severity_number, 24) where attributes["error_severity"] == "PANIC"# Clean up fields that have been promoted or replaced- delete_key(attributes, "error_severity")service:pipelines:logs:receivers: [filelog]processors: [batch, transform/parse_postgres] # <- add it hereexporters: [debug]
With these transformations applied, the Collector now "understands" your logs.
If you inspect the output now, you will see that the Body is still the
original JSON (preserving the source of truth), but the Attributes map is now
fully populated, the Timestamp is correct, and SeverityNumber allows for
proper filtering in your backend.
text12345678910111213141516171819202122232425LogRecord #0ObservedTimestamp: 2025-11-17 14:19:16.184 +0000 UTCTimestamp: 2025-11-17 14:19:16.184 +0000 UTCSeverityText: LOGSeverityNumber: Info4(12)Body: Str(...)Attributes:-> log.file.name: Str(app.log.json)-> ps: Str(idle)-> message: Str(statement: table album limit 10;)-> application_name: Str(psql)-> remote_host: Str([local])-> session_id: Str(691b1c6f.14d1)-> session_start: Str(2025-11-17 14:00:31 WAT)-> vxid: Str(4/3)-> query_id: Double(0)-> pid: Double(5329)-> txid: Double(0)-> line_num: Double(2)-> backend_type: Str(client backend)-> user: Str(postgres)-> dbname: Str(chinook)Trace ID:Span ID:Flags: 0
In the next section, you'll refine this further by mapping these raw Postgres fields to OpenTelemetry Semantic Conventions to ensure consistent correlation across your stack.
Mapping PostgreSQL fields to OpenTelemetry semantic conventions
Now that the Collector is parsing PostgreSQL’s JSON logs and fixing the timestamps and severities, the next step is to align PostgreSQL’s native fields with OpenTelemetry’s Semantic Conventions.
Semantic conventions provide a shared vocabulary for your data. When you use
standard attribute names (like db.namespace instead of dbname), your
observability backend can automatically group, correlate, and visualize your
data.
However, PostgreSQL logs contain far more metadata than the current OpenTelemetry standards cover. To handle this, we use a hybrid approach by:
- Mapping core fields to their official OpenTelemetry equivalents.
- Namespace PostgreSQL-specific fields with
postgresql.*. - Preserve the original JSON payload in log.record.original for full-fidelity debugging.
Here is the updated transform configuration:
<!-- prettier-ignore-end -->yaml123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172# otelcol.yamlprocessors:transform/parse_postgres:log_statements:# [...]# 1. Map PostgreSQL connection fields- context: logstatements:# Map standard semantic conventions- set(attributes["user.name"], attributes["user"])- set(attributes["db.nameespace"], attributes["dbname"])- set(attributes["client.address"], attributes["remote_host"])- set(attributes["client.port"], attributes["remote_port"])- set(attributes["db.response.status_code"], attributes["state_code"])- set(attributes["process.status"], attributes["ps"])- set(attributes["process.parent_pid"], attributes["leader_pid"])- set(attributes["session.id"], attributes["session_id"])- set(attributes["code.function.name"], attributes["func_name"])- set(attributes["code.file.path"], attributes["file_name"])- set(attributes["code.line.number"], Int(attributes["file_line_num"]))- set(attributes["log.record.original"], body)- set(body, attributes["message"])# PostgreSQL-specific extensions- set(attributes["postgresql.session_id"], attributes["session_id"])- set(attributes["postgresql.txid"], attributes["txid"])- set(attributes["postgresql.vxid"], attributes["vxid"])- set(attributes["postgresql.query_id"], attributes["query_id"])- set(attributes["postgresql.internal_query"], attributes["internal_query"])- set(attributes["postgresql.backend_type"], attributes["backend_type"])- set(attributes["postgresql.detail"], attributes["detail"])- set(attributes["postgresql.hint"], attributes["hint"])- set(attributes["postgresql.context"], attributes["context"])- set(attributes["postgresql.line_num"], Int(attributes["line_num"]))- set(attributes["postgresql.cursor_position"], Int(attributes["cursor_position"]))- set(attributes["postgresql.internal_position"], Int(attributes["internal_position"]))- set(attributes["postgresql.application_name"], attributes["application_name"])# Cleanup redundant fields- delete_key(attributes, "timestamp")- delete_key(attributes, "message")- delete_key(attributes, "error_severity")- delete_key(attributes, "dbname")- delete_key(attributes, "user")- delete_key(attributes, "statement")- delete_key(attributes, "state_code")- delete_key(attributes, "remote_host")- delete_key(attributes, "remote_port")- delete_key(attributes, "application_name")- delete_key(attributes, "leader_pid")- delete_key(attributes, "file_name")- delete_key(attributes, "func_name")- delete_key(attributes, "file_line_num")- delete_key(attributes, "session_id")- delete_key(attributes, "txid")- delete_key(attributes, "vxid")- delete_key(attributes, "query_id")- delete_key(attributes, "internal_query")- delete_key(attributes, "backend_type")- delete_key(attributes, "detail")- delete_key(attributes, "hint")- delete_key(attributes, "context")- delete_key(attributes, "line_num")- delete_key(attributes, "cursor_position")- delete_key(attributes, "internal_position")# Add a new group to for resource attributes- statements:- set(resource.attributes["process.id"], log.attributes["pid"])- delete_key(log.attributes, "pid")
With this configuration applied, your logs are now fully interoperable with the rest of your stack:
text12345678910111213141516171819202122232425262728293031322025-11-18T10:56:46.810Z info ResourceLog #0Resource SchemaURL:Resource attributes:-> process.id: Int(5329)ScopeLogs #0ScopeLogs SchemaURL:InstrumentationScopeLogRecord #0ObservedTimestamp: 2025-11-17 14:19:16.184 +0000 UTCTimestamp: 2025-11-17 14:19:16.184 +0000 UTCSeverityText: LOGSeverityNumber: Info4(12)Body: Str(statement: table album limit 10;)Attributes:-> log.file.name: Str(app.log.json)-> db.nameespace: Str(chinook)-> log.record.original: Str({"timestamp":"2025-11-17 14:19:16.184 UTC","user":"postgres","dbname":"chinook","pid":5329,"remote_host":"[local]","session_id":"691b1c6f.14d1","line_num":2,"ps":"idle","session_start":"2025-11-17 14:00:31 WAT","vxid":"4/3","txid":0,"error_severity":"LOG","message":"statement: table album limit 10;","application_name":"psql","backend_type":"client backend","query_id":0})-> session.id: Str(691b1c6f.14d1)-> postgresql.backend_type: Str(client backend)-> postgresql.line_num: Int(2)-> client.address: Str([local])-> postgresql.application_name: Str(psql)-> postgresql.vxid: Str(4/3)-> user.name: Str(postgres)-> postgresql.query_id: Double(0)-> session_start: Str(2025-11-17 14:00:31 WAT)-> postgresql.session_id: Str(691b1c6f.14d1)-> process.status: Str(idle)-> postgresql.txid: Double(0)Trace ID:Span ID:Flags: 0
PostgreSQL is now speaking OpenTelemetry, but some of the most important data
(like the query itself and how long it took) is still trapped inside the Body
string.
In the next section, we will parse this text to extract precise execution duration and isolate the SQL statement into its own attribute.
Parsing the PostgreSQL message field
Until now, we have treated the PostgreSQL message field as a single opaque
string. However, this field is actually a composite that can contain:
- A plain statement log such as
statement: TABLE album LIMIT 10; - A duration plus statement:
duration: 45.322 ms statement: SELECT * FROM track WHERE milliseconds > 500000; - Operational events such as
checkpoint complete: wrote 405 buffers (2.5%) ... - Errors and warnings with human-readable descriptions.
Leaving this data trapped in a string limits your ability to filter slow queries or group logs by SQL statement. To fix this, we need to extract the specific query text and duration into their own attributes.
We can use the ExtractPatterns() function in OTTL to parse these variable
formats. The regular expression below looks for an optional duration prefix
followed by the statement:
<!-- prettier-ignore-end -->yaml123456789101112131415processors:transform/parse_postgres:log_statements:# [...]- context: logstatements:- merge_maps(attributes, ExtractPatterns(body, "(?:duration:\\s+(?P<pg_duration_ms>[0-9.]+)\\s+ms)?(?:\\s*statement:\\s+(?P<pg_statement>.*))?$"), "upsert")- set(attributes["db.query.text"], attributes["pg_statement"])- set(attributes["db.query.text"], attributes["statement"])- set(attributes["db.query.duration"], Double(attributes["pg_duration_ms"])) where attributes["pg_duration_ms"] != ""- delete_key(attributes, "pg_statement")- delete_key(attributes, "statement")- delete_key(attributes, "pg_duration_ms")- delete_key(attributes, "pg_statement")
With this in place, a log line like:
json123456789{"timestamp": "2025-11-18 10:42:01.123 UTC","user": "postgres","dbname": "chinook","error_severity": "LOG","message": "duration: 45.322 ms statement: SELECT * FROM track WHERE milliseconds > 500000;","application_name": "psql","backend_type": "client backend"}
Turns into an OpenTelemetry log record that includes:
db.query.text:SELECT * FROM track WHERE milliseconds > 500000;db.query.duration:45.322
Non-query messages (for example checkpoint logs or connection events) skip these regex matches and remain unchanged. You can add further statements to parse these messages if you wish.
At this point, your pipeline is doing all the heavy lifting for you: PostgreSQL
emits JSON, the Collector parses it, standardizes timestamps and severity, maps
fields to semantic conventions, attaches resource metadata, and now even splits
the overloaded message into structured fields for statements and durations.
In the next section, you’ll enrich these logs with resource attributes so that every record carries information about the database node, environment, and service identity.
Enriching PostgreSQL logs with resource attributes
Resource attributes describe where the telemetry came from and what produced it. They form the backbone of correlation across logs, metrics, and traces, and they ensure your observability backend can tie PostgreSQL activity to the rest of your system.
Unlike the log attributes we parsed earlier (which change with every query), resource attributes are constant for the lifecycle of the PostgreSQL instance.
Some of the most important resource attributes to set include:
service.name: a stable, human-readable identifier for the component.db.system.name: identifies the database technology (postgresqlhere).service.instance.id: a unique ID for the specific PostgreSQL node or container (often the hostname or container ID).deployment.environment.name: describes the environment (staging,production, etc.).
Here is a simple example that uses the resource processor to apply static fields that will never change for this instance:
yaml123456789101112processors:resource:actions:- key: service.namevalue: db-serviceaction: upsert- key: db.system.namevalue: postgresqlaction: upsert- key: deployment.environment.namevalue: productionaction: upsert
Processors such as
resourcedetection
and
k8sattributes
also make it possible to automatically discover metadata from the host, Docker,
Kubernetes, or Cloud Provider (AWS/GCP/Azure).
yaml123456789101112processors:resourcedetection:detectors: [system, docker, ec2] # add relevant detectorssystem:hostname_sources: [os]service:pipelines:logs:receivers: [filelog]processors: [resourcedetection, resource, transform/parse_postgres, batch] # <-exporters: [debug]
Your logs now carry a Resource block that identifies the infrastructure
context:
text1234567892025-11-18T11:09:44.510Z info ResourceLog #0Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1Resource attributes:-> service.name: Str(db-service)-> db.system.name: Str(postgresql)-> deployment.environment.name: Str(production)-> host.name: Str(3cb4a2c0ae38)-> os.type: Str(linux)-> process.id: Int(5329)
With parsing, semantic conventions, and resource attribution complete, you have a fully structured PostgreSQL log stream.
In the final section, you'll build on this foundation to correlate these logs
with traces by propagating trace_id and span_id from your application into
PostgreSQL, closing the loop between API calls and the exact SQL they triggered.
Adding trace context to PostgreSQL logs
At this stage of the pipeline, your PostgreSQL logs are parsed, enriched with semantic attributes, and annotated with resource metadata. The final step is to make them trace-aware so they can be correlated with the requests, spans, and services that triggered the queries.
PostgreSQL has no built-in notion of distributed tracing. It doesn't know which
request initiated a query or how the resulting log relates to application-level
spans. To bridge that gap, you'll need to embed trace context directly into the
SQL statements themselves and let PostgreSQL write it out in the logs. Once the
context is in the logs, the Collector can extract it and attach trace_id and
span_id to each LogRecord.
This capability is made possible by
SQLCommenter, a mechanism that appends
a lightweight SQL comment containing OpenTelemetry’s traceparent header to
every query executed by your application. S
QLCommenter originated at Google and is now integrated into many official OpenTelemetry instrumentation libraries, including frameworks like Rails, Django, Spring, and Node.js ORMs such as Knex or Sequelize.
When enabled in your application, the instrumentation intercepts the query just before it leaves the service and rewrites it. For example, a normal query:
sql1SELECT * FROM album;
Becomes:
sql1SELECT * FROM album; /*traceparent='00-866e8698eac009878483cfd029f62af9-bb14062982c63db3-01'*/
Because PostgreSQL logs the query string verbatim, this trace context is
captured in your structured logs. Inside the message field of your JSON
output, you will now see the traceparent embedded in the text:
json123456{[...],"message": "statement: SELECT * from album; /*traceparent='00-4b9ce7b9aaf2cc874b523154ae53b89f-484b22db45f0bd1b-01'*/","backend_type": "client backend","query_id": 0}
The data is now there. The final piece of the puzzle is teaching the Collector
to extract that traceparent from the SQL comment and promote it to the
official Trace ID and Span ID fields of the LogRecord.
Extracting trace_id and span_id inside the Collector
With the traceparent comment now embedded in the PostgreSQL log entry, the
Collector can parse it to reconstruct the context.
We'll use the transform processor to:
- Detect logs containing a
traceparent. - Extract the IDs using a Regex capture group.
- Promote them to the top-level
Trace IDandSpan IDfields. - Strip the comment from the SQL statement (to ensure query aggregation/metrics still work correctly).
Add this final logic to your transform block:
yaml123456789101112131415161718192021# otelcol.yamlprocessors:transform/parse_postgres:log_statements:# [...]- context: logconditions:- IsMatch(attributes["db.query.text"], "traceparent")statements:- merge_maps(attributes, ExtractPatterns(attributes["db.query.text"],"/\\*traceparent='00-(?P<trace_id>[0-9a-f]{32})-(?P<span_id>[0-9a-f]{16})-(?P<trace_flags>[0-9a-f]{2})'\\*/"),"upsert")- set(trace_id.string, attributes["trace_id"])- set(span_id.string, attributes["span_id"])- set(flags, Int(attributes["trace_flags"]))- replace_pattern(attributes["db.query.text"], "/\\*traceparent='[^']+'\\*/", "")- replace_pattern(body, " /\\*traceparent='[^']+'\\*/", "")- delete_key(attributes, "trace_id")- delete_key(attributes, "span_id")- delete_key(attributes, "trace_flags")
After applying this transformation, your database logs are no longer isolated text files. They are fully integrated components of your distributed trace:
text1234567891011LogRecord #0ObservedTimestamp: 2025-11-18 18:58:12.549 +0000 UTCTimestamp: 2025-11-18 18:58:12.549 +0000 UTCSeverityText: LOGSeverityNumber: Info4(12)Body: Str(statement: SELECT * from album;)Attributes:. . .Trace ID: 4b9ce7b9aaf2cc874b523154ae53b89fSpan ID: 484b22db45f0bd1bFlags: 1
Final thoughts
By completing this pipeline, you have elevated PostgreSQL from a passive data store into an active participant in your observability stack. You can now:
-
Navigate instantly from a slow application trace directly to the exact PostgreSQL log showing what caused the bottleneck.
-
Filter logs by Trace ID to see the complete story of a request, crossing the boundary between your code and the database engine.
-
Measure accurate latency by comparing the application's view of the span duration against the database's reported execution time.
This is the power of OpenTelemetry: distinct signals (logs, traces, and metrics) from distinct sources (applications, databases, infrastructure) converging into a single, unified view of your system.
Thanks for reading!
