Last updated: September 28, 2025
Mastering the OpenTelemetry Filelog Receiver
The OpenTelemetry Collector is the swiss-army knife for modern observability, but not all applications are born cloud-native. Many critical systems, from legacy applications and databases to infrastructure components like NGINX, still write their most valuable diagnostic data to local log files. This is where the filelog receiver comes in.
It's a component that tails log files, parses their contents, and transforms them into structured OpenTelemetry Log Records. While the official documentation provides a reference of its many configuration options, it can be difficult to grasp the key concepts needed to use it effectively.
This guide will take you from the basics of tailing a file to building a reliable, production-grade pipeline for ingesting, parsing, and enriching file-based logs. You'll learn how to handle complex formats like multiline stack traces, manage log rotation gracefully, and ensure no data is lost when the Collector restarts.
How the Filelog receiver works
Before we get into configuration details, it helps to picture how the receiver handles a log file throughout its lifecycle. You can think of it as a simple repeating four-step loop:
-
Discover: The receiver scans the filesystem at regular intervals, using the
include
andexclude
patterns you've set, to figure out which log files it should pay attention to. -
Read: Once a file is picked up, the receiver opens it and begins following along as new lines are written. The
start_at
setting decides whether it begins frombeginning
or just tails new content from theend
. -
Parse: Each line (or block of lines, if multiline parsing is used) runs through a series of operators (if configured). These operators parse the raw text, pull out key attributes, assign timestamps and severity levels, and ultimately structure the log data.
-
Emit: Finally, the structured log records are passed into the Collector's pipeline, where they can be filtered, transformed further, or exported to your backend.
This Discover -> Read -> Parse -> Emit
loop forms the foundation of everything
the receiver does.
Quick Start: tailing a log file
One of the most common cases is when your application is already writing logs in
JSON format. For example, imagine you have an app logging to
/var/log/myapp/app.log
:
json12{"time":"2025-09-28 20:15:12","level":"INFO","message":"User logged in successfully","user_id":"u-123","source_ip":"192.168.1.100"}{"time":"2025-09-28 20:15:45","level":"WARN","message":"Password nearing expiration","user_id":"u-123"}
Here's the minimal Collector configuration to read and parse these logs:
yaml1234567891011121314151617181920212223242526receivers:filelog:# 1. DISCOVER: Find all .log files in /var/log/myapp/include: [/var/log/myapp/*.log]# 2. READ: Start reading from the beginning of new filesstart_at: beginning# 3. PARSE: Use the json_parser operatoroperators:- type: json_parser# Tell the parser where to find the timestamp and how it's formattedtimestamp:parse_from: attributes.timelayout: "%Y-%m-%d %H:%M:%S"# Tell the parser which field contains the severityseverity:parse_from: attributes.levelexportersdebug:verbosity: detailedservice:pipelines:logs:receivers: [filelog]exporters: [debug]
Here's a breakdown of the above configuration:
include
: Points the receiver to all.log
files in/var/log/myapp/
.start_at: beginning
: Ensures the receiver processes the entire file the first time it sees it. By default (end
), it would only capture new lines written after the Collector starts.operators
: In this case, there's just one: thejson_parser
. Its job is to take each log line, interpret it as JSON, and then promote selected fields into the log record's core metadata.timestamp
andseverity
: Within thejson_parser
, we're pulling thetime
andlevel
fields out of the JSON and promote them to the top-levelTimestamp
andSeverity
fields of each log record.
Using
the debug exporter,
e can the beautifully structured output. The original JSON string is gone, and
its contents have been promoted to the Log Record's Attributes
, Timestamp
,
and SeverityText
fields.
With the debug exporter, you'll see the parsed and structured output. Instead of just raw JSON, each field is now properly represented inside the log record:
text12345678910111213141516LogRecord #0ObservedTimestamp: 2025-09-28 20:48:36.728437503 +0000 UTCTimestamp: 2025-09-28 20:15:12 +0000 UTCSeverityText: INFOSeverityNumber: Info(9)Body: Str({"time":"2025-09-28 20:15:12","level":"INFO","message":"User logged in successfully","user_id":"u-123","source_ip":"192.168.1.100"})Attributes:-> user_id: Str(u-123)-> source_ip: Str(192.168.1.100)-> log.file.name: Str(myapp.log)-> time: Str(2025-09-28 20:15:12)-> level: Str(INFO)-> message: Str(User logged in successfully)Trace ID:Span ID:Flags: 0
Now the Collector isn't just tailing a file; it's transforming raw JSON into structured OpenTelemetry log data that seamlessly flows through the rest of your pipeline.
Parsing unstructured text with regular expression
Most infrastructure logs don't come neatly packaged as JSON. More often, they're plain text strings that follow a loose pattern, such as web server access logs, database query logs, or custom application messages. These logs are human-readable but difficult for machines to work with until they're given some structure.
To bridge that gap, the Collector provides the regex_parser operator. By applying regular expressions with named capture groups, you can slice a raw log line into meaningful pieces and promote them into structured fields.
For example, if you're tailing an NGINX access log file that contains entries in the common log format:
text12127.0.0.1 - - [28/Sep/2025:20:30:00 +0000] "GET /api/v1/users HTTP/1.1" 200 512127.0.0.1 - - [28/Sep/2025:20:30:05 +0000] "POST /api/v1/login HTTP/1.1" 401 128
You can use the regex_parser
to
yaml123456789101112131415161718192021receivers:filelog:include: [/var/log/nginx/access.log]start_at: beginningoperators:- type: regex_parser# Use named capture groups to extract dataregex:'^(?P<client_ip>[^ ]+) - - \[(?P<timestamp>[^\]]+)\]"(?P<http_method>[A-Z]+) (?P<http_path>[^ "]+)[^"]*"(?P<status_code>\d{3}) (?P<response_size>\d+)$'# Parse the extracted timestamptimestamp:parse_from: attributes.timestamplayout: "%d/%b/%Y:%H:%M:%S %z"# Map status codes to severitiesseverity:parse_from: attributes.status_codemapping:"401": WARN"5": ERROR # Use '5' to match all 5xx codes
The core of this setup is the regex
field with named capture groups. Each
group labels a slice of the line so the parser can turn it into an attribute:
client_ip
grabs the remote address, timestamp
captures the bracketed time
string, http_method
and http_path
pull the request pieces, status_code
picks up the three-digit response code, and response_size records the byte
count.
Once those attributes exist, the timestamp
field parses the timestamp
string
into a proper datetime value, and the severity
block translates status codes
into meaningful severity levels using an explicit mapping
: 2xx and 3xx
responses as INFO
, 4xx as WARN
, and 5xx as ERROR
.
The debug
output confirms our success:
text1234567891011121314151617LogRecord #0ObservedTimestamp: 2025-09-28 21:17:42.31729069 +0000 UTCTimestamp: 2025-09-28 20:30:00 +0000 UTCSeverityText: 200SeverityNumber: Info(9)Body: Str(127.0.0.1 - - [28/Sep/2025:20:30:00 +0000] "GET /api/v1/users HTTP/1.1" 200 512)Attributes:-> status_code: Str(200)-> response_size: Str(512)-> log.file.name: Str(myapp.log)-> client_ip: Str(127.0.0.1)-> timestamp: Str(28/Sep/2025:20:30:00 +0000)-> http_method: Str(GET)-> http_path: Str(/api/v1/users)Trace ID:Span ID:Flags: 0
With a single expression and a couple of parsing steps, a flat NGINX access log is transformed into structured OpenTelemetry data. From there, your pipeline can enrich it further—for example, by mapping the captured fields to the OpenTelemetry Semantic Conventions for HTTP attributes.
Handling stack traces and multiline logs
Not all log entries fit neatly on a single line. A stack trace is a classic example:
text1234567892025-09-28 21:05:42 [ERROR] Unhandled exception: Cannot read property 'foo' of undefinedTypeError: Cannot read property 'foo' of undefinedat Object.<anonymous> (/usr/src/app/index.js:15:18)at Module._compile (node:internal/modules/cjs/loader:1254:14)at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)at Module.load (node:internal/modules/cjs/loader:1117:32)at Module._load (node:internal/modules/cjs/loader:958:12)at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)at node:internal/main/run_main_module:17:47
If you feed this straight into the Collector, the receiver will treat each line as its own log entry. That's not what you'd want here since the error message and every stack frame belong to the same record.
The fix is to use the multiline
configuration, which tells the receiver how to
group lines together:
yaml12345678910111213141516171819receivers:filelog:include: [/var/log/myapp/*.log]start_at: beginningmultiline:# New entry starts when a line begins with "YYYY-MM-DD HH:MM:SS"line_start_pattern: ^\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}operators:- type: regex_parserregex: (?P<timestamp>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+\[(?P<severity>[A-Za-z]+)\]\s+(?P<message>.+)timestamp:parse_from: attributes.timestamplayout: "%Y-%m-%d %H:%M:%S"severity:parse_from: attributes.severity
Here, the line_start_pattern
acts as the anchor: a new log entry begins only
when a line starts with a date in the form YYYY-MM-DD HH:MM:SS
. Any line that
doesn't match is automatically folded into the body of the previous entry.
The result is that the entire stack trace, from the error message down through
each at ...
line, gets captured as one structured log record. This way, you
don't lose context when analyzing errors.
Handling log rotation seamlessly
Log files don’t grow indefinitely.
At some point, they'll get rotated.
The filelog receiver is built to handle common rotation patterns (like renaming
app.log
to app.log.1
) automatically and without losing data.
It works by tracking files with a unique fingerprint (taken from the first few kilobytes) rather than just the filename. When a file is rotated, the receiver recognizes that the old file has been renamed, finishes reading it to the end, and then begins reading the new file from the start.
There’s no special configuration required for this behavior as it happens out of the box.
How to avoid lost or duplicate logs
What happens if the Collector process restarts? Without care, you risk either
re-ingesting old data or skipping over new logs. If you set
start_at: beginning
, the receiver will reread all your log files and create
massive duplication. If you set start_at: end
, it will miss any logs written
while the Collector was down.
The solution is checkpointing. By configuring a storage extension, you
instruct the filelog
receiver to save its position (the last read offset for
each file) to disk.
yaml1234567891011121314151617181920extensions:file_storage:directory: /var/otelcol/storagereceivers:filelog:include: [/var/log/myapp/*.log]start_at: beginning# Link the receiver to the storage extensionstorage: file_storage# ... processors, exportersservice:# The extension must be enabled in the service sectionextensions: [file_storage]pipelines:logs:receivers: [filelog]# ...
With the storage
extension enabled, the receiver will:
- On startup, check the storage directory for saved offsets.
- Resume reading from the saved offset for any file it was tracking, ensuring no data is lost or duplicated.
- Periodically update the storage with its latest progress.
This is an essential best practice for any production deployment.
Filelog receiver tips and best practices
When troubleshooting the filelog receiver, a few issues come up again and again. Let's look at a few of these below:
-
The most common issue is that the logs don't show up. In almost every case, the cause is permissions. The fix is to ensure the user running the Collector can read not just the log files, but also the directories that contain them.
-
Another frequent culprit is the start_at setting. By default it is set to end, which means the receiver will only collect new lines written after startup. If you are testing against an existing file that isn’t actively being written to, change it to beginning so the entire file is ingested. Finally, double-check your glob pattern. If you are trying to match files in nested directories, remember to use
**
(for example,/var/log/**/*.log
). -
Another common frustration is when your regular expression doesn't match the log lines. When in doubt, test it outside the Collector first. Tools like Regex101 are invaluable for verifying your expression, especially if you select the "Golang" flavor to match the Collector’s regex engine. Subtle whitespace or hidden characters are often the reason a pattern fails.
-
Finally, if your logs are being duplicated on restart, you need to enable a storage extension that allows the receiver to checkpoint its position in each file and resume cleanly, without data loss or duplication.
Final thoughts
The filelog
receiver is an essential bridge between traditional file-based
logging and the world of modern, structured observability. By mastering its core
concepts of discovery, parsing with operators, and stateful checkpointing, you
can reliably ingest data from any application that writes to a file.
Once you have transformed your raw text into well-structured OpenTelemetry logs, the full power of the Collector is at your disposal. You can now filter, enrich, and route this data to any observability backend, turning forgotten log files into a rich source of actionable insight.
