Last updated: July 6, 2025
Building Telemetry Pipelines with the OpenTelemetry Collector
Your services emit a torrent of telemetry—traces, metrics, and logs. It’s the lifeblood of modern observability. But how does that data get from your application and infrastructure to your backend?
For many, the answer is a chaotic web of vendor-specific agents, direct-to-backend SDK configurations, and disparate data shippers. This setup is brittle, expensive, hard to manage, and locks you into a single vendor’s ecosystem.
There is a better way.
Instead of managing a maze of point-to-point integrations, we’re going to build a telemetry pipeline: a centralized, vendor-neutral system that gives you complete control to collect, enrich, and route your observability data.
At the heart of this system is the OpenTelemetry Collector. It is a standalone service that acts as a universal receiver, a powerful processing engine, and a flexible dispatcher for telemetry data.
In this article, we’ll build a telemetry pipeline from the ground up. We’ll start with basic ingestion, layer in advanced processing, and finally branch into sophisticated data flows that unlock new insights.
Let’s get started!
The simplest possible pipeline
Every data pipeline needs an entry point and an exit. We will start by building the most basic version of an OpenTelemetry pipeline imaginable. The goal is to receive telemetry data and print it directly to the console, confirming that data is flowing correctly before we add complexity.
The Collector’s behavior is defined by a YAML configuration file. For this initial setup, you need to understand three top-level sections: receivers
, exporters, and service
.
Receivers
Receivers are the entry points for all telemetry data coming into the Collector from your applications and infrastructure.
They are configured to ingest data in various ways such as listening for network traffic, actively polling endpoints, reading from local sources (like files), or querying infrastructure APIs.
For example, the OTLP receiver sets up an endpoint that accepts data sent using the OpenTelemetry Protocol, while the Prometheus receiver periodically scrapes metrics from specified targets.
Exporters
Exporters are the final destinations for all telemetry data leaving the Collector after it has been processed.
They are responsible for translating data into the required format and transmitting it to various backend systems, such as observability platforms, databases, or message queues.
For example, the otlphttp exporter can send data to any OTLP-compatible backend over HTTP, while the debug exporter simply writes telemetry data to the console for debugging.
Service
The service
section is the central orchestrator that activates and defines the flow of data through the Collector. No component is active unless it is enabled here.
It works by defining pipelines for each signal type (traces, metrics, or logs). Each pipeline specifies the exact path data will take by linking receivers, processors, and exporters.
For example, a traces pipeline could be configured to receive span data over OTLP, and fan it out to Jaeger through the OTLP exporter.
To see the three components in action, let’s create our first configuration file:
otelcol.yaml123456789101112131415receivers:otlp:protocols:grpc:endpoint: 0.0.0.0:4317exporters:debug:verbosity: detailedservice:pipelines:logs:receivers: [otlp]exporters: [debug]
This configuration creates a simple pipeline for logs alone. It sets up an otlp
receiver to accept log data sent over gRPC on port 4317. Any logs received are immediately passed without any processing to the debug
exporter, which then prints a detailed OTLP representation to the Collector’s stderr
.
To test your pipeline, you need an application that can generate and send telemetry data. A convenient tool for this is otelgen, which produces synthetic logs, traces, and metrics.
You can define and run the Collector and the otelgen
tool using the following Docker Compose file:
docker-compose.yml12345678910111213141516171819202122232425services:otelcol:image: otel/opentelemetry-collector-contrib:0.129.1container_name: otelcolvolumes:- ./otelcol.yaml:/etc/otelcol-contrib/config.yamlrestart: unless-stoppedotelgen:image: ghcr.io/krzko/otelgen:v0.5.2container_name: otelgencommand:["--otel-exporter-otlp-endpoint","otelcol:4317","--insecure","logs","multi",]depends_on:- otelcolnetworks:otelnet:driver: bridge
The otelgen
service is configured via its command arguments to send telemetry that matches our Collector’s setup:
--otel-exporter-otlp-endpoint otelcol:4317
: Tellsotelgen
to send data to theotelcol
service on port 4317.--insecure
: Disables TLS.logs
: Instructsotelgen
to generate log data specifically.multi
: A subcommand that generates a continuous, varied stream of logs.
To see it in action, start both services in detached mode:
1docker compose up -d
Once running, the otelcol
service listens for OTLP data over gRPC on port 4317, and the otelgen
service generates and sends a continuous stream of logs to it.
You can monitor the Collector’s output to verify that it’s receiving the logs:
1docker compose logs otelcol -f
You will see a continuous stream of log data being printed to the console. A single log entry will be formatted like this, showing rich contextual information like the severity, body, and various attributes:
output123456789101112131415161718192021222324252627282930313233ResourceLog #0Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0Resource attributes:-> host.name: Str(node-1)-> k8s.container.name: Str(otelgen)-> k8s.namespace.name: Str(default)-> k8s.pod.name: Str(otelgen-pod-ab06ca8b)-> service.name: Str(otelgen)ScopeLogs #0ScopeLogs SchemaURL:InstrumentationScope otelgenLogRecord #0ObservedTimestamp: 2025-07-06 11:21:57.085421018 +0000 UTCTimestamp: 2025-07-06 11:21:57.085420886 +0000 UTCSeverityText: ErrorSeverityNumber: Error(17)Body: Str(Log 3: Error phase: finish)Attributes:-> worker_id: Str(3)-> service.name: Str(otelgen)-> trace_id: Str(46287c1c7b7eebea22af2b48b97f4a49)-> span_id: Str(f5777521efe11f94)-> trace_flags: Str(01)-> phase: Str(finish)-> http.method: Str(PUT)-> http.status_code: Int(403)-> http.target: Str(/api/v1/resource/3)-> k8s.pod.name: Str(otelgen-pod-8f215fc5)-> k8s.namespace.name: Str(default)-> k8s.container.name: Str(otelgen)Trace ID:Span ID:Flags: 0
Understanding the debug
exporter output
The output from the debug exporter shows the structured format of OpenTelemetry data (OTLP). It’s hierarchical, starting from the resource that generated the telemetry all the way down to the individual telemetry record. Let’s break down what you’re seeing.
ResourceLogs and Resource attributes
ResourceLog #0
: This is the top-level container. The#0
indicates it’s the first resource log in this batch. All telemetry within this block comes from the same resource.Resource attributes
: These are key-value pairs that describe the entity that produced the log. This could be a service, a container, or a host machine. In the example, attributes likeservice.name
andk8s.pod.name
apply to every log generated by this resource.
ScopeLogs
ScopeLogs #0
: Within a resource, telemetry is grouped by its origin, known as the instrumentation scope. This block contains all logs from a single scope.InstrumentationScope
: This identifies the specific library or module that generated the log (in this case,otelgen
). This is useful for knowing which part of your application emitted the log.
LogRecord
Within a single ResourceLog
block, you may see multiple LogRecord
entries, but they all belong to that same parent resource.
LogRecord #0
: This is the first log entry belonging to the resource. The key fields are:Timestamp
: When the event occurred.SeverityNumber
/SeverityText
: The log level, such asERROR
orINFO
.Body
: The actual log message content.Attributes
: These are key-value pairs that provide context specific to this single log event.Trace ID
/Span ID
: These top-level fields are crucial for correlation. When populated, they directly link a log to a specific trace and span, allowing you to easily navigate between logs and traces in your observability backend.
Congratulations, you’ve built and verified your first telemetry pipeline! It’s simple, but it establishes the fundamental flow of data from a source, through the Collector, and to an exit point. Now, let’s make it more powerful.
Processing and transforming telemetry
Right now, your pipeline is just an empty conduit. Data goes in one end and comes out the other untouched. The real power of the Collector lies in its ability to process data in-flight. This is where processors come in.
Processors are intermediary components in a pipeline that can inspect, modify, filter, or enrich your telemetry. Let’s add a few essential processors to solve common problems and make the pipeline more intelligent.
Our new pipeline flow will look like this: Receiver -> [Processors] -> Exporter
.
Batching telemetry for efficiency
Sending every single span or metric individually over the network is incredibly inefficient. It creates high network traffic and puts unnecessary load on the backend. The batch processor solves this by grouping telemetry into batches before exporting.
This isn’t an optional tweak; for any production workload, the batch processor is essential.
Go ahead and add it to your processors
section. By default, it buffers data for a short period to create batches automatically:
otelcol.yaml123456789101112131415# Add this top-level 'processors' sectionprocessors:batch:# You can customize the default values for more control# send_batch_size: 8192# timeout: 200msservice:pipelines:logs:receivers: [otlp]# Add the processor to your pipeline's execution path.# Order matters here if you have multiple processors.processors: [batch]exporters: [debug]
With this simple addition, our Collector now buffers data for up to 200 milliseconds or until it has 8192 items, and then sends it all to the exporters in one efficient action.
Reducing noise by filtering telemetry
Observability data can be noisy. For example, frequent DEBUG
level logs are often useful in development but can be costly and redundant in production. Let’s add a bouncer to our pipeline to drop this noise at the source.
We’ll use the filter processor, which lets you drop telemetry data using the powerful OpenTelemetry Transformation Language (OTTL). Let’s say you wanted to drop all logs below the INFO
severity level, you can do so with the following modifications:
otelcol.yaml123456789101112131415processors:batch:# The filter processor lets you exclude telemetry data based on its attributesfilter:logs:log_record:- severity_number < SEVERITY_NUMBER_INFOservice:pipelines:logs:receivers: [otlp]# The order is important. You want to drop data before batching it.processors: [filter, batch]exporters: [debug]
Now, any log with a severity number less than 9 (INFO
) will be dropped by the Collector and will never reach the debug
exporter.
Modifying and enriching telemetry data
When you’d like to add, remove, or modify attributes in your telemetry data, there are a few general-purpose processors you can use:
- resource processor: For actions targeting resource-level attributes (e.g.,
host.name
,service.name
). - attributes processor: For simple actions on the attributes of individual logs, spans, or metrics.
- transform processor: The most powerful of the three, for performing complex transformations on any part of your telemetry data.
Some common use cases for these processors include:
- Redacting or removing sensitive information from telemetry before it leaves your systems.
- Enriching data by adding static attributes.
- Renaming or standardizing attributes to conform to semantic conventions across different services.
- Correcting malformed or misplaced data sent by older or misconfigured instrumentation.
Let’s examine the structure of the OTLP log records being sent by the otelgen tool once again:
output123456789101112131415161718192021222324252627282930ResourceLog #0Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0Resource attributes:-> host.name: Str(node-1)-> k8s.container.name: Str(otelgen)-> k8s.namespace.name: Str(default)-> k8s.pod.name: Str(otelgen-pod-b9919c90)-> service.name: Str(otelgen)LogRecord #0ObservedTimestamp: 2025-07-03 16:01:40.264711241 +0000 UTCTimestamp: 2025-07-03 16:01:40.264711041 +0000 UTCSeverityText: FatalSeverityNumber: Fatal(21)Body: Str(Log 1763: Fatal phase: finish)Attributes:-> worker_id: Str(1763)-> service.name: Str(otelgen)-> trace_id: Str(a85d432127e63d667508563efd73af52)-> span_id: Str(34c07d59e6cfa2d9)-> trace_flags: Str(01)-> phase: Str(finish)-> http.method: Str(POST)-> http.status_code: Int(200)-> http.target: Str(/api/v1/resource/1763)-> k8s.pod.name: Str(otelgen-pod-b9919c90)-> k8s.namespace.name: Str(default)-> k8s.container.name: Str(otelgen)Trace ID:Span ID:Flags: 0
There are three issues here that deviate from the latest OpenTelemetry semantic conventions:
- Misplaced trace context: The
trace_id
,span_id
, andtrace_flags
values are incorrectly placed inside theAttributes
map, while the dedicated top-levelTrace ID
,Span ID
, andFlags
fields are empty. - Redundant attributes: Resource attributes like
k8s.pod.name
andservice.name
are duplicated in the log record’s Attributes. - Deprecated attributes: HTTP attributes like
http.method
,http.target
, andhttp.status_code
have all been deprecated in favor of newer attributes.
The transform processor is the perfect tool for fixing these issues. Add the following modifications to your otelcol.yaml
file:
otelcol.yaml1234567891011121314151617181920212223242526272829processors:transform:log_statements:# Move trace context from attributes to the correct top-level fields- context: logstatements:- set(trace_id.string, attributes["trace_id"])- set(span_id.string, attributes["span_id"])- set(flags, Int(attributes["trace_flags"]))# Delete the original, now redundant, trace context attributes- context: logstatements:- delete_key(attributes, "trace_id")- delete_key(attributes, "span_id")- delete_key(attributes, "trace_flags")# Delete the duplicated resource attributes from the log record's attributes- context: logstatements:- delete_key(attributes, "k8s.pod.name")- delete_key(attributes, "k8s.namespace.name")- delete_key(attributes, "k8s.container.name")- delete_key(attributes, "service.name")service:pipelines:logs:receivers: [otlp]processors: [filter, transform, batch] # Add the transform processor to the pipelineexporters: [debug]
This configuration uses OTTL statements to clean up our log records:
set(trace_id, ...)
: This function takes the value from thetrace_id
key within the attributes map and sets it as the top-levelTrace ID
for the log record. The same logic applies to the other statements.delete_key(attributes, ...)
: After moving the values, this function removes the original keys from the attributes map to eliminate redundancy.
You can recreate the containers to see it in action:
1docker compose up --force-recreate -d
When you check the logs, you’ll notice that the outgoing log data is now correctly formatted, smaller in size, and fully compliant with semantic conventions with the Trace ID
and Span ID
fields properly populated:
output12345678910111213141516171819202122232425262025-07-03T16:36:49.418Z info ResourceLog #0Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0Resource attributes:-> host.name: Str(node-1)-> k8s.container.name: Str(otelgen)-> k8s.namespace.name: Str(default)-> k8s.pod.name: Str(otelgen-pod-3efafa6f)-> service.name: Str(otelgen)ScopeLogs #0ScopeLogs SchemaURL:InstrumentationScope otelgenLogRecord #0ObservedTimestamp: 2025-07-03 16:36:48.41161663 +0000 UTCTimestamp: 2025-07-03 16:36:48.411616563 +0000 UTCSeverityText: ErrorSeverityNumber: Error(17)Body: Str(Log 38: Error phase: finish)Attributes:-> worker_id: Str(38)-> phase: Str(finish)-> url.path: Str(/api/v1/resource/340)-> http.response.status_code: Int(200)-> http.request.method: Str(GET)Trace ID: 86713e2736d6f6a398047b9317b11398Span ID: d06e86785766aa64Flags: 1
Ensuring resilience with the Memory Limiter
An overloaded service could suddenly send a massive flood of data, overwhelming the Collector and causing it to run out of memory and crash. This would create a total visibility outage.
The memory_limiter processor acts as a safety valve to prevent this. It monitors memory usage and starts rejecting data if it exceeds a configured limit, enforcing backpressure on the data source.
otelcol.yaml1234567891011121314151617181920processors:batch:filter: # ...transform: # ...memory_limiter:# How often to check the collector's memory usage.check_interval: 1s# The hard memory limit in Mebibytes (MiB). If usage exceeds this,# the collector will start rejecting new data.limit_mib: 400# A soft limit. When usage drops below this, the collector will# start accepting data again.spike_limit_mib: 100service:pipelines:logs:receivers: [otlp]processors: [memory_limiter, filter, transform, batch]exporters: [debug]
Note that the memory_limiter
should come first in your pipeline’s processor list. If it’s over the limit, you want to reject data immediately, before wasting CPU cycles on other processing.
Your pipeline is now not just efficient and correct, but also resilient against overloads.
Handling multiple signals with parallel pipelines
So far, you’ve built a simple pipeline for processing logs. However, a key strength of the OpenTelemetry Collector is its ability to handle traces, metrics, and logs simultaneously within a single instance. You can achieve this by defining parallel pipelines, one for each signal type, in the service
section.
Let’s expand the Collector configuration to also process traces. The goal is to receive traces from an application, batch them for efficiency, and then send them to a Jaeger instance for visualization, all while the existing logs
pipeline continues to operate independently.
To send traces to Jaeger, you need a new exporter. The Collector allows you to define multiple components of the same type by giving them unique names using the type/name
syntax:
otelcol.yaml12345exporters:otlp/jaeger:endpoint: jaeger:4317 # The address of the Jaeger gRPC endpointtls:insecure: true # Use TLS in production
Now, you can add a new pipeline to the service section specifically for traces. This pipeline will:
- Reuse the same
otlp
receiver we already defined. - Reuse the
batch
andmemory_limiter
processors. - Send its data to the new
otlp/jaeger
exporter.
Here is the complete service
section showing both the logs
and traces
pipelines running in parallel:
otelcol.yaml1234567891011service:pipelines:logs:receivers: [otlp]processors: [memory_limiter, filter, transform, batch]exporters: [debug]traces:receivers: [otlp]processors: [memory_limiter, batch]exporters: [otlp/jaeger]
To test this, you need to update your docker-compose.yml
to run a Jaeger instance and a second otelgen
service configured to generate traces:
docker-compose.yaml1234567891011121314151617181920212223242526272829303132333435363738394041424344454647services:otelcol:image: otel/opentelemetry-collector-contrib:0.129.1container_name: otelcolvolumes:- ./otelcol.yaml:/etc/otelcol-contrib/config.yamlrestart: unless-stoppedjaeger:image: jaegertracing/all-in-one:1.71.0container_name: jaegerports:- 16686:16686otelgen-logs:image: ghcr.io/krzko/otelgen:v0.5.2container_name: otelgen-logscommand:["--otel-exporter-otlp-endpoint","otelcol:4317","--insecure","logs","multi",]depends_on:- otelcolotelgen-traces:image: ghcr.io/krzko/otelgen:v0.5.2container_name: otelgen-tracescommand:["--otel-exporter-otlp-endpoint","otelcol:4317","--insecure","--duration","86400","traces","multi",]depends_on:- otelcolnetworks:otelnet:driver: bridge
Notice we now have two otelgen
services: otelgen-logs
sends logs as before, and otelgen-traces
sends traces to the same OTLP endpoint on our Collector.
Recreate the containers with the updated configuration:
1docker compose up --force-recreate --remove-orphans -d
The easiest way to verify the traces
pipeline is to check the Jaeger UI. Open your web browser and navigate to http://localhost:16686
. In the Jaeger UI, select otelgen from the Service dropdown menu and click Find Traces.
You will see a list of traces generated by the otelgen-traces
service, confirming that your new traces
pipeline is successfully receiving, processing, and exporting trace data to Jaeger.
With this setup, you have a single Collector instance efficiently managing two completely separate data flows, demonstrating the power and flexibility of defining multiple pipelines.
Fanning out to multiple destinations
A key advantage of the OpenTelemetry Collector is its ability to easily route telemetry to multiple destinations at once, a concept often called “fanning out”. This is done by simply adding more exporters to a pipeline’s exporters
list.
Let’s demonstrate this by forwarding both our logs and traces to Dash0, in addition to our existing destinations:
otelcol.yaml1234567891011121314151617181920exporters:# [...]otlphttp/dash0:# Environment variables in the collector config are automatically expandedendpoint: ${env:DASH0_ENDPOINT}headers:Authorization: Bearer ${env:DASH0_TOKEN}Dash0-Dataset: ${env:DASH0_DATASET}service:pipelines:logs:receivers: [otlp]processors: [memory_limiter, filter, transform, batch]exporters: [debug, otlphttp/dash0]traces:receivers: [otlp]processors: [batch]exporters: [otlp/jaeger, otlphttp/dash0]
With this change, both pipelines now fan out the processed data to the specified exporters. You’ll see the data in your Dash0 dashboard as follows:
This capability provides incredible flexibility, allowing you to experiment with new backends, migrate between vendors without downtime, or satisfy long-term compliance and archival needs without changing your application code.
Chaining pipelines with connectors
You can create powerful data processing flows by generating new telemetry signals from your existing data.
This is possible with connectors
. A connector is a special component that acts as both an exporter
for one pipeline and a receiver
for another, allowing you to chain pipelines together.
Let’s demonstrate this by building a system that generates an error count metric from the otelgen
log data. The count connector is perfect for this.
First, you’ll need to define the count
connector and configure it to create a metric named log_error.count
that increments every time it sees a log with a severity of ERROR
or higher:
otelcol.yaml1234567connectors:count/log_errors:logs:log_error.count:description: count of errors loggedconditions:- severity_number >= SEVERITY_NUMBER_ERROR
To use this, go ahead and update your service
configuration to create a new metrics pipeline. The count/log_errors
connector will serve as the bridge: it will be an exporter for the logs
pipeline and a receiver for the new metrics
pipeline:
otelcol.yaml123456789101112131415161718service:pipelines:logs:receivers: [otlp]processors: [memory_limiter, filter, transform, batch]# The connector is added as a destination for logs.exporters: [debug, otlphttp/dash0, count/log_errors]traces:receivers: [otlp]processors: [memory_limiter, batch]exporters: [otlp/jaeger, otlphttp/dash0]metrics:# This new pipeline receives data exclusively from the connector.receivers: [count/log_errors]processors: [memory_limiter, batch]exporters: [otlphttp/dash0]
This configuration is a game-changer because it allows you to derive new insights from existing data streams directly within the Collector. The data flow is now:
- The logs pipeline processes logs and sends a copy to the
count/log_errors
connector. - The
count/log_errors
connector inspects these logs, generates a newlog_error.count
metric based on our condition, and passes this metric along. - The
metrics
pipeline receives the newly generated metric, batches it, and sends them to your backend.
After relaunching the services, you’ll see the new log_error.count
metric appear in your dashboard, all without adding a single line of metrics instrumentation code to your application.
This is a basic example, but it demonstrates the power of a true pipeline architecture. The same principle can be used for more advanced scenarios, like using the spanmetrics connector to automatically generate full RED metrics (request rates, error counts, and duration histograms) directly from your trace data.
Understanding Collector distributions
When you use the OpenTelemetry Collector, you are not running a single, monolithic application. Instead, you use a distribution: a specific binary packaged with a curated set of components (receivers
, processors
, exporters
, and extensions
).
This model exists to allow you to use a version of the Collector that is tailored to your specific needs or even create your own. There are three primary types of distributions you will encounter:
1. Official OpenTelemetry distributions
The OpenTelemetry project maintains several official distributions. The two most common are:
- Core (
otelcol
): This is a minimal, lightweight distribution that includes only the most essential and stable components. It provides a stable foundation but has limited functionality. - Contrib (
otelcol-contrib
): This is the most comprehensive version, that includes almost every component from both the core and contrib repositories. It is the recommended distribution for getting started, as it provides the widest range of capabilities for connecting to various sources and destinations without needing to build a custom version.
2. Vendor distributions
Some observability vendors provide their own Collector distributions. These are typically based on the otelcol-contrib
distribution but are pre-configured with the vendor’s specific exporter and other recommended settings. Using a vendor distribution can simplify the process of sending data to that vendor’s platform.
3. Custom distributions
For production environments, the recommended best practice is to build your own custom distribution. This involves creating a lightweight, fit-for-purpose Collector binary that contains only the components you need.
You can create a custom distribution using the OpenTelemetry Collector Builder (ocb
) tool. It involves creating a simple manifest file that lists the components you want to include, and then running the ocb
tool to compile your custom binary.
Debugging and observing your pipeline
A critical piece of infrastructure like your telemetry pipeline must itself be observable and easy to debug. If the Collector is dropping data, experiencing high latency, or is unhealthy, you need to know.
Fortunately, the Collector is instrumented out-of-the-box and provides several tools for validation and observation.
Validating your configuration
Before deploying the Collector, you should always validate that your config.yaml
file is syntactically correct. The primary way to do this is with the validate
subcommand which checks the configuration file for errors without starting the full Collector service:
1otelcol-contrib validate --config=otelcol.yaml
If the configuration is valid, the command will exit silently. If there are errors, it will print them to the console:
For a more visual approach, you can use the OtelBin web tool. This tool allows you to paste your configuration, visualize the resulting pipeline, and validate it against various Collector distributions.
If you’re complex OTTL statements in the transform
or filter
processor, you will also find the OTTL Playground to be a useful resource for understanding how different configurations impact the OTLP data transformation.
Live debugging
When building a pipeline, you’ll often need to inspect the data flowing through it in real-time. As you’ve already seen, the debug
exporter is primary way to do this.
By adding it to any pipeline’s exporters
list, you can print the full content of traces, metrics, or logs to the console, and verify that your receivers
and processors
are working as expected.
For debugging the Collector components themselves, you can enable the zPages extension:
otelcol.yaml1234567extensions:zpages: # default endpoint is localhost:55679service:extensions: [zpages]pipelines:# ... your pipelines
Once the Collector is running, you can access several useful debugging pages in your browser, such as /debug/pipelinez
to view your pipeline components or /debug/tracez
to see recently sampled traces.
Observing the Collector’s internal telemetry
In production environments, you’ll need to monitor the Collector’s health and performance over time. This is configured under the service.telemetry
section.
By default, the Collector sends its own internal logs to stderr
, and its often the first place you’ll check when there’s a problem with your pipeline. For metrics, the Collector can expose its own data in a Prometheus-compatible format:
You can now scrape this endpoint with a Prometheus instance to monitor key health indicators like otelcol_exporter_send_failed_spans_total
, otelcol_processor_batch_send_size, and otelcol_receiver_accepted_spans
. For more details, see the official documentation on Collector telemetry.
Going to production: Collector deployment patterns
How you run the Collector in production is a critical architectural decision. Your deployment strategy affects the scalability, security, and resilience of your entire observability setup. The two fundamental roles a Collector can play are that of an agent or a gateway, which can be combined into several common patterns.
1. Agent-only deployment
The simplest pattern is to deploy a Collector agent on every host or as a sidecar to every application pod. In this model, each agent is responsible for collecting, processing, and exporting telemetry directly to one or more backends.
1Application → OpenTelemetry Collector (Agent) → Observability Backend
This approach is easy to start with but it offers limited durability, as agents typically buffer in memory, meaning a single node failure can lead to data loss.
2. Agent and gateway deployment
A more robust production pattern enhances the agent deployment with a new, centralized gateway layer. In this model, the agent’s role is simplified: it handles local collection and metadata enrichment before forwarding all telemetry to the gateway.
This gateway is a standalone, centralized service consisting of one or more Collector instances that receive telemetry from all agents. It’s the ideal place for heavy processing like PII scrubbing, filtering, and tail-based sampling, which ensures rules are applied consistently before data leaves your environment.
1Application → Collector (Agent) → Collector (Gateway) → Observability Backend
This layered approach provides the best of both worlds: agents handle local collection and metadata enrichment efficiently, while the gateway provides centralized control, security, and processing.
High-scale deployment with a message queue
When you’re dealing with massive data volumes or require extreme durability, the standard pattern is to introduce an event queue (like Apache Kafka) between your agents and a fleet of Collectors that act as consumers.
1Application → Collector (Agent) → Message Queue → Collector (Aggregator) → Backend(s)
This pattern provides two key advantages. The message queue acts as a massive buffer for durability; even if the aggregator fleet is down, agents can continue sending data to the queue, preventing data loss.
It also provides load-leveling by decoupling the agents from the aggregators, which smooths out traffic spikes and allows the aggregators to consume data at a steady rate.
You’re now a pipeline architect
We’ve journeyed from a simple data pass-through to a powerful, multi-stage pipeline that enriches, filters, secures, and routes telemetry data, even generating new, valuable signals along the way.
By adopting the pipeline mindset, you gain:
- Centralized control: Manage your entire telemetry flow from one place.
- Vendor neutrality: Swap backends with a simple config change. Use multiple vendors at once.
- Efficiency and cost savings: Filter noise and batch data at the source to slash your observability bill.
- Enhanced security: Scrub sensitive data before it ever leaves your infrastructure.
- Powerful capabilities: Unlock advanced patterns like metric generation that would be complex or impossible otherwise.
You are no longer just configuring an agent; you’re now designing a scalable telemetry infrastructure that ultimately empowers your team to observe, and improve systems.
So start small by putting a Collector in front of one of your services. Add a batch
processor and a filter
, then see what you can build. The control and flexibility you’ll gain is transformative.
Don’t forget to check out the OpenTelemetry Collector documentation for more details.
Thanks for reading!
