Last updated: July 6, 2025

Building Telemetry Pipelines with the OpenTelemetry Collector

Your services emit a torrent of telemetry—traces, metrics, and logs. It’s the lifeblood of modern observability. But how does that data get from your application and infrastructure to your backend?

For many, the answer is a chaotic web of vendor-specific agents, direct-to-backend SDK configurations, and disparate data shippers. This setup is brittle, expensive, hard to manage, and locks you into a single vendor’s ecosystem.

There is a better way.

Instead of managing a maze of point-to-point integrations, we’re going to build a telemetry pipeline: a centralized, vendor-neutral system that gives you complete control to collect, enrich, and route your observability data.

At the heart of this system is the OpenTelemetry Collector. It is a standalone service that acts as a universal receiver, a powerful processing engine, and a flexible dispatcher for telemetry data.

In this article, we’ll build a telemetry pipeline from the ground up. We’ll start with basic ingestion, layer in advanced processing, and finally branch into sophisticated data flows that unlock new insights.

Let’s get started!

The simplest possible pipeline

Every data pipeline needs an entry point and an exit. We will start by building the most basic version of an OpenTelemetry pipeline imaginable. The goal is to receive telemetry data and print it directly to the console, confirming that data is flowing correctly before we add complexity.

The Collector’s behavior is defined by a YAML configuration file. For this initial setup, you need to understand three top-level sections: receivers, exporters, and service.

Receivers

Receivers are the entry points for all telemetry data coming into the Collector from your applications and infrastructure.

They are configured to ingest data in various ways such as listening for network traffic, actively polling endpoints, reading from local sources (like files), or querying infrastructure APIs.

For example, the OTLP receiver sets up an endpoint that accepts data sent using the OpenTelemetry Protocol, while the Prometheus receiver periodically scrapes metrics from specified targets.

Exporters

Exporters are the final destinations for all telemetry data leaving the Collector after it has been processed.

They are responsible for translating data into the required format and transmitting it to various backend systems, such as observability platforms, databases, or message queues.

For example, the otlphttp exporter can send data to any OTLP-compatible backend over HTTP, while the debug exporter simply writes telemetry data to the console for debugging.

Service

The service section is the central orchestrator that activates and defines the flow of data through the Collector. No component is active unless it is enabled here.

It works by defining pipelines for each signal type (traces, metrics, or logs). Each pipeline specifies the exact path data will take by linking receivers, processors, and exporters.

For example, a traces pipeline could be configured to receive span data over OTLP, and fan it out to Jaeger through the OTLP exporter.

To see the three components in action, let’s create our first configuration file:

yaml
otelcol.yaml
123456789101112131415
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [otlp]
exporters: [debug]

This configuration creates a simple pipeline for logs alone. It sets up an otlp receiver to accept log data sent over gRPC on port 4317. Any logs received are immediately passed without any processing to the debug exporter, which then prints a detailed OTLP representation to the Collector’s stderr.

To test your pipeline, you need an application that can generate and send telemetry data. A convenient tool for this is otelgen, which produces synthetic logs, traces, and metrics.

You can define and run the Collector and the otelgen tool using the following Docker Compose file:

yaml
docker-compose.yml
12345678910111213141516171819202122232425
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.129.1
container_name: otelcol
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
restart: unless-stopped
otelgen:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"logs",
"multi",
]
depends_on:
- otelcol
networks:
otelnet:
driver: bridge

The otelgen service is configured via its command arguments to send telemetry that matches our Collector’s setup:

  • --otel-exporter-otlp-endpoint otelcol:4317: Tells otelgen to send data to the otelcol service on port 4317.
  • --insecure: Disables TLS.
  • logs: Instructs otelgen to generate log data specifically.
  • multi: A subcommand that generates a continuous, varied stream of logs.

To see it in action, start both services in detached mode:

sh
1
docker compose up -d

Once running, the otelcol service listens for OTLP data over gRPC on port 4317, and the otelgen service generates and sends a continuous stream of logs to it.

You can monitor the Collector’s output to verify that it’s receiving the logs:

sh
1
docker compose logs otelcol -f

You will see a continuous stream of log data being printed to the console. A single log entry will be formatted like this, showing rich contextual information like the severity, body, and various attributes:

output
123456789101112131415161718192021222324252627282930313233
ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-ab06ca8b)
-> service.name: Str(otelgen)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope otelgen
LogRecord #0
ObservedTimestamp: 2025-07-06 11:21:57.085421018 +0000 UTC
Timestamp: 2025-07-06 11:21:57.085420886 +0000 UTC
SeverityText: Error
SeverityNumber: Error(17)
Body: Str(Log 3: Error phase: finish)
Attributes:
-> worker_id: Str(3)
-> service.name: Str(otelgen)
-> trace_id: Str(46287c1c7b7eebea22af2b48b97f4a49)
-> span_id: Str(f5777521efe11f94)
-> trace_flags: Str(01)
-> phase: Str(finish)
-> http.method: Str(PUT)
-> http.status_code: Int(403)
-> http.target: Str(/api/v1/resource/3)
-> k8s.pod.name: Str(otelgen-pod-8f215fc5)
-> k8s.namespace.name: Str(default)
-> k8s.container.name: Str(otelgen)
Trace ID:
Span ID:
Flags: 0

Understanding the debug exporter output

The output from the debug exporter shows the structured format of OpenTelemetry data (OTLP). It’s hierarchical, starting from the resource that generated the telemetry all the way down to the individual telemetry record. Let’s break down what you’re seeing.

ResourceLogs and Resource attributes

  • ResourceLog #0: This is the top-level container. The #0 indicates it’s the first resource log in this batch. All telemetry within this block comes from the same resource.
  • Resource attributes: These are key-value pairs that describe the entity that produced the log. This could be a service, a container, or a host machine. In the example, attributes like service.name and k8s.pod.name apply to every log generated by this resource.

ScopeLogs

  • ScopeLogs #0: Within a resource, telemetry is grouped by its origin, known as the instrumentation scope. This block contains all logs from a single scope.
  • InstrumentationScope: This identifies the specific library or module that generated the log (in this case, otelgen). This is useful for knowing which part of your application emitted the log.

LogRecord

Within a single ResourceLog block, you may see multiple LogRecord entries, but they all belong to that same parent resource.

  • LogRecord #0: This is the first log entry belonging to the resource. The key fields are:
    • Timestamp: When the event occurred.
    • SeverityNumber / SeverityText: The log level, such as ERROR or INFO.
    • Body: The actual log message content.
    • Attributes: These are key-value pairs that provide context specific to this single log event.
    • Trace ID / Span ID: These top-level fields are crucial for correlation. When populated, they directly link a log to a specific trace and span, allowing you to easily navigate between logs and traces in your observability backend.

Congratulations, you’ve built and verified your first telemetry pipeline! It’s simple, but it establishes the fundamental flow of data from a source, through the Collector, and to an exit point. Now, let’s make it more powerful.

Processing and transforming telemetry

Right now, your pipeline is just an empty conduit. Data goes in one end and comes out the other untouched. The real power of the Collector lies in its ability to process data in-flight. This is where processors come in.

Processors are intermediary components in a pipeline that can inspect, modify, filter, or enrich your telemetry. Let’s add a few essential processors to solve common problems and make the pipeline more intelligent.

Our new pipeline flow will look like this: Receiver -> [Processors] -> Exporter.

Batching telemetry for efficiency

Sending every single span or metric individually over the network is incredibly inefficient. It creates high network traffic and puts unnecessary load on the backend. The batch processor solves this by grouping telemetry into batches before exporting.

info

This isn’t an optional tweak; for any production workload, the batch processor is essential.

Go ahead and add it to your processors section. By default, it buffers data for a short period to create batches automatically:

yaml
otelcol.yaml
123456789101112131415
# Add this top-level 'processors' section
processors:
batch:
# You can customize the default values for more control
# send_batch_size: 8192
# timeout: 200ms
service:
pipelines:
logs:
receivers: [otlp]
# Add the processor to your pipeline's execution path.
# Order matters here if you have multiple processors.
processors: [batch]
exporters: [debug]

With this simple addition, our Collector now buffers data for up to 200 milliseconds or until it has 8192 items, and then sends it all to the exporters in one efficient action.

Reducing noise by filtering telemetry

Observability data can be noisy. For example, frequent DEBUG level logs are often useful in development but can be costly and redundant in production. Let’s add a bouncer to our pipeline to drop this noise at the source.

We’ll use the filter processor, which lets you drop telemetry data using the powerful OpenTelemetry Transformation Language (OTTL). Let’s say you wanted to drop all logs below the INFO severity level, you can do so with the following modifications:

yaml
otelcol.yaml
123456789101112131415
processors:
batch:
# The filter processor lets you exclude telemetry data based on its attributes
filter:
logs:
log_record:
- severity_number < SEVERITY_NUMBER_INFO
service:
pipelines:
logs:
receivers: [otlp]
# The order is important. You want to drop data before batching it.
processors: [filter, batch]
exporters: [debug]

Now, any log with a severity number less than 9 (INFO) will be dropped by the Collector and will never reach the debug exporter.

Modifying and enriching telemetry data

When you’d like to add, remove, or modify attributes in your telemetry data, there are a few general-purpose processors you can use:

Some common use cases for these processors include:

  • Redacting or removing sensitive information from telemetry before it leaves your systems.
  • Enriching data by adding static attributes.
  • Renaming or standardizing attributes to conform to semantic conventions across different services.
  • Correcting malformed or misplaced data sent by older or misconfigured instrumentation.

Let’s examine the structure of the OTLP log records being sent by the otelgen tool once again:

text
output
123456789101112131415161718192021222324252627282930
ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-b9919c90)
-> service.name: Str(otelgen)
LogRecord #0
ObservedTimestamp: 2025-07-03 16:01:40.264711241 +0000 UTC
Timestamp: 2025-07-03 16:01:40.264711041 +0000 UTC
SeverityText: Fatal
SeverityNumber: Fatal(21)
Body: Str(Log 1763: Fatal phase: finish)
Attributes:
-> worker_id: Str(1763)
-> service.name: Str(otelgen)
-> trace_id: Str(a85d432127e63d667508563efd73af52)
-> span_id: Str(34c07d59e6cfa2d9)
-> trace_flags: Str(01)
-> phase: Str(finish)
-> http.method: Str(POST)
-> http.status_code: Int(200)
-> http.target: Str(/api/v1/resource/1763)
-> k8s.pod.name: Str(otelgen-pod-b9919c90)
-> k8s.namespace.name: Str(default)
-> k8s.container.name: Str(otelgen)
Trace ID:
Span ID:
Flags: 0

There are three issues here that deviate from the latest OpenTelemetry semantic conventions:

  1. Misplaced trace context: The trace_id, span_id, and trace_flags values are incorrectly placed inside the Attributes map, while the dedicated top-level Trace ID, Span ID, and Flags fields are empty.
  2. Redundant attributes: Resource attributes like k8s.pod.name and service.name are duplicated in the log record’s Attributes.
  3. Deprecated attributes: HTTP attributes like http.method, http.target, and http.status_code have all been deprecated in favor of newer attributes.

The transform processor is the perfect tool for fixing these issues. Add the following modifications to your otelcol.yaml file:

yaml
otelcol.yaml
1234567891011121314151617181920212223242526272829
processors:
transform:
log_statements:
# Move trace context from attributes to the correct top-level fields
- context: log
statements:
- set(trace_id.string, attributes["trace_id"])
- set(span_id.string, attributes["span_id"])
- set(flags, Int(attributes["trace_flags"]))
# Delete the original, now redundant, trace context attributes
- context: log
statements:
- delete_key(attributes, "trace_id")
- delete_key(attributes, "span_id")
- delete_key(attributes, "trace_flags")
# Delete the duplicated resource attributes from the log record's attributes
- context: log
statements:
- delete_key(attributes, "k8s.pod.name")
- delete_key(attributes, "k8s.namespace.name")
- delete_key(attributes, "k8s.container.name")
- delete_key(attributes, "service.name")
service:
pipelines:
logs:
receivers: [otlp]
processors: [filter, transform, batch] # Add the transform processor to the pipeline
exporters: [debug]

This configuration uses OTTL statements to clean up our log records:

  • set(trace_id, ...): This function takes the value from the trace_id key within the attributes map and sets it as the top-level Trace ID for the log record. The same logic applies to the other statements.
  • delete_key(attributes, ...): After moving the values, this function removes the original keys from the attributes map to eliminate redundancy.

You can recreate the containers to see it in action:

sh
1
docker compose up --force-recreate -d

When you check the logs, you’ll notice that the outgoing log data is now correctly formatted, smaller in size, and fully compliant with semantic conventions with the Trace ID and Span ID fields properly populated:

text
output
1234567891011121314151617181920212223242526
2025-07-03T16:36:49.418Z info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.26.0
Resource attributes:
-> host.name: Str(node-1)
-> k8s.container.name: Str(otelgen)
-> k8s.namespace.name: Str(default)
-> k8s.pod.name: Str(otelgen-pod-3efafa6f)
-> service.name: Str(otelgen)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope otelgen
LogRecord #0
ObservedTimestamp: 2025-07-03 16:36:48.41161663 +0000 UTC
Timestamp: 2025-07-03 16:36:48.411616563 +0000 UTC
SeverityText: Error
SeverityNumber: Error(17)
Body: Str(Log 38: Error phase: finish)
Attributes:
-> worker_id: Str(38)
-> phase: Str(finish)
-> url.path: Str(/api/v1/resource/340)
-> http.response.status_code: Int(200)
-> http.request.method: Str(GET)
Trace ID: 86713e2736d6f6a398047b9317b11398
Span ID: d06e86785766aa64
Flags: 1

Ensuring resilience with the Memory Limiter

An overloaded service could suddenly send a massive flood of data, overwhelming the Collector and causing it to run out of memory and crash. This would create a total visibility outage.

The memory_limiter processor acts as a safety valve to prevent this. It monitors memory usage and starts rejecting data if it exceeds a configured limit, enforcing backpressure on the data source.

yaml
otelcol.yaml
1234567891011121314151617181920
processors:
batch:
filter: # ...
transform: # ...
memory_limiter:
# How often to check the collector's memory usage.
check_interval: 1s
# The hard memory limit in Mebibytes (MiB). If usage exceeds this,
# the collector will start rejecting new data.
limit_mib: 400
# A soft limit. When usage drops below this, the collector will
# start accepting data again.
spike_limit_mib: 100
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug]

Note that the memory_limiter should come first in your pipeline’s processor list. If it’s over the limit, you want to reject data immediately, before wasting CPU cycles on other processing.

Your pipeline is now not just efficient and correct, but also resilient against overloads.

Handling multiple signals with parallel pipelines

So far, you’ve built a simple pipeline for processing logs. However, a key strength of the OpenTelemetry Collector is its ability to handle traces, metrics, and logs simultaneously within a single instance. You can achieve this by defining parallel pipelines, one for each signal type, in the service section.

Let’s expand the Collector configuration to also process traces. The goal is to receive traces from an application, batch them for efficiency, and then send them to a Jaeger instance for visualization, all while the existing logs pipeline continues to operate independently.

To send traces to Jaeger, you need a new exporter. The Collector allows you to define multiple components of the same type by giving them unique names using the type/name syntax:

yaml
otelcol.yaml
12345
exporters:
otlp/jaeger:
endpoint: jaeger:4317 # The address of the Jaeger gRPC endpoint
tls:
insecure: true # Use TLS in production

Now, you can add a new pipeline to the service section specifically for traces. This pipeline will:

  • Reuse the same otlp receiver we already defined.
  • Reuse the batch and memory_limiter processors.
  • Send its data to the new otlp/jaeger exporter.

Here is the complete service section showing both the logs and traces pipelines running in parallel:

yaml
otelcol.yaml
1234567891011
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger]

To test this, you need to update your docker-compose.yml to run a Jaeger instance and a second otelgen service configured to generate traces:

yaml
docker-compose.yaml
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
services:
otelcol:
image: otel/opentelemetry-collector-contrib:0.129.1
container_name: otelcol
volumes:
- ./otelcol.yaml:/etc/otelcol-contrib/config.yaml
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one:1.71.0
container_name: jaeger
ports:
- 16686:16686
otelgen-logs:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen-logs
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"logs",
"multi",
]
depends_on:
- otelcol
otelgen-traces:
image: ghcr.io/krzko/otelgen:v0.5.2
container_name: otelgen-traces
command:
[
"--otel-exporter-otlp-endpoint",
"otelcol:4317",
"--insecure",
"--duration",
"86400",
"traces",
"multi",
]
depends_on:
- otelcol
networks:
otelnet:
driver: bridge

Notice we now have two otelgen services: otelgen-logs sends logs as before, and otelgen-traces sends traces to the same OTLP endpoint on our Collector.

Recreate the containers with the updated configuration:

sh
1
docker compose up --force-recreate --remove-orphans -d

The easiest way to verify the traces pipeline is to check the Jaeger UI. Open your web browser and navigate to http://localhost:16686. In the Jaeger UI, select otelgen from the Service dropdown menu and click Find Traces.

Find otelgen traces in Jaeger

You will see a list of traces generated by the otelgen-traces service, confirming that your new traces pipeline is successfully receiving, processing, and exporting trace data to Jaeger.

With this setup, you have a single Collector instance efficiently managing two completely separate data flows, demonstrating the power and flexibility of defining multiple pipelines.

Fanning out to multiple destinations

A key advantage of the OpenTelemetry Collector is its ability to easily route telemetry to multiple destinations at once, a concept often called “fanning out”. This is done by simply adding more exporters to a pipeline’s exporters list.

Let’s demonstrate this by forwarding both our logs and traces to Dash0, in addition to our existing destinations:

yaml
otelcol.yaml
1234567891011121314151617181920
exporters:
# [...]
otlphttp/dash0:
# Environment variables in the collector config are automatically expanded
endpoint: ${env:DASH0_ENDPOINT}
headers:
Authorization: Bearer ${env:DASH0_TOKEN}
Dash0-Dataset: ${env:DASH0_DATASET}
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
exporters: [debug, otlphttp/dash0]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger, otlphttp/dash0]

With this change, both pipelines now fan out the processed data to the specified exporters. You’ll see the data in your Dash0 dashboard as follows:

Otelgen traces in Dash0Otelgen logs in Dash0
Otelgen traces and logs in Dash0

This capability provides incredible flexibility, allowing you to experiment with new backends, migrate between vendors without downtime, or satisfy long-term compliance and archival needs without changing your application code.

Chaining pipelines with connectors

You can create powerful data processing flows by generating new telemetry signals from your existing data.

This is possible with connectors. A connector is a special component that acts as both an exporter for one pipeline and a receiver for another, allowing you to chain pipelines together.

Let’s demonstrate this by building a system that generates an error count metric from the otelgen log data. The count connector is perfect for this.

First, you’ll need to define the count connector and configure it to create a metric named log_error.count that increments every time it sees a log with a severity of ERROR or higher:

yaml
otelcol.yaml
1234567
connectors:
count/log_errors:
logs:
log_error.count:
description: count of errors logged
conditions:
- severity_number >= SEVERITY_NUMBER_ERROR

To use this, go ahead and update your service configuration to create a new metrics pipeline. The count/log_errors connector will serve as the bridge: it will be an exporter for the logs pipeline and a receiver for the new metrics pipeline:

yaml
otelcol.yaml
123456789101112131415161718
service:
pipelines:
logs:
receivers: [otlp]
processors: [memory_limiter, filter, transform, batch]
# The connector is added as a destination for logs.
exporters: [debug, otlphttp/dash0, count/log_errors]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger, otlphttp/dash0]
metrics:
# This new pipeline receives data exclusively from the connector.
receivers: [count/log_errors]
processors: [memory_limiter, batch]
exporters: [otlphttp/dash0]

This configuration is a game-changer because it allows you to derive new insights from existing data streams directly within the Collector. The data flow is now:

  1. The logs pipeline processes logs and sends a copy to the count/log_errors connector.
  2. The count/log_errors connector inspects these logs, generates a new log_error.count metric based on our condition, and passes this metric along.
  3. The metrics pipeline receives the newly generated metric, batches it, and sends them to your backend.

After relaunching the services, you’ll see the new log_error.count metric appear in your dashboard, all without adding a single line of metrics instrumentation code to your application.

Log error count metric in Dash0

This is a basic example, but it demonstrates the power of a true pipeline architecture. The same principle can be used for more advanced scenarios, like using the spanmetrics connector to automatically generate full RED metrics (request rates, error counts, and duration histograms) directly from your trace data.

Understanding Collector distributions

When you use the OpenTelemetry Collector, you are not running a single, monolithic application. Instead, you use a distribution: a specific binary packaged with a curated set of components (receivers, processors, exporters, and extensions).

This model exists to allow you to use a version of the Collector that is tailored to your specific needs or even create your own. There are three primary types of distributions you will encounter:

1. Official OpenTelemetry distributions

The OpenTelemetry project maintains several official distributions. The two most common are:

  • Core (otelcol): This is a minimal, lightweight distribution that includes only the most essential and stable components. It provides a stable foundation but has limited functionality.
  • Contrib (otelcol-contrib): This is the most comprehensive version, that includes almost every component from both the core and contrib repositories. It is the recommended distribution for getting started, as it provides the widest range of capabilities for connecting to various sources and destinations without needing to build a custom version.

2. Vendor distributions

Some observability vendors provide their own Collector distributions. These are typically based on the otelcol-contrib distribution but are pre-configured with the vendor’s specific exporter and other recommended settings. Using a vendor distribution can simplify the process of sending data to that vendor’s platform.

3. Custom distributions

For production environments, the recommended best practice is to build your own custom distribution. This involves creating a lightweight, fit-for-purpose Collector binary that contains only the components you need.

You can create a custom distribution using the OpenTelemetry Collector Builder (ocb) tool. It involves creating a simple manifest file that lists the components you want to include, and then running the ocb tool to compile your custom binary.

Debugging and observing your pipeline

A critical piece of infrastructure like your telemetry pipeline must itself be observable and easy to debug. If the Collector is dropping data, experiencing high latency, or is unhealthy, you need to know.

Fortunately, the Collector is instrumented out-of-the-box and provides several tools for validation and observation.

Validating your configuration

Before deploying the Collector, you should always validate that your config.yaml file is syntactically correct. The primary way to do this is with the validate subcommand which checks the configuration file for errors without starting the full Collector service:

sh
1
otelcol-contrib validate --config=otelcol.yaml

If the configuration is valid, the command will exit silently. If there are errors, it will print them to the console:

OpenTelemetry collector validate command

For a more visual approach, you can use the OtelBin web tool. This tool allows you to paste your configuration, visualize the resulting pipeline, and validate it against various Collector distributions.

visualizing collector configuration through otelbin

If you’re complex OTTL statements in the transform or filter processor, you will also find the OTTL Playground to be a useful resource for understanding how different configurations impact the OTLP data transformation.

Live debugging

When building a pipeline, you’ll often need to inspect the data flowing through it in real-time. As you’ve already seen, the debug exporter is primary way to do this.

By adding it to any pipeline’s exporters list, you can print the full content of traces, metrics, or logs to the console, and verify that your receivers and processors are working as expected.

For debugging the Collector components themselves, you can enable the zPages extension:

yaml
otelcol.yaml
1234567
extensions:
zpages: # default endpoint is localhost:55679
service:
extensions: [zpages]
pipelines:
# ... your pipelines

Once the Collector is running, you can access several useful debugging pages in your browser, such as /debug/pipelinez to view your pipeline components or /debug/tracez to see recently sampled traces.

OpenTelemetry Collector zPages

Observing the Collector’s internal telemetry

In production environments, you’ll need to monitor the Collector’s health and performance over time. This is configured under the service.telemetry section.

By default, the Collector sends its own internal logs to stderr, and its often the first place you’ll check when there’s a problem with your pipeline. For metrics, the Collector can expose its own data in a Prometheus-compatible format:

OpenTelemetry Collector zPages

You can now scrape this endpoint with a Prometheus instance to monitor key health indicators like otelcol_exporter_send_failed_spans_total, otelcol_processor_batch_send_size, and otelcol_receiver_accepted_spans. For more details, see the official documentation on Collector telemetry.

Going to production: Collector deployment patterns

How you run the Collector in production is a critical architectural decision. Your deployment strategy affects the scalability, security, and resilience of your entire observability setup. The two fundamental roles a Collector can play are that of an agent or a gateway, which can be combined into several common patterns.

1. Agent-only deployment

The simplest pattern is to deploy a Collector agent on every host or as a sidecar to every application pod. In this model, each agent is responsible for collecting, processing, and exporting telemetry directly to one or more backends.

1
Application → OpenTelemetry Collector (Agent) → Observability Backend

This approach is easy to start with but it offers limited durability, as agents typically buffer in memory, meaning a single node failure can lead to data loss.

2. Agent and gateway deployment

A more robust production pattern enhances the agent deployment with a new, centralized gateway layer. In this model, the agent’s role is simplified: it handles local collection and metadata enrichment before forwarding all telemetry to the gateway.

This gateway is a standalone, centralized service consisting of one or more Collector instances that receive telemetry from all agents. It’s the ideal place for heavy processing like PII scrubbing, filtering, and tail-based sampling, which ensures rules are applied consistently before data leaves your environment.

1
Application → Collector (Agent) → Collector (Gateway) → Observability Backend

This layered approach provides the best of both worlds: agents handle local collection and metadata enrichment efficiently, while the gateway provides centralized control, security, and processing.

High-scale deployment with a message queue

When you’re dealing with massive data volumes or require extreme durability, the standard pattern is to introduce an event queue (like Apache Kafka) between your agents and a fleet of Collectors that act as consumers.

1
Application → Collector (Agent) → Message Queue → Collector (Aggregator) → Backend(s)

This pattern provides two key advantages. The message queue acts as a massive buffer for durability; even if the aggregator fleet is down, agents can continue sending data to the queue, preventing data loss.

It also provides load-leveling by decoupling the agents from the aggregators, which smooths out traffic spikes and allows the aggregators to consume data at a steady rate.

You’re now a pipeline architect

We’ve journeyed from a simple data pass-through to a powerful, multi-stage pipeline that enriches, filters, secures, and routes telemetry data, even generating new, valuable signals along the way.

By adopting the pipeline mindset, you gain:

  • Centralized control: Manage your entire telemetry flow from one place.
  • Vendor neutrality: Swap backends with a simple config change. Use multiple vendors at once.
  • Efficiency and cost savings: Filter noise and batch data at the source to slash your observability bill.
  • Enhanced security: Scrub sensitive data before it ever leaves your infrastructure.
  • Powerful capabilities: Unlock advanced patterns like metric generation that would be complex or impossible otherwise.

You are no longer just configuring an agent; you’re now designing a scalable telemetry infrastructure that ultimately empowers your team to observe, and improve systems.

So start small by putting a Collector in front of one of your services. Add a batch processor and a filter, then see what you can build. The control and flexibility you’ll gain is transformative.

Don’t forget to check out the OpenTelemetry Collector documentation for more details.

Thanks for reading!

Authors
Ayooluwa Isaiah
Ayooluwa Isaiah