Last updated: August 4, 2025
Mastering the OpenTelemetry Batch Processor
The batch processor is one of the most fundamental and critical components in the OpenTelemetry Collector. It sits in nearly every production pipeline, working quietly to make your telemetry processing more efficient, reliable, and cost-effective. Its job is simple: it collects individual spans, metrics, and logs and groups them into batches before passing them to the next component.
While the concept is straightforward, mastering its configuration is key to building a high-performance observability pipeline. In this article, we'll explore the nuances of its settings, common patterns, and best practices that will help you tune your Collector for any workload.
Let's get started!
Why you absolutely need the batch processor
Before diving into the configuration, it's crucial to understand why batching is so important, and why skipping it is almost always a mistake that leads to inefficient and expensive pipelines.
Here's what batching does for you:
-
Reduces network overhead: Sending thousands of individual log lines or trace spans over the network is incredibly inefficient. Batching combines many data points into a single request, drastically reducing the number of outgoing connections and the associated overhead.
-
Improves compression: Compressing a single 200-byte JSON payload doesn't yield much benefits. Compressing a 2MB batch of 10,000 similar JSON payloads, however, can result in significant size reduction. Batching gives compression algorithms more data to work with, leading to better compression ratios.
-
Lowers egress costs: For cloud-based workloads, network egress can be a major cost driver. By reducing the number of requests and improving compression, batching directly translates to lower cloud bills.
-
Reduces load on backends: Sending a high volume of small requests can overwhelm your observability backend or any intermediate Collectors. Batching smooths out traffic, sending larger, less frequent requests that are often easier for backends to ingest efficiently.
In short, the batch
processor is your primary tool for controlling the flow,
cost, and performance of your telemetry data.
Quick start: adding the batch processor
For most use cases, the default settings are a great starting point. To enable
the batch
processor, simply add it to the processors
section of your
configuration and include it in your pipelines.
yaml12345678910111213141516171819processors:batch:# Default settings: sends a batch when it reaches 8192 items# or after 200ms, whichever comes first.service:pipelines:traces:receivers: [otlp]processors: [batch]exporters: [otlphttp]metrics:receivers: [otlp]processors: [batch]exporters: [otlphttp]logs:receivers: [otlp]processors: [batch]exporters: [otlphttp]
This simple configuration already provides significant efficiency gains over a pipeline without batching.
Understanding the available batching mechanisms
You can think of the batch
processor as a ferry which has two rules for when
it departs:
- It leaves when it's full.
- It leaves after waiting a certain amount of time at the dock, even if it's not full.
The batch
processor operates on the exact same logic, controlled by three key
parameters: timeout
, send_batch_size
, and send_batch_max_size
.
1. timeout
This setting controls the maximum latency introduced by the batch
processor. If you set timeout: 10s
, your telemetry could be delayed by up to
10 seconds before being sent to the next stage.
A shorter timeout is good for near-real-time data, while a longer timeout is better for maximizing batch size and reducing costs, especially for less time-sensitive data like logs.
The default setting is 200ms.
2. send_batch_size
This setting controls the target size of a batch. It is the number of items
(spans, log records, or metric data points) that will trigger the batch to be
sent, regardless of the timeout
.
In a high-traffic environment, you'll likely hit this limit long before the
timeout
is reached. Increasing this value allows for larger, more efficient
batches, but also increases the Collector's memory footprint, as it needs to
hold more data in memory.
The default is 8192
items.
3. send_batch_max_size
This is a critical but often misunderstood setting. The send_batch_size
is a
trigger, but it doesn't prevent a batch from growing larger. For example, if
your timeout
is long and you receive a sudden flood of data, the batch in
memory could grow well beyond send_batch_size
.
send_batch_max_size
is the safety valve. It ensures that no matter what,
the batch sent to the next component is no larger than this value. This is
essential when your backend has a strict request size limit (e.g., "no requests
over 4MB").
How they interact:
- A batch is sent when either
timeout
is reached or the number of items in the buffer reachessend_batch_size
. - If
send_batch_max_size
is set, it acts as a final check. Before sending the batch, the processor asks: "Is this batch bigger thansend_batch_max_size
?" If yes, it splits the batch into smaller chunks that respect the limit.
Note that send_batch_max_size
must be greater or equal to send_batch_size
to
be considered valid. It is set to 0
by default (disabled), so it has no
effect.
Configuration recipes
Let's look at how to tune these settings for different goals.
Minimizing latency for real-time debugging
When you need data to appear in your backend as quickly as possible, you want
small, frequent batches. The default timeout value of 200ms
is already a good
starting point, so you probably don't need to change anything.
yaml12processors:batch:
Maximizing cost savings
When you're sending high-volume data to cold storage, latency is less important than cost efficiency, but keep the additional memory requirements in mind.
yaml1234processors:batch/cost_optimized:timeout: 60ssend_batch_size: 16384
Handling strict backend limits
Many observability backends and APIs impose strict limits on request body size. For example, a backend might reject any request larger than 1MB. If your processor sends batches that exceed this limit, your exporter will receive errors, and you'll risk losing data.
The send_batch_max_size
setting is the solution. It acts as a hard ceiling on
the size of any batch sent to the next component in the pipeline.
Since the setting is based on the number of items (spans, logs, etc.), you'll
need to estimate a safe value based on your average data point size. If your
average log record is about 1KB, a send_batch_max_size
of 1000 would keep your
batches safely under a 1MB limit.
yaml12345678processors:batch/strict_backend:timeout: 5s# Trigger a send when the batch reaches 1000 items.send_batch_size: 1000# Enforce a hard limit of 1000 items per batch. This prevents the# timeout from creating a massive batch that would be rejected.send_batch_max_size: 1000
Setting send_batch_size
and send_batch_max_size
to the same value, as shown
above, is a common and effective pattern for ensuring consistent batch sizes
that respect your backend's limits.
Multi-tenant batching with metadata_keys
In a multi-tenant architecture, you might receive data from different customers (or tenants) at a single OTLP endpoint. You may need to process or route this data differently based on which tenant it belongs to. For example, each tenant might have a unique API key or a different backend destination.
The batch
processor can handle this by creating separate, independent
batchers for each unique combination of metadata values.
Let's say you're running a SaaS platform where customers send telemetry to your
Collector, and each request includes an X-Tenant-ID
HTTP header. You may need
to batch data on a per-tenant basis.
The configuration could look like this:
- First, configure your receivers to include the request metadata with
(
include_metadata
). - Then, configure the
batch
processor to use the relevant metadata as a grouping key.
yaml123456789101112131415161718192021222324252627receivers:otlp:protocols:http:endpoint: 0.0.0.0:4318# This is ESSENTIAL for metadata batching.include_metadata: trueprocessors:batch/multitenant:# Create a new, independent batcher for each unique value of 'tenant_id'.metadata_keys:- tenant_id# Set a safety limit on the number of tenants.metadata_cardinality_limit: 2000exporters:# ... your exportersservice:pipelines:traces:receivers: [otlp]# The processor needs access to the gRPC/HTTP metadata, so it must be# configured to handle it.processors: [batch/multitenant]exporters: [otlphttp] # or routing processor
This is a powerful feature, but it comes with a significant resource cost.
Each unique metadata combination spawns a new batcher in memory, each with
its own buffer (send_batch_size
) and timer (timeout
).
If you have 1000 tenants, you will have 1000 batch processors running inside the Collector, consuming 1000 times the memory of a single batcher.
The metadata_cardinality_limit
is a crucial safeguard. It prevents a malicious
or runaway client from exhausting the Collector's memory by sending thousands of
unique metadata values.
Always monitor the otelcol_processor_batch_metadata_cardinality
metric to
track how many batchers are active.
Batch processor tips and best practices
-
The
batch
processor should almost always be placed after any sampling or filtering processors (filter
,probabilistic_sampler
) but before any processors that perform external lookups or heavy computation (k8sattributes
,resource
).You certainly don't want to waste time batching data you're about to drop, and you also want to provide well-formed batches to processors that make network calls to enrich data, as they often operate more efficiently on batches.
-
If users complain that their data is taking too long to appear, the first place to check is the
timeout
setting in yourbatch
processor. A long timeout is a common cause of perceived data loss, when in fact the data is just waiting in the Collector's buffer. -
If your exporter logs show
4xx
errors indicating that requests are too large, the solution is almost always to configuresend_batch_max_size
. -
It is almost never a good idea to have a network-based exporter (
otlphttp
,jaeger
,prometheusremotewrite
) without abatch
processor immediately before it in the pipeline. The efficiency gains are too significant to ignore.
Final thoughts
The batch processor is a foundational component for building performant, scalable, and cost-effective OpenTelemetry pipelines. By understanding the interplay between its time and size-based triggers, you can fine-tune its behavior to meet the specific needs of your workloads.
Treat it not as an optional add-on, but as a mandatory first step in any production-grade Collector configuration.
