The OpenTelemetry Collector plays a critical role in any modern observability stack. Like any service that handles high volumes of traffic, it has resource limits.

A sudden surge in data or a particularly memory-heavy processor can push it beyond those limits, potentially consuming all available memory and causing the Collector to crash. When that happens, an Out-of-Memory (OOM) error can bring your entire data pipeline to a stop.

The memory limiter processor acts as a safeguard against this kind of failure. It continuously monitors the Collector's memory usage and steps in when usage crosses a configured threshold. Instead of letting the process crash, it temporarily slows down data ingestion, giving the system room to recover.

In this guide, you'll learn not only what the memory limiter does, but also how to configure it for different environments, why it works best alongside Go's GOMEMLIMIT setting, and how to confirm that it's functioning as intended.

Let's get started!

Why collector memory usage can spike

Before setting memory limits, it helpful to understand what drives high memory usage in the Collector. Several common factors can cause spikes:

Sudden traffic surges: A burst of logs, metrics, or traces from your applications can overwhelm the Collector’s ability to process and export data. This causes internal buffers to grow, thus increasing memory consumption.
Buffering components: Processors like batch, or exporters with queuing enabled (such as otlphttp), hold data in memory by design. If a downstream service is slow or unreachable, these buffers can grow quickly.
Resource-intensive processing: Some processors, like transform with heavy regex use or span aggregating trace data, consume more memory per operation. These can add up under load.
High cardinality: Although often a backend issue, high-cardinality data can also impact the Collector. If processors handle many unique attributes or labels, memory usage can also rise significantly.

The memory limiter acts as a circuit breaker, stepping in when memory use climbs too high to help prevent a crash.

How the memory limiter works

The processor operates on a simple but effective two-level threshold system: a soft limit and a hard limit. You can think of it like a traffic management system for your data pipeline:

🟢 Normal operation: As long as memory usage is below the soft limit, data flows through the pipeline unimpeded.
🟡 Soft limit reached: Once memory usage exceeds the soft limit, the limiter starts throttling. It refuses incoming data from receivers by sending back a non-permanent error. It’s a form of backpressure that gives the Collector time to recover. The soft limit is calculated as limit - spike_limit.
🔴 Hard limit reached: If memory usage continues to climb and crosses the hard limit (defined by limit_mib or limit_percentage), the processor takes more aggressive action. In addition to refusing data, it forces the Go runtime to perform a garbage collection (GC) cycle. This is a last-ditch effort to reclaim memory immediately and prevent an OOM crash. Once memory usage falls back below the soft limit, the Collector returns to normal operation.

Note: The limiter relies on upstream components to properly handle the retryable error it returns. Standard OTLP receivers already do this, but if you're using custom components, make sure they can buffer or retry data. If not, throttled data will be lost.

Quick start: a basic safety net

The memory limiter is essential in any production-grade Collector setup. For it to work as intended, it must be the first processor in each pipeline. This allows it to apply backpressure directly to receivers before any downstream processing increases memory usage.

Below is a simple configuration for a Collector running on a system with 4 GB of available memory:

yaml
1234567891011121314151617181920
processors:
  memory_limiter:
    # Check memory usage every 5 seconds.
    check_interval: 5s

    # The hard memory limit for the Collector's heap.
    # Set to a value safely below the total memory available to the process.
    limit_mib: 3500

    # The size of the "buffer" zone.
    # The soft limit will be (3500 - 500) = 3000 MiB.
    spike_limit_mib: 500

service:
  pipelines:
    traces:
      receivers: [otlp]
      # memory_limiter MUST be the first processor.
      processors: [memory_limiter, batch]
      exporters: [otlp]

This setup enforces a hard memory limit of 3500 MiB and a soft limit of 3000 MiB, with memory checks running once every five seconds. It provides a simple but effective layer of protection against unplanned memory spikes.

Memory limiter configuration deep dive

Let’s break down the key settings for the memory limiter and when to use them.

1. `check_interval`

This setting controls how frequently the processor checks the Collector’s memory usage. By default, it’s disabled (0s), but for most environments, a one-second interval is recommended.

Setting a shorter interval, such as less than one second, allows the limiter to react more quickly to sudden spikes in memory use, though it introduces a slight increase in CPU overhead.

On the other hand, using a longer interval reduces that overhead but means the limiter needs a larger buffer (via the spike_limit) to safely absorb memory growth between checks.

2. `limit_mib` vs `limit_percentage`

The hard memory limit can be configured using either limit_mib or limit_percentage, but not both.

The limit_mib option sets a fixed memory cap in mebibytes (MiB) and is best suited for environments with consistent memory allocation, such as virtual machines or physical servers.

In contrast, limit_percentage defines the limit as a percentage of available system memory and is only supported on Linux systems that use cgroups, such as containers. This makes it ideal for Kubernetes or Docker deployments where memory limits may change dynamically.

For instance, if a container is assigned 2 GiB of memory and limit_percentage is set to 80, the memory limiter will automatically calculate a hard limit of 1638 MiB.

3. `spike_limit_mib` vs `spike_limit_percentage`

These options define the size of the buffer between the soft and hard limits. You can specify this buffer either as a fixed value in MiB (spike_limit_mib) or as a percentage of total memory (spike_limit_percentage).

In general, the buffer should be large enough to handle any expected surge in memory usage between checks. A good starting point is about 20 percent of your hard limit.

If your workload involves sharp traffic spikes or unpredictable load patterns, increasing this buffer can give the limiter more time to respond effectively and prevent an out-of-memory crash.

Integrating `GOMEMLIMIT` with the memory limiter

For a truly robust production deployment, the memory limiter processor should be used in tandem with the GOMEMLIMIT environment variable. They provide two different but complementary layers of defense.

GOMEMLIMIT is a feature of the Go runtime, not the OpenTelemetry Collector itself. It acts as a soft cap on memory usage, guiding the garbage collector (GC) to free memory more aggressively as usage approaches the specified limit. This helps smooth out memory growth and reduces the chance of sudden spikes. In effect, it’s a proactive measure that keeps memory consumption in check before it becomes a problem.

The memory limiter, on the other hand, operates at the application level. It serves as a circuit breaker that steps in when memory usage has already exceeded a defined threshold. If garbage collection alone isn't enough to control memory growth—especially during sudden traffic surges—the memory limiter provides a safety net by throttling data and triggering additional GC cycles. This makes it a reactive safeguard that kicks in only when needed.

Using both together creates a strong defense. GOMEMLIMIT reduces the likelihood of ever hitting the memory limiter’s threshold, while the limiter ensures the Collector remains stable if memory usage suddenly spikes beyond what the Go runtime can handle on its own.

As a best practice, set GOMEMLIMIT to about 80 to 90 percent of the memory limiter’s hard limit. This gives the runtime enough room to manage memory before the limiter has to take action.

In a Kubernetes deployment, for example, if you allocate 2 GiB of memory to the Collector container, you can set GOMEMLIMIT to approximately 1843 MiB (90 percent of 2 GiB). The memory limiter’s configuration would then use a hard limit slightly above that, such as 1900 MiB.

Here’s what that might look like in practice:

yaml
123456789101112131415161718192021
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  template:
    spec:
      containers:
        - name: otel-collector
          # Request and limit memory for the container
          resources:
            requests:
              memory: "1Gi"
            limits:
              memory: "2Gi"
          env:
            # Set GOMEMLIMIT to 90% of the 2Gi limit (approx 1843MiB)
            - name: GOMEMLIMIT
              value: "1843MiB"
          # The Collector config would then use limit_percentage
          # or a corresponding limit_mib (e.g., 1900MiB)

Verifying and monitoring the limiter

To ensure the memory limiter is working as intended, you can monitor both the Collector’s logs and its internal metrics.

Reviewing collector logs

When the memory limiter activates, it writes clear log messages that indicate when throttling starts, when a forced garbage collection is triggered, and when normal operation resumes. For example:

text
12345678
# Throttling begins after crossing the soft limit
info    memorylimiter/memorylimiter.go:102    Memory usage is above soft limit, refusing data.    {"kind": "processor", "name": "memory_limiter", "data_type": "traces", "memory_usage": "3052MiB", "soft_limit": "3000MiB"}

# Garbage collection is forced after exceeding the hard limit
info    memorylimiter/memorylimiter.go:121    Memory usage is above hard limit, forcing GC.    {"kind": "processor", "name": "memory_limiter", "memory_usage": "3515MiB", "hard_limit": "3500MiB"}

# Throttling ends once memory drops below the soft limit
info    memorylimiter/memorylimiter.go:135    Memory usage is now below soft limit, resuming normal operation.    {"kind": "processor", "name": "memory_limiter", "memory_usage": "2890MiB", "soft_limit": "3000MiB"}

These logs confirm that the limiter is actively monitoring memory and responding when thresholds are crossed. Watching for these messages is the most direct way to verify the limiter’s behavior.

Monitoring collector metrics

The Collector also exposes internal metrics that can be scraped using Prometheus. The key metric to watch is:

otelcol_process_memory_rss: The current memory usage of the Collector in bytes.
otelcol_process_runtime_total_sys_memory_bytes: The total bytes of memory obtained from the OS

By plotting these metrics over time and drawing lines for your configured soft and hard limits, you can get a clear visual representation of how close the Collector is to its limits and how effectively the processor is capping memory usage.

Final thoughts

The memory limiter processor is not just another component; it's an essential guardrail for building a stable and resilient observability pipeline.

By mastering the memory limiter, you can ensure your OpenTelemetry Collector remains a reliable foundation for your observability, even in the face of unpredictable workloads.

Mastering the OpenTelemetry Memory Limiter Processor