Last updated: May 21, 2025

Scrubbing Sensitive Data from Logs, Metrics, and Traces with OpenTelemetry

Telemetry data is crucial for monitoring and debugging modern applications, but it often contains sensitive information that should never leave your systems.

For instance, authentication tokens and PII can appear in HTTP headers or urls, and custom attributes can carry business-specific or regulated information.

Without proper safeguards, such data could move unchecked through your observability pipeline and end up in third-party backends, violating privacy standards and compliance requirements. It's your responsibility to prevent that.

To address this, the OpenTelemetry Collector offers three key processors designed for redacting sensitive data from spans, logs, and metrics before they are exported:

  1. Attributes processor: For modifying or removing known fields.
  2. Redaction processor: For filtering attributes and masking values using patterns.
  3. Transform processor: For advanced or conditional redaction logic using the OpenTelemetry Transformation Language (OTTL).

Each serves a different purpose and is suited to different redaction needs. In the sections that follow, I'll walk you through how to use them effectively.

Modifying telemetry fields with the attributes processor

The attributes processor provides a straightforward way to alter individual attributes within spans, logs, or metrics. It gives you direct control over how known fields should be updated, removed, or anonymized before they're exported from the Collector.

For scrubbing sensitive data in these attributes, the most relevant actions are update, delete, and hash:

  • update: Replaces an attribute's value with a static placeholder.
  • delete: Removes the attribute entirely.
  • hash: Anonymizes a value by converting it to a SHA256 hash, which preserves uniqueness without revealing the original value.

Here's a sample configuration that applies these actions across telemetry types:

yaml
1234568910111214151718192021222324
processors:
attributes/sensitive_data:
actions:
- key: user.email
value: "[REDACTED]"
action: update
# with `delete` and `hash` you can specify a key and/or a regex pattern
# for attribute names
- pattern: auth.*
key: auth
action: delete
- key: client.ip
action: hash
service:
pipelines:
traces:
processors: [attributes/sensitive_data, ...]
metrics:
processors: [attributes/sensitive_data, ...]
logs:
processors: [attributes/sensitive_data, ...]

In this example, the user.email field is overwritten with a fixed placeholder. Any attribute matching the pattern auth.* is deleted entirely, which might include values like auth.token or auth.header.

The client.ip field is hashed to retain uniqueness without exposing the raw IP address, helps you count or correlate unique clients without capturing identifiable information.

Before redaction:

Sensitive values present before applying the attributes processor

After redaction:

Sensitive values absent or redacted after applying the attributes processor

When to use the attributes processor

The attributes processor is useful when:

  • You know exactly which attribute keys contain sensitive data in advance.
  • You only need simple actions like attribute deletion or redaction.

For situations where you'd like to handle dynamic or unknown attribute names, you’ll want to consider the redaction processor so let’s look at that next.

Filtering and masking with the redaction processor

When sensitive data may appear in unpredictable places or under multiple attribute names, the redaction processor is the one to reach for. It’s designed to protect telemetry by applying configurable rules that either remove, mask, or hash data across spans, logs, and metrics.

It provides two powerful approaches to redaction which we’ll consider below:

1. Enforcing an attribute allowlist

Allowlisting provides a strong guarantee in high-compliance environments or systems with fast-changing telemetry schemas, where it’s safer to block unknown fields than try to redact them after the fact.

Here’s a configuration that demonstrates this approach:

yaml
12345679101112
processors:
redaction/allowlist:
allow_all_keys: false
allowed_keys:
- http.method
- http.url
- http.status_code
service:
pipelines:
traces:
processors: [attributes/allowlist, ...]

In this setup, allow_all_keys: false ensures that all attributes are discarded unless they appear in the allowed_keys list.

As a result, only the HTTP method, URL, and status code will be preserved. Any extra fields, whether sensitive or not, are removed automatically.

Before allowlisting:

Span attributes before applying allowlist

After allowlisting:

Span attributes after applying allowlist

This approach works best when you can determine a reliable set of "safe" fields and can afford to discard everything else. But it may also reduce observability by removing potentially useful context.

2. Masking sensitive data with pattern detection

An alternative approach is to redact data by matching patterns in either attribute keys or their values.

The redaction processor supports regex-based matching to detect and mask sensitive content, regardless of where it appears.

Here’s a typical configuration:

yaml
12345678
processors:
redaction/mask_patterns:
allow_all_keys: true
blocked_key_patterns:
- .*token.*
- .*password*
blocked_values:
- "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" # Email addresses

In this example, all attributes are retained (allow_all_keys: true), but any key that matches patterns like token or password will be masked.

Additionally, any value resembling an email address is replaced with a mask (****).

Before masking:

Span attributes before masking

After masking:

Span attributes after masking

For use cases that demand anonymization instead of masking, you can instruct the processor to hash matched patterns using a preferred algorithm. This allows you to retain uniqueness and cardinality without revealing the original data.

yaml
123
processors:
redaction/mask_patterns:
hash_function: sha3
Hashing matched patterns with Sha256 instead of masking

When necessary, you can also fine-tune the processor to let certain attributes pass through untouched, even if they match a blocked pattern. This is useful when some sensitive-looking fields are known to be safe.

yaml
12345678
processors:
redaction/mask_patterns:
ignored_keys:
- user.token.key # this key won't be redacted by `blocked_key_patterns`
allowed_values:
# dash0 email addresses will be allowed to pass through even though it
# matches one of the `blocked_values`
- .+@dash0.com

In this case, user.token.key is excluded from key pattern matching, and any email address ending in @dash0.com is allowed even though it matches the general email pattern in blocked_values.

This kind of handling allows for more nuanced redaction policies, ensuring that trusted data can remain available while untrusted content is filtered out.

When to use the redaction processor

The redaction processor shines when attribute keys are unpredictable or sensitive data can appear anywhere in the payload.

It's especially effective for enforcing strict allowlists, masking data with known patterns, and cleaning up telemetry from legacy systems.

For more complex redaction needs or conditional processing, the transform processor offers the most flexibility so let’s explore that next.

Using OTTL for flexible data redaction

The most advanced tool for redacting sensitive data in the OpenTelemetry Collector is the transform processor.

It uses the OpenTelemetry Transformation Language (OTTL) to apply fine-grained transformations through context-aware logic and composable functions.

This processor is particularly useful when your redaction rules depend on multiple conditions or span attributes, or where each telemetry signal require a different redaction logic.

Here’s its simplest configuration structure:

yaml
123456
transform:
error_mode: <ignore|silent|propagate>
<trace|metric|log>_statements:
- string
- string
- string

The error_mode setting controls how the processor handles runtime errors. You can choose to ignore them, silence them, or propagate them upstream. Transformation rules are written in OTTL and grouped under trace_statements, log_statements, or metric_statements depending on the telemetry type.

Let’s look at a practical example. Suppose your traces include URLs containing email addresses as query parameters, and you want to redact these without touching the rest of the string. Here's how you'd configure the processor:

yaml
123456891011
processors:
transform/redact_sensitive:
error_mode: ignore
trace_statements:
- keep_keys(span.attributes, ["http.url", "http.method", "http.status_code"])
- replace_pattern(span.attributes["http.url"], "email=[^&]+", "email=[REDACTED]")
service:
pipelines:
traces:
processors: [transform/redact_sensitive, ...]

The keep_keys() function specifies an allowlist of attributes to keep, while the replace_pattern() function acts on the http.url attribute by redacting the email property if present:

Transform processor redacts sensitive email field

If you wanted to hash the matched value instead, you can use the following OTTL statement:

yaml
12
replace_pattern(span.attributes["http.url"], "email=([^&]+)", Concat(["email=",
SHA256("$1")], ""))

This transforms the value into:

1
http://localhost:3000/?email=c1b43b36df09f0bf0e0612b54e90e47df2e8722dd90821b3b0613dc8a4f7d5f4

You can also use the transform processor to delete fields entirely:

yaml
12
delete_key(span.attributes, "http.request.header.authorization")
delete_matching_keys(span.attributes, "password.*")

Conditional redaction

The power of OTTL becomes clear when redaction needs to happen conditionally. You can define conditions blocks that apply to a set of statements:

yaml
12345678
processors:
transform/conditional_redaction:
log_statements:
- conditions:
- log.severity_number < 17
statements:
- delete_key(log.attributes, "req")
- delete_key(log.attributes, "res")

This configuration deletes the req and res attributes only if the log severity indicates it’s below an error threshold. You can also apply conditions inline using where clauses:

yaml
1234
processors:
transform/conditional_redaction:
log_statements:
- delete_key(log.attributes, "req") where log.severity_number < 17

When to use the transform processor

The transform processor is the right choice when redaction depends on multiple conditions, or when you need different behavior across telemetry types. It's also well-suited to complex cases where multiple transformations need to be executed in sequence.

While the configuration syntax is more verbose and requires a deeper understanding of OTTL, it offers the most flexibility and power for all kinds of telemetry manipulation.

Validating your redaction setup

A misconfigured processor could leave sensitive data exposed or strip out useful context unintentionally. Therefore, it's important to verify that your redaction rules are functioning correctly.

One way to test your configuration is by enabling the built-in debug exporter in the OpenTelemetry Collector.

yaml
12356789101112
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters: [debug, ...]
metrics:
exporters: [debug, ...]
logs:
exporters: [debug, ...]

Once this is set up, run the Collector and generate some representative telemetry either from your application or with test data.

You can then review the console output to confirm that sensitive fields are either removed, masked, or hashed as intended.

If you're working with the transform processor, the OTTL Playground is another useful tool for testing your statements and observing how various configurations impact your data in real time.

Some best practices for sensitive data redaction

When implementing sensitive data redaction in your telemetry pipelines, consider the following best practices:

Redact at the source when possible

While the OpenTelemetry Collector provides powerful redaction capabilities, the most secure approach is to prevent sensitive data from entering the telemetry pipeline in the first place.

Therefore, only collect data that serves an observability purpose and implement redaction at the instrumentation layer where possible to reduce the risk of potentially sensitive data being transmitted or stored accidentally downstream.

Apply defense in depth

No single redaction method is foolproof. Using multiple strategies such as combining allowlists with pattern matching helps ensure different types of sensitive data are caught.

This layered approach provides redundancy so that if one rule misses something, another may still catch it.

Test your redaction rules

Before deploying redaction rules to production, test them with representative data to ensure they’re catching all the sensitive information you expect.

You could also consider creating a validation pipeline that checks for common patterns of sensitive data.

Balance redaction with observability

Aggressive redaction can sometimes impair your ability to troubleshoot issues, so you must strive for a balance that protects sensitive data while maintaining enough context for effective observability.

For instance, you may need to track user-level behavior without storing identifiable information. In such cases, hashing user identifiers can preserve uniqueness and enable useful metrics like "number of affected users".

However, this only works if the original values have high entropy. If the set of possible values is small or predictable, hashed values can be reverse-engineered. In those cases, a salted hash or more robust anonymization method is needed.

Final thoughts

The OpenTelemetry Collector offers robust tools for redacting sensitive data without sacrificing the visibility required to operate and debug your systems with confidence.

By combining all the techniques explored in this guide, you can build a telemetry pipeline that enables observability without compromising user privacy or security.

Thanks for reading!

Authors
Ayooluwa Isaiah
Ayooluwa Isaiah