Dash0 Raises $110M Series B at $1B Valuation

  • 12 min read

Observability at the Edge: OpenTelemetry in Ingress Controllers

Observability at the Edge: OpenTelemetry in Ingress Controllers

OpenTelemetry has crossed an important threshold.

Nearly half of organizations now report running it in production, with many more actively evaluating it, according to the latest CNCF Annual Survey. As discussed in the previous post, that level of adoption changes the role OpenTelemetry plays in the ecosystem.

It is no longer just another integration point - it is becoming the de facto mechanism for producing and collecting correlated telemetry across modern platforms. If OpenTelemetry is becoming that layer, we should expect it to behave predictably at the boundaries of our systems - especially at ingress, where every request first crosses into the cluster.

In the previous post, we argued that “supports OpenTelemetry” is no longer a sufficient description. This post looks at what that support actually looks like in practice at the edge of the system.

If you're a platform engineer, your goal is to make observability a self-service capability. Developers shouldn't have to think about pipelines, formats, or correlation. They should get useful insights out of the box. But in reality, it often looks very different. Traces start mid-story. Logs don't correlate. Metrics live in a completely different system.

A lot of that starts at the ingress layer. Ingress controllers and gateways are the first observable hop for every external request. They terminate TLS, enforce routing rules, apply policy, and introduce latency before a request ever reaches an application. Before your beautifully instrumented service sees anything, the ingress layer has already made decisions that shape what the trace looks like. If you don't observe this layer properly, your traces literally start mid-story.

If OpenTelemetry is going to function as a shared integration layer across the CNCF ecosystem, it has to function well here.

Over the past months, I’ve been looking closely at how several widely used ingress and gateway implementations integrate with OpenTelemetry. This builds on earlier work evaluating ingress controllers like NGINX, Contour, Traefik, and Emissary, as well as collaborations with projects such as Linkerd and Dapr. The goal of this evaluation wasn’t a comparison or benchmarking. It was to understand what OpenTelemetry support actually looks like in practice when real traffic flows through real systems.

And when you look across these implementations, certain patterns repeat.

Tracing has largely stabilized

If you enable tracing on a modern ingress controller today, it generally works the way you’d expect.

Spans are exported via OTLP. W3C Trace Context is respected. If a request arrives with a traceparent header, the ingress continues that trace. If it doesn’t, a new one begins at the edge. You can see the boundary between external traffic and internal services clearly in your trace view.

That’s not a trivial achievement. A few years ago, ingress layers were often opaque from a tracing perspective. Today, they usually participate cleanly in distributed traces without requiring invasive customization.

This convergence didn’t happen by accident. Most of the ingress controllers in this space build on Envoy, which has had native OpenTelemetry tracing support for several years now. And tracing was the first signal to reach stable status in the OpenTelemetry specification, giving projects a clear target to implement against. The combination of a mature proxy runtime and a stable spec created a baseline that most projects have been able to meet without heroic effort.

There are still differences. Some implementations create explicit upstream client spans. Others keep the model minimal. Attribute sets vary. But the baseline capability - ingress as a first-class trace participant - has largely converged across projects.

Tracing, at least, feels native.

Metrics still reflect Prometheus gravity

Metrics are more nuanced. In many ingress implementations, they are still exposed primarily in Prometheus format and scraped, even when traces and logs are emitted over OTLP. This isn’t surprising. Prometheus has been foundational in the CNCF ecosystem for years, and ingress controllers historically built their metrics around that model.

So what you often end up with is a hybrid: Traces are pushed. Logs may be pushed. Metrics are scraped.

The OpenTelemetry Collector frequently bridges the gap, scraping Prometheus endpoints and forwarding metrics into the broader pipeline.

Operationally, this works extremely well. There’s nothing inherently problematic about it. But it does mean that signals are not always modeled through the same conventions. Prometheus metrics follow established naming traditions and dimensional patterns, while OpenTelemetry defines semantic conventions and explicit units designed to align metrics with traces and logs under a shared resource model. That alignment is what enables meaningful correlation across signals, rather than leaving each signal to be interpreted in isolation.

When those models differ at the source, coherence becomes something the pipeline assembles rather than something the component emits directly.

It’s less a flaw than a reflection of ecosystem history.

The Collector becomes the alignment layer

One of the clearest recurring themes is the role of the OpenTelemetry Collector.

Even when ingress controllers emit clean OTLP spans, the Collector typically does significant work: enriching telemetry with Kubernetes attributes, normalizing fields, associating metrics with resource identity, and routing signals to different backends.

What this looks like in practice: a Collector config with a k8sattributes processor to inject pod and workload metadata, a transform processor to normalize attribute names across signals, a batch processor to manage throughput, and separate exporters routing traces, logs, and metrics to different backends. It works, but that configuration is the integration, and it's rarely trivial to get right.

In many setups, cross-signal coherence only really appears after Collector processing. So you compensate. You build transforms, add enrichment pipelines, stitch signals together so they look consistent. And eventually, observability works - but only because the platform team is holding it all together.

That’s not an indictment of source-level telemetry. The Collector was designed to centralize these concerns. But it does mean that OpenTelemetry integration is often expressed as a combination of source behavior and pipeline alignment.

In small deployments, that distinction is subtle. In large platform environments, it shapes how much operational complexity accumulates in configuration versus in the component itself. And it reveals a trade-off: the more alignment happens in the pipeline, the less intent is expressed at the source. When telemetry is not modeled consistently at the source, the work doesn’t disappear - it moves into the platform.

Semantics are improving - but not uniform

Another pattern that emerges when you look closely is semantic variation.

Most ingress telemetry is understandable. You can see HTTP method, status code, duration, upstream behavior. But alignment with the latest OpenTelemetry semantic conventions varies.

Some projects track changes closely. Others still emit attributes like http.method and http.status_code, which have been superseded by http.request.method and http.response.status_code in current conventions. Logs may carry expressive field names that don’t directly align with standardized OpenTelemetry attributes. Metrics often follow Prometheus naming patterns rather than OpenTelemetry’s semantic model.

Nothing breaks because of this. Operators can still debug real issues. But as more tooling - including automated analysis and AI-assisted workflows - begins to rely on structured, consistent telemetry, those differences become more visible.

Semantic precision matters more when OpenTelemetry becomes infrastructure.

Resource identity is often derived

Resource modeling follows a similar pattern.

Ingress components frequently emit a stable service.name. But detailed Kubernetes context - pod identifiers, workload names, cluster information - is often added downstream by the Collector using Kubernetes metadata processors.

This approach is widely used and generally effective. But it means that stable identity can depend on deployment topology and pipeline configuration, not just on what the component emits at the source. In practice, this often results in a kind of Sahara of resource attributes at the source - with identity only becoming complete after enrichment.

When resource identity is incomplete or inconsistent, the effects are subtle but real. Dashboards silently split the same workload into separate entries after a pod reschedule. Alerts fire on services that no longer exist. Correlation queries return partial results because service.name doesn’t match across signals. These aren’t catastrophic failures - they’re the kind of slow erosion that makes operators trust their tooling less over time.

There’s a difference between telemetry that is intrinsically self-describing and telemetry that becomes self-describing after enrichment. That difference tends to show up later, when systems scale or change.

Final thoughts

Across the ecosystem, “supports OpenTelemetry” is usually accurate. Ingress controllers emit spans, logs can be correlated, metrics are available - everything can flow into an OpenTelemetry pipeline.

But that binary label hides meaningful variation in how integration actually works. In some cases, OpenTelemetry is clearly the primary integration surface. In others, it coexists with legacy or parallel models. In some cases, semantic alignment is deliberate at the source. In others, it emerges through pipeline normalization. In some systems, resource identity is explicit. In others, it is inferred.

None of this maps cleanly to better or worse. It reflects architectural trade-offs, project history, and ecosystem gravity.

But it does affect where operational effort lives.

As OpenTelemetry becomes foundational infrastructure rather than optional instrumentation, the conversation naturally shifts. The question is no longer simply whether a project supports OpenTelemetry. It’s how it participates in a shared telemetry model - and where alignment actually happens.

In the next posts, I’ll look at individual ingress and gateway implementations more concretely. Not to grade them, but to make these patterns concrete - showing where alignment happens at the source, where it depends on the pipeline, and how that affects real-world usage. Only after exploring those examples will we step back and ask whether the ecosystem might benefit from more precise language to describe what we’re seeing.

If you prefer video, I recently covered these findings in a Platform Engineering webinar on observability at the edge.

For now, it’s enough to recognize the shape. OpenTelemetry support is no longer a checkbox. It’s an architectural characteristic that shapes how systems are observed in practice.