Telemetry is the data your systems emit about themselves. Learn the signal types, how telemetry flows from code to backend, and how it differs from observability.

What Is Telemetry?

Telemetry is the automatic collection of data from a running system and its transmission somewhere else for monitoring and analysis. The word comes from the Greek tele (far off) and metron (measure), and the concept predates software by a century: it described sending meteorological readings over telegraph wires in the 1800s, and it still describes a spacecraft beaming sensor data back to mission control.

If you landed here from a software or DevOps context, though, telemetry means something more specific: the data your applications and infrastructure emit about their own behavior while they run. Request latencies, error counts, memory usage, the path a request took through a dozen services. This article explains what that data actually consists of, how it gets from your code to a place you can query it, and why telemetry is not the same thing as observability or monitoring, even though people use the three words interchangeably.

What telemetry actually is

Strip away the industry-specific framing and telemetry is one idea: a system measures something about itself and ships that measurement to a remote receiver, without a human standing there reading a gauge. A heart monitor doing this is medical telemetry. A Formula 1 car streaming tire temperatures to the pit wall is automotive telemetry. A payment service recording how long each database query took is software telemetry. Same mechanics, different sensors.

In software the "sensor" is your code, or a library wrapped around your code, or an agent watching the runtime. The measurement is a number, an event, or a record. The remote receiver is an observability backend where the data lands, gets stored, and becomes queryable. The interesting part, and the part most definitions skip, is everything that happens between the measurement and the backend.

The signal types

Modern software telemetry is organized into distinct signal types, each answering a different operational question. OpenTelemetry, the open standard that most of the industry has converged on, defines four.

Traces capture the path of a single request as it moves through your system. A trace is a tree of spans, where each span represents one unit of work: an incoming HTTP request, a database call, a cache lookup, a call to a downstream service. Each span records when it started, how long it took, and a set of attributes describing what happened. Spans share a trace ID, so you can reconstruct the whole journey of a request even when it crossed ten service boundaries. For more on how this works in practice, see how distributed tracing works in microservices.

Here is what a single span looks like, simplified to its essentials:

json

12345678910111213
{
  "name": "GET /api/orders/{id}",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "parent_span_id": "a2fb4a1d1a96d312",
  "start_time": "2026-06-30T10:14:22.301Z",
  "end_time": "2026-06-30T10:14:22.548Z",
  "attributes": {
    "http.request.method": "GET",
    "http.response.status_code": 200,
    "server.address": "orders.internal"
  }
}

That 247-millisecond span is one node in a tree. The parent_span_id tells you which operation called it, and the shared trace_id ties it to every other span in the same request.

Metrics are numeric measurements aggregated over time. A counter that only goes up (total requests served), a gauge that moves in both directions (current memory usage), or a histogram that captures a distribution (request latency bucketed into ranges). Metrics are cheap to store and fast to query, which makes them the right tool for dashboards and alerting. They tell you that something is wrong, like a latency spike at 10:14, but rarely why.

Logs are timestamped records of discrete events. They have the longest legacy of any signal, since every language has shipped logging since forever. What changed with OpenTelemetry is correlation: logs now carry the trace ID and span ID of the request that produced them, so a log line stops being an isolated string in a file and becomes a link straight to the span that emitted it. Structured logs (key/value or JSON) are strongly preferred over free-text in production, because a stable schema is the difference between parsing at scale and grep-and-pray.

text

1
2026-06-30T10:14:22.540Z level=ERROR service=orders trace_id=4bf92f3577b34da6a3ce929d0e0e4736 msg="payment authorization failed" order_id=88213 reason="gateway timeout"

Because that log carries the same trace_id as the span above, you can jump from the error straight to the full request trace and see exactly which downstream call timed out.

Profiles are the newest signal, and they record resource usage at the code level: which functions consumed CPU, where memory was allocated, where a thread spent its time waiting on a lock. A profile is essentially a continuous flame graph of your running process. OpenTelemetry's eBPF-based continuous profiling can profile most languages on Linux without any code changes, and the profiling data model has been stabilizing through 2026. The payoff is correlation again: from a slow span you can jump to the profile for that exact time window and find the specific function eating the CPU.

You will often see traces, metrics, and logs called the "three pillars of observability." It is a useful mnemonic, but treat it with mild suspicion. The number of signals matters far less than whether they share context. Three uncorrelated pillars are three separate haystacks. The reason OpenTelemetry bothers to thread a single trace ID through all of them is so they stop being pillars and start being one connected dataset.

How telemetry data flows

Telemetry has to travel from inside your process to a backend you can query, and that journey has four stages.

Generation happens through instrumentation. Auto-instrumentation libraries patch common frameworks (your HTTP server, your database driver, your message queue client) and emit spans and metrics with no code from you. The OpenTelemetry Node.js auto-instrumentation, for example, automatically wraps Express, Postgres, Redis, and a dozen others. For anything specific to your business logic, you add manual instrumentation: a custom span around a checkout flow, a counter for coupons redeemed.

Collection usually routes through the OpenTelemetry Collector, a standalone process that receives telemetry, transforms it, and forwards it on. The Collector follows a receivers, processors, exporters pipeline: receivers accept incoming data, processors batch it, drop noise, redact PII, or sample it, and exporters send it to one or more destinations.

Transmission uses OTLP (the OpenTelemetry Protocol), a single wire format over gRPC or HTTP that carries all four signal types. This is the part that ended the bad old days. You instrument once against the OTLP standard, and you can point that data at any compatible backend without rewriting your application.

Storage and analysis happen in the backend, where the data is persisted and made queryable through dashboards, trace views, and alerting.

The flow looks like this:

text

1234567891011121314151617
┌──────────────────────────────┐
│  Your application            │  ← SDK + auto/manual instrumentation
│  (generates spans, metrics,  │
│   logs, profiles)            │
└──────────────┬───────────────┘
               │  OTLP (gRPC / HTTP)
               ▼
┌──────────────────────────────┐
│  OpenTelemetry Collector     │  ← receivers → processors → exporters
│  (batch, filter, redact,     │
│   sample, route)             │
└──────────────┬───────────────┘
               │  OTLP
               ▼
┌──────────────────────────────┐
│  Observability backend       │  ← store, correlate, query, alert
└──────────────────────────────┘

The Collector in the middle is what decouples your code from your vendor. Swap backends or add a second destination by editing one config file, and your application never notices.

Telemetry vs observability vs monitoring

These three terms get used as synonyms constantly, and the conflation causes real confusion in design discussions. They are not the same thing.

Telemetry is the data and the act of producing it. It is the raw material: the spans, metrics, logs, and profiles your systems emit.

Monitoring is watching known signals against expectations. You decide in advance what to measure (error rate, p99 latency, disk usage), set thresholds, and get alerted when a threshold is crossed. Monitoring answers questions you already knew to ask. It is excellent for "is the thing I expect to break, breaking?"

Observability is a property of a system: the degree to which you can understand its internal state from the telemetry it produces, including states you never anticipated. The test of observability is whether you can answer a brand-new question about your system in production without shipping new code to collect new data. That only works if the underlying telemetry is rich and correlated enough to support questions you didn't think of in advance.

So the relationship is layered. Telemetry is the data. Monitoring and observability are two different things you do with it. You can have plenty of telemetry and poor observability if your signals are siloed and can't be cross-referenced, which is exactly the failure mode that correlated, OpenTelemetry-native data is designed to prevent.

Common pitfalls

The hard part of telemetry in production is rarely generating it. It is generating the right amount and keeping it useful.

The first trap is cardinality explosion. Every unique combination of attribute values on a metric creates a separate time series. Tag a request-count metric with user_id, and a system with a million users now has a million time series for one metric. Storage and query cost climb fast, and dashboards slow to a crawl. The fix is discipline about which attributes belong on metrics (low-cardinality dimensions like http.response.status_code) versus which belong on traces and logs (high-cardinality detail like a specific order ID).

The second is collecting data you never query. It is tempting to instrument everything and figure out what matters later. In practice this produces enormous bills and signal you have to wade through during an incident. Decide what questions you need to answer first, then collect to support those questions. Define your SLOs and golden signals before you decide how many logs to ship.

The third is sampling that hides the incidents you care about. High-traffic systems can't afford to keep every trace, so they sample. Naive head-based sampling (decide at the start of a request whether to keep it) is cheap but throws away the rare slow or failed requests that you most want to see. Tail-based sampling, where the Collector decides after a request finishes and can preferentially keep errors and outliers, costs more compute but keeps the traces that actually matter.

The fourth, and the one that quietly defeats observability, is uncorrelated signals. If your logs don't carry trace IDs, your metrics live in a different system from your traces, and your profiles are a separate tool entirely, then during an incident you are manually joining datasets by timestamp at three in the morning. The whole point of consistent semantic conventions and shared context is so the join is already done for you.

Final thoughts

Telemetry is the data your systems emit about themselves, organized into four signal types (traces, metrics, logs, and profiles) that travel from instrumented code, through a collection pipeline, to a backend over OTLP. It is the raw material of both monitoring and observability, but on its own it is just data. Its value depends entirely on whether the signals are correlated tightly enough to answer questions you didn't anticipate.

That correlation is hard to bolt on after the fact, which is why it pays to standardize on OpenTelemetry from the start and to send that data somewhere built to keep the signals connected rather than stored in four separate silos.

Dash0 is OpenTelemetry-native: it ingests OTLP directly and keeps your traces, logs, and metrics correlated with infrastructure data through shared context, so jumping from a latency spike to the trace to the log that explains it is a click, not a manual timestamp hunt. Start a free trial to send your telemetry to a backend that treats the four signals as one connected dataset. No credit card required.