Last updated: September 5, 2025

Code Red Newsletter #15

Hi there,

Welcome back! This time we’re zooming in on what happens when observability stops being an afterthought and starts being a platform feature. From GA launches to Kafka crises, the throughline is clear: observability at scale only works when it’s engineered into the platform itself.

In Focus: Observability for Platform Engineering

Platforms aren’t just about golden paths and Kubernetes clusters anymore. They’re about shipping observability as part of the product. If developers are still wiring exporters by hand or guessing which dashboard is “blessed,” you don’t have a platform - you have plumbing.

That’s why we teamed up with platformengineering.org for the new course Observability for Platform Engineering. It’s a free field guide for platform teams who want to make observability self-service, standardized, and developer-friendly:

  • Self-service instrumentation (no more wiki spelunking for YAML snippets)
  • Open standards baked in (hi, OpenTelemetry 👋)
  • Defaults with guardrails so you don’t blow the budget before you’ve finished your coffee

The point? Move observability from “something devs fight for in sprint planning” to “something the platform just provides.” All carefully curated by Michele Mancioppi and yours truly, Kasper Borg Nissen.

Register for the course.

Synthetic Monitoring: Know Before Your Users Do

We’ve just launched Synthetic Monitoring at Dash0, and it’s a game-changer for platform observability. Instead of waiting for users to stumble into errors, synthetic checks simulate real traffic against your services so you can spot issues before they ripple into Slack channels and support queues.

What makes this special is the integration: synthetics aren’t bolted on, they’re wired directly into your traces. That means developers always know whether an alert was triggered by a real user session or a platform-driven check. No more duct tape cron jobs, no more guesswork - just observability, productized and delivered as part of your platform.

Read more about Synthetic Monitoring.

The Five Stages of SRE Maturity: From Chaos to Operational Excellence

OneUptime charts the SRE journey through five familiar stages: chaos, reactivity, proactivity, optimization, and mastery. Early on it’s all about firefighting and tribal knowledge; later come SLOs, error budgets, and blameless postmortems. As maturity grows, automation takes over, observability deepens, and reliability becomes so ingrained that users barely notice when things go wrong.

It’s a reminder that the path to operational excellence isn’t just about tools - it’s about building systems and practices that scale with your platform.

Read the post.

Kafka Performance Crisis: How We Scaled OpenTelemetry Log Ingestion by 150%

Bindplane tells the story of a customer whose log pipeline, built on the OpenTelemetry Collector with a Kafka receiver, started to crumble under load. Throughput stalled at ~12K events per second per partition, and consumer lag ballooned. After weeks of tuning - moving the batch processor, switching to the Franz-Go client, and fixing an inefficient encoding setup - they pushed throughput to ~30K EPS per partition. Across 16 partitions, that meant scaling from 192K to nearly 480K EPS, a 150% improvement.

It’s a reminder that even the observability pipeline itself needs observability - and careful engineering - when it’s part of the platform.

Read the article.

Website Monitoring goes GA on Monday

A platform isn’t complete if developers have to glue together curl scripts to check uptime. That’s why Dash0’s Website Monitoring is going GA on September 8.

Global checks, latency alerts, and proactive notifications - all delivered as part of your platform’s observability layer. It’s one less tool devs have to configure, and one more signal that just comes with the platform.

Learn more about website monitoring here.

Code RED LIVE: Beyond Hype - The Real Impact of AI on Observability

In this live episode, Mirko joins CTO Ben Blackmore and Principal AI Engineer Lariel Fernandes for a candid conversation about what AI really means for observability. Rather than adding to the hype, they dig into the realities of how AI shows up in engineering workflows today, why OpenTelemetry is the backbone that makes those AI-driven insights possible, and what “agentic observability” could actually look like in practice.

The consensus: AI won’t replace observability pipelines, but it will reshape them - turning raw signals into actionable intelligence that platforms can deliver straight to developers.

Watch the recording.

Choice Cuts

Even our snack-sized reads point back to platform observability.

Observing Dapr with OpenTelemetry and Dash0

Dapr moves service calls, pub/sub, state, and workflows into a sidecar - which is great for developers, but tricky for observability. This deep dive shows how OpenTelemetry can stitch sidecar and app signals back together into one coherent view. From end-to-end traces across async pub/sub to metrics on sidecar health, it’s a practical guide for making Dapr-based platforms observable. There’s even a hands-on demo you can run and break yourself.

And yes, this one’s by me - so consider it a first-person field report straight from the trenches.

Read the blog post.

How to Name Your Span Attributes

Consistency is what makes observability-as-a-platform work. In this post, Juraci Paixão Kröhling walks through the do’s and don’ts of naming span attributes - when to lean on existing semantic conventions, how to design custom attributes that stay vendor-neutral, and why company-prefixed chaos only makes data harder to use.

Read the guide.

What Is OTLP and Why It's the Future of Observability

OTLP isn’t just another wire format - it’s the common protocol that makes telemetry portable across tools, vendors, and platforms. This explainer breaks down what OTLP is, why it was designed the way it is, and how it’s becoming the foundation for interoperable observability pipelines. If you want one protocol to rule your traces, metrics, and logs, start here.

Read the explainer.

If there’s a pattern in this issue, it’s this: observability works best when it’s part of the platform. From website checks to Kafka tuning, from span names to AI enrichment, the message is the same - make it productized, standardized, and accessible.

We’ll be back in two weeks with more stories from the frontlines of observability. Until then, may your spam filters be sharp, your OTLP endpoints be stable, and your developers never ask, “do we even have tracing enabled here?”

Kasper, out.

Authors
Kasper Borg Nissen
Kasper Borg Nissen