Last updated: September 1, 2025

Code Red Newsletter #6

Hi there,

KubeCon + CloudNativeCon is a wrap – and wow, that was a wild one.

Sunny skies in London? That alone felt like a bug in the simulation. But paired with a packed venue, high signal talks, and hallway tracks that might as well have had their own entries on the schedule - it all added up to a standout event.

I also had the pleasure of delivering a keynote on bridging Observability and Platform Engineering - definitely a highlight for me. Thank you to everyone who was there!

Naturally, this newsletter focuses on the conference that just wrapped. Let’s start with some trend spotting. My top 3 takeaways from the week were:

  • Platform engineering is becoming a foundational practice across the tech industry.
  • Observability is evolving - fast. OpenTelemetry leading the charge.
  • AI/ML is entering a new era - 2025 will be the year of Agentic AI.

Let’s break them down one by one:

Platform engineering is becoming a foundational practice across the tech industry

Still one of the most talked-about (and walked-about) topics at the conference. It continues to be the connective tissue across many CNCF projects - and the crowd size at any platform talk confirmed it’s not slowing down. More organizations are treating their platforms as products, investing in developer experience, and shifting left without shifting chaos. Golden paths are becoming real, not just slides.

Observability is evolving - fast. OpenTelemetry leading the charge

This one is evolving fast - and OpenTelemetry is clearly leading the charge. The days of fragmented telemetry data and vendor lock-in are fading. Engineers now expect correlation across traces, metrics, and logs, and semantic conventions are making that possible. The OpenTelemetry Collector is positioning itself as the last instrumentation agent you’ll ever need. And Perses? It got well-deserved keynote love. Looks like the community is starting to rally behind it as the declarative dashboard spec we’ve all been waiting for.

AI/ML is entering a new era - 2025 will be the year of Agentic AI

There’s always AI at KubeCon, but this year felt different. The energy has shifted from "how do we run AI on Kubernetes?" to "how can AI help us build, operate, and troubleshoot better systems?" 

Agentic AI took the spotlight - and with kagent officially launched, it’s never been easier to deploy AI agents that do more than just generate YAML with questionable confidence. Think debugging assistants, automated incident summaries, and actual ops value - no LLM-powered magic 8-balls here.

On a personal note - this was my last KubeCon+CloudNativeCon as a Co-chair. I’ve officially passed the torch to Abby Bangser, and I can’t wait to see what she brings to the role. She's going to crush it.

Dash0 didn’t just show up at KubeCon - we got mobbed. Our booth was one of the busiest at the conference, and for good reason. The red overalls turned heads, the live demos kept people coming back, and the backpacks? Gone in record time. But it wasn’t just about swag - the conversations were deep, thoughtful, and confirmed what we’re hearing everywhere: teams are hungry for better observability, smarter alerting, and real correlation across signals. Dash0 is hitting a nerve - in the best possible way.

Let’s get into this week's recommendations

Building an Observability Solution With ClickHouse at Dash0

In this deep dive, Miel Donkers walks through how we built Dash0’s observability backend on ClickHouse - and why we’d do it all over again. From correlating high-cardinality OpenTelemetry signals to optimizing query performance at scale, ClickHouse turned out to be the right tool for the job.

With clever use of materialized columns, hybrid hot/cold storage, and carefully tuned table schemas (yes, we benchmarked everything), Dash0 delivers real-time insights without blowing up storage costs. And the best part? It’s built by folks who’ve lived and breathed both observability and ClickHouse for years.

Read the full post here.

Code RED Podcast #21: From Outages to Optimization: How ilert Solves Incident Response with Birol Yildiz

Incident response is a lot like juggling chainsaws: stressful, risky, and not a great time for learning on the fly. Birol Yildiz joins the Code RED podcast to chat about how ilert helps teams flip the script on outages – turning chaos into clarity.

From automated escalations to real-time resolution workflows, this episode is a must-listen for anyone who's ever uttered the words “Is it down for everyone or just me?”

Listen to the episode here.

OTel Sucks (But Also Rocks!)

This blog post is the written sibling to the KubeCon talk of the same name. Engineers from Delivery Hero, Atlassian, and Pismo share the brutal truths and beautiful wins of OpenTelemetry. Yes, the Collector moves too fast. Yes, semantic convention changes make dashboards cry. But also: vendor neutrality, modularity, and a thriving community that actually listens. It’s raw, real, and reassuring for anyone tangled in trace spaghetti.

Read the full post here.

Insights from the OpenTelemetry Developer Experience Survey

218 developers shared the joys and pains of working with OpenTelemetry, and the results are in: we need better docs, richer examples, and simpler debugging.

Top gripes included unclear SDK configs, difficulty knowing why exporters fail, and Collector configs that read like ancient scrolls. But the community is listening, and improvements are underway. If you've ever muttered “Why isn’t this trace showing up?” - you’re not alone.

Read the full survey summary here.

AI-SRE Tooling: Between Hype and Help

AI won’t replace your SREs (yet), but it can definitely make them faster. In this sharp analysis, Andrew Mallaband explores the current state of AI-powered tooling for incident response - and why a human-in-the-loop co-pilot model is where the real value lies.

The secret sauce? Causal graphs + LLMs + solid telemetry = fewer pagers at 3AM. Tools like NoFire.AI, Senser, and Dash0’s own Triage are showing that with the right data, AI can guide RCA, not just guess at it.

Read the full post here.

Choice cuts

This is where we serve up spicy reads and crispy insights from around the ecosystem.

Perses Closes the Observability Gap with Declarative Dashboards

Dashboards have lagged behind in the "as code" game - until now. Perses, a CNCF Sandbox project, brings declarative dashboards with CRDs, GitOps support, and real version control (goodbye final-FINAL-v3.json).

Even better? Dash0 supports Perses out of the box, making it easy to scale, standardize, and stop babysitting your dashboards.

Read the full story here.

AI Agents: Observability's Next Leap?

Xata’s take on AI agents flips the on-call script: why wake an engineer when an agent can investigate, summarize, and even act?

Their Postgres-savvy Xata Agent already handles root cause analysis, learns from incidents, and even writes your postmortems. It’s early days, but observability might just be going full co-pilot.

Read the post here.

Whether you were with us in London or tuning in virtually, this KubeCon was one for the books. Observability is no longer a side quest – it’s core to building resilient, scalable, developer-friendly platforms.

The Dash0 crew will keep shipping, keep supporting, and yes – keep rocking the overalls.

Catch you in two weeks.

Kasper, out.

Authors
Kasper Borg Nissen
Kasper Borg Nissen