Last updated: September 1, 2025
Code Red Newsletter #9
Hi there,
AI civilizations, Minecraft memes, and tracing agentic workflows - welcome to the wild frontier of AI observability.
You may have seen the study already: 1,000 AI agents dropped into Minecraft… and they built a society. Villages. Markets. Even memes.
“We explored cultural transmission through the spontaneous generation of memes.”
It’s hilarious. It’s fascinating. And it’s also a glimpse into where the industry is heading.
Across the ecosystem - from cloud platforms to dev tools to open-source frameworks - AI agents are everywhere. They’re generating code, managing environments, making decisions, and integrating into developer workflows at a surprising pace.
This isn’t just a trend. It’s a shift in how we build and operate systems.
And engineers are feeling it. To thrive, we’ll need to move from “using AI tools” to orchestrating AI agents - debugging them, evaluating them, and observing how they interact with our systems and each other.
That’s what this edition is all about.
We’ll explore how observability is evolving to meet this moment: from semantic conventions and tracing protocols to practical stories of what happens when agents get a little… creative.
Whether you’re scaling GenAI in prod or just curious about how Minecraft turned into a meme factory, this one’s for you.
In Focus: Observability for AI Agents
AI Agent Observability - Evolving Standards and Best Practices
OpenTelemetry is beginning to define how observability should work for agentic applications. This post introduces new semantic conventions that cover both frameworks (like CrewAI and LangGraph) and the agent applications built on top of them. It also proposes patterns for handling trace propagation, identity, and error semantics. If you're building anything with reasoning loops or tool calls, this blog is essential reading.
Read the blog here.
AI + OpenTelemetry @ OTel Night Berlin
In this talk, Dash0’s Lariel Fernandes explores what happens when we point AI at telemetry itself. By using LLMs to classify logs, detect signal patterns, and support triage, we start to see how GenAI can help reduce alert fatigue and increase insight density. It’s not just automation - it’s augmentation. The talk also covers early challenges around hallucination, feedback loops, and practical deployment.
Watch the recording from OTel Night Berlin here.
Proposal: Adding OpenTelemetry Trace Support to MCP (Model Context Protocol)
The Model Context Protocol is quickly becoming the connective tissue of multi-agent systems. But until now, it’s been hard to know what’s actually happening inside it.
This proposal adds OpenTelemetry trace support directly into the MCP spec - giving client agents visibility into tool execution timelines, server-side latencies, and failures. It’s a major step toward full-stack traceability for agent workflows.
Join the discussion here.
A company gave 1,000 AI agents access to Minecraft - and they built a society
It started as an experiment in emergent behavior and ended in a full-fledged simulation of digital civilization. The agents built towns, traded resources, and - most bizarrely - started sharing memes. The researchers behind the project were testing how language and social structure could emerge from agent interactions. It worked. And it was weirdly beautiful.
Read the article.
Code RED Podcast: Engineering Intelligence: How to Build LLM Applications at Scale with Marc Klingen of Langfuse
Marc Klingen, CEO of Langfuse, joins the podcast to talk about observability for LLM chains and agent execution.
He shares what it takes to build structured traces for GenAI - including prompt tracking, response evaluation, and feedback-driven fine-tuning.
The conversation is packed with hands-on lessons from teams running LLMs in production today.
Listen to the episode here.
OpenTelemetry for Generative AI
As LLMs become core infrastructure, we need to monitor them with the same rigor as microservices. This piece introduces OpenTelemetry’s new semantic conventions and a Python instrumentation library for OpenAI calls. It covers trace spans for prompts, metrics for token usage, and events for user input and model response. It’s the start of an open, standardized telemetry model for generative systems.
Read the blog post here.
Keeping Up With AI: The Painful New Mandate for Software Engineers
According to The New Stack, the shift to AI-native software development is accelerating. Engineers are being asked to orchestrate teams of AI agents, design prompt chains, and understand new observability patterns. The article warns that we need to invest in skills like clear specification writing and agent oversight. Otherwise, we risk being outpaced by the tools we’re meant to control.
Read the blog post here.
MCP: May Cause Pwnage - Backdoors in Disguise
A new blog post, provocatively titled MCP: May Cause Pwnage, details how developers are unintentionally exposing powerful AI agents to the public internet.
The post outlines a series of vulnerabilities in the Model Context Protocol ecosystem - ranging from debug tools listening on the wrong interface to command execution via CSRF, DNS rebinding, and argument injection. In just a few days, the researchers found over 100 exposed servers online, many running tools like Playwright or Git clients with default configs. It’s a timely reminder that as we embrace agentic infrastructure, we also need to lock it down like any other production system.
Read the write-up here.
Choice cuts
Every Code RED issue ends with a few standout links that didn’t quite fit the main thread - but are too good not to share. Think of it as your observability snack pack.
Microsoft Build 2025: The age of AI agents is here
At Build, Microsoft doubled down on long-running copilots, open agentic frameworks, and deep IDE integrations. It wasn’t just a showcase - it was a statement. Expect to see “agent” become a standard dropdown in your dev tools soon.
Watch the keynote here.
A Modern Approach to Log Levels with OpenTelemetry
Not all logs are equal. This deep dive explores how OpenTelemetry standardizes severity across languages, frameworks, and protocols. It’s your guide to cutting through log noise - without losing important signals.
Read more here.
So where are we now?
AI agents are everywhere. Observability is catching up. And in the middle of it all, engineers are being asked to debug black boxes that talk back, refactor themselves, and sometimes open security holes the size of a Minecraft portal.
We’ve seen new semantic conventions, traceable agent workflows, powerful debugging tools - and also: exposed servers, DNS rebinding, and backdoors called from a browser tab.
In short: it’s an exciting time. But let’s not get lost in the meme stream. Stay curious. Stay secure. And always, always trace your agents.
Until next time - watch your ports, rotate your tokens, and maybe don’t let your LLMs run shell commands unsupervised.
Kasper, out!
