Introduction

Observability tools are powerful, but the cost of figuring out what to do next is still too high. Dashboards, alerts, traces, and logs all work but only if you already know where to look. Agent0 is Dash0’s agentic AI platform and it is designed to help teams explore, understand, and act on their telemetry without starting from scratch every time.

Instead of a single chatbot, Agent0 is a family of specialized AI agents. A generalist agent first interprets your request and understands your intent. It then routes that request to the right specialized agent depending on what you’re trying to do.

This guide shows how to get the most value out of Agent0 in practice. You can read it end-to-end or jump directly to the sections that match what you’re trying to accomplish. Embedded videos show each workflow in action.

NOTE: This guide reflects how we use Agent0 internally and how we see teams use it in real environments. The prompts here aren’t exhaustive, they’re just starting points. The best way to learn Agent0 is to ask real questions about your own systems and follow the next steps it suggests.

1) Environment Overview and First Steps

Best for: new environments, new hires, first day on call

Agent involved: The Pathfinder

When you first open an observability environment, the hardest question is usually where to start. Agent0 is designed to answer that directly by scanning what’s already instrumented and summarizing the current state of the system.

Example prompts:

“I’m new to this Dash0 environment. What telemetry do I have and what should I look at first?”
“Give me an overview of what’s currently monitored in [environment/workspace].”
“What services are active right now in [service.namespace] and how healthy are they?”

Agent0 identifies active services, evaluates health signals, and highlights what’s working, what’s noisy, and what may be missing. From there, you can quickly narrow your focus with follow-ups like “Which services should I prioritize?” or “If I were on call right now, where would you start?” This workflow is especially useful during onboarding and handoffs, where context matters more than raw metrics.

2) Troubleshooting and Incident Investigation

Best for: on-call, incident response, retrospectives, recurring issues

Agent involved: The Seeker

Troubleshooting usually starts broad, then narrows quickly. Agent0 can summarize what’s happening across the environment, then drill into a specific service or endpoint using traces, logs, and metrics together.

Start broad (incident summary prompts):

“Why are users seeing errors right now in [service.namespace]?”
“Explain the recent incidents in [environment/workspace].”
“Summarize all alerts and outages from [time range] (for example: last 24h / this week).”

Agent0 correlates signals and produces a narrative explanation of what happened, backed by real data. From there, follow-ups like “Which service contributed most to these errors?” or “What changed shortly before the spike?” help you narrow scope fast.

Go deep (service/endpoint investigation prompts):

“Show me the failed traces for [service.name] in the last [time range] and explain what’s going wrong.”
“Why is [endpoint/route] returning elevated errors?”
“What’s causing latency spikes for [endpoint/operation]?”

Agent0 surfaces likely causes and suggests what to inspect next, which keeps the investigation moving even when you’re not sure what to ask after the first answer.

3) Dashboard Creation Without Starting from an Empty Canvas

Best for: new teams, new services, new environments

Agent involved: The Artist

Dashboards are often treated as static artifacts, but in practice they’re living tools. Instead of manually selecting metrics and writing queries, you can ask Agent0 to do it for you.

Example prompts:

“Create a dashboard for [service.name] with the most important performance metrics.”
“Build a dashboard showing health for my top [N] services in [service.namespace].”
“Create a dashboard focused on [workflow] reliability (for example: checkout).”

Agent0 identifies active services, finds usable metrics, validates PromQL queries, and builds dashboards that automatically adapt as services change. Dashboards become a baseline rather than a one-off artifact.

4) Alerting and Check Rules

Best for: reliability, on-call readiness

Agent involved: The Artist (and Oracle, when you need query help)

Once dashboards exist, alerts should build on the same logic teams already trust.

Example prompts:

“What alerts should I set up for services in [service.namespace]?”
“Create an alert for elevated error rates on critical services.”
“What SLIs make sense for [service.name]?”

Agent0 proposes alert patterns, recommends thresholds, explains why they make sense, and turns existing PromQL into actionable check rules. This keeps dashboards and alerts aligned and avoids duplicated effort.

5) PromQL Help and Query Exploration

Best for: developers, platform teams, anyone new to PromQL

Agent involved: The Oracle

PromQL shows up everywhere, and most people know what they want to measure but not how to write the query. Or the query works, but no one is completely sure what question it’s answering. Agent0 helps bridge those gaps.

Example prompts:

“Write a PromQL query for p95 latency per service in [service.namespace].”
“Explain this query step by step and what question it answers: [paste PromQL].”
“How do I calculate error rate per endpoint for [service.name]?”

Agent0 explains queries in plain language, connects them to real operational questions, and suggests safe modifications. This lowers the learning curve and reduces copy-paste mistakes.

6) Large Traces and Performance Analysis

Best for: slow requests, intermittent failures

Agent involved: The Threadweaver

Large traces are hard to reason about manually. With Agent0, you can turn a single trace into a broader performance investigation.

Example prompts:

“Summarize this trace.”
“Why is this request slow?”
“What’s the critical path here?”

Agent0 analyzes the trace, compares it against similar traces in the same time window, flags anomalies, and highlights bottlenecks. One trace becomes a doorway to understanding many.

7) Onboarding a New Service

Best for: new services, migrations, architecture changes

Agent involved: The Pathfinder

When adding a new service, Agent0 helps you instrument it correctly from the start.

Example prompts:

“How should I instrument [service.name] with OpenTelemetry?”
“What telemetry should I expect once [service.name] is set up?”
“What signals are missing for effective debugging in [service.name]?”

Agent0 guides instrumentation choices, naming conventions, and validation steps so teams gain confidence early.

8) Prioritization, Summaries, and Decision Support

Best for: leads, managers, daily operations

Agent involved: The Seeker

Agent0 is also useful for answering higher-level questions about what to do next.

Example prompts:

“What’s the biggest reliability risk right now in [service.namespace]?”
“What should the team focus on improving this week?”
“Summarize what I missed today in [environment/workspace].”

These prompts turn telemetry into decisions, not just insights.

Conclusion

Agent0 is designed to meet teams where they are. Whether you start from an alert, a trace, a dashboard, or a blank environment, Agent0 uses the context you already have and guides you step by step toward understanding and action, turning observability data into outcomes.

A Practical Guide to Agentic Observability in Dash0