Logs serve as the eyes and ears for understanding what’s happening within your system. But not all log messages carry the same weight or importance.

Some messages record routine activity, while others flag critical failures that demand immediate attention. Log levels are labels that help distinguish logs based on urgency and impact.

Standards like OpenTelemetry further enhance this by unifying log level semantics across languages and frameworks, making it easier to analyze logs consistently.

By assigning a level to each log message, you’re creating a structured way to:

Prioritize issues that need attention.
Filter relevant information during debugging or monitoring.
Control the verbosity of your logs to suit different environments.

This guide focuses on real-world use of log levels: where they help, when they hurt, and how to make them work consistently across teams and tooling.

From Syslog to OpenTelemetry: The origins and evolution of log levels

The concept of log levels dates back to the 1980s with the introduction of the Syslog protocol in Unix systems.

It provided one of the first standardized methods for classifying log messages by severity, a model that would go on to shape modern logging practices.

Syslog defined eight severity levels:

Emergency
Alert
Critical
Error
Warning
Notice
Informational
Debug

As logging evolved, various frameworks adapted and streamlined these levels to suit the needs of different languages and application domains.

While naming conventions vary slightly across ecosystems, the core idea of categorizing messages by urgency and importance remained universal.

Modern observability frameworks like OpenTelemetry advance this standardization by adopting levels like TRACE, DEBUG, INFO, WARN, ERROR, and FATAL, while ensuring that existing log formats can be mapped unambiguously to ensure interoperability.

This is especially important in distributed systems and microservices, where logs collected from diverse sources must be aggregated and analyzed together.

In OpenTelemetry, each log entry carries two complementary fields that describe its severity:

SeverityText: A textual label such as INFO, ERROR, or WARN as defined at the source of the log. Since this value comes directly from the originating system, it carries no standardized meaning and is typically ignored by tools that rely on OpenTelemetry's semantic conventions.
SeverityNumber: A numerical value between 1 and 24 that normalizes log severity across different systems to enable consistent filtering, comparison, and visualization regardless of log source.

Here’s how each supported level category maps to the severity number:

SeverityNumber Range	Category
1-4	TRACE
5-8	DEBUG
9-12	INFO
13-16	WARN
17-20	ERROR
21-24	FATAL

Within each range, a higher number indicates a more severe condition. For example, an ERROR log with a SeverityNumber of 18 is more severe than one with 17.

This severity model is designed to interoperate seamlessly with existing logging systems, including legacy protocols like Syslog. For example, here’s how the Syslog levels map to OpenTelemetry’s severity categories:

Syslog	SeverityName	SeverityNumber
Emergency	FATAL	24
Alert	FATAL	23
Critical	FATAL	21
Warning	WARN	14
Notice	INFO	10
Information	INFO	9
Debug	DEBUG	5

Rather than focusing solely on name matching, this mapping prioritizes semantic equivalence. For instance, Notice and Informational both map to INFO, but Notice carries a higher severity number to reflect its greater importance. Similarly, Emergency, Alert, and Critical are all classified as FATAL but retain their relative severity through distinct numeric values.

This model makes it easy to perform precise filtering and alerting in OpenTelemetry-native tools like Dash0. For example, to view all logs that represent errors regardless of their source, you could filter using:

plain text

1
SeverityNumber >= 17

Filtering logs with SeverityNumber in Dash0

In the following sections, we’ll explore each of the OpenTelemetry severity categories to understand where they could fit into your logging strategy.

TRACE and DEBUG: Detailed information for troubleshooting

Trace and Debug logs provide detailed diagnostic information that’s useful during development and troubleshooting.

They expose internal application logic, decision paths, variable values, and other state details that help you understand how the system is functioning and interacting with other systems.

If this operation goes sideways, what information would I want at my fingertips?

These logs are usually disabled in production due to their verbosity and performance impact, but they serve an essential purpose in development, testing, and troubleshooting scenarios.

The distinction between them is that TRACE is more fine-grained than DEBUG, but in practice, they often overlap and are used similarly.

Use TRACE or DEBUG for tracing internal flow or understanding code execution paths:

During local development to understand code flow without using a debugger.
Logging the entry and exit of individual functions or methods along with their arguments.
Capturing variable contents, payloads, or intermediate values.
Profiling algorithms with step-by-step state changes.
Logging detailed flow across services or components in integration testing.
Selectively in production when investigating specific customer issues.

plain text

123456
TRACE: 2025-05-12T08:34:11.872Z enter processOrderRequest(order_id='ORD-28734')
TRACE: 2025-05-12T08:34:11.873Z loop[1] validating item_id='A123' stock=12
TRACE: 2025-05-12T08:34:11.874Z loop[2] validating item_id='B456' stock=5
DEBUG: 2025-05-12T08:32:15.938Z Attempting database connection with retries=3 timeout=5000ms
DEBUG: 2025-05-12T09:15:22.563Z User retrieved from database: {"id": 45678, "username": "jsmith", "status": "active"}
DEBUG: 2025-05-12T17:00:00.001Z Shutdown signal received type='SIGTERM' waiting_for_active_connections=true

INFO: A narrative of normal operations

Informational logs provide a high-level narrative of normal application activities. They are used to capture significant business events or state changes that have operational value but are not indicative of problems.

These logs are typically the default level in most logging frameworks and are useful for understanding the system’s behavior under typical conditions without overwhelming detail.

Use INFO when describing normal behavior or noting significant operational events:

Recording successful API or service operations.
Documenting incoming service requests from external systems.
Documenting service state changes or configuration changes.
Tracking maintenance tasks or background jobs.

plain text

123456
INFO: 2025-05-12T08:32:16.103Z Database connection established successfully to database='users_db'
INFO: 2025-05-12T08:34:12.456Z Order #ORD-28734 created successfully for customer_id=12345 items_count=3 total=129.99
INFO: 2025-05-12T08:35:04.782Z Payment processed successfully payment_id=PMT-9876 amount=129.99 method='credit_card'
INFO: 2025-05-12T09:15:05.433Z Configuration updated environment='production' updated_keys=['session_timeout', 'max_connections']
INFO: 2025-05-12T11:59:59.999Z Scheduled database backup completed success=true size_mb=1254 location='s3://backups/db-20250512.bak'
INFO: 2025-05-12T17:00:01.321Z Server shutdown initiated - graceful termination in progress

WARN: Potential issues requiring attention

Warning logs highlight unexpected conditions that don’t interrupt application functionality but could lead to issues if left unresolved. They serve as early warnings for situations that may require monitoring, intervention, or future fixes.

These logs are helpful for identifying trends that might degrade performance, introduce instability, or signal misuse of the system without being outright errors.

Use WARN when an event is unusual but not breaking functionality:

Resource usage nearing critical thresholds (e.g., memory, disk, connections).
Usage of deprecated features or APIs.
Misconfigurations or fallback behavior that still allow operation.
Latency spikes or degraded performance from dependent systems.

plain text

12345
WARN: 2025-05-12T09:34:21.789Z Database connection pool nearing capacity current=85 max=100
WARN: 2025-05-12T10:56:32.456Z Slow database query detected duration_ms=3250 query="SELECT * FROM orders WHERE created_at > ?"
WARN: 2025-05-12T11:12:45.678Z Using deprecated API endpoint "/api/v1/users" - will be removed in next release
WARN: 2025-05-12T13:28:56.123Z Disk space running low usage_percent=85 threshold=80 path="/var/log"
WARN: 2025-05-12T14:23:17.321Z API rate limit threshold approaching for client_id=45AC78B3 current_rate=178/200 time_window="1 minute" endpoint="/api/v2/transactions"

ERROR: Problems impacting functionality

Error logs represent events that break expected functionality. These are significant problems such as failed operations, unhandled exceptions, or broken integrations that prevent a task from completing successfully.

A one-off error usually isn’t something to worry about, but persistent failures in quick succession could signal deeper system issues that should be investigated.

Proper error logging should include sufficient context about what operation was attempted, what went wrong, and any relevant details that would help you troubleshoot the issue effectively.

Use ERROR when something goes wrong or an operation failed:

Documenting failures in external services or internal dependencies.
Resource access failures (e.g, timeouts, permission issues).
Operations that couldn’t be completed.

plain text

1234
ERROR: 2025-05-12T10:15:22.453Z Payment processing failed user_id=45678 amount=129.99 error_code="GATEWAY_TIMEOUT"
ERROR: 2025-05-12T11:28:19.237Z Failed to send notification email user_id=45678 error="SMTP connection refused"
ERROR: 2025-05-12T13:05:33.891Z Database query failed query="UPDATE users SET status = ? WHERE id = ?" error="Lock wait timeout exceeded"
ERROR: 2025-05-12T15:22:18.456Z Unable to process uploaded file filename="report.xlsx" reason="file is corrupt or in wrong format"

FATAL: When your application can’t continue

Fatal logs represent critical errors from which the application cannot recover.

These events typically indicate that continuing operation would lead to data corruption, security compromise, or complete system instability which forces an immediate shutdown.

In a healthy system, FATAL logs should be exceedingly rare. When they do occur, they must trigger immediate alerts and often require urgent human intervention.

Use FATAL when the system in a non-recoverable state:

When critical configuration is missing with no viable defaults or fallbacks.
Loss of essential external services required for core operations.
Uncaught exceptions.
Exhaustion of vital system resources (disk, memory, file handles)
Detection of a serious security incident that invalidates application trust.
Internal failures that make it unsafe or impossible to continue running.

plain text

12345
FATAL: 2025-05-12T14:23:45.912Z Database connection failed after 10 retries - application cannot function without database
FATAL: 2025-05-12T16:08:32.001Z Critical security breach detected - shutting down all services immediately
FATAL: 2025-05-12T09:17:22.561Z Disk space critically low (99.9%) - cannot write to disk error="no space available for write operations"
FATAL: 2025-05-12T11:03:15.784Z Configuration file missing required encryption keys - cannot start securely
FATAL: 2025-05-12T13:45:09.332Z Out of memory condition detected - insufficient resources to continue operation

Mapping existing log levels to OpenTelemetry

In most cases, the OpenTelemetry SDKs and integrations automatically map native log levels to the appropriate severity values. However, when using ingestion methods like the filelogreceiver) your logs may end up with a SeverityNumber of 0 which shows up as "UNKNOWN" in Dash0.

This creates problems for filtering, searching, and alerting since you’ll lose the semantic meaning that severity levels are meant to provide:

A log with SeverityNumber of 0 shows as UNKNOWN in Dash0

One option is to use OpenTelemetry’s operators to parse and enrich logs before they reach your observability platform. While effective, this requires configuring and maintaining custom pipelines for each log source.

This is where Dash0’s Log AI can step in. It uses machine learning to:

Detect and classify implicit log severity in an OpenTelemetry-compatible way such as when the SeverityNumber is not set on the log, but something looking like a severity range is included in the log body.
Extract log patterns and materialize additional attributes from the log body that can be used to filter and group.

This eliminates the need for custom regex filters and makes filtering, alerting, and visualizing logs as seamless as working with structured telemetry data.

Embracing dynamic log level control

Traditional approaches to log levels often create a frustrating paradox: either you log too little and miss critical clues, or you log too much and suffer performance hits or ballooning storage costs.

Most teams treat log levels as static configuration, set at deploy time and rarely changed without a full redeploy. This rigidity slows down troubleshooting and limits visibility when it’s needed most.

Modern observability practices solve this with dynamic log level control that enables real-time adjustments without redeploying.

You can temporarily increase log verbosity for specific components, users, or transactions to allow for targeted diagnostics exactly where the problem is happening.

This reduces incident response time, improves system understanding, and avoids the cost of blanket debug logging.

Beyond the technical benefits, this approach represents a cultural shift toward viewing observability as an adaptable tool rather than fixed infrastructure.

One way Dash0 helps with managing log volume is through Spam Filters, which work across spans, logs, and metric datapoints.

Spam filters can be used to filter out DEBUG logs in Dash0

For example, you can mark logs like verbose DEBUG entries as “spam”. Dash0 will then drop these logs at the ingestion stage, so that you’re not charged for data you don’t need.

When troubleshooting, simply disable the filter to re-enable those DEBUG logs in real time and use them to debug your issue. Once you’re done, reapply the filter to suppress them again (no redeploys needed!).

Making log levels actionable with Dash0

On their own, log levels are just static severity labels. They gain real value when integrated into an observability platform where they become powerful tools for filtering, visualizing, and responding to application behavior.

In the Dash0 interface, log levels are transformed into a visual hierarchy that guides your attention. This is achieved through our deliberate data-in-color philosophy:

Red for errors and fatal logs.
Yellow for warnings.
Gray for informational and debug logs.

Red for errors, yellow for warnings, grey for informational and debug logs

Another critical use of log levels is in alerting. While you always want to be notified of FATAL entries, it’s just as important to watch for patterns like spikes in ERROR logs or sustained WARN conditions that may indicate a degrading service.

Dash0 supports PromQL-based queries on log data, allowing you to define alerts in a powerful, expressive way that’s already familiar to anyone using Prometheus or Grafana.

Final thoughts

Log levels are a foundational element of any effective logging strategy. When used consistently, they help prioritize issues, control log volume, and improve monitoring.

With OpenTelemetry, log levels gain even more value. By standardizing severity across sources through SeverityNumber, OpenTelemetry enables consistent filtering, alerting, and analysis regardless of language or framework.

To get the most from your logs:

Apply log levels consistently across your codebase.
Include context in your messages.
Align with OpenTelemetry standards for interoperability.

Structured, semantically rich logs aren’t just helpful—they’re foundational to modern observability.

Thanks for reading!

A Modern Approach to Log Levels with OpenTelemetry