Skip to content

Tracing and Observability

DCAF provides built-in support for distributed tracing and observability, allowing you to track requests through the entire agent execution pipeline from your application to the LLM provider.


Overview

Tracing in DCAF is built around four key identifiers that flow through the system:

Field Purpose Example
user_id Identifies the user making requests "user-123"
session_id Groups related runs into a session "session-abc"
run_id Unique identifier for a single execution "run-xyz"
request_id HTTP request correlation ID "req-456"

These identifiers are: - Passed to the LLM provider (Agno SDK) for end-to-end tracing - Included in response metadata for correlation - Available in logs throughout the execution pipeline - Compatible with OpenTelemetry and other observability platforms


Quick Start

Option 1: Via AgentRequest

The simplest way to add tracing is through the AgentRequest:

from dcaf.core.application.dto import AgentRequest

request = AgentRequest(
    content="What pods are running?",
    tools=[kubectl_tool],
    # Tracing fields
    user_id="user-123",
    session_id="session-abc",
    run_id="run-xyz",
    request_id="req-456",
)

response = await agent_service.execute(request)

# Tracing IDs are returned in response metadata
print(response.metadata)
# {'run_id': 'run-xyz', 'session_id': 'session-abc', 'user_id': 'user-123', 'request_id': 'req-456'}

Option 2: Via PlatformContext

For more control, use the PlatformContext value object:

from dcaf.core.domain.value_objects import PlatformContext

# Create context with tracing
context = PlatformContext(
    tenant_id="tenant-1",
    tenant_name="acme-corp",
    user_id="user-123",
    session_id="session-abc",
    run_id="run-xyz",
    request_id="req-456",
)

# Or add tracing to an existing context
context = PlatformContext.from_dict({"tenant_id": "tenant-1"})
context = context.with_tracing(
    user_id="user-123",
    session_id="session-abc",
    run_id="run-xyz",
)

# Pass as dict in request
request = AgentRequest(
    content="Deploy my app",
    context=context.to_dict(),
    tools=[deploy_tool],
)

Tracing Fields

user_id

Identifies the user making the request. Use this for: - User-level analytics and quotas - Audit trails showing who performed actions - Personalization and context

request = AgentRequest(
    content="Delete the old pods",
    user_id="alice@company.com",  # Or user ID from your auth system
    tools=[kubectl_tool],
)

session_id

Groups related runs into a logical session. Use this for: - Conversation continuity tracking - Session-level analytics - Grouping related agent interactions

# Generate session ID at the start of a conversation
import uuid
session_id = f"session-{uuid.uuid4()}"

# Use it for all requests in the conversation
request1 = AgentRequest(content="What's running?", session_id=session_id, ...)
request2 = AgentRequest(content="Delete pod-1", session_id=session_id, ...)

run_id

Unique identifier for a single agent execution. Use this for: - Correlating logs across services - Debugging specific executions - Linking to external tracing systems

import uuid

request = AgentRequest(
    content="Scale deployment to 5 replicas",
    run_id=f"run-{uuid.uuid4()}",
    tools=[scale_tool],
)

request_id

HTTP request correlation ID. Use this for: - End-to-end request tracing - Correlating with API gateway logs - Debugging request flows

# Typically passed from your HTTP framework
from fastapi import Request

@app.post("/chat")
async def chat(request: Request, body: ChatRequest):
    request_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))

    agent_request = AgentRequest(
        content=body.message,
        request_id=request_id,
        tools=[...],
    )

How Tracing Flows Through the System

┌─────────────────────┐
│   HTTP Request      │  X-Request-ID, user from JWT, etc.
└──────────┬──────────┘
┌─────────────────────┐
│   AgentRequest      │  user_id, session_id, run_id, request_id
│   + PlatformContext │
└──────────┬──────────┘
┌─────────────────────┐
│   AgentService      │  Logs tracing context
└──────────┬──────────┘
┌─────────────────────┐
│   AgnoAdapter       │  Passes to Agno SDK:
│                     │  - run_id → agno_agent.arun(run_id=...)
│                     │  - session_id → agno_agent.arun(session_id=...)
│                     │  - user_id → agno_agent.arun(user_id=...)
│                     │  - metadata → agno_agent.arun(metadata={...})
└──────────┬──────────┘
┌─────────────────────┐
│   Agno SDK          │  Native tracing support
│   (LLM Provider)    │  Integrates with observability platforms
└──────────┬──────────┘
┌─────────────────────┐
│   AgentResponse     │  metadata contains tracing IDs
└─────────────────────┘

Accessing Tracing Context

In Response Metadata

After execution, tracing IDs are available in the response:

response = await agent_service.execute(request)

# Access tracing metadata
run_id = response.metadata.get("run_id")
session_id = response.metadata.get("session_id")
user_id = response.metadata.get("user_id")
request_id = response.metadata.get("request_id")
tenant_id = response.metadata.get("tenant_id")

In PlatformContext

The PlatformContext provides a helper method to extract only tracing fields:

context = PlatformContext.from_dict(request.context or {})

# Get only tracing fields (safe to log, no sensitive data)
tracing = context.get_tracing_dict()
# {'user_id': 'user-123', 'session_id': 'session-abc', ...}

logger.info(f"Processing request", extra=tracing)

Integration with Agno Debug Mode

DCAF automatically syncs Agno's debug logging with Python's logging level:

# Enable Agno debug mode (verbose tracing)
LOG_LEVEL=DEBUG python your_agent.py

# Or set AGNO_DEBUG directly
AGNO_DEBUG=true python your_agent.py

When debug mode is enabled, you'll see: - Detailed message flow logging - Tool call parameters and results - Agno SDK internal operations


Integration with OpenTelemetry

Agno supports OpenTelemetry for distributed tracing. To enable:

pip install openinference-instrumentation-agno opentelemetry-sdk
import openlit
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

# Configure OpenTelemetry
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")
# ... configure processor and provider

# Initialize instrumentation
openlit.init()

# Your DCAF code will now emit traces
response = await agent_service.execute(request)

With OpenTelemetry enabled, each agent execution creates spans that include: - The tracing IDs you provided (run_id, session_id, etc.) - Tool call durations and parameters - LLM request/response timing - Token usage metrics


Logging Best Practices

Structured Logging with Tracing

import logging
import structlog

# Configure structured logging
structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ]
)

logger = structlog.get_logger()

# Include tracing in all log entries
async def handle_chat(user_id: str, session_id: str, message: str):
    run_id = f"run-{uuid.uuid4()}"

    # Bind tracing context to logger
    log = logger.bind(
        user_id=user_id,
        session_id=session_id,
        run_id=run_id,
    )

    log.info("Starting agent execution")

    request = AgentRequest(
        content=message,
        user_id=user_id,
        session_id=session_id,
        run_id=run_id,
        tools=[...],
    )

    response = await agent_service.execute(request)

    log.info("Agent execution complete",
             has_pending=response.has_pending_approvals)

    return response

Log Output Example

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "info",
  "event": "Starting agent execution",
  "user_id": "user-123",
  "session_id": "session-abc",
  "run_id": "run-xyz"
}
{
  "timestamp": "2024-01-15T10:30:02Z",
  "level": "info",
  "event": "Agno: Tracing context",
  "run_id": "run-xyz",
  "session_id": "session-abc",
  "user_id": "user-123"
}
{
  "timestamp": "2024-01-15T10:30:05Z",
  "level": "info",
  "event": "Agent execution complete",
  "user_id": "user-123",
  "session_id": "session-abc",
  "run_id": "run-xyz",
  "has_pending": false
}

Environment Variables

Variable Default Description
LOG_LEVEL INFO Python log level. Set to DEBUG for Agno verbose mode
AGNO_DEBUG false Enable Agno debug mode directly
AGNO_MONITOR false Enable Agno monitoring dashboard

See Also