Interceptors Guide¶
Interceptors let you hook into the LLM request/response pipeline. Think of them as checkpoints where you can inspect, modify, or block data before it goes to the AI and after you receive a response.
Table of Contents¶
- What Are Interceptors?
- Quick Start
- Request Interceptors
- Response Interceptors
- Session Access
- Error Handling
- Async Interceptors
- Common Use Cases
- Best Practices
- API Reference
What Are Interceptors?¶
Interceptors are functions that run at specific points in the request/response flow:
Your Message
│
▼
┌─────────────────────────────┐
│ REQUEST INTERCEPTORS │ ◄── Runs BEFORE the LLM call
│ • Add context │
│ • Validate input │
│ • Block bad requests │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ LLM │ ◄── AI processes your request
│ (Claude, GPT, etc.) │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ RESPONSE INTERCEPTORS │ ◄── Runs AFTER the LLM call
│ • Clean output │
│ • Redact secrets │
│ • Add formatting │
└─────────────────────────────┘
│
▼
Final Response
Why Use Interceptors?¶
| Use Case | Example |
|---|---|
| Add Context | Tell the AI which tenant/environment the user is in |
| Security | Block prompt injection attacks |
| Validation | Ensure required data is present |
| Clean Output | Remove thinking tags or sensitive data |
| Logging | Track what's being sent and received |
| Enrichment | Look up user preferences from a database |
Quick Start¶
Here's a simple example that adds tenant context to every request:
from dcaf.core import Agent, LLMRequest
from dcaf.tools import tool
# Define a tool
@tool(description="List pods in the current namespace")
def list_pods(namespace: str = "default") -> str:
return f"Pods in {namespace}: nginx, redis, api"
# Define a request interceptor
def add_tenant_context(request: LLMRequest) -> LLMRequest:
"""
Add information about the user's tenant to help the AI
understand which environment they're working in.
"""
# Get the tenant name from the context (passed by the caller)
tenant_name = request.context.get("tenant_name", "unknown")
# Add it to the system prompt so the AI knows about it
request.add_system_context(f"The user is working in tenant: {tenant_name}")
# Return the modified request
return request
# Create the agent with the interceptor
agent = Agent(
tools=[list_pods],
request_interceptors=add_tenant_context, # Single interceptor
)
# Use the agent
response = agent.run(
messages=[{"role": "user", "content": "What pods are running?"}],
context={"tenant_name": "production"},
)
print(response.text)
Request Interceptors¶
Request interceptors run before your request is sent to the LLM. They receive an LLMRequest object and must return an LLMRequest object.
The LLMRequest Object¶
from dcaf.core import LLMRequest
# What a request looks like
request = LLMRequest(
messages=[
{"role": "user", "content": "What pods are running?"},
],
tools=[list_pods, delete_pod],
system="You are a helpful Kubernetes assistant.",
context={
"tenant_name": "production",
"user_id": "alice",
"k8s_namespace": "default",
},
)
LLMRequest Fields¶
| Field | Type | Description |
|---|---|---|
messages |
list[dict] |
Conversation history. Each dict has role and content. |
tools |
list |
Tools the AI can use. |
system |
str \| None |
System prompt (instructions for the AI). |
context |
dict |
Platform context (tenant, user, etc.). |
session |
Session |
Persistent state across conversation turns. |
LLMRequest Methods¶
| Method | Description |
|---|---|
get_latest_user_message() |
Returns the content of the most recent user message. |
add_system_context(text) |
Appends text to the system prompt. |
Example: Adding Context¶
def add_user_preferences(request: LLMRequest) -> LLMRequest:
"""
Add the user's language preference to help the AI respond
in the correct language.
"""
# Get user ID from context
user_id = request.context.get("user_id")
if user_id:
# Look up user's preferred language (simplified example)
preferred_language = "Spanish" # Would come from a database
# Tell the AI about the preference
request.add_system_context(
f"Please respond in {preferred_language} if possible."
)
return request
agent = Agent(
tools=[...],
request_interceptors=add_user_preferences,
)
Example: Modifying Messages¶
def shorten_long_messages(request: LLMRequest) -> LLMRequest:
"""
Truncate very long messages to save tokens.
"""
max_message_length = 5000
# Go through each message
for message in request.messages:
content = message.get("content", "")
if len(content) > max_message_length:
# Truncate and add indicator
message["content"] = content[:max_message_length] + "\n[Message truncated...]"
return request
agent = Agent(
tools=[...],
request_interceptors=shorten_long_messages,
)
Response Interceptors¶
Response interceptors run after you receive a response from the LLM. They receive an LLMResponse object and must return an LLMResponse object.
The LLMResponse Object¶
from dcaf.core import LLMResponse
# What a response looks like
response = LLMResponse(
text="There are 3 pods running: nginx, redis, and api.",
tool_calls=[], # Empty if no tools were called
usage={"input_tokens": 150, "output_tokens": 25},
)
LLMResponse Fields¶
| Field | Type | Description |
|---|---|---|
text |
str |
The AI's text response. |
tool_calls |
list[dict] |
Tools the AI wants to use. |
usage |
dict \| None |
Token usage statistics. |
raw |
Any |
Original provider response (for debugging). |
session |
Session |
Persistent state across conversation turns. |
LLMResponse Methods¶
| Method | Description |
|---|---|
has_tool_calls() |
Returns True if the AI wants to call tools. |
get_text_length() |
Returns the length of the text response. |
Example: Cleaning Output¶
import re
def remove_thinking_tags(response: LLMResponse) -> LLMResponse:
"""
Remove <thinking>...</thinking> tags from the response.
Some AI models include their reasoning process in these tags.
We hide this from users for a cleaner experience.
"""
# Use regex to remove thinking tags and their content
cleaned_text = re.sub(
r'<thinking>.*?</thinking>',
'',
response.text,
flags=re.DOTALL # Match across multiple lines
)
# Remove extra whitespace
response.text = cleaned_text.strip()
return response
agent = Agent(
tools=[...],
response_interceptors=remove_thinking_tags,
)
Example: Redacting Sensitive Data¶
def redact_secrets(response: LLMResponse) -> LLMResponse:
"""
Remove any accidentally leaked secrets from the response.
"""
# List of patterns that might be secrets
secret_patterns = [
(r'sk-[a-zA-Z0-9]{32,}', '[API_KEY_REDACTED]'), # API keys
(r'eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+', '[JWT_REDACTED]'), # JWTs
(r'password["\']?\s*[:=]\s*["\'][^"\']+["\']', 'password: [REDACTED]'),
]
text = response.text
for pattern, replacement in secret_patterns:
text = re.sub(pattern, replacement, text, flags=re.IGNORECASE)
response.text = text
return response
agent = Agent(
tools=[...],
response_interceptors=redact_secrets,
)
Session Access¶
Both LLMRequest and LLMResponse have access to the session, allowing interceptors to read and modify persistent state across conversation turns.
Reading Session in Request Interceptors¶
def add_user_context(request: LLMRequest) -> LLMRequest:
"""Add user-specific context from session."""
# Read from session
user_name = request.session.get("user_name", "User")
user_prefs = request.session.get("user_preferences", {})
# Add context based on session data
request.add_system_context(f"User: {user_name}")
if user_prefs.get("verbose_mode"):
request.add_system_context("User prefers detailed explanations.")
return request
Modifying Session in Request Interceptors¶
def track_request_count(request: LLMRequest) -> LLMRequest:
"""Track how many requests the user has made."""
count = request.session.get("request_count", 0)
request.session.set("request_count", count + 1)
request.session.set("last_request_time", datetime.now().isoformat())
return request
Modifying Session in Response Interceptors¶
def track_response_metrics(response: LLMResponse) -> LLMResponse:
"""Track response metrics in session."""
# Update session with response info
response.session.set("last_response_length", len(response.text))
response.session.set("had_tool_calls", response.has_tool_calls())
# Accumulate total tokens if available
if response.usage:
total = response.session.get("total_tokens", 0)
total += response.usage.get("output_tokens", 0)
response.session.set("total_tokens", total)
return response
agent = Agent(
tools=[...],
response_interceptors=track_response_metrics,
)
Session with Typed Models¶
You can store and retrieve typed models in session:
from pydantic import BaseModel
class UserPreferences(BaseModel):
theme: str = "light"
language: str = "en"
verbose: bool = False
def apply_user_preferences(request: LLMRequest) -> LLMRequest:
"""Apply user preferences from session."""
# Get as typed model
prefs = request.session.get("user_prefs", as_type=UserPreferences)
if prefs:
if prefs.language != "en":
request.add_system_context(f"Respond in {prefs.language}.")
if prefs.verbose:
request.add_system_context("Provide detailed explanations.")
return request
Error Handling¶
Sometimes you need to stop a request entirely. Use InterceptorError for this:
from dcaf.core import Agent, LLMRequest, InterceptorError
def block_dangerous_requests(request: LLMRequest) -> LLMRequest:
"""
Block requests that look like prompt injection attacks.
"""
user_message = request.get_latest_user_message().lower()
# Check for suspicious patterns
dangerous_patterns = [
"ignore previous instructions",
"disregard your instructions",
"forget everything",
"new instructions:",
]
for pattern in dangerous_patterns:
if pattern in user_message:
# STOP! Don't send this to the LLM.
raise InterceptorError(
user_message="I'm sorry, I can't process this request.",
code="PROMPT_INJECTION_BLOCKED",
details={"pattern": pattern},
)
# Safe to continue
return request
agent = Agent(
tools=[...],
request_interceptors=block_dangerous_requests,
)
InterceptorError Fields¶
| Field | Description |
|---|---|
user_message |
The message shown to the user. Make it friendly! |
code |
Internal code for logging (not shown to user). |
details |
Extra info for logging (not shown to user). |
Handling InterceptorError in Your Code¶
from dcaf.core import InterceptorError
try:
response = agent.run(messages=[...])
print(response.text)
except InterceptorError as error:
# The interceptor blocked the request
print(f"Request blocked: {error.user_message}")
# For logging/debugging
if error.code:
log.warning(f"Blocked with code: {error.code}")
Error Handling Modes¶
You can configure how the agent handles unexpected errors (not InterceptorError):
agent = Agent(
tools=[...],
request_interceptors=[my_interceptor],
on_interceptor_error="abort", # Stop on any error (default)
# OR
on_interceptor_error="continue", # Log error and keep going
)
Async Interceptors¶
Interceptors can be async, which is useful for:
- Database lookups
- API calls
- File I/O
import asyncio
async def enrich_from_database(request: LLMRequest) -> LLMRequest:
"""
Look up user preferences from the database.
This is async because database calls take time.
"""
user_id = request.context.get("user_id")
if user_id:
# Async database call
preferences = await get_user_preferences(user_id)
# Add preferences to context for tools to use
request.context["user_preferences"] = preferences
# Also tell the AI about them
request.add_system_context(
f"User preferences: {preferences}"
)
return request
# Async functions work just like sync functions
agent = Agent(
tools=[...],
request_interceptors=enrich_from_database,
)
Mixing Sync and Async¶
You can use both sync and async interceptors together:
# Sync interceptor
def add_timestamp(request: LLMRequest) -> LLMRequest:
request.context["timestamp"] = datetime.now().isoformat()
return request
# Async interceptor
async def check_rate_limit(request: LLMRequest) -> LLMRequest:
user_id = request.context.get("user_id")
is_allowed = await rate_limiter.check(user_id)
if not is_allowed:
raise InterceptorError("You've made too many requests. Please wait.")
return request
# Both work together!
agent = Agent(
tools=[...],
request_interceptors=[add_timestamp, check_rate_limit],
)
Common Use Cases¶
1. Multi-Tenant Context¶
Add tenant information so the AI knows which environment the user is in:
def add_tenant_context(request: LLMRequest) -> LLMRequest:
"""Add tenant and namespace info to help the AI."""
tenant = request.context.get("tenant_name", "unknown")
namespace = request.context.get("k8s_namespace", "default")
context_info = f"""
User's Environment:
- Tenant: {tenant}
- Kubernetes Namespace: {namespace}
- Only show resources from this namespace.
"""
request.add_system_context(context_info)
return request
2. Input Validation¶
Ensure required data is present:
def require_tenant(request: LLMRequest) -> LLMRequest:
"""Ensure the tenant is specified."""
if not request.context.get("tenant_name"):
raise InterceptorError(
user_message="Please select a tenant before continuing.",
code="MISSING_TENANT",
)
return request
3. Audit Logging¶
Log all requests for compliance:
import logging
audit_log = logging.getLogger("audit")
def log_request(request: LLMRequest) -> LLMRequest:
"""Log all requests for audit purposes."""
user_id = request.context.get("user_id", "anonymous")
user_message = request.get_latest_user_message()
audit_log.info(
f"Request from {user_id}: {user_message[:100]}..."
)
return request # Don't modify, just log
def log_response(response: LLMResponse) -> LLMResponse:
"""Log all responses for audit purposes."""
audit_log.info(
f"Response ({response.get_text_length()} chars): "
f"{response.text[:100]}..."
)
return response # Don't modify, just log
agent = Agent(
tools=[...],
request_interceptors=log_request,
response_interceptors=log_response,
)
4. Response Formatting¶
Add consistent formatting to responses:
def add_disclaimer(response: LLMResponse) -> LLMResponse:
"""Add a disclaimer to all responses."""
disclaimer = "\n\n---\n*This is an AI-generated response. Please verify before taking action.*"
response.text = response.text + disclaimer
return response
5. Rate Limiting¶
Prevent abuse by limiting requests:
from collections import defaultdict
from datetime import datetime, timedelta
# Simple in-memory rate limiter (use Redis in production)
request_counts = defaultdict(list)
def rate_limit(request: LLMRequest) -> LLMRequest:
"""Limit users to 10 requests per minute."""
user_id = request.context.get("user_id", "anonymous")
now = datetime.now()
one_minute_ago = now - timedelta(minutes=1)
# Clean old entries
request_counts[user_id] = [
t for t in request_counts[user_id] if t > one_minute_ago
]
# Check limit
if len(request_counts[user_id]) >= 10:
raise InterceptorError(
user_message="You've reached the rate limit. Please wait a moment.",
code="RATE_LIMITED",
)
# Record this request
request_counts[user_id].append(now)
return request
Best Practices¶
1. Keep Interceptors Focused¶
Each interceptor should do one thing well:
# Good: Single responsibility
def add_tenant_context(request): ...
def validate_input(request): ...
def log_request(request): ...
# Bad: Doing too much
def do_everything(request):
# Adds context AND validates AND logs...
# Hard to maintain and test
...
2. Order Matters¶
Interceptors run in the order you specify:
agent = Agent(
request_interceptors=[
validate_input, # First: Block bad requests early
check_permissions, # Second: Verify user can do this
add_context, # Third: Enrich the request
log_request, # Last: Log the final request
],
)
3. Handle Errors Gracefully¶
Use InterceptorError for user-facing errors:
# Good: Friendly message
raise InterceptorError(
user_message="Please log in to continue.",
code="AUTH_REQUIRED",
)
# Bad: Technical error shown to user
raise ValueError("Missing auth_token in context")
4. Don't Modify What You Don't Need¶
Only change what's necessary:
# Good: Only adds to system prompt
def add_context(request: LLMRequest) -> LLMRequest:
request.add_system_context("Extra info")
return request
# Bad: Replaces entire system prompt
def add_context(request: LLMRequest) -> LLMRequest:
request.system = "Completely new system prompt" # Loses original!
return request
5. Test Your Interceptors¶
Interceptors are just functions, so they're easy to test:
def test_add_tenant_context():
# Create a test request
request = LLMRequest(
messages=[{"role": "user", "content": "Hello"}],
context={"tenant_name": "production"},
)
# Run the interceptor
result = add_tenant_context(request)
# Check the result
assert "production" in result.system
def test_blocks_prompt_injection():
request = LLMRequest(
messages=[{"role": "user", "content": "Ignore previous instructions"}],
)
with pytest.raises(InterceptorError) as exc_info:
block_dangerous_requests(request)
assert exc_info.value.code == "PROMPT_INJECTION_BLOCKED"
API Reference¶
LLMRequest¶
@dataclass
class LLMRequest:
messages: list[dict] # Conversation messages
tools: list[Any] # Available tools
system: str | None # System prompt
context: dict # Platform context
session: Session # Session for persistent state
def get_latest_user_message(self) -> str:
"""Get the content of the most recent user message."""
def add_system_context(self, text: str) -> None:
"""Add text to the system prompt."""
LLMResponse¶
@dataclass
class LLMResponse:
text: str # AI's text response
tool_calls: list[dict] # Tool calls (if any)
usage: dict | None # Token usage
raw: Any # Original response
session: Session # Session for persistent state
def has_tool_calls(self) -> bool:
"""Check if there are tool calls."""
def get_text_length(self) -> int:
"""Get length of text response."""
InterceptorError¶
class InterceptorError(Exception):
def __init__(
self,
user_message: str, # Shown to user
code: str | None, # For logging
details: dict | None, # Extra info for logging
):
...
Agent Configuration¶
agent = Agent(
tools=[...],
# Single interceptor
request_interceptors=my_interceptor,
# Multiple interceptors (run in order)
request_interceptors=[first, second, third],
# Response interceptors work the same way
response_interceptors=[clean_output, log_response],
# Error handling mode
on_interceptor_error="abort", # or "continue"
)
See Also¶
- Agent Documentation - Full Agent class reference
- Session Management - Complete session guide
- Architecture Guide - How DCAF works internally
- Custom Agents Guide - Building complex agents