Working with AWS Bedrock Guide¶
This guide covers how to effectively use AWS Bedrock with DCAF, including configuration, model selection, tool integration, and best practices.
Table of Contents¶
- Introduction
- Setup and Configuration
- Model Selection
- Using the BedrockLLM Client
- Tool Integration
- Streaming
- Performance Optimization
- Error Handling
- Best Practices
Introduction¶
DCAF uses AWS Bedrock's Converse API to interact with foundation models. The Converse API provides:
- Unified interface across all Bedrock models
- Tool calling (function calling) support
- Streaming responses
- Multi-turn conversations
- Consistent message format
Key Concepts¶
- Model ID: Identifies the model to use (e.g.,
us.anthropic.claude-3-5-sonnet-20240620-v1:0) - Converse API: AWS Bedrock's unified API for all models
- Tool Config: How tools are defined for LLM consumption
- Inference Config: Parameters like temperature, max tokens
Setup and Configuration¶
Prerequisites¶
- AWS Account with Bedrock access
- IAM permissions for Bedrock
- Model access enabled in AWS console
Required IAM Permissions¶
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.*",
"arn:aws:bedrock:*::foundation-model/amazon.*"
]
}
]
}
Environment Setup¶
# .env file
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_SESSION_TOKEN=your_session_token # If using temporary credentials
AWS_REGION=us-east-1
# Optional: Boto3 configuration
BOTO3_READ_TIMEOUT=20
BOTO3_CONNECT_TIMEOUT=10
BOTO3_MAX_ATTEMPTS=3
BOTO3_RETRY_MODE=standard
Creating the LLM Client¶
from dcaf.llm import BedrockLLM
import dotenv
# Load environment
dotenv.load_dotenv(override=True)
# Option 1: Defaults (recommended for most cases)
llm = BedrockLLM(region_name="us-east-1")
# Option 2: Custom boto3 config
from botocore.config import Config
custom_config = Config(
read_timeout=60,
connect_timeout=15,
retries={
'max_attempts': 5,
'mode': 'adaptive'
}
)
llm = BedrockLLM(region_name="us-east-1", boto3_config=custom_config)
Model Selection¶
Available Models¶
| Model | ID | Best For |
|---|---|---|
| Claude 3.5 Sonnet | us.anthropic.claude-3-5-sonnet-20240620-v1:0 |
General purpose, balanced |
| Claude 3 Sonnet | us.anthropic.claude-3-sonnet-20240229-v1:0 |
Cost-effective general |
| Claude 3.5 Haiku | us.anthropic.claude-3-5-haiku-20241022-v1:0 |
Fast, simple tasks |
| Claude 4 Opus | us.anthropic.claude-opus-4-20250514-v1:0 |
Complex reasoning |
Cross-Region Inference¶
Use the us. prefix for cross-region inference profiles:
# Cross-region (recommended for availability)
model_id = "us.anthropic.claude-3-5-sonnet-20240620-v1:0"
# Single region
model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"
Choosing the Right Model¶
# For fast, simple operations (routing, classification)
fast_model = "us.anthropic.claude-3-5-haiku-20241022-v1:0"
# For general purpose (most agents)
general_model = "us.anthropic.claude-3-5-sonnet-20240620-v1:0"
# For complex reasoning (advanced analysis)
complex_model = "us.anthropic.claude-opus-4-20250514-v1:0"
Using the BedrockLLM Client¶
Basic Invocation¶
from dcaf.llm import BedrockLLM
llm = BedrockLLM()
response = llm.invoke(
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
max_tokens=100
)
# Extract text
text = response['output']['message']['content'][0]['text']
print(text) # "The capital of France is Paris."
With System Prompt¶
response = llm.invoke(
messages=[
{"role": "user", "content": "Explain quantum computing"}
],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
system_prompt="""You are a physics teacher for high school students.
Use simple language and helpful analogies.
Keep explanations under 3 paragraphs.""",
max_tokens=500
)
Multi-Turn Conversation¶
conversation = [
{"role": "user", "content": "My name is Alice"},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
{"role": "user", "content": "What did I just tell you?"}
]
response = llm.invoke(
messages=conversation,
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
max_tokens=100
)
# "You told me your name is Alice."
Inference Parameters¶
response = llm.invoke(
messages=[{"role": "user", "content": "Write a creative story"}],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
max_tokens=1000, # Maximum output tokens
temperature=0.8, # Higher = more creative (0-1)
top_p=0.9 # Nucleus sampling parameter
)
| Parameter | Range | Description |
|---|---|---|
max_tokens |
1-4096+ | Maximum tokens to generate |
temperature |
0-1 | Randomness (0=deterministic) |
top_p |
0-1 | Nucleus sampling threshold |
Tool Integration¶
Defining Tools¶
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
]
Invoking with Tools¶
response = llm.invoke(
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
tools=tools
)
# Check for tool use
content = response['output']['message']['content']
for block in content:
if 'toolUse' in block:
tool_use = block['toolUse']
print(f"Tool: {tool_use['name']}")
print(f"Input: {tool_use['input']}")
Tool Choice Strategies¶
# Auto (default) - Model decides
response = llm.invoke(
messages=...,
tools=tools,
tool_choice="auto"
)
# Any - Must use a tool
response = llm.invoke(
messages=...,
tools=tools,
tool_choice="any"
)
# Specific - Must use this tool
response = llm.invoke(
messages=...,
tools=tools,
tool_choice={"type": "tool", "name": "get_weather"}
)
Complete Tool Loop¶
def process_with_tools(user_message: str, tools: list, tool_functions: dict):
"""Complete tool execution loop."""
messages = [{"role": "user", "content": user_message}]
while True:
# Call LLM
response = llm.invoke(
messages=messages,
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
tools=tools
)
content = response['output']['message']['content']
stop_reason = response.get('stopReason', '')
# Check for tool use
tool_uses = [b for b in content if 'toolUse' in b]
if not tool_uses:
# No tools, return text response
text = next((b['text'] for b in content if 'text' in b), "")
return text
# Execute tools
tool_results = []
for block in tool_uses:
tool_use = block['toolUse']
tool_name = tool_use['name']
tool_input = tool_use['input']
tool_id = tool_use['toolUseId']
# Execute the tool
if tool_name in tool_functions:
result = tool_functions[tool_name](**tool_input)
else:
result = f"Unknown tool: {tool_name}"
tool_results.append({
"toolResult": {
"toolUseId": tool_id,
"content": [{"text": str(result)}]
}
})
# Add assistant message with tool use
messages.append({
"role": "assistant",
"content": content
})
# Add tool results
messages.append({
"role": "user",
"content": tool_results
})
# Usage
def get_weather(location: str, unit: str = "celsius"):
return f"Weather in {location}: 72°F, sunny"
result = process_with_tools(
"What's the weather in NYC?",
tools=tools,
tool_functions={"get_weather": get_weather}
)
Streaming¶
Basic Streaming¶
import sys
for event in llm.invoke_stream(
messages=[{"role": "user", "content": "Tell me a story"}],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
max_tokens=500
):
if "contentBlockDelta" in event:
delta = event["contentBlockDelta"].get("delta", {})
if "text" in delta:
sys.stdout.write(delta["text"])
sys.stdout.flush()
Event Types¶
for event in llm.invoke_stream(...):
if "messageStart" in event:
print("Stream started")
elif "contentBlockStart" in event:
# New content block (text or tool use)
start = event["contentBlockStart"]
if "toolUse" in start.get("start", {}):
print(f"Tool: {start['start']['toolUse']['name']}")
elif "contentBlockDelta" in event:
delta = event["contentBlockDelta"]["delta"]
if "text" in delta:
print(delta["text"], end="")
elif "toolUse" in delta:
# Tool input streaming
pass
elif "contentBlockStop" in event:
print() # End of block
elif "messageStop" in event:
print(f"\nStop reason: {event['messageStop']['stopReason']}")
Streaming with Tools¶
accumulated_text = ""
current_tool = None
for event in llm.invoke_stream(
messages=[{"role": "user", "content": "What's the weather?"}],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
tools=tools
):
if "contentBlockStart" in event:
start = event["contentBlockStart"].get("start", {})
if "toolUse" in start:
current_tool = {
"name": start["toolUse"]["name"],
"id": start["toolUse"]["toolUseId"],
"input": ""
}
elif "contentBlockDelta" in event:
delta = event["contentBlockDelta"]["delta"]
if "text" in delta:
accumulated_text += delta["text"]
print(delta["text"], end="", flush=True)
elif "toolUse" in delta and current_tool:
current_tool["input"] += delta["toolUse"].get("input", "")
elif "contentBlockStop" in event:
if current_tool:
print(f"\nTool call: {current_tool['name']}")
current_tool = None
Performance Optimization¶
Timeout Configuration¶
from botocore.config import Config
# For fast, short responses
fast_config = Config(
read_timeout=10,
connect_timeout=5
)
# For long, complex responses
slow_config = Config(
read_timeout=120,
connect_timeout=30
)
llm_fast = BedrockLLM(boto3_config=fast_config)
llm_slow = BedrockLLM(boto3_config=slow_config)
Retry Configuration¶
# Aggressive retry for production
production_config = Config(
retries={
'max_attempts': 10,
'mode': 'adaptive' # Smart exponential backoff
}
)
# Light retry for development
dev_config = Config(
retries={
'max_attempts': 2,
'mode': 'standard'
}
)
Latency Optimization¶
# Use latency-optimized mode
response = llm.invoke(
messages=[...],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
performance_config={"latency": "optimized"}
)
Message Optimization¶
# Let BedrockLLM normalize messages automatically
messages = [
{"role": "user", "content": "Hello"},
{"role": "user", "content": "How are you?"}, # Will be merged
]
# Messages are automatically normalized:
# [{"role": "user", "content": "Hello\nHow are you?"}]
Error Handling¶
Common Errors¶
from botocore.exceptions import ClientError
try:
response = llm.invoke(messages=..., model_id=...)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ExpiredTokenException':
# Refresh credentials
print("Credentials expired - refresh with dcaf env-update-aws-creds")
elif error_code == 'ResourceNotFoundException':
# Model not found
print("Model not found - check model ID and region")
elif error_code == 'ThrottlingException':
# Rate limited
import time
time.sleep(5) # Wait and retry
elif error_code == 'ValidationException':
# Invalid request
print("Invalid request - check message format")
elif error_code == 'ServiceUnavailableException':
# Service issue
print("Bedrock temporarily unavailable")
else:
print(f"Unexpected error: {e}")
Retry Wrapper¶
import time
from botocore.exceptions import ClientError
def invoke_with_retry(llm, max_retries=3, **kwargs):
"""Invoke LLM with exponential backoff retry."""
for attempt in range(max_retries):
try:
return llm.invoke(**kwargs)
except ClientError as e:
error_code = e.response['Error']['Code']
# Don't retry validation errors
if error_code == 'ValidationException':
raise
# Don't retry expired tokens
if error_code == 'ExpiredTokenException':
raise
# Retry throttling and service errors
if error_code in ['ThrottlingException', 'ServiceUnavailableException']:
if attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Retrying in {wait}s...")
time.sleep(wait)
else:
raise
else:
raise
raise Exception("Max retries exceeded")
Best Practices¶
1. Use Appropriate Model for Task¶
# Routing/classification - use fast model
routing_response = llm.invoke(
messages=[...],
model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", # Fast
max_tokens=10
)
# Main task - use capable model
main_response = llm.invoke(
messages=[...],
model_id="us.anthropic.claude-3-5-sonnet-20240620-v1:0", # Capable
max_tokens=1000
)
2. Set Appropriate Limits¶
# Short responses
response = llm.invoke(
messages=[{"role": "user", "content": "Yes or no: Is 2+2=4?"}],
max_tokens=10,
temperature=0 # Deterministic
)
# Creative tasks
response = llm.invoke(
messages=[{"role": "user", "content": "Write a poem"}],
max_tokens=500,
temperature=0.8 # Creative
)
3. Use System Prompts Effectively¶
system_prompt = """
Role: You are a Kubernetes expert assistant.
Guidelines:
- Always specify namespaces in commands
- Prefer kubectl over direct API calls
- Explain what each command does
- Warn about destructive operations
Format:
- Use code blocks for commands
- Keep explanations concise
"""
response = llm.invoke(
messages=[...],
system_prompt=system_prompt
)
4. Handle Empty Responses¶
response = llm.invoke(messages=..., model_id=...)
content = response.get('output', {}).get('message', {}).get('content', [])
if not content:
print("No content in response")
else:
for block in content:
if 'text' in block:
print(block['text'])
5. Monitor Usage¶
response = llm.invoke(messages=..., model_id=...)
# Check token usage
usage = response.get('usage', {})
print(f"Input tokens: {usage.get('inputTokens', 0)}")
print(f"Output tokens: {usage.get('outputTokens', 0)}")