Working with Google Vertex AI¶
DCAF supports Google Vertex AI as a unified provider for both Google Gemini and Anthropic Claude models, providing zero-configuration deployment on Google Cloud Platform.
Overview¶
The google provider offers access to multiple model families through Vertex AI:
Google Gemini Models:
- Gemini 3: Latest generation with advanced reasoning
- Gemini 2.x: High-performance models with thinking budgets
- Gemini 1.5: Large context windows and efficient inference
Anthropic Claude Models (via Vertex AI):
- Claude Opus 4: Most capable Claude model for complex tasks
- Claude Sonnet 4: Balanced performance and cost
- Claude Haiku 3.5: Fast, cost-effective for simple tasks
DCAF automatically detects which model family you're using based on the model ID and routes to the correct backend — no extra configuration needed.
Installation¶
Install the required dependencies for the models you plan to use:
# For Gemini models
pip install google-generativeai google-auth
# For Claude models on Vertex AI
pip install 'anthropic[vertex]'
# Or install DCAF with full Google/Vertex AI support
pip install dcaf[gemini]
Configuration¶
Zero Configuration on GCP (Recommended)¶
When running on GCP (GKE, GCE, Cloud Run), DCAF automatically detects your project and location:
from dcaf.core import Agent
# Gemini — project/location auto-detected on GCP
agent = Agent(
provider="google",
model="gemini-2.5-pro",
system_prompt="You are a helpful assistant."
)
# Claude on Vertex AI — same provider, just change the model ID
agent = Agent(
provider="google",
model="claude-sonnet-4@20250514",
system_prompt="You are a helpful assistant."
)
DCAF detects the model family from the model ID (e.g., IDs starting with claude are routed to the Anthropic Vertex AI backend) and uses the appropriate Agno model class automatically.
Environment Variables (Optional)¶
Override auto-detected values if needed:
export GOOGLE_CLOUD_PROJECT="your-project-id"
export DCAF_GOOGLE_MODEL_LOCATION="us-central1" # Optional, defaults to us-central1
Quick Start¶
Basic Gemini Agent¶
from dcaf.core import Agent
agent = Agent(
provider="google",
model="gemini-2.5-pro",
system_prompt="You are a helpful assistant."
)
response = agent.run([
{"role": "user", "content": "What's the capital of France?"}
])
print(response.text)
Basic Claude on Vertex AI Agent¶
from dcaf.core import Agent
agent = Agent(
provider="google",
model="claude-sonnet-4@20250514",
system_prompt="You are a helpful assistant."
)
response = agent.run([
{"role": "user", "content": "What's the capital of France?"}
])
print(response.text)
Available Gemini Models¶
Gemini 3 (Latest)¶
gemini-3-pro-preview - Most capable model with advanced reasoning
gemini-3-flash - Fast inference with strong reasoning
Gemini 2.x¶
gemini-2.5-flash - Fast model with thinking support
gemini-2.5-pro - More capable, supports thinking budget
gemini-2.0-flash - Previous generation flash
Gemini 1.5¶
gemini-1.5-flash - Lightweight, fast responses
gemini-1.5-pro - Large context window (2M tokens)
Available Claude Models (via Vertex AI)¶
Anthropic Claude models are available through Vertex AI using the same provider="google" configuration. DCAF automatically detects Claude model IDs and routes them through the Vertex AI Anthropic backend.
Model ID Format
Vertex AI Claude model IDs use the format model-name@version (e.g., claude-sonnet-4@20250514). This differs from the direct Anthropic API format.
Claude 4¶
claude-opus-4@20250805 - Most capable model for complex, multi-step tasks
claude-sonnet-4@20250514 - Balanced performance and cost
Claude 3.5¶
claude-3-5-haiku@20241022 - Fast and cost-effective
Check Vertex AI Model Garden
Available Claude model versions may change. Check the Vertex AI Model Garden or Google Cloud documentation for the latest available models and versions.
Model Configuration¶
Temperature and Max Tokens¶
Control generation behavior:
agent = Agent(
provider="google",
model="gemini-3-flash",
model_config={
"temperature": 0.7, # 0.0 to 1.0 (default: 0.1)
"max_tokens": 8192, # Maximum output tokens
}
)
Advanced Model Configuration¶
Pass additional Gemini-specific parameters:
agent = Agent(
provider="google",
model="gemini-3-pro-preview",
model_config={
"thinking_level": "high", # "low" or "high" (Gemini 3 only)
"top_p": 0.9,
"top_k": 40,
}
)
Note: Gemini 3 models use thinking_level, while Gemini 2.5 models use thinking_budget. See Agno's documentation for model-specific parameters.
Using Tools with Gemini¶
Gemini excels at tool use and function calling:
from dcaf.core import Agent
from dcaf.tools import tool
import os
@tool(description="Search for current information")
def search(query: str) -> str:
"""Search the web for information."""
# Your search implementation
return f"Results for: {query}"
@tool(description="Get weather information")
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: Sunny, 72°F"
agent = Agent(
provider="google",
model="gemini-2.5-flash",
tools=[search, get_weather],
system_prompt="You are a helpful assistant with access to search and weather tools."
)
response = agent.run([
{"role": "user", "content": "What's the weather in Paris and any recent news?"}
])
print(response.text)
Streaming Responses¶
Use streaming for real-time token generation:
from dcaf.core import Agent
agent = Agent(
provider="google",
model="gemini-3-flash",
)
for event in agent.stream([
{"role": "user", "content": "Write a short story about AI."}
]):
if event.type == "text_delta":
print(event.data.text, end="", flush=True)
elif event.type == "complete":
print("\n\nDone!")
REST Server with Gemini¶
Expose a Gemini agent as a REST API:
from dcaf.core import Agent, serve
from dcaf.tools import tool
import os
@tool(description="Analyze code for issues")
def analyze_code(code: str, language: str) -> str:
"""Analyze code and return suggestions."""
return f"Analyzing {language} code..."
agent = Agent(
name="code-reviewer",
description="AI code review assistant",
provider="google",
model="gemini-3-flash",
tools=[analyze_code]
)
# Start server with A2A support
serve(agent, port=8000, a2a=True)
Access via:
- HTTP: POST http://localhost:8000/api/chat
- A2A: GET http://localhost:8000/.well-known/agent.json
Multi-Agent Systems with Gemini¶
Use Gemini in multi-agent architectures:
from dcaf.core import Agent
from dcaf.core.a2a import RemoteAgent
import os
# Specialist agent using Gemini
research_agent = Agent(
name="researcher",
provider="google",
model="gemini-2.5-flash",
tools=[web_search],
system_prompt="You are a research specialist. Gather information from the web."
)
# Orchestrator using Claude on Bedrock
orchestrator = Agent(
name="orchestrator",
provider="bedrock",
model="anthropic.claude-3-sonnet-20240229-v1:0",
aws_profile="my-profile",
tools=[research_agent.as_tool()], # Gemini agent as a tool
system_prompt="Route research tasks to the specialist."
)
response = orchestrator.run([
{"role": "user", "content": "Research the latest AI developments"}
])
Vertex AI (Default)¶
The Google provider always uses Vertex AI for both Gemini and Claude models. Project and location are auto-detected on GCP:
from dcaf.core import Agent
# Gemini on Vertex AI - auto-detected!
agent = Agent(
provider="google",
model="gemini-2.5-pro",
)
# Claude on Vertex AI - same provider, same auto-detection!
agent = Agent(
provider="google",
model="claude-sonnet-4@20250514",
)
How Auto-Detection Works¶
DCAF automatically detects your GCP environment:
- google.auth.default(): Gets project ID from ADC (works with Workload Identity)
- Metadata service: Falls back to
http://metadata.google.internal/for project/zone - Default location: Uses
us-central1if location can't be detected
Explicit Configuration¶
Override auto-detected values if needed:
agent = Agent(
provider="google",
model="gemini-2.5-pro",
google_project_id="my-project", # Explicit project
google_location="europe-west1", # Explicit region
)
Or via environment variables:
Requirements¶
- GCP project with Vertex AI API enabled
- Application Default Credentials configured:
- Local dev:
gcloud auth application-default login - GKE: Workload Identity with appropriate IAM bindings
- GCE/Cloud Run: Attached service account
- IAM role:
roles/aiplatform.useron the service account - For Claude models: Claude must be enabled in your project's Vertex AI Model Garden
Model Selection Guide¶
Gemini Models¶
| Model | Best For | Context | Speed | Cost |
|---|---|---|---|---|
| gemini-3-pro-preview | Complex reasoning, multi-step tasks | Large | Slow | High |
| gemini-3-flash | General-purpose, balanced performance | Large | Fast | Low |
| gemini-2.5-pro | Advanced capabilities, thinking | Large | Medium | Medium |
| gemini-2.5-flash | Fast inference, good reasoning | Large | Very Fast | Low |
| gemini-1.5-pro | Huge context (2M tokens) | Massive | Medium | Medium |
| gemini-1.5-flash | Quick tasks, simple queries | Large | Very Fast | Very Low |
Claude Models (via Vertex AI)¶
| Model | Best For | Context | Speed | Cost |
|---|---|---|---|---|
| claude-opus-4@20250805 | Complex reasoning, agentic tasks | 200K | Slow | High |
| claude-sonnet-4@20250514 | Balanced performance, tool use | 200K | Fast | Medium |
| claude-3-5-haiku@20241022 | Quick tasks, high throughput | 200K | Very Fast | Low |
Error Handling¶
Handle provider-specific errors:
from dcaf.core import Agent
agent = Agent(
provider="google",
model="gemini-3-flash", # or "claude-sonnet-4@20250514"
)
try:
response = agent.run([
{"role": "user", "content": "Hello!"}
])
print(response.text)
except ImportError as e:
print("Required package not installed:")
print(" Gemini: pip install google-generativeai google-auth")
print(" Claude: pip install 'anthropic[vertex]'")
except ValueError as e:
print(f"Configuration error: {e}")
print("Ensure GOOGLE_CLOUD_PROJECT and DCAF_GOOGLE_MODEL_LOCATION are set")
except Exception as e:
print(f"Error: {e}")
Best Practices¶
1. Choose the Right Model¶
# For production - use flash models for speed and cost
production_agent = Agent(
provider="google",
model="gemini-3-flash", # Fast, cost-effective
)
# For complex reasoning - use pro models
research_agent = Agent(
provider="google",
model="gemini-3-pro-preview", # Advanced reasoning
)
2. Monitor Token Usage¶
Gemini models have different context windows and pricing:
agent = Agent(
provider="google",
model="gemini-3-flash",
model_config={
"max_tokens": 2048, # Limit output to control costs
}
)
3. Test with Flash, Deploy with Pro¶
import os
# Development/testing
if os.getenv("ENV") == "development":
model = "gemini-3-flash"
else:
model = "gemini-3-pro-preview"
agent = Agent(
provider="google",
model=model,
)
Comparison: Gemini vs Claude vs GPT¶
| Feature | Gemini (Vertex AI) | Claude (Vertex AI) | Claude (Bedrock) | GPT-4 |
|---|---|---|---|---|
| Tool Use | Excellent | Excellent | Excellent | Good |
| Reasoning | Strong (G3) | Excellent | Excellent | Strong |
| Speed | Very Fast (Flash) | Fast (Haiku) | Fast | Medium |
| Context | 2M (1.5 Pro) | 200K | 200K | 128K |
| Cost | Low (Flash) | Low (Haiku) | Medium | High |
| Provider | google |
google |
bedrock |
openai / azure |
Troubleshooting¶
Project or Location Not Found¶
# Check if environment variables are set
echo $GOOGLE_CLOUD_PROJECT
echo $GOOGLE_CLOUD_LOCATION
# Set them if not on GCP (for local development)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"
# Or use gcloud to set up ADC
gcloud auth application-default login
Import Error¶
# For Gemini models
pip install google-generativeai google-auth
# For Claude models on Vertex AI
pip install 'anthropic[vertex]'
# Or upgrade if already installed
pip install --upgrade google-generativeai anthropic
Rate Limiting¶
Gemini API has rate limits. Handle gracefully:
import time
from dcaf.core import Agent
agent = Agent(provider="google", model="gemini-3-flash")
for i in range(10):
try:
response = agent.run([{"role": "user", "content": f"Request {i}"}])
print(response.text)
except Exception as e:
if "429" in str(e) or "rate limit" in str(e).lower():
print("Rate limited, waiting...")
time.sleep(60)
else:
raise
Examples¶
Complete examples available in the repository:
examples/gemini_basic.py- Basic Gemini usageexamples/gemini_tools.py- Tool use with Geminiexamples/gemini_multi_agent.py- Multi-agent with Gemini
DuploCloud / JIT Credential Injection¶
When DCAF is deployed inside DuploCloud (e.g. on GKE), Vertex AI credentials can be injected at request time via platform_context instead of relying on static ADC. This is the preferred approach for short-lived (JIT) credentials.
How it works¶
- The caller includes a GCP scope with a
service-account-access-tokeninplatform_context.scopes. - DCAF extracts the token and passes it directly to the Vertex AI client — bypassing ADC entirely.
- Gemini models use
google.oauth2.credentials.Credentials(token=...)as thecredentialskwarg. - Vertex Claude models use
AnthropicVertex(access_token=...). - The model instance is not cached for token-authenticated requests (tokens are short-lived and change per request).
When no token is present, both models fall back to ADC as normal — existing GKE Workload Identity or GOOGLE_APPLICATION_CREDENTIALS deployments are unaffected.
For full wire-format details and how the CredentialManager handles GCP scopes for subprocess tools, see the Credential Injection guide.
Resources¶
- Vertex AI Documentation
- Gemini API Documentation
- Claude on Vertex AI
- Agno Gemini Guide
- Agno Vertex AI Claude Guide
- DCAF Documentation
Support¶
For issues with the Google provider in DCAF:
- Check this guide
- Review Agno's Gemini docs or Agno's Vertex AI Claude docs
- Verify ADC is configured:
gcloud auth application-default login - Check Vertex AI quotas in GCP Console
- For Claude models, verify access is enabled in Model Garden
- Open an issue on GitHub with logs
How Model Detection Works¶
When you set provider="google", DCAF inspects the model ID to determine which Vertex AI backend to use:
| Model ID Pattern | Backend | Agno Class |
|---|---|---|
Starts with claude |
Anthropic on Vertex AI | agno.models.vertexai.claude.Claude |
| Everything else | Google Gemini on Vertex AI | agno.models.google.Gemini |
This means you can switch between Gemini and Claude models by changing only the model ID — no provider or configuration changes required.
Next Steps: