LLM Layer¶

The LLM layer provides a unified, provider-agnostic interface for making direct LLM calls. It is the single source of truth for model creation and configuration in DCAF.

Overview¶

The LLM layer sits between callers and the underlying model providers:

┌──────────────────┐     ┌──────────────────┐
│  Agent / Agno    │     │  Direct callers   │
│  (orchestration) │     │  (e.g. routing)   │
└────────┬─────────┘     └────────┬──────────┘
         │                        │
         │  get_model()           │  invoke()
         │                        │
         ▼                        ▼
    ┌─────────────────────────────────────┐
    │          LLM Layer                  │
    │  (dcaf.core.llm)                    │
    │                                     │
    │  - Provider detection               │
    │  - Credential resolution            │
    │  - Model instantiation              │
    │  - Direct LLM calls                 │
    └──────────────────┬──────────────────┘
                       │
                       ▼
    ┌─────────────────────────────────────┐
    │       AgnoModelFactory              │
    │  Bedrock│Anthropic│OpenAI│Google│...│
    └─────────────────────────────────────┘

Two consumers, one layer:

Agent orchestration — calls llm.get_model() to get an Agno model instance for the agent loop.
Direct callers — call llm.invoke() / llm.ainvoke() for single-shot LLM calls without agent machinery.

Quick Start¶

from dcaf.core import LLM, create_llm

# Create from environment variables (DCAF_PROVIDER, DCAF_MODEL, etc.)
llm = create_llm()

# Or with explicit configuration
llm = create_llm(provider="google", model="gemini-2.0-flash")

# Make a direct call (sync)
response = llm.invoke(
    messages=[{"role": "user", "content": "Hello"}],
    system_prompt="You are helpful.",
)
print(response.text)
print(response.tool_calls)
print(response.usage)

# Async call
response = await llm.ainvoke(
    messages=[{"role": "user", "content": "Hello"}],
)

API Reference¶

`create_llm()`¶

Factory function that creates an LLM instance from environment variables.

def create_llm(
    provider: str | None = None,
    model: str | None = None,
    **overrides,
) -> LLM

Parameters¶

Parameter	Type	Default	Description
`provider`	`str \\| None`	`DCAF_PROVIDER` env var, or `"bedrock"`	Provider name
`model`	`str \\| None`	`DCAF_MODEL` env var, or provider default	Model identifier
`**overrides`			Additional kwargs: `temperature`, `max_tokens`, `aws_profile`, `aws_region`, `api_key`, `google_project_id`, `google_location`

Environment Variables¶

Variable	Description
`DCAF_PROVIDER`	Provider name (`bedrock`, `anthropic`, `openai`, `azure`, `google`, `ollama`)
`DCAF_MODEL`	Model identifier
`DCAF_TEMPERATURE`	Sampling temperature (0.0–1.0)
`DCAF_MAX_TOKENS`	Maximum response tokens
`AWS_PROFILE`	AWS profile (for Bedrock)
`AWS_REGION`	AWS region (for Bedrock)
`ANTHROPIC_API_KEY`	API key (for Anthropic)
`OPENAI_API_KEY`	API key (for OpenAI)
`GOOGLE_PROJECT_ID`	Project ID (for Google)

Examples¶

# Everything from environment
llm = create_llm()

# Override provider and model
llm = create_llm(provider="google", model="gemini-2.0-flash")

# Override temperature
llm = create_llm(temperature=0.0, max_tokens=200)

# Bedrock with specific AWS config
llm = create_llm(provider="bedrock", aws_region="us-west-2", aws_profile="prod")

`LLM`¶

The main class for direct LLM interaction.

class LLM:
    def __init__(
        self,
        provider: str,
        model: str,
        temperature: float = 0.1,
        max_tokens: int = 4096,
        **provider_kwargs,
    )

Properties¶

Property	Type	Description
`model_id`	`str`	The model identifier
`provider`	`str`	The provider name (lowercased)

Methods¶

`invoke()` — Synchronous direct call¶

def invoke(
    self,
    messages: list[dict[str, Any]],
    system_prompt: str | None = None,
    tools: list[dict[str, Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_tokens: int | None = None,
    temperature: float | None = None,
) -> LLMResponse

Single-shot LLM call. No tool execution loop, no agent orchestration.

response = llm.invoke(
    messages=[{"role": "user", "content": "What is 2+2?"}],
    system_prompt="Answer concisely.",
    max_tokens=100,
    temperature=0.0,
)
print(response.text)  # "4"

`ainvoke()` — Async direct call¶

async def ainvoke(
    self,
    messages: list[dict[str, Any]],
    system_prompt: str | None = None,
    tools: list[dict[str, Any]] | None = None,
    tool_choice: str | dict[str, Any] | None = None,
    max_tokens: int | None = None,
    temperature: float | None = None,
) -> LLMResponse

Same as invoke() but async.

response = await llm.ainvoke(
    messages=[{"role": "user", "content": "Hello"}],
)

`get_model()` — Get underlying Agno model¶

async def get_model(self) -> Any

Returns the Agno model instance (e.g., AwsBedrock, Gemini). Used by the agent orchestration layer to feed the model into AgnoAgent.

model = await llm.get_model()
# Pass to AgnoAgent for full orchestration

`cleanup()`¶

async def cleanup(self) -> None

Release resources held by the underlying model factory.

`LLMResponse`¶

Response from a direct LLM call. This is the base class for AgentResponse.

@dataclass
class LLMResponse:
    text: str | None = None
    tool_calls: list[dict[str, Any]] = field(default_factory=list)
    usage: dict[str, int] = field(default_factory=dict)
    raw: ModelResponse | None = field(default=None, repr=False)

Field	Type	Description
`text`	`str \\| None`	The model's text response, or `None` if only tool calls were returned
`tool_calls`	`list[dict]`	Normalized tool calls. Each entry has `name` and `input` keys
`usage`	`dict[str, int]`	Token usage: `input_tokens`, `output_tokens`, `total_tokens`
`raw`	`ModelResponse \\| None`	The underlying Agno `ModelResponse` for advanced use

Relationship to AgentResponse¶

AgentResponse extends LLMResponse:

@dataclass
class AgentResponse(LLMResponse):
    # Inherits: text, tool_calls, usage, raw
    needs_approval: bool = False
    pending_tools: list[PendingToolCall] = field(default_factory=list)
    conversation_id: str = ""
    is_complete: bool = True
    session: dict[str, Any] = field(default_factory=dict)

This means every AgentResponse is also an LLMResponse, and code that works with LLMResponse will also work with AgentResponse.

Tool Calling¶

The LLM layer supports tool calling via the standard tool schema format:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state",
                }
            },
            "required": ["location"],
        },
    }
]

response = llm.invoke(
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    tools=tools,
    tool_choice={"name": "get_weather"},  # Force this tool
)

if response.tool_calls:
    tool_call = response.tool_calls[0]
    print(tool_call["name"])   # "get_weather"
    print(tool_call["input"])  # {"location": "NYC"}

Supported Providers¶

Provider	`DCAF_PROVIDER` value	Default model
AWS Bedrock	`bedrock`	`us.anthropic.claude-3-5-haiku-20241022-v1:0`
Anthropic	`anthropic`	`claude-sonnet-4-20250514`
OpenAI	`openai`	`gpt-4o`
Azure OpenAI	`azure`	`gpt-4o`
Google Vertex AI	`google`	`gemini-2.0-flash`
Ollama	`ollama`	`llama3`

Usage in Channel Routing¶

The SlackResponseRouter uses the LLM layer for direct calls without any agent overhead:

from dcaf.core.llm import create_llm

llm = create_llm()  # Reads DCAF_PROVIDER, DCAF_MODEL from env

response = llm.invoke(
    messages=[{"role": "user", "content": thread_text}],
    system_prompt=routing_prompt,
    tools=[routing_tool_schema],
    tool_choice={"name": "slack_routing_decision"},
    max_tokens=200,
    temperature=0.0,
)

should_respond = response.tool_calls[0]["input"]["should_respond"]