Building an AI Agent From Scratch With the AI SDK
The AI SDK has changed a lot since the early days of streamText and useChat. Version 6, released in December 2025, introduced first-class agents, tool execution approval, MCP support, DevTools, reranking, and a bunch of smaller things that make it feel like a different library. It now has over 20 million monthly downloads and companies like Thomson Reuters and Clay are running production agents on it.
This post starts with the basics and builds up to a complete agent with tools, structured output, and a streaming Next.js UI. If you already know the fundamentals, skip ahead.
The idea behind the AI SDK
Every LLM provider has its own API, its own response format, its own way of handling streams and tool calls. If you want to switch from OpenAI to Anthropic, you rewrite a lot of code. The AI SDK gives you a single interface that works across all of them. You change one line (the model string) and everything else stays the same.
It has three main parts:
- AI SDK Core handles text generation, structured data, tool calling, embeddings, image generation, transcription, speech, and more
- AI SDK UI gives you React hooks like
useChatanduseCompletionfor building chat interfaces - AI SDK RSC lets you stream React Server Components from the server (experimental)
Setting up
Install the core package and a provider. We will use Anthropic in this guide, but you can swap it for any other provider.
pnpm install ai @ai-sdk/anthropic zod
Add your API key to .env.local:
ANTHROPIC_API_KEY=your_key_here
That is all you need.
Generating text
The two core functions are generateText (waits for the full response) and streamText (streams tokens as they come in). Use generateText for background tasks and streamText for anything a user is looking at.
import { generateText } from 'ai';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4.5',
prompt: 'Explain what a monad is in one paragraph.',
});
The model string follows the provider/model format. The SDK routes it to the right provider automatically. You can also import the provider directly if you prefer:
import { anthropic } from '@ai-sdk/anthropic';
const { text } = await generateText({
model: anthropic('claude-sonnet-4-5-20250514'),
prompt: 'Explain what a monad is in one paragraph.',
});
For streaming:
import { streamText } from 'ai';
const result = streamText({
model: 'anthropic/claude-sonnet-4.5',
prompt: 'Write a short story about a debugging session at 3am.',
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
streamText starts immediately and uses backpressure, meaning it only generates tokens as fast as you consume them.
Structured output
Sometimes you do not want free-form text. You want a JSON object that matches a specific shape. The SDK handles this with generateObject and streamObject. You define a Zod schema and the SDK makes sure the model's output conforms to it.
import { generateText, Output } from 'ai';
import { z } from 'zod';
const result = await generateText({
model: 'anthropic/claude-sonnet-4.5',
output: Output.object({
schema: z.object({
title: z.string(),
summary: z.string(),
tags: z.array(z.string()),
sentiment: z.enum(['positive', 'negative', 'neutral']),
}),
}),
prompt: `Analyze this review: "The battery life is incredible but the camera is disappointing for the price."`,
});
console.log(result.output);
The Output object supports several formats: Output.object() for single objects, Output.array() for lists, Output.choice() for picking from options, Output.json() for unstructured JSON, and Output.text() for plain text (the default).
Since AI SDK 6, you can also use structured output together with tool calling in a single call. The model calls tools first, gathers information, and then returns a structured object at the end.
Tool calling
Tools are how you give a model the ability to do things beyond generating text. You define a tool with a description, an input schema, and an execute function. The model decides when to call it based on the conversation.
import { tool } from 'ai';
import { z } from 'zod';
const weatherTool = tool({
description: 'Get the current weather for a city',
inputSchema: z.object({
city: z.string().describe('The city name'),
}),
execute: async ({ city }) => {
const res = await fetch(
`https://api.weatherapi.com/v1/current.json?key=${process.env.WEATHER_API_KEY}&q=${city}`
);
const data = await res.json();
return {
temperature: data.current.temp_c,
condition: data.current.condition.text,
humidity: data.current.humidity,
};
},
});
You pass tools to generateText or streamText and the model will call them when it thinks it needs to:
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4.5',
tools: { weather: weatherTool },
maxSteps: 5,
prompt: 'What is the weather like in Berlin and Tokyo right now?',
});
The maxSteps parameter is important. Without it, the model makes one tool call and stops. With maxSteps: 5, the model can call tools, read the results, call more tools, and keep going until it has enough information to respond. This is what makes multi-step reasoning possible.
Tool execution approval
In AI SDK 6, you can require human approval before a tool runs. This is critical for anything that has real-world consequences like deleting data, sending emails, or running shell commands.
const deleteFileTool = tool({
description: 'Delete a file from the filesystem',
inputSchema: z.object({
path: z.string().describe('The file path to delete'),
}),
needsApproval: true,
execute: async ({ path }) => {
await fs.unlink(path);
return { deleted: path };
},
});
You can also make needsApproval a function that decides based on the input:
needsApproval: async ({ path }) => path.includes('/production/'),
On the frontend, you check the invocation state and show approve/deny buttons. More on that when we build the UI.
Sending custom output to the model
By default, whatever your tool returns gets stringified as JSON and sent back to the model. That is fine for small payloads, but if your tool returns a 10KB document, you are wasting tokens. The toModelOutput function lets you control what the model actually sees:
const searchTool = tool({
description: 'Search the knowledge base',
inputSchema: z.object({ query: z.string() }),
execute: async ({ query }) => {
const results = await searchKnowledgeBase(query);
return results;
},
toModelOutput: async ({ output }) => ({
type: 'text',
value: output.map(r => `- ${r.title}: ${r.snippet}`).join('\n'),
}),
});
The execute function returns the full data (for your app to use), and toModelOutput returns a condensed version for the model.
Building agents
Before AI SDK 6, you would pass tools and maxSteps to generateText every time. That works, but when you want the same agent in a chat UI, a background job, and an API endpoint, you end up copying config everywhere.
The ToolLoopAgent class solves this. Define your agent once, use it anywhere:
import { ToolLoopAgent } from 'ai';
const researchAgent = new ToolLoopAgent({
model: 'anthropic/claude-sonnet-4.5',
instructions: `You are a research assistant. Use the available tools to find
information and provide well-sourced answers. Always cite your sources.`,
tools: {
search: searchTool,
weather: weatherTool,
fetchPage: fetchPageTool,
},
});
Now you can call it from anywhere:
const result = await researchAgent.generate({
prompt: 'What are the top 3 trending topics in AI this week?',
});
console.log(result.text);
Or stream it:
const result = researchAgent.stream({
prompt: 'What are the top 3 trending topics in AI this week?',
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
The agent loops automatically: it calls the model, executes tool calls, feeds results back, and repeats until the model is done (up to 20 steps by default).
Call options
You can make agents configurable per request using callOptionsSchema. This is useful for passing user context, selecting models dynamically, or injecting retrieved documents for RAG:
import { ToolLoopAgent } from 'ai';
import { z } from 'zod';
const supportAgent = new ToolLoopAgent({
model: 'anthropic/claude-sonnet-4.5',
callOptionsSchema: z.object({
userId: z.string(),
accountType: z.enum(['free', 'pro', 'enterprise']),
}),
prepareCall: ({ options, ...settings }) => ({
...settings,
instructions: `You are a support agent. The user has a ${options.accountType} account. Their ID is ${options.userId}.`,
}),
tools: { lookupOrder: lookupOrderTool },
});
const result = await supportAgent.generate({
prompt: 'Where is my order?',
options: { userId: 'usr_abc123', accountType: 'pro' },
});
Structured output from agents
You can combine agents with structured output. The agent calls tools to gather data, then returns a typed object at the end:
const analysisAgent = new ToolLoopAgent({
model: 'anthropic/claude-sonnet-4.5',
tools: { search: searchTool, fetchPage: fetchPageTool },
output: Output.object({
schema: z.object({
summary: z.string(),
sources: z.array(z.object({
title: z.string(),
url: z.string(),
relevance: z.number().min(0).max(1),
})),
confidence: z.number().min(0).max(1),
}),
}),
});
const { output } = await analysisAgent.generate({
prompt: 'What is the current state of WebAssembly support in browsers?',
});
console.log(output.summary);
console.log(output.sources);
Connecting to a Next.js UI
This is where everything comes together. We will build a chat interface that streams responses from our agent.
The agent definition
// agents/research-agent.ts
import { ToolLoopAgent, InferAgentUIMessage } from 'ai';
import { searchTool } from '@/tools/search';
import { fetchPageTool } from '@/tools/fetch-page';
export const researchAgent = new ToolLoopAgent({
model: 'anthropic/claude-sonnet-4.5',
instructions: 'You are a helpful research assistant. Search the web when needed and provide sourced answers.',
tools: {
search: searchTool,
fetchPage: fetchPageTool,
},
});
export type ResearchAgentMessage = InferAgentUIMessage<typeof researchAgent>;
The API route
// app/api/chat/route.ts
import { createAgentUIStreamResponse } from 'ai';
import { researchAgent } from '@/agents/research-agent';
export async function POST(request: Request) {
const { messages } = await request.json();
return createAgentUIStreamResponse({
agent: researchAgent,
uiMessages: messages,
});
}
The chat page
// app/page.tsx
'use client';
import { useChat } from '@ai-sdk/react';
import type { ResearchAgentMessage } from '@/agents/research-agent';
export default function ChatPage() {
const { messages, input, handleInputChange, handleSubmit, isLoading } =
useChat<ResearchAgentMessage>();
return (
<div>
<div>
{messages.map(message => (
<div key={message.id}>
<strong>{message.role}:</strong>
{message.parts.map((part, i) => {
switch (part.type) {
case 'text':
return <p key={i}>{part.text}</p>;
case 'tool-search':
return (
<div key={i}>
Searching: {part.input.query}
{part.state === 'output-available' && (
<span> ({part.output.results.length} results)</span>
)}
</div>
);
default:
return null;
}
})}
</div>
))}
</div>
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
placeholder="Ask me anything..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>
Send
</button>
</form>
</div>
);
}
The types flow end-to-end. The ResearchAgentMessage type is inferred from the agent definition, so when you switch on part.type, TypeScript knows exactly what properties are available. If your search tool returns { results: Array<{ title: string, url: string }> }, then part.output.results is typed correctly in your component.
MCP support
The Model Context Protocol is a standard for connecting AI models to external tools and data sources. Think of it as a universal plug for AI integrations. Instead of writing custom tool implementations, you connect to an MCP server that already exposes tools.
AI SDK 6 has full MCP support through the @ai-sdk/mcp package:
import { createMCPClient } from '@ai-sdk/mcp';
const mcpClient = await createMCPClient({
transport: {
type: 'http',
url: 'https://your-mcp-server.com/mcp',
headers: { Authorization: 'Bearer your-token' },
},
});
const tools = await mcpClient.tools();
You can pass those tools directly to generateText, streamText, or a ToolLoopAgent. The client also supports OAuth authentication, resources (for reading data from the server), prompts (reusable templates), and elicitation (the server asking the user for input mid-operation).
DevTools
Debugging multi-step agent flows used to mean adding console.log everywhere and trying to piece together what happened. AI SDK DevTools gives you a visual inspector for every LLM call.
Wrap your model with the middleware:
import { wrapLanguageModel, gateway } from 'ai';
import { devToolsMiddleware } from '@ai-sdk/devtools';
const model = wrapLanguageModel({
model: gateway('anthropic/claude-sonnet-4.5'),
middleware: devToolsMiddleware(),
});
Run npx @ai-sdk/devtools and open http://localhost:4983. You will see every step of every call: input, output, tool calls, token usage, timing, and raw provider requests.
Reranking
If you are building RAG (retrieval-augmented generation), you probably retrieve a bunch of documents and dump them all into the prompt. Reranking lets you sort them by relevance first, so the model gets better context:
import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';
const documents = await searchVectorDB(query);
const { ranking } = await rerank({
model: cohere.reranking('rerank-v3.5'),
documents: documents.map(d => d.content),
query: 'How does token caching work?',
topN: 5,
});
Now you pass only the top 5 most relevant documents to the model instead of all 50 you retrieved.
Stream smoothing
LLMs often emit tokens in bursts. You get a chunk of 10 words, then silence, then another burst. This makes the UI feel jittery. The smoothStream transform evens out the delivery:
import { smoothStream, streamText } from 'ai';
const result = streamText({
model: 'anthropic/claude-sonnet-4.5',
prompt: 'Tell me about the history of the internet.',
experimental_transform: smoothStream(),
});
You can also write custom transforms. For example, a transform that converts all text to uppercase, or one that stops the stream if the model generates something inappropriate.
What changed from earlier versions
If you used the AI SDK before version 6, here is a quick summary of what is different:
- Agents are first class.
ToolLoopAgentreplaces the pattern of passing tools andmaxStepsinline every time. - Tool execution approval.
needsApprovalon tools for human-in-the-loop workflows. Outputspecification.generateObjectandgenerateTextwith structured output are now unified throughOutput.object(),Output.array(), etc.- MCP is stable. Full support for connecting to MCP servers with HTTP, SSE, OAuth, resources, prompts, and elicitation.
- DevTools. Visual debugger for LLM calls.
- Reranking. Native
rerankfunction for sorting documents by relevance. - Type-safe UI.
InferAgentUIMessagegives you end-to-end types from agent definition to React component. toModelOutput. Control what the model sees from tool results separately from what your app gets.- Standard JSON Schema. Any schema library that implements the Standard JSON Schema V1 spec works, not just Zod.
