Skip to main content

Building an AI Agent From Scratch With the AI SDK

00:11:31:50

Building an AI Agent From Scratch With the AI SDK

The AI SDK has changed a lot since the early days of streamText and useChat. Version 6, released in December 2025, introduced first-class agents, tool execution approval, MCP support, DevTools, reranking, and a bunch of smaller things that make it feel like a different library. It now has over 20 million monthly downloads and companies like Thomson Reuters and Clay are running production agents on it.

This post starts with the basics and builds up to a complete agent with tools, structured output, and a streaming Next.js UI. If you already know the fundamentals, skip ahead.

The idea behind the AI SDK

Every LLM provider has its own API, its own response format, its own way of handling streams and tool calls. If you want to switch from OpenAI to Anthropic, you rewrite a lot of code. The AI SDK gives you a single interface that works across all of them. You change one line (the model string) and everything else stays the same.

It has three main parts:

  • AI SDK Core handles text generation, structured data, tool calling, embeddings, image generation, transcription, speech, and more
  • AI SDK UI gives you React hooks like useChat and useCompletion for building chat interfaces
  • AI SDK RSC lets you stream React Server Components from the server (experimental)

Setting up

Install the core package and a provider. We will use Anthropic in this guide, but you can swap it for any other provider.

bash
pnpm install ai @ai-sdk/anthropic zod

Add your API key to .env.local:

bash
ANTHROPIC_API_KEY=your_key_here

That is all you need.

Generating text

The two core functions are generateText (waits for the full response) and streamText (streams tokens as they come in). Use generateText for background tasks and streamText for anything a user is looking at.

ts
import { generateText } from 'ai';

const { text } = await generateText({
  model: 'anthropic/claude-sonnet-4.5',
  prompt: 'Explain what a monad is in one paragraph.',
});

The model string follows the provider/model format. The SDK routes it to the right provider automatically. You can also import the provider directly if you prefer:

ts
import { anthropic } from '@ai-sdk/anthropic';

const { text } = await generateText({
  model: anthropic('claude-sonnet-4-5-20250514'),
  prompt: 'Explain what a monad is in one paragraph.',
});

For streaming:

ts
import { streamText } from 'ai';

const result = streamText({
  model: 'anthropic/claude-sonnet-4.5',
  prompt: 'Write a short story about a debugging session at 3am.',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

streamText starts immediately and uses backpressure, meaning it only generates tokens as fast as you consume them.

Structured output

Sometimes you do not want free-form text. You want a JSON object that matches a specific shape. The SDK handles this with generateObject and streamObject. You define a Zod schema and the SDK makes sure the model's output conforms to it.

ts
import { generateText, Output } from 'ai';
import { z } from 'zod';

const result = await generateText({
  model: 'anthropic/claude-sonnet-4.5',
  output: Output.object({
    schema: z.object({
      title: z.string(),
      summary: z.string(),
      tags: z.array(z.string()),
      sentiment: z.enum(['positive', 'negative', 'neutral']),
    }),
  }),
  prompt: `Analyze this review: "The battery life is incredible but the camera is disappointing for the price."`,
});

console.log(result.output);

The Output object supports several formats: Output.object() for single objects, Output.array() for lists, Output.choice() for picking from options, Output.json() for unstructured JSON, and Output.text() for plain text (the default).

Since AI SDK 6, you can also use structured output together with tool calling in a single call. The model calls tools first, gathers information, and then returns a structured object at the end.

Tool calling

Tools are how you give a model the ability to do things beyond generating text. You define a tool with a description, an input schema, and an execute function. The model decides when to call it based on the conversation.

ts
import { tool } from 'ai';
import { z } from 'zod';

const weatherTool = tool({
  description: 'Get the current weather for a city',
  inputSchema: z.object({
    city: z.string().describe('The city name'),
  }),
  execute: async ({ city }) => {
    const res = await fetch(
      `https://api.weatherapi.com/v1/current.json?key=${process.env.WEATHER_API_KEY}&q=${city}`
    );
    const data = await res.json();
    return {
      temperature: data.current.temp_c,
      condition: data.current.condition.text,
      humidity: data.current.humidity,
    };
  },
});

You pass tools to generateText or streamText and the model will call them when it thinks it needs to:

ts
const { text } = await generateText({
  model: 'anthropic/claude-sonnet-4.5',
  tools: { weather: weatherTool },
  maxSteps: 5,
  prompt: 'What is the weather like in Berlin and Tokyo right now?',
});

The maxSteps parameter is important. Without it, the model makes one tool call and stops. With maxSteps: 5, the model can call tools, read the results, call more tools, and keep going until it has enough information to respond. This is what makes multi-step reasoning possible.

Tool execution approval

In AI SDK 6, you can require human approval before a tool runs. This is critical for anything that has real-world consequences like deleting data, sending emails, or running shell commands.

ts
const deleteFileTool = tool({
  description: 'Delete a file from the filesystem',
  inputSchema: z.object({
    path: z.string().describe('The file path to delete'),
  }),
  needsApproval: true,
  execute: async ({ path }) => {
    await fs.unlink(path);
    return { deleted: path };
  },
});

You can also make needsApproval a function that decides based on the input:

ts
needsApproval: async ({ path }) => path.includes('/production/'),

On the frontend, you check the invocation state and show approve/deny buttons. More on that when we build the UI.

Sending custom output to the model

By default, whatever your tool returns gets stringified as JSON and sent back to the model. That is fine for small payloads, but if your tool returns a 10KB document, you are wasting tokens. The toModelOutput function lets you control what the model actually sees:

ts
const searchTool = tool({
  description: 'Search the knowledge base',
  inputSchema: z.object({ query: z.string() }),
  execute: async ({ query }) => {
    const results = await searchKnowledgeBase(query);
    return results;
  },
  toModelOutput: async ({ output }) => ({
    type: 'text',
    value: output.map(r => `- ${r.title}: ${r.snippet}`).join('\n'),
  }),
});

The execute function returns the full data (for your app to use), and toModelOutput returns a condensed version for the model.

Building agents

Before AI SDK 6, you would pass tools and maxSteps to generateText every time. That works, but when you want the same agent in a chat UI, a background job, and an API endpoint, you end up copying config everywhere.

The ToolLoopAgent class solves this. Define your agent once, use it anywhere:

ts
import { ToolLoopAgent } from 'ai';

const researchAgent = new ToolLoopAgent({
  model: 'anthropic/claude-sonnet-4.5',
  instructions: `You are a research assistant. Use the available tools to find 
    information and provide well-sourced answers. Always cite your sources.`,
  tools: {
    search: searchTool,
    weather: weatherTool,
    fetchPage: fetchPageTool,
  },
});

Now you can call it from anywhere:

ts
const result = await researchAgent.generate({
  prompt: 'What are the top 3 trending topics in AI this week?',
});

console.log(result.text);

Or stream it:

ts
const result = researchAgent.stream({
  prompt: 'What are the top 3 trending topics in AI this week?',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

The agent loops automatically: it calls the model, executes tool calls, feeds results back, and repeats until the model is done (up to 20 steps by default).

Call options

You can make agents configurable per request using callOptionsSchema. This is useful for passing user context, selecting models dynamically, or injecting retrieved documents for RAG:

ts
import { ToolLoopAgent } from 'ai';
import { z } from 'zod';

const supportAgent = new ToolLoopAgent({
  model: 'anthropic/claude-sonnet-4.5',
  callOptionsSchema: z.object({
    userId: z.string(),
    accountType: z.enum(['free', 'pro', 'enterprise']),
  }),
  prepareCall: ({ options, ...settings }) => ({
    ...settings,
    instructions: `You are a support agent. The user has a ${options.accountType} account. Their ID is ${options.userId}.`,
  }),
  tools: { lookupOrder: lookupOrderTool },
});

const result = await supportAgent.generate({
  prompt: 'Where is my order?',
  options: { userId: 'usr_abc123', accountType: 'pro' },
});

Structured output from agents

You can combine agents with structured output. The agent calls tools to gather data, then returns a typed object at the end:

ts
const analysisAgent = new ToolLoopAgent({
  model: 'anthropic/claude-sonnet-4.5',
  tools: { search: searchTool, fetchPage: fetchPageTool },
  output: Output.object({
    schema: z.object({
      summary: z.string(),
      sources: z.array(z.object({
        title: z.string(),
        url: z.string(),
        relevance: z.number().min(0).max(1),
      })),
      confidence: z.number().min(0).max(1),
    }),
  }),
});

const { output } = await analysisAgent.generate({
  prompt: 'What is the current state of WebAssembly support in browsers?',
});

console.log(output.summary);
console.log(output.sources);

Connecting to a Next.js UI

This is where everything comes together. We will build a chat interface that streams responses from our agent.

The agent definition

ts
// agents/research-agent.ts
import { ToolLoopAgent, InferAgentUIMessage } from 'ai';
import { searchTool } from '@/tools/search';
import { fetchPageTool } from '@/tools/fetch-page';

export const researchAgent = new ToolLoopAgent({
  model: 'anthropic/claude-sonnet-4.5',
  instructions: 'You are a helpful research assistant. Search the web when needed and provide sourced answers.',
  tools: {
    search: searchTool,
    fetchPage: fetchPageTool,
  },
});

export type ResearchAgentMessage = InferAgentUIMessage<typeof researchAgent>;

The API route

ts
// app/api/chat/route.ts
import { createAgentUIStreamResponse } from 'ai';
import { researchAgent } from '@/agents/research-agent';

export async function POST(request: Request) {
  const { messages } = await request.json();
  return createAgentUIStreamResponse({
    agent: researchAgent,
    uiMessages: messages,
  });
}

The chat page

tsx
// app/page.tsx
'use client';

import { useChat } from '@ai-sdk/react';
import type { ResearchAgentMessage } from '@/agents/research-agent';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat<ResearchAgentMessage>();

  return (
    <div>
      <div>
        {messages.map(message => (
          <div key={message.id}>
            <strong>{message.role}:</strong>
            {message.parts.map((part, i) => {
              switch (part.type) {
                case 'text':
                  return <p key={i}>{part.text}</p>;
                case 'tool-search':
                  return (
                    <div key={i}>
                      Searching: {part.input.query}
                      {part.state === 'output-available' && (
                        <span> ({part.output.results.length} results)</span>
                      )}
                    </div>
                  );
                default:
                  return null;
              }
            })}
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask me anything..."
          disabled={isLoading}
        />
        <button type="submit" disabled={isLoading}>
          Send
        </button>
      </form>
    </div>
  );
}

The types flow end-to-end. The ResearchAgentMessage type is inferred from the agent definition, so when you switch on part.type, TypeScript knows exactly what properties are available. If your search tool returns { results: Array<{ title: string, url: string }> }, then part.output.results is typed correctly in your component.

MCP support

The Model Context Protocol is a standard for connecting AI models to external tools and data sources. Think of it as a universal plug for AI integrations. Instead of writing custom tool implementations, you connect to an MCP server that already exposes tools.

AI SDK 6 has full MCP support through the @ai-sdk/mcp package:

ts
import { createMCPClient } from '@ai-sdk/mcp';

const mcpClient = await createMCPClient({
  transport: {
    type: 'http',
    url: 'https://your-mcp-server.com/mcp',
    headers: { Authorization: 'Bearer your-token' },
  },
});

const tools = await mcpClient.tools();

You can pass those tools directly to generateText, streamText, or a ToolLoopAgent. The client also supports OAuth authentication, resources (for reading data from the server), prompts (reusable templates), and elicitation (the server asking the user for input mid-operation).

DevTools

Debugging multi-step agent flows used to mean adding console.log everywhere and trying to piece together what happened. AI SDK DevTools gives you a visual inspector for every LLM call.

Wrap your model with the middleware:

ts
import { wrapLanguageModel, gateway } from 'ai';
import { devToolsMiddleware } from '@ai-sdk/devtools';

const model = wrapLanguageModel({
  model: gateway('anthropic/claude-sonnet-4.5'),
  middleware: devToolsMiddleware(),
});

Run npx @ai-sdk/devtools and open http://localhost:4983. You will see every step of every call: input, output, tool calls, token usage, timing, and raw provider requests.

Reranking

If you are building RAG (retrieval-augmented generation), you probably retrieve a bunch of documents and dump them all into the prompt. Reranking lets you sort them by relevance first, so the model gets better context:

ts
import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';

const documents = await searchVectorDB(query);

const { ranking } = await rerank({
  model: cohere.reranking('rerank-v3.5'),
  documents: documents.map(d => d.content),
  query: 'How does token caching work?',
  topN: 5,
});

Now you pass only the top 5 most relevant documents to the model instead of all 50 you retrieved.

Stream smoothing

LLMs often emit tokens in bursts. You get a chunk of 10 words, then silence, then another burst. This makes the UI feel jittery. The smoothStream transform evens out the delivery:

ts
import { smoothStream, streamText } from 'ai';

const result = streamText({
  model: 'anthropic/claude-sonnet-4.5',
  prompt: 'Tell me about the history of the internet.',
  experimental_transform: smoothStream(),
});

You can also write custom transforms. For example, a transform that converts all text to uppercase, or one that stops the stream if the model generates something inappropriate.

What changed from earlier versions

If you used the AI SDK before version 6, here is a quick summary of what is different:

  • Agents are first class. ToolLoopAgent replaces the pattern of passing tools and maxSteps inline every time.
  • Tool execution approval. needsApproval on tools for human-in-the-loop workflows.
  • Output specification. generateObject and generateText with structured output are now unified through Output.object(), Output.array(), etc.
  • MCP is stable. Full support for connecting to MCP servers with HTTP, SSE, OAuth, resources, prompts, and elicitation.
  • DevTools. Visual debugger for LLM calls.
  • Reranking. Native rerank function for sorting documents by relevance.
  • Type-safe UI. InferAgentUIMessage gives you end-to-end types from agent definition to React component.
  • toModelOutput. Control what the model sees from tool results separately from what your app gets.
  • Standard JSON Schema. Any schema library that implements the Standard JSON Schema V1 spec works, not just Zod.