The agent loop you need fits in about 30 lines of TypeScript. The rest is knowing what breaks at scale — and why.

This tutorial builds a working agent with @anthropic-ai/sdk v0.100.1 from scratch: you wire up your first tool call, extend it into a proper agentic loop, add session and persistent memory, then expose the whole thing as an MCP server that Claude Desktop and Claude Code can discover without any extra client-specific code.

Who this is for

TypeScript developers who want to ship a real agent — not a chatbot wrapper. You should be comfortable with async/await and have Node.js 18+ installed.

What you’re building

By the end you’ll have a runnable TypeScript agent that calls tools across multiple turns, session memory baked into the message array, optional persistent memory via pgvector, and an MCP server that exposes your agent’s tools to any MCP client.

Install the packages:

npm install @anthropic-ai/[email protected] @modelcontextprotocol/[email protected]

Set ANTHROPIC_API_KEY in your environment. The SDK reads it automatically.

Step 1: Define a tool

A tool is a JSON Schema object you pass to the API alongside your message. Claude reads the description, decides whether the tool is relevant, and returns a tool_use block when it wants to call it.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env

const tools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description:
      "Get the current weather for a city. Returns temperature in Celsius and a short condition string. Use when the user asks about weather at a specific location. Does not forecast future conditions.",
    input_schema: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "City and state, e.g. San Francisco, CA",
        },
      },
      required: ["location"],
    },
    strict: true,
  },
];

The description is the most important field. Claude uses it to decide when — and when not — to call the tool. Three to four sentences works best: what the tool does, what it returns, when to use it, and what it doesn’t cover.

strict: true guarantees Claude’s tool inputs always match your JSON Schema exactly. Without it you can get missing optional fields or unexpected types at runtime. Turn it on.

Failure mode: Vague descriptions produce wrong tool selections. If your agent calls get_weather for non-weather questions, tighten the description first — before touching the code. Model behavior problems are almost never code problems.

Step 2: Make a single tool call

The four-step handshake: send your request with the tools array, extract the tool_use block from the response, run the tool in your code, send the result back.

// Step 1: send user request with tools
const response = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  tool_choice: { type: "auto" },
  messages: [
    { role: "user", content: "What is the weather in Tokyo?" },
  ],
});

// Step 2: extract the tool_use block — it may NOT be at index 0
const toolUse = response.content.find(
  (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
)!;

// Step 3: run your tool (call a real weather API here)
const weatherData = { temperature: 22, condition: "Partly cloudy" };

// Step 4: return the result
const followup = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  messages: [
    { role: "user", content: "What is the weather in Tokyo?" },
    { role: "assistant", content: response.content },
    {
      role: "user",
      content: [
        {
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: JSON.stringify(weatherData),
        },
      ],
    },
  ],
});

const reply = followup.content.find((b) => b.type === "text");
if (reply?.type === "text") console.log(reply.text);

Failure mode: A response can contain a text block before the tool_use block — Claude often narrates what it’s about to do. Never index with response.content[0]. Always filter by block.type === "tool_use". If stop_reason is "end_turn" instead of "tool_use", Claude decided no tool was needed; check your tool descriptions.

Failure mode: tool_result blocks must go in a user message immediately after the assistant’s tool_use message. Sending them in an assistant message causes an API error.

Step 3: The TypeScript AI agent loop

A single round-trip is fine for one-shot tasks. For anything multi-step — “book four weekly standup meetings” — you need a loop. The loop runs until stop_reason is "end_turn".

function runTool(name: string, input: Record<string, unknown>): unknown {
  if (name === "get_weather") {
    return { temperature: 22, condition: "Partly cloudy" };
  }
  if (name === "create_calendar_event") {
    return { event_id: "evt_abc123", status: "created", title: input.title };
  }
  return { error: `Unknown tool: ${name}` };
}

// The messages array is the agent's session memory — grow it each turn
const messages: Anthropic.MessageParam[] = [
  {
    role: "user",
    content:
      "Schedule weekly standups for 4 Mondays starting next week at 9am. Invite alice@, bob@, [email protected].",
  },
];

let response = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  messages,
});

while (response.stop_reason === "tool_use") {
  // Claude may request multiple tools in a single turn
  const toolUseBlocks = response.content.filter(
    (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
  );

  const toolResults = toolUseBlocks.map((toolUse) => ({
    type: "tool_result" as const,
    tool_use_id: toolUse.id,
    content: JSON.stringify(
      runTool(toolUse.name, toolUse.input as Record<string, unknown>)
    ),
  }));

  // Append the full assistant turn and all tool results to history
  messages.push({ role: "assistant", content: response.content });
  messages.push({ role: "user", content: toolResults });

  response = await client.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 1024,
    tools,
    messages,
  });
}

// Claude's final natural-language response
for (const block of response.content) {
  if (block.type === "text") console.log(block.text);
}

stop_reason alternates between "tool_use" (another call is needed) and "end_turn" (Claude is done). Your code does not need to know in advance how many turns will happen.

Failure mode: Processing only the first tool_use block when Claude returns several. Claude can pack multiple independent tool calls into one turn. Filter for all of them, run all of them, return all results in a single user message. If you send them piecemeal, the API errors on the broken conversation structure.

Failure mode: Not telling Claude when a tool fails. Signal errors with is_error: true:

{
  type: "tool_result",
  tool_use_id: toolUse.id,
  content: "ConnectionError: calendar API returned HTTP 503. Retry after 30 seconds.",
  is_error: true,
}

Claude will adapt — usually retry with a correction or explain the situation. A bare "failed" gives Claude nothing to work with; it will often apologize and stop instead of trying again.

Step 4: Memory

Session memory

The messages array above is session memory. Every turn sees the full conversation history. Zero infrastructure required. This is sufficient for single-session, short-lived agents.

Limit: claude-opus-4-8 has a 1M-token context window. A loop that calls tools with large results — database queries, web pages, file contents — can fill it fast. Track usage.input_tokens on each response. When it approaches ~750K, summarize old turns:

// Replace old history with a condensed summary message
const summaryMessage: Anthropic.MessageParam = {
  role: "user",
  content:
    "[Context summary: user requested 4 weekly standups. Events created: Oct 6, 13, 20, 27 at 9am with alice@, bob@, [email protected].]",
};
// Keep only the last 10 messages; prepend the summary
messages.splice(0, messages.length - 10, summaryMessage);

Persistent cross-session memory

For agents that need to remember context across separate sessions, embed past turns, store them in Postgres with pgvector, retrieve semantically relevant context before each new turn, and inject it into the system prompt.

npm install pg
# Neon (serverless Postgres with pgvector pre-installed) removes the setup overhead

-- Run this once in your Postgres instance
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE memories (
  id        SERIAL PRIMARY KEY,
  user_id   TEXT NOT NULL,
  content   TEXT NOT NULL,
  embedding vector(1024) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops);

import { Anthropic } from "@anthropic-ai/sdk";
import { Pool } from "pg";

const db = new Pool({ connectionString: process.env.DATABASE_URL });
const client = new Anthropic();

async function embedText(text: string): Promise<number[]> {
  // Use any embedding API — voyage-4 (Voyage AI), text-embedding-3-small (OpenAI, needs vector(1536)), etc.
  // Must return a vector whose length matches the column dimension above (1024)
  throw new Error("Implement with your chosen embedding provider");
}

async function saveMemory(userId: string, content: string): Promise<void> {
  const embedding = await embedText(content);
  await db.query(
    "INSERT INTO memories (user_id, content, embedding) VALUES ($1, $2, $3::vector)",
    [userId, content, JSON.stringify(embedding)]
  );
}

async function recallMemories(
  userId: string,
  query: string,
  topK = 5
): Promise<string[]> {
  const queryEmbedding = await embedText(query);
  const { rows } = await db.query(
    `SELECT content
     FROM memories
     WHERE user_id = $1
     ORDER BY embedding <=> $2::vector   -- cosine distance operator (pgvector)
     LIMIT $3`,
    [userId, JSON.stringify(queryEmbedding), topK]
  );
  return rows.map((r: { content: string }) => r.content);
}

async function agentTurn(userId: string, userMessage: string): Promise<string> {
  const memories = await recallMemories(userId, userMessage);
  const memoryBlock =
    memories.length > 0
      ? `\n\nRelevant context from past sessions:\n${memories.map((m) => `- ${m}`).join("\n")}`
      : "";

  const response = await client.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 1024,
    system: `You are a helpful assistant.${memoryBlock}`,
    messages: [{ role: "user", content: userMessage }],
  });

  const reply = (response.content[0] as Anthropic.TextBlock).text;

  // Save asynchronously — don't block the response
  saveMemory(userId, `User: ${userMessage}\nAssistant: ${reply}`).catch(
    console.error
  );

  return reply;
}

Failure mode: Awaiting saveMemory before returning the reply. The write is non-critical for this turn; let it happen in the background.

Failure mode: Swapping embedding providers mid-project without migrating the column. Dimensions must match exactly — a 1536-dimensional model’s embeddings cannot be compared against a 768-dimensional column’s neighbors.

If you want a higher-level abstraction over this pattern: Mem0 (mem0ai) supports 20+ vector store backends (Qdrant, Chroma, Redis, Pinecone). Mastra (@mastra/mem0) wraps similar retrieval with a TypeScript-first API. Both add a dependency for what is otherwise a handful of SQL queries and an HTTP call — evaluate whether the abstraction is worth it for your use case.

Step 5: Expose your tools as an MCP server

MCP (Model Context Protocol) lets you publish your agent’s tools as a standardized server. Claude Desktop, Claude Code, and any other MCP client can discover and call them without any client-specific integration work on your end.

Pin to @modelcontextprotocol/sdk v1.29.0. v2 is pre-alpha as of June 2026.

npm install zod   # MCP SDK uses Zod for input/output schema validation

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio";
import * as z from "zod";

const server = new McpServer(
  { name: "my-agent-tools", version: "1.0.0" },
  {
    // Optional usage hint — Claude reads this to understand how to use your tools together
    instructions:
      "Provides weather lookups and calendar event creation. Call get_weather before create_calendar_event when scheduling weather-dependent events.",
  }
);

server.registerTool(
  "get_weather",
  {
    title: "Get weather",
    description:
      "Get current weather for a city. Returns temperature in Celsius and a short condition string.",
    inputSchema: z.object({
      location: z.string().describe("City and state, e.g. San Francisco, CA"),
    }),
    outputSchema: z.object({
      temperature: z.number(),
      condition: z.string(),
    }),
  },
  async ({ location }) => {
    // Call a real weather API here
    const result = { temperature: 22, condition: "Partly cloudy" };
    return {
      content: [{ type: "text", text: JSON.stringify(result) }],
      structuredContent: result,
    };
  }
);

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

For remote deployments — accessible over the network from multiple clients — swap in the HTTP transport:

import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp";
import { randomUUID } from "crypto";
import http from "http";

const httpServer = http.createServer(async (req, res) => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: randomUUID,
  });
  await server.connect(transport);
  await transport.handleRequest(req, res);
});

httpServer.listen(3000);

Transport guide:

Target	Transport
Claude Desktop / Claude Code (local process)	`StdioServerTransport`
Remote MCP server (any network client)	`StreamableHTTPServerTransport`
Backwards compatibility only	HTTP + SSE — don’t start new projects here

Failure mode: Confusing MCP with the direct API tool use from Steps 1–3. Direct @anthropic-ai/sdk tool use is how your agent calls functions inside its own process. MCP is how you publish those tools so other agents or clients — running in a different process or on a different machine — can discover and call them. You use both in the same project for different reasons.

Failure mode: Omitting the instructions field on McpServer. Claude reads it to understand ordering and dependencies between your tools. Without it, clients have no context beyond the individual tool descriptions.

For a complete step-by-step walkthrough of scaffolding and registering a TypeScript MCP server with Claude Code, see How to build an MCP server for Claude Code.

When to use a framework instead

Two worth knowing.

Vercel AI SDK v6 — choose this if you need a single API across multiple model providers (Claude, GPT-4o, Gemini) without rewriting the loop each time, or if you’re building on Next.js and want built-in streaming hooks (useChat, useCompletion). The trade-off: you lose direct access to Claude-specific features — extended thinking, strict: true schemas, Anthropic server tools like web_search.

LangChain.js v1 / LangGraph — choose this if your agent needs stateful graph-based orchestration: human-in-the-loop approval, branching decision trees, multi-agent handoffs. For a straight tool-calling loop, the abstraction adds overhead without payoff. LangGraph has become the standard choice for graph-based agent architectures in TypeScript.

For everything in this tutorial, direct @anthropic-ai/sdk is the right call. You see the full message array, the exact tool_use blocks, and the real API shapes. When something breaks at 3am, that transparency matters.

Where this breaks at scale

Rate limits

The SDK retries 429 (and 408, 409, connection errors, and >=500 errors) twice by default with exponential backoff — enough for brief bursts. For sustained quota exhaustion or tighter control, override maxRetries at the client or per-request level:

// Disable retries entirely (e.g. to implement your own logic)
const client = new Anthropic({ maxRetries: 0 });

// Or increase retries for a specific call
const response = await client.messages.create(
  { model: "claude-opus-4-8", max_tokens: 1024, tools, messages },
  { maxRetries: 5 }
);

Token overhead from tool definitions

Your tool definitions are sent as input tokens on every request — this adds token overhead on every turn. With 20 tools, that compounds across a long session. Cache definitions that don’t change between turns:

const tools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description: "...",
    input_schema: { /* ... */ },
    cache_control: { type: "ephemeral" }, // cached for up to 5 minutes
  },
];

Prompt caching is particularly useful in long-running agent sessions where the tool set is stable but the message history grows. For a production deep-dive on caching, batching, and rate limits, see Claude API 2026: Prompt Caching, Tool Use & Batches.

Context window exhaustion

1M tokens sounds generous until a tool returns 50KB of JSON. Track usage.input_tokens on each response; when it crosses ~750K, trigger the summarization approach from Step 4. An agent that hits the context limit mid-task will fail with a hard API error, not a graceful degradation.

Parallelism

A single agent instance is sequential — one API call at a time. For parallel workloads (process 100 support tickets simultaneously), run multiple instances against a shared task queue. BullMQ + Redis is a proven pattern for this — see best job queue for Node.js and TypeScript in 2026 for a full comparison of BullMQ, Trigger.dev, and Inngest. MCP servers using StreamableHTTPServerTransport are stateless by default, so horizontal scaling works without changes.

Prompt injection via tool results

Tool results can contain content from untrusted sources: web pages, database rows, third-party API responses. An attacker who controls a tool result can embed instructions redirecting Claude’s behavior. Keep untrusted content inside tool_result blocks — don’t copy it into system prompts or plain user messages. The tool_result block is a structural signal to Claude that the content is external data, not trusted instructions from you.

How to build an AI agent in TypeScript — tools, memory, MCP