Vòng lặp agent bạn cần chỉ vừa vào khoảng 30 dòng TypeScript. Phần còn lại là biết điều gì sẽ gãy ở scale lớn — và tại sao.

Hướng dẫn này xây một agent hoạt động thực sự với @anthropic-ai/sdk v0.100.1 từ đầu: bạn kết nối tool call đầu tiên, mở rộng thành vòng lặp agentic đúng nghĩa, thêm session memory và persistent memory, rồi expose toàn bộ dưới dạng MCP server để Claude Desktop và Claude Code có thể tự khám phá mà không cần code riêng cho từng client.

Dành cho ai

Các developer TypeScript muốn đưa một agent thực sự lên production — không phải wrapper của chatbot. Bạn cần quen với async/await và đã cài Node.js 18+.

Bạn sẽ xây gì

Kết thúc hướng dẫn, bạn sẽ có một TypeScript agent chạy được: gọi tools qua nhiều lượt, session memory được nhúng trực tiếp vào mảng messages, persistent memory tùy chọn qua pgvector, và một MCP server expose tools của agent cho bất kỳ MCP client nào.

Cài packages:

npm install @anthropic-ai/[email protected] @modelcontextprotocol/[email protected]

Đặt ANTHROPIC_API_KEY vào environment. SDK tự đọc biến này.

Bước 1: Định nghĩa tool

Tool là một object JSON Schema bạn gửi kèm theo message lên API. Claude đọc description, tự quyết định tool đó có phù hợp không, và trả về block tool_use khi muốn gọi nó.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // đọc ANTHROPIC_API_KEY từ env

const tools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description:
      "Get the current weather for a city. Returns temperature in Celsius and a short condition string. Use when the user asks about weather at a specific location. Does not forecast future conditions.",
    input_schema: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "City and state, e.g. San Francisco, CA",
        },
      },
      required: ["location"],
    },
    strict: true,
  },
];

Description là trường quan trọng nhất. Claude dựa vào đó để quyết định khi nào — và khi nào không — gọi tool. Ba đến bốn câu là tốt nhất: tool làm gì, trả về gì, khi nào dùng, và giới hạn là gì.

strict: true đảm bảo input của tool từ Claude luôn khớp đúng với JSON Schema của bạn. Không có nó, bạn có thể gặp thiếu optional fields hoặc sai kiểu dữ liệu lúc runtime. Hãy bật nó lên.

Lỗi hay gặp: Description mơ hồ dẫn đến chọn sai tool. Nếu agent của bạn gọi get_weather cho các câu hỏi không liên quan đến thời tiết, hãy làm rõ description trước — đừng vội sửa code. Vấn đề hành vi của model hầu như không bao giờ là lỗi code.

Bước 2: Thực hiện một tool call đơn

Quy trình bốn bước: gửi request kèm mảng tools, trích xuất block tool_use từ response, chạy tool trong code của bạn, gửi kết quả trả về.

// Bước 1: gửi request kèm tools
const response = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  tool_choice: { type: "auto" },
  messages: [
    { role: "user", content: "What is the weather in Tokyo?" },
  ],
});

// Bước 2: trích xuất block tool_use — nó có thể KHÔNG nằm ở index 0
const toolUse = response.content.find(
  (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
)!;

// Bước 3: chạy tool (gọi API thời tiết thực ở đây)
const weatherData = { temperature: 22, condition: "Partly cloudy" };

// Bước 4: trả kết quả về
const followup = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  messages: [
    { role: "user", content: "What is the weather in Tokyo?" },
    { role: "assistant", content: response.content },
    {
      role: "user",
      content: [
        {
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: JSON.stringify(weatherData),
        },
      ],
    },
  ],
});

const reply = followup.content.find((b) => b.type === "text");
if (reply?.type === "text") console.log(reply.text);

Lỗi hay gặp: Response có thể chứa một text block trước block tool_use — Claude thường mô tả những gì nó sắp làm. Đừng bao giờ dùng response.content[0]. Luôn filter theo block.type === "tool_use". Nếu stop_reason là "end_turn" thay vì "tool_use", Claude quyết định không cần dùng tool; kiểm tra lại description của tool.

Lỗi hay gặp: Block tool_result phải nằm trong một message user ngay sau message tool_use của assistant. Đặt chúng trong message assistant sẽ gây ra API error.

Bước 3: Vòng lặp TypeScript AI agent

Một vòng request-response là đủ cho tác vụ một lần. Với những gì đa bước — chẳng hạn “đặt lịch standup hàng tuần cho 4 tuần” — bạn cần một vòng lặp. Vòng lặp chạy cho đến khi stop_reason là "end_turn".

function runTool(name: string, input: Record<string, unknown>): unknown {
  if (name === "get_weather") {
    return { temperature: 22, condition: "Partly cloudy" };
  }
  if (name === "create_calendar_event") {
    return { event_id: "evt_abc123", status: "created", title: input.title };
  }
  return { error: `Unknown tool: ${name}` };
}

// Mảng messages là session memory của agent — tăng dần theo từng lượt
const messages: Anthropic.MessageParam[] = [
  {
    role: "user",
    content:
      "Schedule weekly standups for 4 Mondays starting next week at 9am. Invite alice@, bob@, [email protected].",
  },
];

let response = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  messages,
});

while (response.stop_reason === "tool_use") {
  // Claude có thể yêu cầu nhiều tools trong một lượt
  const toolUseBlocks = response.content.filter(
    (block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
  );

  const toolResults = toolUseBlocks.map((toolUse) => ({
    type: "tool_result" as const,
    tool_use_id: toolUse.id,
    content: JSON.stringify(
      runTool(toolUse.name, toolUse.input as Record<string, unknown>)
    ),
  }));

  // Thêm toàn bộ lượt assistant và kết quả tools vào lịch sử
  messages.push({ role: "assistant", content: response.content });
  messages.push({ role: "user", content: toolResults });

  response = await client.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 1024,
    tools,
    messages,
  });
}

// Response cuối cùng bằng ngôn ngữ tự nhiên của Claude
for (const block of response.content) {
  if (block.type === "text") console.log(block.text);
}

stop_reason luân phiên giữa "tool_use" (cần gọi thêm) và "end_turn" (Claude xong). Code của bạn không cần biết trước sẽ có bao nhiêu lượt.

Lỗi hay gặp: Chỉ xử lý block tool_use đầu tiên khi Claude trả về nhiều. Claude có thể đặt nhiều tool call độc lập trong một lượt. Filter tất cả, chạy tất cả, trả về tất cả kết quả trong một user message duy nhất. Gửi từng cái một sẽ khiến API báo lỗi cấu trúc conversation.

Lỗi hay gặp: Không báo cho Claude khi tool gặp lỗi. Dùng is_error: true để báo hiệu:

{
  type: "tool_result",
  tool_use_id: toolUse.id,
  content: "ConnectionError: calendar API returned HTTP 503. Retry after 30 seconds.",
  is_error: true,
}

Claude sẽ tự xử lý — thường là thử lại với correction hoặc giải thích tình huống. Chỉ trả về "failed" không thôi thì Claude không có gì để làm tiếp; nó thường sẽ xin lỗi rồi dừng thay vì thử lại.

Bước 4: Memory

Session memory

Mảng messages ở trên chính là session memory. Mỗi lượt đều thấy toàn bộ lịch sử hội thoại. Không cần infrastructure thêm gì. Đây là đủ cho các agent chạy một session ngắn.

Giới hạn: claude-opus-4-8 có context window 1M token. Một vòng lặp gọi tools trả về kết quả lớn — database queries, trang web, nội dung file — có thể lấp đầy nhanh. Theo dõi usage.input_tokens trên mỗi response. Khi tiến gần ~750K, hãy tóm tắt các lượt cũ:

// Thay lịch sử cũ bằng một message tóm tắt
const summaryMessage: Anthropic.MessageParam = {
  role: "user",
  content:
    "[Context summary: user requested 4 weekly standups. Events created: Oct 6, 13, 20, 27 at 9am with alice@, bob@, [email protected].]",
};
// Giữ lại 10 messages gần nhất; thêm summary vào đầu
messages.splice(0, messages.length - 10, summaryMessage);

Persistent memory qua nhiều session

Với các agent cần ghi nhớ ngữ cảnh qua nhiều session riêng biệt, hãy embed các lượt trước, lưu vào Postgres với pgvector, truy xuất ngữ cảnh liên quan về mặt ngữ nghĩa trước mỗi lượt mới, rồi inject vào system prompt.

npm install pg
# Neon (serverless Postgres với pgvector cài sẵn) giúp bỏ qua phần setup

-- Chạy một lần trong Postgres instance của bạn
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE memories (
  id        SERIAL PRIMARY KEY,
  user_id   TEXT NOT NULL,
  content   TEXT NOT NULL,
  embedding vector(1024) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops);

import { Anthropic } from "@anthropic-ai/sdk";
import { Pool } from "pg";

const db = new Pool({ connectionString: process.env.DATABASE_URL });
const client = new Anthropic();

async function embedText(text: string): Promise<number[]> {
  // Dùng bất kỳ embedding API nào — voyage-4 (Voyage AI), text-embedding-3-small (OpenAI, cần vector(1536)), v.v.
  // Phải trả về vector có số chiều khớp với cột (1024)
  throw new Error("Implement with your chosen embedding provider");
}

async function saveMemory(userId: string, content: string): Promise<void> {
  const embedding = await embedText(content);
  await db.query(
    "INSERT INTO memories (user_id, content, embedding) VALUES ($1, $2, $3::vector)",
    [userId, content, JSON.stringify(embedding)]
  );
}

async function recallMemories(
  userId: string,
  query: string,
  topK = 5
): Promise<string[]> {
  const queryEmbedding = await embedText(query);
  const { rows } = await db.query(
    `SELECT content
     FROM memories
     WHERE user_id = $1
     ORDER BY embedding <=> $2::vector   -- cosine distance operator (pgvector)
     LIMIT $3`,
    [userId, JSON.stringify(queryEmbedding), topK]
  );
  return rows.map((r: { content: string }) => r.content);
}

async function agentTurn(userId: string, userMessage: string): Promise<string> {
  const memories = await recallMemories(userId, userMessage);
  const memoryBlock =
    memories.length > 0
      ? `\n\nRelevant context from past sessions:\n${memories.map((m) => `- ${m}`).join("\n")}`
      : "";

  const response = await client.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 1024,
    system: `You are a helpful assistant.${memoryBlock}`,
    messages: [{ role: "user", content: userMessage }],
  });

  const reply = (response.content[0] as Anthropic.TextBlock).text;

  // Lưu nền — không block response
  saveMemory(userId, `User: ${userMessage}\nAssistant: ${reply}`).catch(
    console.error
  );

  return reply;
}

Lỗi hay gặp: Await saveMemory trước khi trả về reply. Lần ghi này không ảnh hưởng đến lượt hiện tại; để nó chạy nền.

Lỗi hay gặp: Đổi embedding provider giữa chừng mà không migrate cột. Số chiều phải khớp hoàn toàn — embedding của model 1536 chiều không thể so sánh với dữ liệu trong cột 768 chiều.

Nếu muốn abstraction cấp cao hơn cho pattern này: Mem0 (mem0ai) hỗ trợ 20+ vector store backend (Qdrant, Chroma, Redis, Pinecone). Mastra (@mastra/mem0) bọc quy trình retrieval tương tự với API TypeScript-first. Cả hai đều thêm dependency cho thứ thực chất chỉ là vài câu SQL và một HTTP call — hãy cân nhắc xem abstraction đó có đáng không với use case của bạn.

Bước 5: Expose tools dưới dạng MCP server

MCP (Model Context Protocol) cho phép bạn publish tools của agent dưới dạng một server chuẩn. Claude Desktop, Claude Code và bất kỳ MCP client nào cũng có thể tự tìm thấy và gọi chúng mà bạn không cần code tích hợp riêng cho từng client.

Ghim phiên bản @modelcontextprotocol/sdk v1.29.0. v2 vẫn còn pre-alpha tính đến tháng 6/2026.

npm install zod   # MCP SDK dùng Zod để validate input/output schema

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio";
import * as z from "zod";

const server = new McpServer(
  { name: "my-agent-tools", version: "1.0.0" },
  {
    // Gợi ý sử dụng — Claude đọc phần này để hiểu cách dùng các tools cùng nhau
    instructions:
      "Provides weather lookups and calendar event creation. Call get_weather before create_calendar_event when scheduling weather-dependent events.",
  }
);

server.registerTool(
  "get_weather",
  {
    title: "Get weather",
    description:
      "Get current weather for a city. Returns temperature in Celsius and a short condition string.",
    inputSchema: z.object({
      location: z.string().describe("City and state, e.g. San Francisco, CA"),
    }),
    outputSchema: z.object({
      temperature: z.number(),
      condition: z.string(),
    }),
  },
  async ({ location }) => {
    // Gọi API thời tiết thực ở đây
    const result = { temperature: 22, condition: "Partly cloudy" };
    return {
      content: [{ type: "text", text: JSON.stringify(result) }],
      structuredContent: result,
    };
  }
);

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

Với deployment từ xa — cần truy cập qua mạng từ nhiều client — hãy chuyển sang HTTP transport:

import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp";
import { randomUUID } from "crypto";
import http from "http";

const httpServer = http.createServer(async (req, res) => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: randomUUID,
  });
  await server.connect(transport);
  await transport.handleRequest(req, res);
});

httpServer.listen(3000);

Bảng chọn transport:

Mục tiêu	Transport
Claude Desktop / Claude Code (process local)	`StdioServerTransport`
MCP server từ xa (bất kỳ network client)	`StreamableHTTPServerTransport`
Chỉ để tương thích ngược	HTTP + SSE — đừng dùng cho project mới

Lỗi hay gặp: Nhầm lẫn MCP với cách dùng tool trực tiếp qua API ở các bước 1–3. @anthropic-ai/sdk tool use là cách agent gọi các hàm trong process của chính nó. MCP là cách bạn publish những tools đó để agent hoặc client khác — chạy trong process khác hoặc trên máy khác — có thể tự tìm thấy và gọi chúng. Bạn dùng cả hai trong cùng một project nhưng với mục đích khác nhau.

Lỗi hay gặp: Bỏ qua trường instructions trên McpServer. Claude đọc nó để hiểu thứ tự và phụ thuộc giữa các tools. Thiếu trường này, client không có ngữ cảnh gì ngoài description của từng tool riêng lẻ.

Để xem hướng dẫn đầy đủ từng bước dựng và đăng ký TypeScript MCP server với Claude Code, xem Cách xây dựng MCP server cho Claude Code.

Khi nào nên dùng framework thay thế

Có hai framework đáng để biết.

Vercel AI SDK v6 — chọn cái này nếu bạn cần một API thống nhất cho nhiều model provider (Claude, GPT-4o, Gemini) mà không phải viết lại vòng lặp mỗi lần, hoặc đang xây trên Next.js và muốn dùng streaming hooks có sẵn (useChat, useCompletion). Đánh đổi: bạn mất quyền truy cập trực tiếp vào các tính năng riêng của Claude — extended thinking, strict: true schemas, Anthropic server tools như web_search.

LangChain.js v1 / LangGraph — chọn cái này nếu agent cần orchestration dạng graph có trạng thái: human-in-the-loop approval, cây quyết định phân nhánh, multi-agent handoff. Với vòng lặp tool-calling đơn giản, abstraction này thêm phức tạp mà không mang lại lợi ích gì. LangGraph đã trở thành lựa chọn chuẩn cho kiến trúc agent dạng graph trong TypeScript.

Với tất cả những gì trong hướng dẫn này, dùng thẳng @anthropic-ai/sdk là lựa chọn đúng. Bạn thấy toàn bộ mảng message, các block tool_use chính xác, và hình dạng API thực tế. Khi có gì đó gãy lúc 3 giờ sáng, sự minh bạch đó rất quan trọng.

Điểm gãy ở scale lớn

Rate limits

SDK mặc định retry 429 (cùng với 408, 409, connection error và >=500) hai lần với exponential backoff — đủ cho các burst ngắn. Với quota cạn kiệt kéo dài hoặc cần kiểm soát chặt hơn, override maxRetries ở cấp client hoặc từng request:

// Tắt retry hoàn toàn (ví dụ để tự xử lý logic)
const client = new Anthropic({ maxRetries: 0 });

// Hoặc tăng số lần retry cho một request cụ thể
const response = await client.messages.create(
  { model: "claude-opus-4-8", max_tokens: 1024, tools, messages },
  { maxRetries: 5 }
);

Token overhead từ định nghĩa tool

Định nghĩa tool của bạn được gửi dưới dạng input token trong mỗi request — điều này tạo ra token overhead trong mỗi lượt. Với 20 tools, chi phí tích lũy qua một session dài. Cache các định nghĩa không thay đổi giữa các lượt:

const tools: Anthropic.Tool[] = [
  {
    name: "get_weather",
    description: "...",
    input_schema: { /* ... */ },
    cache_control: { type: "ephemeral" }, // cache tối đa 5 phút
  },
];

Prompt caching đặc biệt hữu ích trong các session agent chạy dài, khi bộ tool cố định nhưng lịch sử message liên tục tăng. Xem Claude API 2026: Prompt Caching, Tool Use và Batch để tìm hiểu sâu về caching, batching và rate limit trong production.

Cạn kiệt context window

1M token nghe có vẻ rộng rãi, cho đến khi một tool trả về 50KB JSON. Theo dõi usage.input_tokens trên mỗi response; khi vượt ~750K, kích hoạt cách tóm tắt từ Bước 4. Một agent chạm giới hạn context giữa chừng sẽ thất bại với API error cứng, không phải lỗi có thể phục hồi.

Xử lý song song

Một instance agent chạy tuần tự — mỗi lần một API call. Với các workload song song (xử lý 100 support ticket đồng thời), hãy chạy nhiều instance cùng một task queue. BullMQ + Redis là pattern đã được kiểm chứng trong production cho việc này — xem job queue tốt nhất cho TypeScript 2026 để so sánh BullMQ, Trigger.dev và Inngest. MCP server dùng StreamableHTTPServerTransport mặc định là stateless, nên horizontal scaling hoạt động mà không cần thay đổi gì.

Prompt injection qua kết quả tool

Kết quả tool có thể chứa nội dung từ các nguồn không đáng tin: trang web, dòng database, response từ API bên thứ ba. Kẻ tấn công kiểm soát được kết quả tool có thể nhúng các lệnh chuyển hướng hành vi của Claude. Hãy giữ nội dung không đáng tin trong block tool_result — đừng sao chép nó vào system prompt hay user message thông thường. Block tool_result là tín hiệu cấu trúc để Claude nhận biết đây là dữ liệu bên ngoài, không phải lệnh tin cậy từ bạn.

Xây dựng AI agent TypeScript — tools, memory, và MCP