· llm / openai / anthropic

LLM structured outputs: JSON mode, function calling, and Zod

Grammar-constrained sampling is the only reliable LLM primitive. How OpenAI, Anthropic, Zod, and Vercel AI SDK v6 compare — and where each still fails you.

By

2,796 words · 14 min read

Your LLM pipeline worked fine in testing. Then, at 3 a.m., a missing closing bracket in the model’s JSON response crashed your job processor. The schema validator threw, nothing caught it, and the queue stalled for six hours.

JSON mode does not prevent this. It guarantees syntactically valid JSON — not that the JSON matches your schema. Structured Outputs does. Understanding the difference, and where each provider’s implementation still fails you, is the difference between a resilient pipeline and a 3 a.m. page.

Who this is for

You are already building production LLM apps. You know what a Zod schema is. You want to understand the mechanism guarantees — not a “what is JSON?” walkthrough.

The core distinction: JSON mode vs structured outputs

JSON mode (response_format: { type: "json_object" }) tells the model to produce syntactically valid JSON. That is all it does. The shape, field names, and types are still unconstrained — the model guesses based on your prompt.

Structured Outputs uses grammar-constrained sampling at the decoder level. The sampler itself is constrained to token sequences that produce schema-valid output. This is not post-hoc validation and not prompt engineering. The constraint is structural.

OpenAI’s announcement:

“While both ensure valid JSON is produced, only Structured Outputs ensure schema adherence.”

Anthropic’s docs describe the mechanism identically:

“constraining the model’s token sampling to schema-valid outputs (a technique called grammar-constrained sampling)”

Both implementations arrive at the same conclusion: for production pipelines that parse model output into typed structs, JSON mode is the wrong primitive. Structured Outputs is what you want.

JSON mode: what it is and why it falls short

JSON mode exists as a lightweight guarantee. It is useful for exploration and when your downstream code can handle any valid JSON shape. Production parsing pipelines are rarely that forgiving.

The failure modes are qualitative — no reliable benchmarks survived adversarial verification on JSON mode error rates. What is documented: the model can produce JSON that is syntactically valid but missing required fields, using wrong types, or with keys you did not ask for. Your validator catches this, but only if you have one. Many pipelines do not.

Use JSON mode when: you are prototyping, the schema is simple and prompt-adherence is sufficient, or you are targeting a model that does not support Structured Outputs.

Use Structured Outputs when: the output feeds a typed data pipeline, schema violations would cause downstream errors, or you are at anything approaching production scale.

Function calling: OpenAI and Anthropic mechanics

Function calling (tool use) is the earlier mechanism for eliciting structured responses. Both providers expose it differently.

OpenAI function calling

OpenAI’s function-calling API lets you define tools with JSON Schema inputs. The model returns a tool_calls array when it decides to invoke a tool. The caller executes the function and returns the result.

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is the weather in Paris?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Returns weather data for a city",
        parameters: {
          type: "object",
          properties: {
            city: { type: "string" },
            unit: { type: "string", enum: ["celsius", "fahrenheit"] },
          },
          required: ["city"],
        },
      },
    },
  ],
});

Function calling without strict: true still does not guarantee schema adherence — the model may omit required properties or use wrong types. For the guarantee, you need the Structured Outputs API (section below).

Anthropic tool use

Anthropic’s tool use API reached GA on May 30, 2024. Each tool definition requires name, description, and input_schema (JSON Schema). When Claude calls a tool, the API returns stop_reason: "tool_use".

The tool_choice parameter controls invocation behavior:

ModeBehaviorDefault?
autoModel decides whether to call a toolYes, when tools provided
anyMust call one of the provided toolsNo
toolForces a specific named toolNo
noneNo tools may be usedYes, when no tools provided

One behavior worth knowing: when tool_choice is any or tool, the API prefills the assistant message. This suppresses any natural-language preamble before tool_use blocks — even if you ask for one in the prompt. Plan for this in your UI.

OpenAI Structured Outputs API

OpenAI Structured Outputs reached GA on August 6, 2024. The API shape uses response_format: { type: "json_schema" } with strict: true. The Python and Node.js SDKs both ship a zodResponseFormat helper that converts a Zod schema to the required format and returns a typed parsed result.

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const CalendarEventSchema = z.object({
  name: z.string(),
  date: z.string(),
  participants: z.array(z.string()),
});

const completion = await client.beta.chat.completions.parse({
  model: "gpt-4o-2024-08-06",
  messages: [
    { role: "user", content: "Alice and Bob are going to a science fair on Friday." }
  ],
  response_format: zodResponseFormat(CalendarEventSchema, "event"),
});

const event = completion.choices[0].message.parsed;
// event is typed as { name: string; date: string; participants: string[] }

In May 2025, OpenAI expanded strict mode to support parallel tool calling and added new JSON Schema features: string validation patterns (email, uri, date-time), and numeric/array range constraints.

What the guarantee does not cover

Three failure modes bypass the schema guarantee on every Structured Outputs implementation:

  1. Safety refusal — the model declines to generate for policy reasons. The response has no structured output; your code must handle message.refusal.
  2. Token-limit truncation — the response is cut off at max_tokens. The Python SDK raises LengthFinishReasonError on finish_reason == "length". The output is incomplete and schema-invalid.
  3. Content filter block — output is blocked post-generation.

Schema adherence is only guaranteed for normal completions. Handle all three cases explicitly.

const completion = await client.beta.chat.completions.parse({ ... });
const message = completion.choices[0].message;

if (message.refusal) {
  throw new Error(`Model refused: ${message.refusal}`);
}
if (completion.choices[0].finish_reason === "length") {
  throw new Error("Response truncated at max_tokens");
}

const data = message.parsed; // safe here

Anthropic strict tool use

Anthropic’s strict tool use adds grammar-constrained sampling to function calling. Setting strict: true on a tool definition combined with tool_choice: { type: "any" } gives you a dual guarantee: a tool will be called AND its inputs will strictly match the schema.

Strict tool use exited public beta on January 29, 2026 — no beta header required after that date. It is available on Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5, and all later Claude API models.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "get_weather",
    description: "Returns weather data for a city",
    input_schema: {
      type: "object" as const,
      properties: {
        city: { type: "string", description: "City name" },
        unit: { type: "string", enum: ["celsius", "fahrenheit"] },
      },
      required: ["city"],
    },
    strict: true,
  },
];

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools,
  tool_choice: { type: "any" },
  messages: [{ role: "user", content: "What is the weather in Paris?" }],
});

// response.stop_reason === "tool_use"
const toolUseBlock = response.content.find((b) => b.type === "tool_use");

The same three failure modes from the OpenAI section apply here too — safety refusals, truncation, and content filter blocks all bypass the guarantee.

Claude Mythos Preview does not support forced tool use. Requests with tool_choice: { type: "any" } or tool_choice: { type: "tool", name: "..." } return a 400 error on that model. If you are targeting Mythos Preview, use tool_choice: auto and rely on prompting — the strict: true schema constraint still applies to whatever tool the model chooses to call.

Anthropic schema limitations

Anthropic’s structured outputs do not support:

  • Recursive schemas
  • Numerical constraints (minimum, maximum, multipleOf)
  • String length constraints (minLength, maxLength)
  • additionalProperties set to anything other than false

The official SDKs strip unsupported constraints client-side and move them to description fields. Numeric range and string length validation must happen in your application code. This is a meaningful limitation if your schema relies on these constraints for correctness — a z.number().min(0).max(100) silently becomes a plain number field at the API layer.

Whether the Vercel AI SDK handles Anthropic’s schema limitations gracefully when targeting Claude models is an open question. The SDK layer behavior is unverified — test it with your specific schema before relying on it in production.

Zod for type-safe parsing

Zod is the de-facto validation layer in TypeScript LLM pipelines. Two methods matter:

.parse() throws on failure. It returns a strongly-typed deep clone of the input on success. Use it when schema violation is always a bug.

.safeParse() never throws. It returns a discriminated union: { success: true; data: T } | { success: false; error: ZodError }. Use it for LLM outputs where partial or invalid responses are expected.

import { z } from "zod";

const ArticleSchema = z.object({
  title: z.string(),
  tags: z.array(z.string()),
  publishedAt: z.string().datetime(),
});

// Option A: throws — use when schema violation is always a bug
const article = ArticleSchema.parse(llmOutput);

// Option B: discriminated union — use for LLM outputs
const result = ArticleSchema.safeParse(llmOutput);
if (!result.success) {
  console.error(result.error.issues);
  // e.g. [{ path: ["publishedAt"], code: "invalid_string", message: "Invalid datetime" }]
} else {
  const article = result.data; // fully typed
}

// Option C: async schemas require async variants
const articleAsync = await ArticleSchema.safeParseAsync(llmOutputStream);

On error, access result.error.issues (not errors — the errors alias was removed in Zod v4). Verified on v4.4.3.

Vercel AI SDK v6: generateText with structured output

The Vercel AI SDK v6 is a breaking change. generateObject and streamObject are deprecated and will be removed. The replacement is generateText and streamText with an output parameter.

Conceptv5 (deprecated)v6 (current)
Structured objectgenerateObject({ schema })generateText({ output: Output.object({ schema }) })
StreamingstreamObject({ schema })streamText({ output: Output.object({ schema }) })
Partial streampartialObjectStreampartialOutputStream
Import{ generateObject } from 'ai'{ generateText, Output } from 'ai'

Many tutorials still use v5 syntax. If you copy-paste SDK examples from articles older than late 2025, you are likely looking at deprecated code.

Automated migration: npx @ai-sdk/codemod upgrade v6

v6 code example

import { generateText, streamText, Output } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const ArticleSchema = z.object({
  summary: z.string().describe("One-paragraph summary"),
  temperature: z.number(),
  recommendation: z.string(),
});

// generateText with structured output
const { output } = await generateText({
  model: anthropic("claude-sonnet-4-6"),
  output: Output.object({ schema: ArticleSchema }),
  prompt: "Analyze the weather in San Francisco for a developer blog post.",
});
// output is typed as z.infer<typeof ArticleSchema>

// Streaming version
const { partialOutputStream } = streamText({
  model: anthropic("claude-sonnet-4-6"),
  output: Output.object({ schema: ArticleSchema }),
  prompt: "...",
});
for await (const partial of partialOutputStream) {
  console.log(partial);
}

v6 offers five output modes via Output.*:

ModeUse case
Output.object({ schema })Single structured object with schema validation
Output.array({ element })Array of typed objects; per-element validation
Output.choice({ options })Classification from a fixed string set
Output.json()Unstructured JSON, no schema enforcement
Output.text()Plain text (default)

One architectural change from v5: structured output now counts as a step in multi-step tool-calling loops. If you combine tools with structured output, configure stopWhen explicitly. The error type on failure is AI_NoObjectGeneratedError, which preserves text, response, usage, and cause for debugging.

Zod, Valibot, Arktype, and any library implementing the Standard JSON Schema interface work natively in v6.

Comparison table

MechanismSchema guaranteeFailure modes that bypass guaranteeNotable schema limitations
JSON mode (OpenAI / Anthropic)Syntactically valid JSON onlyN/A — no schema guaranteeNone relevant
OpenAI Structured Outputs (strict: true)Grammar-constrainedSafety refusal, max_tokens truncation, content filterNone (supports most JSON Schema features)
OpenAI function calling (no strict)NoneAll of the above
Anthropic strict tool use (strict: true + any)Grammar-constrainedSafety refusal, max_tokens truncation, content filterNo recursive schemas, no numeric/string length constraints
Anthropic tool use (no strict)NoneAll of the above
Vercel AI SDK v6 Output.objectDepends on target modelPropagates from underlying providerDepends on model — Anthropic gaps pass through

Recommendations

For new production pipelines on OpenAI: use client.beta.chat.completions.parse() with zodResponseFormat. Handle message.refusal and finish_reason === "length" explicitly.

For new production pipelines on Anthropic: use strict tool use with tool_choice: { type: "any" }. Strip any numeric or string-length constraints from your Zod schemas before passing to the API — they are silently dropped.

For multi-provider abstractions: the Vercel AI SDK v6 is a reasonable choice, but verify Anthropic schema behavior with your specific schemas before shipping. SDK-level behavior for unsupported Anthropic schema features is not fully documented.

For validation layer: .safeParse() everywhere in LLM output processing. .parse() only when a schema violation is genuinely a programmer error.

For streaming use cases: streamText with Output.object in v6. The partialOutputStream iterator gives you progressively typed partial objects.

For cost optimization: once schema compliance is stable, LLM cost routing: when Haiku beats Opus and when it does not covers when classification and extraction workloads can move to cheaper models without output quality degradation.

For token budget management: if large system prompts push structured-output responses against max_tokens, prompt caching in 2026 — Anthropic, OpenAI, and Gemini compared explains how all three major providers handle prefix caching, cutting repeated-context costs by up to 90%.

Gotchas and edge cases

OpenAI Python SDK — nested Pydantic models with field descriptions

If you use nested Pydantic models in strict mode and add Field(description=...) to a field whose type is another Pydantic model, the SDK sends invalid JSON Schema to the API and you get a 400 BadRequestError. The root cause: JSON Schema for a $ref alongside extra properties requires inline expansion; the prior code path skipped recursive strict coercion on the expanded object.

Fix: upgrade openai-python to a version that includes PR #2025 (merged January 17, 2025). Any version after that date is safe.

# Python — what triggers the bug (fixed in openai-python post-2025-01-17)
from openai import OpenAI
from pydantic import BaseModel, Field

class Address(BaseModel):
    street: str
    city: str

class Person(BaseModel):
    # This Field(description=...) on a nested model type caused the 400 error
    address: Address = Field(description="Home address")
    name: str

Anthropic — no numeric constraints at the API

z.number().min(0).max(100) in your Zod schema produces no constraint at the Anthropic API layer. The SDK strips minimum, maximum, and multipleOf silently. Validate ranges in application code after parsing.

Anthropic — extended thinking incompatibility with forced tool_choice

When extended thinking is enabled, tool_choice: { type: "any" } and tool_choice: { type: "tool", name: "..." } are not supported and return a runtime error. Only tool_choice: { type: "auto" } (the default) and tool_choice: { type: "none" } work alongside extended thinking.

The production recommendation in this article — strict tool use with tool_choice: { type: "any" } — does not apply when you add extended thinking. If you enable both, the API will reject the request. Use tool_choice: auto and rely on prompting when extended thinking is required.

Zod v4 — errors alias removed

error.errors was an alias for error.issues in Zod v3. It was removed in v4. If you are upgrading from v3, search for .errors and replace with .issues.

Vercel AI SDK v6 — generateObject still exists but is deprecated

generateObject still works in v6 but is deprecated and will be removed. Running it now produces no warning at runtime. Watch for it in your codebase with:

grep -r "generateObject\|streamObject" src/ --include="*.ts"

Anthropic × Vercel AI SDK schema compatibility

Whether the Vercel AI SDK v6 translates Zod schemas for Anthropic by stripping unsupported features is unverified. A GitHub issue (#13355) confirms the API rejects schemas with unsupported properties — but whether the SDK handles this translation before the request hits the API is not documented. Test your specific schemas against Claude models before relying on the SDK as an abstraction layer for this.

References