LLM structured outputs: JSON mode, function calling, and Zod
Grammar-constrained sampling is the only reliable LLM primitive. How OpenAI, Anthropic, Zod, and Vercel AI SDK v6 compare — and where each still fails you.
By Ethan
2,796 words · 14 min read
Your LLM pipeline worked fine in testing. Then, at 3 a.m., a missing closing bracket in the model’s JSON response crashed your job processor. The schema validator threw, nothing caught it, and the queue stalled for six hours.
JSON mode does not prevent this. It guarantees syntactically valid JSON — not that the JSON matches your schema. Structured Outputs does. Understanding the difference, and where each provider’s implementation still fails you, is the difference between a resilient pipeline and a 3 a.m. page.
Who this is for
You are already building production LLM apps. You know what a Zod schema is. You want to understand the mechanism guarantees — not a “what is JSON?” walkthrough.
The core distinction: JSON mode vs structured outputs
JSON mode (response_format: { type: "json_object" }) tells the model to produce syntactically valid JSON. That is all it does. The shape, field names, and types are still unconstrained — the model guesses based on your prompt.
Structured Outputs uses grammar-constrained sampling at the decoder level. The sampler itself is constrained to token sequences that produce schema-valid output. This is not post-hoc validation and not prompt engineering. The constraint is structural.
OpenAI’s announcement:
“While both ensure valid JSON is produced, only Structured Outputs ensure schema adherence.”
Anthropic’s docs describe the mechanism identically:
“constraining the model’s token sampling to schema-valid outputs (a technique called grammar-constrained sampling)”
Both implementations arrive at the same conclusion: for production pipelines that parse model output into typed structs, JSON mode is the wrong primitive. Structured Outputs is what you want.
JSON mode: what it is and why it falls short
JSON mode exists as a lightweight guarantee. It is useful for exploration and when your downstream code can handle any valid JSON shape. Production parsing pipelines are rarely that forgiving.
The failure modes are qualitative — no reliable benchmarks survived adversarial verification on JSON mode error rates. What is documented: the model can produce JSON that is syntactically valid but missing required fields, using wrong types, or with keys you did not ask for. Your validator catches this, but only if you have one. Many pipelines do not.
Use JSON mode when: you are prototyping, the schema is simple and prompt-adherence is sufficient, or you are targeting a model that does not support Structured Outputs.
Use Structured Outputs when: the output feeds a typed data pipeline, schema violations would cause downstream errors, or you are at anything approaching production scale.
Function calling: OpenAI and Anthropic mechanics
Function calling (tool use) is the earlier mechanism for eliciting structured responses. Both providers expose it differently.
OpenAI function calling
OpenAI’s function-calling API lets you define tools with JSON Schema inputs. The model returns a tool_calls array when it decides to invoke a tool. The caller executes the function and returns the result.
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "What is the weather in Paris?" }],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Returns weather data for a city",
parameters: {
type: "object",
properties: {
city: { type: "string" },
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["city"],
},
},
},
],
});
Function calling without strict: true still does not guarantee schema adherence — the model may omit required properties or use wrong types. For the guarantee, you need the Structured Outputs API (section below).
Anthropic tool use
Anthropic’s tool use API reached GA on May 30, 2024. Each tool definition requires name, description, and input_schema (JSON Schema). When Claude calls a tool, the API returns stop_reason: "tool_use".
The tool_choice parameter controls invocation behavior:
| Mode | Behavior | Default? |
|---|---|---|
auto | Model decides whether to call a tool | Yes, when tools provided |
any | Must call one of the provided tools | No |
tool | Forces a specific named tool | No |
none | No tools may be used | Yes, when no tools provided |
One behavior worth knowing: when tool_choice is any or tool, the API prefills the assistant message. This suppresses any natural-language preamble before tool_use blocks — even if you ask for one in the prompt. Plan for this in your UI.
OpenAI Structured Outputs API
OpenAI Structured Outputs reached GA on August 6, 2024. The API shape uses response_format: { type: "json_schema" } with strict: true. The Python and Node.js SDKs both ship a zodResponseFormat helper that converts a Zod schema to the required format and returns a typed parsed result.
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const client = new OpenAI();
const CalendarEventSchema = z.object({
name: z.string(),
date: z.string(),
participants: z.array(z.string()),
});
const completion = await client.beta.chat.completions.parse({
model: "gpt-4o-2024-08-06",
messages: [
{ role: "user", content: "Alice and Bob are going to a science fair on Friday." }
],
response_format: zodResponseFormat(CalendarEventSchema, "event"),
});
const event = completion.choices[0].message.parsed;
// event is typed as { name: string; date: string; participants: string[] }
In May 2025, OpenAI expanded strict mode to support parallel tool calling and added new JSON Schema features: string validation patterns (email, uri, date-time), and numeric/array range constraints.
What the guarantee does not cover
Three failure modes bypass the schema guarantee on every Structured Outputs implementation:
- Safety refusal — the model declines to generate for policy reasons. The response has no structured output; your code must handle
message.refusal. - Token-limit truncation — the response is cut off at
max_tokens. The Python SDK raisesLengthFinishReasonErroronfinish_reason == "length". The output is incomplete and schema-invalid. - Content filter block — output is blocked post-generation.
Schema adherence is only guaranteed for normal completions. Handle all three cases explicitly.
const completion = await client.beta.chat.completions.parse({ ... });
const message = completion.choices[0].message;
if (message.refusal) {
throw new Error(`Model refused: ${message.refusal}`);
}
if (completion.choices[0].finish_reason === "length") {
throw new Error("Response truncated at max_tokens");
}
const data = message.parsed; // safe here
Anthropic strict tool use
Anthropic’s strict tool use adds grammar-constrained sampling to function calling. Setting strict: true on a tool definition combined with tool_choice: { type: "any" } gives you a dual guarantee: a tool will be called AND its inputs will strictly match the schema.
Strict tool use exited public beta on January 29, 2026 — no beta header required after that date. It is available on Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5, and all later Claude API models.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools = [
{
name: "get_weather",
description: "Returns weather data for a city",
input_schema: {
type: "object" as const,
properties: {
city: { type: "string", description: "City name" },
unit: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["city"],
},
strict: true,
},
];
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
tools,
tool_choice: { type: "any" },
messages: [{ role: "user", content: "What is the weather in Paris?" }],
});
// response.stop_reason === "tool_use"
const toolUseBlock = response.content.find((b) => b.type === "tool_use");
The same three failure modes from the OpenAI section apply here too — safety refusals, truncation, and content filter blocks all bypass the guarantee.
Claude Mythos Preview does not support forced tool use. Requests with tool_choice: { type: "any" } or tool_choice: { type: "tool", name: "..." } return a 400 error on that model. If you are targeting Mythos Preview, use tool_choice: auto and rely on prompting — the strict: true schema constraint still applies to whatever tool the model chooses to call.
Anthropic schema limitations
Anthropic’s structured outputs do not support:
- Recursive schemas
- Numerical constraints (
minimum,maximum,multipleOf) - String length constraints (
minLength,maxLength) additionalPropertiesset to anything other thanfalse
The official SDKs strip unsupported constraints client-side and move them to description fields. Numeric range and string length validation must happen in your application code. This is a meaningful limitation if your schema relies on these constraints for correctness — a z.number().min(0).max(100) silently becomes a plain number field at the API layer.
Whether the Vercel AI SDK handles Anthropic’s schema limitations gracefully when targeting Claude models is an open question. The SDK layer behavior is unverified — test it with your specific schema before relying on it in production.
Zod for type-safe parsing
Zod is the de-facto validation layer in TypeScript LLM pipelines. Two methods matter:
.parse() throws on failure. It returns a strongly-typed deep clone of the input on success. Use it when schema violation is always a bug.
.safeParse() never throws. It returns a discriminated union: { success: true; data: T } | { success: false; error: ZodError }. Use it for LLM outputs where partial or invalid responses are expected.
import { z } from "zod";
const ArticleSchema = z.object({
title: z.string(),
tags: z.array(z.string()),
publishedAt: z.string().datetime(),
});
// Option A: throws — use when schema violation is always a bug
const article = ArticleSchema.parse(llmOutput);
// Option B: discriminated union — use for LLM outputs
const result = ArticleSchema.safeParse(llmOutput);
if (!result.success) {
console.error(result.error.issues);
// e.g. [{ path: ["publishedAt"], code: "invalid_string", message: "Invalid datetime" }]
} else {
const article = result.data; // fully typed
}
// Option C: async schemas require async variants
const articleAsync = await ArticleSchema.safeParseAsync(llmOutputStream);
On error, access result.error.issues (not errors — the errors alias was removed in Zod v4). Verified on v4.4.3.
Vercel AI SDK v6: generateText with structured output
The Vercel AI SDK v6 is a breaking change. generateObject and streamObject are deprecated and will be removed. The replacement is generateText and streamText with an output parameter.
| Concept | v5 (deprecated) | v6 (current) |
|---|---|---|
| Structured object | generateObject({ schema }) | generateText({ output: Output.object({ schema }) }) |
| Streaming | streamObject({ schema }) | streamText({ output: Output.object({ schema }) }) |
| Partial stream | partialObjectStream | partialOutputStream |
| Import | { generateObject } from 'ai' | { generateText, Output } from 'ai' |
Many tutorials still use v5 syntax. If you copy-paste SDK examples from articles older than late 2025, you are likely looking at deprecated code.
Automated migration: npx @ai-sdk/codemod upgrade v6
v6 code example
import { generateText, streamText, Output } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";
const ArticleSchema = z.object({
summary: z.string().describe("One-paragraph summary"),
temperature: z.number(),
recommendation: z.string(),
});
// generateText with structured output
const { output } = await generateText({
model: anthropic("claude-sonnet-4-6"),
output: Output.object({ schema: ArticleSchema }),
prompt: "Analyze the weather in San Francisco for a developer blog post.",
});
// output is typed as z.infer<typeof ArticleSchema>
// Streaming version
const { partialOutputStream } = streamText({
model: anthropic("claude-sonnet-4-6"),
output: Output.object({ schema: ArticleSchema }),
prompt: "...",
});
for await (const partial of partialOutputStream) {
console.log(partial);
}
v6 offers five output modes via Output.*:
| Mode | Use case |
|---|---|
Output.object({ schema }) | Single structured object with schema validation |
Output.array({ element }) | Array of typed objects; per-element validation |
Output.choice({ options }) | Classification from a fixed string set |
Output.json() | Unstructured JSON, no schema enforcement |
Output.text() | Plain text (default) |
One architectural change from v5: structured output now counts as a step in multi-step tool-calling loops. If you combine tools with structured output, configure stopWhen explicitly. The error type on failure is AI_NoObjectGeneratedError, which preserves text, response, usage, and cause for debugging.
Zod, Valibot, Arktype, and any library implementing the Standard JSON Schema interface work natively in v6.
Comparison table
| Mechanism | Schema guarantee | Failure modes that bypass guarantee | Notable schema limitations |
|---|---|---|---|
| JSON mode (OpenAI / Anthropic) | Syntactically valid JSON only | N/A — no schema guarantee | None relevant |
OpenAI Structured Outputs (strict: true) | Grammar-constrained | Safety refusal, max_tokens truncation, content filter | None (supports most JSON Schema features) |
OpenAI function calling (no strict) | None | All of the above | — |
Anthropic strict tool use (strict: true + any) | Grammar-constrained | Safety refusal, max_tokens truncation, content filter | No recursive schemas, no numeric/string length constraints |
Anthropic tool use (no strict) | None | All of the above | — |
Vercel AI SDK v6 Output.object | Depends on target model | Propagates from underlying provider | Depends on model — Anthropic gaps pass through |
Recommendations
For new production pipelines on OpenAI: use client.beta.chat.completions.parse() with zodResponseFormat. Handle message.refusal and finish_reason === "length" explicitly.
For new production pipelines on Anthropic: use strict tool use with tool_choice: { type: "any" }. Strip any numeric or string-length constraints from your Zod schemas before passing to the API — they are silently dropped.
For multi-provider abstractions: the Vercel AI SDK v6 is a reasonable choice, but verify Anthropic schema behavior with your specific schemas before shipping. SDK-level behavior for unsupported Anthropic schema features is not fully documented.
For validation layer: .safeParse() everywhere in LLM output processing. .parse() only when a schema violation is genuinely a programmer error.
For streaming use cases: streamText with Output.object in v6. The partialOutputStream iterator gives you progressively typed partial objects.
For cost optimization: once schema compliance is stable, LLM cost routing: when Haiku beats Opus and when it does not covers when classification and extraction workloads can move to cheaper models without output quality degradation.
For token budget management: if large system prompts push structured-output responses against max_tokens, prompt caching in 2026 — Anthropic, OpenAI, and Gemini compared explains how all three major providers handle prefix caching, cutting repeated-context costs by up to 90%.
Gotchas and edge cases
OpenAI Python SDK — nested Pydantic models with field descriptions
If you use nested Pydantic models in strict mode and add Field(description=...) to a field whose type is another Pydantic model, the SDK sends invalid JSON Schema to the API and you get a 400 BadRequestError. The root cause: JSON Schema for a $ref alongside extra properties requires inline expansion; the prior code path skipped recursive strict coercion on the expanded object.
Fix: upgrade openai-python to a version that includes PR #2025 (merged January 17, 2025). Any version after that date is safe.
# Python — what triggers the bug (fixed in openai-python post-2025-01-17)
from openai import OpenAI
from pydantic import BaseModel, Field
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
# This Field(description=...) on a nested model type caused the 400 error
address: Address = Field(description="Home address")
name: str
Anthropic — no numeric constraints at the API
z.number().min(0).max(100) in your Zod schema produces no constraint at the Anthropic API layer. The SDK strips minimum, maximum, and multipleOf silently. Validate ranges in application code after parsing.
Anthropic — extended thinking incompatibility with forced tool_choice
When extended thinking is enabled, tool_choice: { type: "any" } and tool_choice: { type: "tool", name: "..." } are not supported and return a runtime error. Only tool_choice: { type: "auto" } (the default) and tool_choice: { type: "none" } work alongside extended thinking.
The production recommendation in this article — strict tool use with tool_choice: { type: "any" } — does not apply when you add extended thinking. If you enable both, the API will reject the request. Use tool_choice: auto and rely on prompting when extended thinking is required.
Zod v4 — errors alias removed
error.errors was an alias for error.issues in Zod v3. It was removed in v4. If you are upgrading from v3, search for .errors and replace with .issues.
Vercel AI SDK v6 — generateObject still exists but is deprecated
generateObject still works in v6 but is deprecated and will be removed. Running it now produces no warning at runtime. Watch for it in your codebase with:
grep -r "generateObject\|streamObject" src/ --include="*.ts"
Anthropic × Vercel AI SDK schema compatibility
Whether the Vercel AI SDK v6 translates Zod schemas for Anthropic by stripping unsupported features is unverified. A GitHub issue (#13355) confirms the API rejects schemas with unsupported properties — but whether the SDK handles this translation before the request hits the API is not documented. Test your specific schemas against Claude models before relying on the SDK as an abstraction layer for this.
References
- OpenAI Structured Outputs announcement
- OpenAI Structured Outputs guide
- OpenAI changelog
- OpenAI Python SDK PR #2025 (Pydantic $ref fix)
- Anthropic tool use — implement tool use
- Anthropic strict tool use
- Anthropic structured outputs — schema limitations
- Anthropic API release notes
- Zod documentation
- Vercel AI SDK v6 — generating structured data
- Vercel AI SDK v6 migration guide