Skip to content
Chapter 106Lesson 2

Zod schemas as the model contract

How the Vercel AI SDK turns a model's free text into a typed, validated object, using a Zod schema as the contract with generateObject and streamObject.

A customer pastes a paragraph into your invoice app:

“2x logo design at $400, 1x brand guidelines at $1200, due net 30.”

You don’t want a model to talk back about this. You want a row:

{
items: [
{ description: 'logo design', quantity: 2, unitAmount: 400 },
{ description: 'brand guidelines', quantity: 1, unitAmount: 1200 },
],
dueDate: '2026-07-15',
}

The next line of your code inserts that object with Drizzle. Prose won’t do. You need a typed value, with the keys you expect, in the types you expect, ready to hand to the next function. The job here is to map free text onto a shape your database already understands.

If you’d only met streamText from the last lesson, the temptation is to reach for it: write a system prompt that begs the model to “respond in JSON,” then JSON.parse the reply and hope. That works in the demo and breaks in production, and the start of this lesson explains why it’s a class of bug rather than a shortcut.

The right tool is the focus of this lesson: generateObject and streamObject. You hand them a Zod schema, and the SDK turns that schema into the model’s instructions, validates the reply against it, retries when the model misses, and hands you back a typed object. At the call site, result.object.unitAmount is a number, with no cast and no parsing. Because the contract lives in the schema rather than in prose you’ve hand-tuned for one model, this is the swap-friendly call shape: the same schema works against Claude, GPT, or Gemini behind the model handle you set up earlier.

You’ve already met both halves of this. The previous lesson gave you the route-handler seam: authedRoute, the rate-limit and quota gates, onFinish for the usage and audit writes, maxOutputTokens on every call, and the model handle imported from lib/llm/models.ts. And you’ve known Zod 4 since you first started validating forms. This lesson is that same seam with one call swapped, and your Zod schemas pointed at a new reader: the model.

Start with the decision, before any syntax. An experienced engineer on the 2026 stack does not ask a model to “respond in JSON” and parse the string. It’s worth being precise about why, because the failure is invisible in testing.

When you prompt for JSON and JSON.parse the reply, you’re trusting the model to produce well-formed, correctly-keyed JSON every single time, with nothing enforcing it. Three things go wrong, and all three pass your first demo:

  • The model renames or invents keys. You asked for unitAmount; it returns unit_price on one call and amount on another. Your result.unitAmount is silently undefined.
  • It wraps prose around the JSON. “Here’s the data you asked for:” followed by a ```json fence. JSON.parse throws on the first non-{ character, and now you’re writing a regex to fish the object out of a string.
  • It drifts on the next model update. The format you tuned against this quarter’s model shifts when the provider ships a new one. Nothing in your code changed; the output did.

generateObject removes all three by construction. It constrains the model’s output to your schema, validates the reply with Zod, retries the model when the output doesn’t fit, returns a typed object, and absorbs provider differences: the same schema produces the same shape whether the call lands on OpenAI, Anthropic, or Google.

That last point is the one the whole lesson hangs on. Free-form streamText couples your contract to prompt-engineering, and prompt-engineering is tuned to a specific model, so changing the model means re-tuning. generateObject moves the contract into the schema, and a Zod schema is provider-independent. A surface built on structured output therefore swaps providers cleanly, which is the same abstraction discipline you set up when you put every model behind lib/llm/models.ts. The rule is short: whenever the workload allows structured output, reach for it.

The contrast is the teaching here, so look at the two call sites side by side.

const result = await generateText({
model: fastModel,
system: 'Extract the line item. Respond with ONLY valid JSON, no prose.',
prompt,
maxOutputTokens: 500,
});
try {
const item = JSON.parse(result.text);
} catch {
// and now what? the model wrapped it in a code fence again
}

Fragile. The model can rename keys, wrap prose around the JSON, or drift on the next model update, and nothing here catches it. JSON.parse throws on the first non-{ character, leaving you to write a regex to fish the object out of a string.

The minimal call: schema in, typed object out

Section titled “The minimal call: schema in, typed object out”

Here’s the spine the rest of the lesson builds on: three lines of decision wrapped around one Zod object.

Start with the schema. We’ll use one shape for the entire lesson, an invoice line item, so you track a single contract instead of re-reading a new one in every section. Keep it bare for now; descriptions come next, staged so you see the plain shape first.

import { z } from 'zod';
const invoiceLineItemSchema = z.object({
description: z.string(),
quantity: z.number(),
unitAmount: z.number(),
});

That’s a normal Zod schema, with nothing AI-specific about it. The interesting part is what generateObject does with it.

const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 500,
});
const lineTotal = object.quantity * object.unitAmount;

The model handle, imported from lib/llm/models.ts, never an inline openai('gpt-X'), the same rule as the last lesson. fastModel is the right pick for extraction: it’s cheap, and the work is mechanical enough not to need the expensive model.

const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 500,
});
const lineTotal = object.quantity * object.unitAmount;

The Zod object is the contract. The SDK serializes it and sends it to the model as the spec for what to produce. This one field is the whole reason we’re not using streamText.

const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 500,
});
const lineTotal = object.quantity * object.unitAmount;

Still non-optional. Structured output is not exempt from the cost cap; every call in this course carries one.

const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 500,
});
const lineTotal = object.quantity * object.unitAmount;

The return is destructured to { object, usage, finishReason }. object is typed by the schema, so object.unitAmount is a number, not any. That’s the TypeScript win: no cast, no JSON.parse, no post-validation. You go straight from model to typed value.

1 / 1

Notice what is not there. There’s no JSON.parse, and no safeParse after the call. The SDK already validated the model’s output against your schema before it handed you object, so adding your own parse step would re-check work that’s already done.

What makes “serialize the schema and send it” possible is JSON Schema , the universal way to describe a JSON shape that every provider’s structured-output mode speaks. Your Zod schema is the source, and JSON Schema is the wire format the model receives. You never write it by hand, but knowing it’s the intermediary explains the next section.

Descriptions are the model’s documentation

Section titled “Descriptions are the model’s documentation”

The highest-leverage habit in schema design for AI is also the one beginners skip: a .describe() on every field that isn’t self-explanatory. This is where a flaky extraction becomes a reliable one, and it costs you one method call.

Here’s the mechanism. When the SDK serializes your schema to JSON Schema, each field’s .describe() string rides along as that field’s documentation, and the model reads it the way you’d read a spec before filling out a form. A bare field is a name and a type and nothing else, so the model guesses. A described field is an instruction.

Watch what that does to a date. dueDate: z.iso.datetime() with no description leaves the model to invent a format, so it hands you "30 days" on one call, "net 30" on another, and "2026-07-15" on a third. Add the documentation, z.iso.datetime().describe('ISO 8601 datetime, the date the invoice must be paid by'), and it extracts cleanly and consistently. Note z.iso.datetime(), not z.string(): the top-level format builder is the convention for dates, and it carries its own JSON Schema shape so the model knows it’s a datetime, not free text.

So the reflex is to give every non-obvious field a .describe, so the schema reads like a spec for a human contractor. It’s the same Zod you’ve always written; the only new thing is the reader. The clearest place this matters is units, so look at the before and after of the same schema.

const invoiceLineItemSchema = z.object({
description: z.string(),
quantity: z.number(),
unitAmount: z.number(),
});

A teaching foil, not a shape you’d ship. The fields are typed correctly but undocumented, so the model has to guess. unitAmount could come back as dollars, cents, or the line total, and you won’t know which until a wrong invoice ships.

One nuance, which foreshadows a cost watch-out later: descriptions are part of the prompt, so they cost input tokens on every call. Be generous on the fields that are genuinely ambiguous, and terse on the obvious ones. A field named description doesn’t need a paragraph explaining that it’s a description. Write specs, not essays.

Now write one yourself. The exercise below gives you a half-finished line-item schema. Tighten it so every fixture lands the right way. The fixtures pin the shape of the contract, which is the part you can prove in the browser.

Here's a half-described invoice line-item schema. Tighten it so every fixture passes. The field types are the structural floor — get those right, and watch the `^?` query firm up the inferred `LineItem` as you go. Descriptions don't change what `safeParse` accepts, so they aren't graded here, but write them anyway: in a real call they're what the model reads.

Booting type-checker…
Test scenario Value
well-formed line item {"description":"logo design","quantity":2,"unitAmount":400}
quantity sent as a string {"description":"logo design","quantity":"2","unitAmount":…
missing description {"quantity":2,"unitAmount":400}
free line item (zero amount) {"description":"goodwill credit","quantity":1,"unitAmount…

The drill grades the schema’s shape, meaning its types and required fields, because that’s what safeParse checks and what runs in the browser. The description discipline rides in the prose and in the call, not in the grader. That split is honest about what’s testable here, and it still puts the real skill under your hands: designing a schema that survives contact with a model.

What the model can render: schema-shape constraints

Section titled “What the model can render: schema-shape constraints”

Not every Zod schema survives the trip through JSON Schema. Many serialize cleanly and extract reliably; a few break the export or quietly degrade the model’s accuracy. This is a short reference section: learn the rule, then sort the cases.

The top level must be an object. Your root is always z.object(...). Most providers reject a bare array or a bare primitive at the top of a structured-output call. You’ll see in a moment how to “return a list” without violating this, since there’s a mode for it.

These are safe inside the object: strings, numbers, booleans, z.enum([...]), nested objects, and arrays of objects. All of them serialize and extract cleanly. This covers the large majority of real schemas.

Unions cost accuracy, so prefer a discriminator. A plain z.union([...]) asks the model to infer which branch its output should match, which is a shape it has to guess at. A z.discriminatedUnion('kind', [...]) gives it an explicit kind label to pick instead. Reserve bare z.union for genuinely shapeless alternatives, and reach for the discriminated form whenever the variants are tagged. The idea is to give the model a label to choose, not a shape to infer.

Three things break or degrade structured output, and they’re worth memorizing as a set. z.any() and z.unknown() have nothing to serialize, so the model gets zero guidance about what goes there. z.transform() is code, and JSON Schema can’t represent code, so the SDK can’t send it. A recursive schema, one that references itself, blows up the JSON Schema export. If a field needs one of these, the structured-output call is the wrong tool for that field.

Sort the cases below into the two buckets. Each chip is a small schema fragment; decide whether it survives serialization or breaks it.

Each fragment is a piece of a schema you'd send to a model. Sort each into whether it serializes to JSON Schema cleanly, or breaks / degrades the structured-output call. Drag each item into the bucket it belongs to, then press Check.

Model-safe Serializes and extracts cleanly
Breaks or degrades Won't serialize, or the model can't comply
z.object({ ... }) at the root
z.enum(['draft', 'sent', 'paid'])
z.array(invoiceLineItemSchema) inside an object
z.discriminatedUnion('kind', [...])
z.any()
a recursive z.lazy(() => nodeSchema) tree
z.string().transform((s) => s.trim())
a bare z.array(...) at the top level

The schema is the floor, the prompt is the suggestion

Section titled “The schema is the floor, the prompt is the suggestion”

This section is one decision, and getting it wrong is the most expensive beginner mistake in structured output. The decision is which constraints belong in the schema, and which belong in the prompt.

Here’s the mechanism that forces the call. A Zod .refine() runs at validation time, on the object the model already returned. If the refinement fails, the SDK doesn’t patch the object; it retries the model, and every retry is a full, paid call. So a constraint the model can’t reliably satisfy isn’t a guardrail. It’s a cost amplifier that thrashes the retry loop and burns your budget, one failed call at a time.

That gives you a clean cut, which is the section’s title made concrete:

  • Hard structural constraints go in the schema: types, enums, required fields. The model can always satisfy these, because they’re about shape, and shape is what structured-output mode enforces natively. The schema is the floor: every output clears it or the call fails.
  • Soft constraints go in the prompt: formatting conventions, house style, “invoice numbers follow INV-XXXX.” These shape the common case without making a single miss expensive. The prompt is the suggestion.

The classic anti-example is a model-generated invoice number. Look at the wrong way and the right way side by side.

const invoiceSchema = z.object({
invoiceNumber: z.string().refine((s) => s.startsWith('INV-')),
// ...
});

Burns a retry on every miss. When the model returns INV/0001 or 2026-INV-1, the refine rejects it, and the SDK pays for a fresh call to try again.

This doesn’t mean never refine. There’s a legitimate use: cross-field invariants the model controls and can satisfy, like endDate being on or after startDate, or a total equalling the sum of its lines. There, .refine is a real guard against a genuinely wrong object, not a thrash. The line to hold is the difference between a constraint the model can always meet and a format you’re hoping it hits.

Picking the output shape: object, enum, array, and streaming

Section titled “Picking the output shape: object, enum, array, and streaming”

So far every call has been generateObject returning one record. That’s the default, but it’s one of four shapes, and the experienced move is to pick among them by naming the workload, not by reaching for the one you used last time. Each maps to a sentence you can say out loud about the task.

One structured record → generateObject with a z.object. The default you already know. Extract a line item, fill a form, parse one thing into one shape.

One value from a known set → output: 'enum'. When the answer isn’t a record at all but a single label, like a sentiment, an intent, or a priority bucket, the schema is overkill. generateObject({ model: fastModel, output: 'enum', enum: ['low', 'medium', 'high'], prompt, maxOutputTokens: 20 }) returns a single string from the set you gave it. Same retry behavior, less overhead, cheaper call. Reach for it when the result is a label, not a record.

A list of records → output: 'array'. When the workload is “extract all the line items,” you want an array. You could wrap it: z.object({ items: z.array(invoiceLineItemSchema) }) works and clears the top-level-object rule. But output: 'array' is the idiom that names the workload directly, and generateObject({ model: fastModel, output: 'array', schema: invoiceLineItemSchema, prompt, maxOutputTokens: 800 }) hands you object typed as LineItem[]. This is also the answer to the top-level-object rule, since the mode handles the wrapping for you.

A large output read field-by-field → streamObject. Same schemas, but instead of waiting for the whole object, it streams partial objects as the fields populate. When the schema is big, like a multi-section summary or a long list of extracted lines, and the user reads the result top-to-bottom as it arrives, this turns a four-second spinner into four seconds of progressive fill. The route handler returns result.toTextStreamResponse(), and on the client the useObject hook renders the partial object as it grows. You’ll build that client side in the next lesson; here, just know streamObject is the server primitive that feeds it.

The lesson isn’t the four call signatures; it’s the order you ask the questions in. Walk it.

Which output shape?

For reference after the walk, here are the four call sites in one place. The model and maxOutputTokens discipline is the same throughout; only the shape changes.

// One record
const { object } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 500,
});
// One label
const { object: priority } = await generateObject({
model: fastModel,
output: 'enum',
enum: ['low', 'medium', 'high'],
prompt,
maxOutputTokens: 20,
});
// A list
const { object: items } = await generateObject({
model: fastModel,
output: 'array',
schema: invoiceLineItemSchema,
prompt,
maxOutputTokens: 800,
});
// A large output, streamed
const result = streamObject({
model: chatModel,
schema: invoiceSummarySchema,
prompt,
maxOutputTokens: 1500,
});

The same seam, the retry knob, and the audit write

Section titled “The same seam, the retry knob, and the audit write”

Here’s the part that should feel anticlimactic, and that’s the point: structured output lives behind the exact same route handler you built in the last lesson. You don’t learn a second handler shape. You swap one call.

generateObject sits inside the same authedRoute('member', schema, fn) wrapper, behind the same rate-limit and quota gates, as streamText. One thing does shift, and it’s worth being precise about. generateObject is the awaited, non-streaming primitive, so it has no onFinish: it resolves with { object, usage, finishReason } directly, and your token accounting and the llm.call.completed audit write run inline after the await, reading usage off that result. streamObject is the one that keeps an onFinish. It returns before the call completes, so the handler returns result.toTextStreamResponse() immediately, and onFinish owns the post-call write because there’s no later line to run it on. The audit event lands either way; it’s just written in two different slots. The rest of the stack is unchanged, and only the one call differs.

export const POST = authedRoute(
'member',
extractLineItemSchema,
async ({ body, orgId }, request) => {
const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt: body.text,
maxOutputTokens: 500,
maxRetries: 1,
abortSignal: request.signal,
});
await recordLlmUsage({ orgId, usage, event: 'llm.call.completed' });
if (finishReason !== 'stop' || !object) {
return problem(422, 'Could not extract a line item from that text.');
}
const line = await insertInvoiceLine(orgId, object);
return Response.json({ line });
},
);

The same wrapper from the last lesson: auth, role check, body validation, rate-limit, and quota, all lifted out of the handler body. The quota layer is the withLlmQuota(...) composition wrapped around authedRoute from the cost chapter. Structured output doesn’t get a different seam.

export const POST = authedRoute(
'member',
extractLineItemSchema,
async ({ body, orgId }, request) => {
const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt: body.text,
maxOutputTokens: 500,
maxRetries: 1,
abortSignal: request.signal,
});
await recordLlmUsage({ orgId, usage, event: 'llm.call.completed' });
if (finishReason !== 'stop' || !object) {
return problem(422, 'Could not extract a line item from that text.');
}
const line = await insertInvoiceLine(orgId, object);
return Response.json({ line });
},
);

This is the one line that differs from the streamText handler. Everything around it is unchanged.

export const POST = authedRoute(
'member',
extractLineItemSchema,
async ({ body, orgId }, request) => {
const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt: body.text,
maxOutputTokens: 500,
maxRetries: 1,
abortSignal: request.signal,
});
await recordLlmUsage({ orgId, usage, event: 'llm.call.completed' });
if (finishReason !== 'stop' || !object) {
return problem(422, 'Could not extract a line item from that text.');
}
const line = await insertInvoiceLine(orgId, object);
return Response.json({ line });
},
);

generateObject is awaited, so usage comes back on the resolved result, with no onFinish. The token accounting and the llm.call.completed audit write run right here, inline after the await. (streamObject is the one that keeps an onFinish, because it returns before the call completes.)

export const POST = authedRoute(
'member',
extractLineItemSchema,
async ({ body, orgId }, request) => {
const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt: body.text,
maxOutputTokens: 500,
maxRetries: 1,
abortSignal: request.signal,
});
await recordLlmUsage({ orgId, usage, event: 'llm.call.completed' });
if (finishReason !== 'stop' || !object) {
return problem(422, 'Could not extract a line item from that text.');
}
const line = await insertInvoiceLine(orgId, object);
return Response.json({ line });
},
);

The retry knob. When the model returns output that fails Zod parsing, the SDK retries, and the documented default is two. Each retry is a full paid call, so it’s a cost lever, not a reliability dial: a well-described schema with a sane floor rarely misses, so 1 is often right and caps the spend. Raise it only when the workload genuinely benefits.

export const POST = authedRoute(
'member',
extractLineItemSchema,
async ({ body, orgId }, request) => {
const { object, usage, finishReason } = await generateObject({
model: fastModel,
schema: invoiceLineItemSchema,
prompt: body.text,
maxOutputTokens: 500,
maxRetries: 1,
abortSignal: request.signal,
});
await recordLlmUsage({ orgId, usage, event: 'llm.call.completed' });
if (finishReason !== 'stop' || !object) {
return problem(422, 'Could not extract a line item from that text.');
}
const line = await insertInvoiceLine(orgId, object);
return Response.json({ line });
},
);

The defensive guard. Even with retries, the model can genuinely fail to comply and hand back an empty object, or the call can hit a non-stop finish reason. Check before the Drizzle insert and return a clean failure, so object.description never blows up on null.

1 / 1

The two new things in that handler are both one line. maxRetries: 1 is the cost knob covered in the step above, and it’s the reason the floor-vs-suggestion section mattered: a schema the model can reliably satisfy almost never spends a retry. The finishReason !== 'stop' || !object guard is the reminder that not everything always succeeds. Most of the time object arrives fully populated, but the rare genuine failure, where the model truly can’t comply or generation errors out, should surface as a clean 422 rather than a crash on object.description. One check before the insert buys you that.

The structured-output lifecycle, end to end

Section titled “The structured-output lifecycle, end to end”

One picture consolidates the whole lesson. The thing to see is the schema’s dual role: it travels out to the model as documentation, and comes back as the validator. The retry loop you were warned about sits right in the middle, where the cost lives.

%%{init: {'themeCSS': '.messageText, .messageText tspan, .noteText, .noteText tspan, .labelText, .labelText tspan { font-size: 16px !important; }'} }%%
sequenceDiagram
    actor Client
    participant Handler as Route handler
    participant SDK as AI SDK
    participant Model

    Client->>Handler: free-text input
    Note over Handler: authedRoute + quota gate pass (Lesson 1)
    Handler->>SDK: generateObject({ schema })
    SDK->>Model: prompt + schema as JSON Schema
    Note right of SDK: schema goes OUT as the spec
    Model-->>SDK: raw output
    SDK->>SDK: validate with Zod
    Note right of SDK: schema comes BACK as the guard
    loop on validation failure, up to maxRetries
        SDK->>Model: retry (a full paid call)
        Model-->>SDK: raw output
    end
    SDK-->>Handler: typed object + usage
    Note over Handler: handler writes the audit event after the await (Lesson 1)
    Handler-->>Client: response
The schema makes two trips: out to the model as the JSON Schema spec, and back through the SDK as the Zod validator. The retry loop is where over-strict schemas spend your budget.

Trace that same flow yourself, in order. It’s the cleanest recall check of everything above.

Put the steps of a `generateObject` call in the order they happen, from the user's text to the response. Drag the items into the correct order, then press Check.

User input reaches the route handler
authedRoute and the quota gate pass
generateObject serializes the schema to JSON Schema and sends it with the prompt
The model returns structured output
The SDK validates the output against the Zod schema
On a validation miss, the SDK retries the model
The typed object returns and the handler writes the audit event after the await
The handler responds to the client

You can now write the structured-output calls the invoice app actually needs: extract line items from a pasted description with generateObject, classify an inbound payment email into a status bucket with output: 'enum', and draft an invoice description from a brief. Each is the same seam from the last lesson with a Zod schema as the contract, and because that contract is provider-independent, each one swaps providers without a re-tune.

The reflex to carry forward is that even when the surface looks conversational, like the ask-your-invoices chat you’ll build later, the tools that conversation calls use this exact Zod discipline for their inputs and outputs. Structured output isn’t a niche; it’s the spine under a lot of what an AI feature does. It’s the swap-friendly call shape, so prefer it whenever the workload is extraction, classification, or form-fill.