Chapter 108Lesson 2

Streaming route under auth with the agentic loop

The starter boots, but the chat rail on /invoices is dead: the textarea is disabled, and POST /api/chat answers every request with a 501. Your job in this lesson is to bring that endpoint to life. By the end, acting as a member of org-acme, you type a question into a temporary chat box and watch a text answer stream back — and if anyone who isn’t an authenticated org member hits the same route, it refuses them before the model ever runs.

This is the spine the rest of the project hangs off. Lessons 3, 4, and 5 add the tool that reads real invoice data, the per-user token budget, and the polished UI — but none of that can exist until there is a streaming route that only runs for the right caller, with the agentic loop already capped. So the slice you build here is deliberately narrow: a streaming answer with no tools yet, gated by auth, with the cost ceilings bolted on before there’s anything expensive to run. To prove it works without the real chat UI (that comes in the last lesson of this chapter), you’ll stand up a throwaway chat box that renders whatever streams back as plain text.

There’s no screenshot to chase. The finished result is behavioral: type tell me a joke about invoices, watch text appear in the temporary box; flip the inspector’s BYPASS_AUTHED_ROUTE flag and watch the same request come back 401; open /inspector after a conversation and find exactly one llm.finish row sitting in the LLM audit-events tail.

Your mission

You’re wiring POST /api/chat to stream an LLM answer, and the order in which you do it is the whole lesson. The reflex an experienced engineer reaches for here is cap and wrap first, then add capability — the guardrails go in before the thing they guard exists, so capability can never be added without them. Three guardrails: the auth boundary, the step cap, and the output-token ceiling. All three land in this lesson, while the route still has nothing to call.

Start with the boundary. The route is wrapped in authedRoute('member', …), the same auth wrapper that guards every mutation in this app — the Request/Response sibling of authedAction, built back in The authedRoute twin. Calling streamText from a bare POST handler is the canonical bug class for an LLM endpoint: it answers any caller on the internet, burns tokens on their behalf, and has no orgId to scope an answer to. Streaming a response forces a route handler rather than a Server Action — actions can’t stream — which is exactly why authedRoute exists as the route-handler twin of authedAction.

Then the loop cap. streamText runs an agentic loop: once tools exist, the model can call a tool, read the result, and decide to call another, round and round. stopWhen: stepCountIs(5) caps that at five steps. You set it now, in this lesson, even though there are no tools to call yet and the cap will never fire — because the moment a tool lands in the next lesson, the runaway-loop window opens, and a cap added after the fact is a cap you forgot to add under pressure. The SDK’s default is stepCountIs(20), far too loose for a surface that bills a token budget per user. Riding alongside it is maxOutputTokens: 1024, the ceiling on a single answer’s length. Treat a missing output cap as exactly as serious as a missing auth check: both are the difference between a bounded cost and an unbounded one.

The system prompt is the third piece, and it is a controller, not a greeting. It does four jobs: it scopes answers to the acting org by name, it forces the model to ground numeric claims in a tool call rather than inventing them, it refuses questions about any other org’s data, and it defines what to do when a tool hands back an error. One subtlety worth internalizing now: the org name gets templated into the prompt string, but the user’s typed question never does — it stays in the messages array. That separation is the prompt-injection boundary. Trusted instructions live in the system prompt you control; untrusted user input stays in the message stream where the model treats it as data to answer, not instructions to obey.

Two seams from the Vercel AI SDK v5 surface exactly once here. convertToModelMessages translates the rich UIMessage[] shape the client renders (with its parts and tool invocations) into the flat ModelMessage[] shape the model expects on the wire — forgetting it is the classic first-week-with-v5 bug. And toUIMessageStreamResponse() is what makes the response a stream the client’s useChat can parse; its lookalike toTextStreamResponse() speaks a different protocol that useChat won’t understand. Both are owned by streamText, generateText, and the route-handler seam — lean on it rather than re-deriving them.

One trade to name rather than gold-plate: the route’s input schema accepts z.array(z.unknown()) for messages. Validating the full UIMessage shape with Zod is heavy and redundant, because convertToModelMessages does the real structural validation downstream. Naming that trade is the senior move; over-validating it would be wasted code.

Finally, the audit lineage. Every finished conversation writes one row to an append-only log, the same one-row-per-event discipline you built in The append-only audit log. Two things to keep straight: this is a different table from that chapter’s auditLogs (it’s the LLM-specific events log), so the writer is not pushAudit; and against real Postgres each write is a single INSERT into llm_audit_events. Here a single array push stands in for that bounded one-row write.

Out of scope, named so you don’t reach for them: the getInvoiceStats tool and its per-step audit are the next lesson; the withLlmQuota reservation that will eventually wrap this whole route is the lesson after; the real parts-rendering chat client is the last lesson of this chapter. onError returns a sanitized log and never leaks a raw error to the client — that’s the one error-handling rule that applies here and now.

Typing a question such as “tell me a joke about invoices” streams a text answer back into the temporary chat box.

untested

With BYPASS_AUTHED_ROUTE on, POST /api/chat is refused with 401 and the model never runs.

untested

A question that asks for another organization’s data is refused in the answer text (the system prompt as controller).

untested

Every completed conversation writes exactly one llm.finish row (with finishReason: 'stop') to the LLM audit-events tail, scoped to the acting org.

untested

Coding time

Implement the four files this slice touches against the brief above — src/lib/llm/prompts.ts, src/lib/llm/audit.ts, src/app/api/chat/route.ts, and a throwaway smoke-test src/app/(app)/invoices/invoice-chat.tsx — then open the walkthrough to compare.

Reference solution and walkthrough

We’ll build in dependency order: the prompt and the audit writers first (the route imports both), then the route itself, then the temporary client that exercises it.

The system prompt

src/lib/llm/prompts.ts ships as a one-line placeholder. Replace it with the four-rule controller:

import 'server-only';

// The system prompt is the controller. It templates the org display name (never
// user input — that stays in `messages`, the prompt-injection rule) and carries
// three load-bearing rules: tools are the only doorway to numbers, cross-org
// questions are refused, and a tool `{ error }` is read back as a graceful note.
export const invoiceQAPrompt = (ctx: { orgName: string }): string =>
  [
    `You answer questions about invoices for ${ctx.orgName} only.`,
    'Always call getInvoiceStats before stating any numeric fact about invoices — never guess counts, totals, or status breakdowns from memory.',
    `Refuse questions about any other organization's invoices; you can only see ${ctx.orgName}'s data.`,
    'If getInvoiceStats returns an { error }, tell the user the stats are unavailable right now and to try again — do not invent numbers.',
  ].join('\n');

Four rules joined with newlines into one string. Notice that two of them already name getInvoiceStats, a tool that doesn’t exist until the next lesson — that’s intentional. The prompt describes the contract you’re about to build the model against, and writing it now means the moment the tool lands, the model already knows to reach for it. The import 'server-only' marks this module as server-side: a system prompt is an instruction surface you never want shipped to the browser, and the import makes a bundling mistake a build error.

The single line carrying the most weight is ${ctx.orgName} — and equally, what is absent. The org name is trusted data you control, so it gets interpolated. The user’s question is untrusted, so it never touches this string; it flows separately through the messages array. That split is the prompt-injection boundary in one decision.

The audit writers

src/lib/llm/audit.ts ships with both functions stubbed to no-ops, with the argument types already in place. Fill in the two pushes:

import 'server-only';

import { pushLlmAuditEvent } from '@/server/store';

type StepArgs = {
  userId: string;
  orgId: string;
  finishReason?: string;
  usage?: unknown;
  toolCalls?: unknown;
};

type FinishArgs = {
  userId: string;
  orgId: string;
  finishReason?: string;
  usage?: unknown;
};

// One append-only row per agentic step. The SQL lineage's "bounded one-row
// transaction" is a single push here.
export const writeLlmStepEvent = async (args: StepArgs): Promise<void> => {
  pushLlmAuditEvent({
    userId: args.userId,
    orgId: args.orgId,
    event: 'llm.step',
    payload: {
      finishReason: args.finishReason,
      usage: args.usage,
      toolCalls: args.toolCalls,
    },
  });
};

// One append-only row per finished conversation.
export const writeLlmFinishEvent = async (args: FinishArgs): Promise<void> => {
  pushLlmAuditEvent({
    userId: args.userId,
    orgId: args.orgId,
    event: 'llm.finish',
    payload: {
      finishReason: args.finishReason,
      usage: args.usage,
    },
  });
};

Each writer is a single append-only pushLlmAuditEvent call — the in-memory stand-in for one INSERT into llm_audit_events. The event discriminant ('llm.step' vs 'llm.finish') and a jsonb-shaped payload are the whole row. This is the same one-row-per-event discipline from The append-only audit log, pointed at a different table: that chapter’s writes go through pushAudit into auditLogs, these go through pushLlmAuditEvent into the LLM events log. Keeping the two logs separate keeps the lifecycle audit (who archived which invoice) from getting tangled with the LLM audit (what the model did per step).

Two writers, but in this lesson the route only calls writeLlmFinishEvent. writeLlmStepEvent is built now and sits unused until the next lesson wires the per-step audit — both writers live in one file, so it’s natural to fill them in together rather than come back for the second one later.

The route handler

This is the heart of the lesson. src/app/api/chat/route.ts ships as a 501 stub. Replace it with the wrapped, capped streaming handler:

import { convertToModelMessages, stepCountIs, streamText } from 'ai';
import { z } from 'zod';
import { authedRoute } from '@/lib/authed-route';
import { writeLlmFinishEvent } from '@/lib/llm/audit';
import { chatModel } from '@/lib/llm/models';
import { invoiceQAPrompt } from '@/lib/llm/prompts';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

export const POST = authedRoute(
  'member',
  z.strictObject({ messages: z.array(z.unknown()) }),
  async (input, ctx) => {
    const org = await ctx.db.query.organization.findFirst({
      where: (o) => o.id === ctx.orgId,
    });
    const orgName = org?.name ?? 'your organization';

    const result = streamText({
      model: chatModel,
      system: invoiceQAPrompt({ orgName }),
      messages: convertToModelMessages(input.messages as InvoiceUIMessage[]),
      stopWhen: stepCountIs(5),
      maxOutputTokens: 1024,
      onFinish: ({ usage, finishReason }) =>
        writeLlmFinishEvent({
          userId: ctx.userId,
          orgId: ctx.orgId,
          finishReason,
          usage,
        }),
      onError: ({ error }) => {
        console.error('[chat] stream error', { code: 'stream_error' });
        void error;
      },
    });

    return result.toUIMessageStreamResponse();
  },
);

The auth boundary. authedRoute('member', …) is a route handler (not a Server Action — actions can’t stream), wrapped so the model only runs for an authenticated org member. ctx is flat: ctx.userId / ctx.orgId, never ctx.user.id. The schema accepts z.array(z.unknown()) on purpose — convertToModelMessages does the real structural validation downstream, so the route does not duplicate it.

import { convertToModelMessages, stepCountIs, streamText } from 'ai';
import { z } from 'zod';
import { authedRoute } from '@/lib/authed-route';
import { writeLlmFinishEvent } from '@/lib/llm/audit';
import { chatModel } from '@/lib/llm/models';
import { invoiceQAPrompt } from '@/lib/llm/prompts';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

export const POST = authedRoute(
  'member',
  z.strictObject({ messages: z.array(z.unknown()) }),
  async (input, ctx) => {
    const org = await ctx.db.query.organization.findFirst({
      where: (o) => o.id === ctx.orgId,
    });
    const orgName = org?.name ?? 'your organization';

    const result = streamText({
      model: chatModel,
      system: invoiceQAPrompt({ orgName }),
      messages: convertToModelMessages(input.messages as InvoiceUIMessage[]),
      stopWhen: stepCountIs(5),
      maxOutputTokens: 1024,
      onFinish: ({ usage, finishReason }) =>
        writeLlmFinishEvent({
          userId: ctx.userId,
          orgId: ctx.orgId,
          finishReason,
          usage,
        }),
      onError: ({ error }) => {
        console.error('[chat] stream error', { code: 'stream_error' });
        void error;
      },
    });

    return result.toUIMessageStreamResponse();
  },
);

The context carries ids, not the org’s display name — so the route fetches it. ctx.db.query.organization.findFirst is a store facade shaped exactly like Drizzle’s db.query.* read. The ?? 'your organization' fallback keeps the prompt sensible if the lookup ever misses.

import { convertToModelMessages, stepCountIs, streamText } from 'ai';
import { z } from 'zod';
import { authedRoute } from '@/lib/authed-route';
import { writeLlmFinishEvent } from '@/lib/llm/audit';
import { chatModel } from '@/lib/llm/models';
import { invoiceQAPrompt } from '@/lib/llm/prompts';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

export const POST = authedRoute(
  'member',
  z.strictObject({ messages: z.array(z.unknown()) }),
  async (input, ctx) => {
    const org = await ctx.db.query.organization.findFirst({
      where: (o) => o.id === ctx.orgId,
    });
    const orgName = org?.name ?? 'your organization';

    const result = streamText({
      model: chatModel,
      system: invoiceQAPrompt({ orgName }),
      messages: convertToModelMessages(input.messages as InvoiceUIMessage[]),
      stopWhen: stepCountIs(5),
      maxOutputTokens: 1024,
      onFinish: ({ usage, finishReason }) =>
        writeLlmFinishEvent({
          userId: ctx.userId,
          orgId: ctx.orgId,
          finishReason,
          usage,
        }),
      onError: ({ error }) => {
        console.error('[chat] stream error', { code: 'stream_error' });
        void error;
      },
    });

    return result.toUIMessageStreamResponse();
  },
);

The two cost caps are non-negotiable. stopWhen: stepCountIs(5) is set with no tools to call yet, so it can’t fire this lesson — it’s set first on purpose, so the next lesson adds a tool into an already-capped loop. maxOutputTokens: 1024 ceilings a single answer. convertToModelMessages translates the UIMessage shape to the model’s wire shape.

import { convertToModelMessages, stepCountIs, streamText } from 'ai';
import { z } from 'zod';
import { authedRoute } from '@/lib/authed-route';
import { writeLlmFinishEvent } from '@/lib/llm/audit';
import { chatModel } from '@/lib/llm/models';
import { invoiceQAPrompt } from '@/lib/llm/prompts';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

export const POST = authedRoute(
  'member',
  z.strictObject({ messages: z.array(z.unknown()) }),
  async (input, ctx) => {
    const org = await ctx.db.query.organization.findFirst({
      where: (o) => o.id === ctx.orgId,
    });
    const orgName = org?.name ?? 'your organization';

    const result = streamText({
      model: chatModel,
      system: invoiceQAPrompt({ orgName }),
      messages: convertToModelMessages(input.messages as InvoiceUIMessage[]),
      stopWhen: stepCountIs(5),
      maxOutputTokens: 1024,
      onFinish: ({ usage, finishReason }) =>
        writeLlmFinishEvent({
          userId: ctx.userId,
          orgId: ctx.orgId,
          finishReason,
          usage,
        }),
      onError: ({ error }) => {
        console.error('[chat] stream error', { code: 'stream_error' });
        void error;
      },
    });

    return result.toUIMessageStreamResponse();
  },
);

onFinish writes the single llm.finish row via writeLlmFinishEvent, scoped to ctx.orgId. onError logs a sanitized, code-only line and drops the raw error with void error — nothing about a model failure leaks to the client.

import { convertToModelMessages, stepCountIs, streamText } from 'ai';
import { z } from 'zod';
import { authedRoute } from '@/lib/authed-route';
import { writeLlmFinishEvent } from '@/lib/llm/audit';
import { chatModel } from '@/lib/llm/models';
import { invoiceQAPrompt } from '@/lib/llm/prompts';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

export const POST = authedRoute(
  'member',
  z.strictObject({ messages: z.array(z.unknown()) }),
  async (input, ctx) => {
    const org = await ctx.db.query.organization.findFirst({
      where: (o) => o.id === ctx.orgId,
    });
    const orgName = org?.name ?? 'your organization';

    const result = streamText({
      model: chatModel,
      system: invoiceQAPrompt({ orgName }),
      messages: convertToModelMessages(input.messages as InvoiceUIMessage[]),
      stopWhen: stepCountIs(5),
      maxOutputTokens: 1024,
      onFinish: ({ usage, finishReason }) =>
        writeLlmFinishEvent({
          userId: ctx.userId,
          orgId: ctx.orgId,
          finishReason,
          usage,
        }),
      onError: ({ error }) => {
        console.error('[chat] stream error', { code: 'stream_error' });
        void error;
      },
    });

    return result.toUIMessageStreamResponse();
  },
);

The v5 stream-response shape the client’s useChat can parse. Its lookalike toTextStreamResponse() speaks a different protocol useChat can’t read — the stream still flows, but your chat box stays empty with no error to explain why.

1 / 1

A few things worth slowing down on.

The handler signature is (input, ctx). input is the parsed body (so input.messages is your array), and ctx is the auth context authedRoute hands you — and it’s flat: ctx.userId, ctx.orgId, ctx.role. There’s no ctx.user.id; reach for the nested shape and TypeScript will stop you. That ctx is what makes the route safe: every id you scope by comes from the verified session, never from the request body.

The context carries ids but not the org’s display name, and the prompt needs the name. So the route looks it up: ctx.db.query.organization.findFirst. That db is a thin store facade shaped to look exactly like a Drizzle db.query.* read — the same call you’d write against real Postgres, standing in for the in-memory store. The ?? 'your organization' fallback keeps the prompt sensible if the lookup ever misses.

stopWhen: stepCountIs(5) and maxOutputTokens: 1024 are the two cost caps, and they’re set here with nothing to spend them on. There are no tools, so the loop runs exactly one step and the five-step cap is never exercised this lesson. That’s the point of the cap-first reflex: the guardrail is in place before the capability that makes it necessary, so the next lesson adds the tool into an already-capped loop.

convertToModelMessages(input.messages as InvoiceUIMessage[]) is the v5 translation seam. The as InvoiceUIMessage[] cast is the other half of the validation trade — the schema let messages through as unknown[], so the cast asserts the shape the converter then validates structurally. InvoiceUIMessage imports cleanly from tools.ts even though its tool map is still an empty stub, so this typechecks today.

onFinish is where the single audit row gets written: when the stream completes, writeLlmFinishEvent pushes one llm.finish row scoped to ctx.orgId, carrying the finishReason and token usage. onError is the sanitization rule made concrete — it logs a flat, code-only line and deliberately drops the raw error on the floor with void error, so nothing about the model failure leaks to the client. That sanitization discipline is owned by streamText, generateText, and the route-handler seam.

The return is result.toUIMessageStreamResponse() — the response shape the client’s useChat knows how to parse. Reach for its lookalike toTextStreamResponse() and the stream still flows, but in a protocol useChat can’t read, and your chat box stays empty with no error to explain why. This is the single easiest v5 mistake to make and the hardest to debug, so it’s worth committing to memory.

The throwaway smoke-test client

You need a way to drive the route before the real chat UI exists. invoice-chat.tsx ships as a disabled shell with the textarea greyed out. Replace it with a minimal useChat client that renders whatever streams back as plain text:

'use client';

import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { type FormEvent, useState } from 'react';
import { Button } from '@/components/ui/button';
import type { InvoiceUIMessage } from '@/lib/llm/tools';

type InvoiceChatProps = {
  orgName: string;
};

// Throwaway smoke-test client — just enough to drive POST /api/chat and prove
// text streams back. It renders only text parts, as raw text. The real
// parts-rendering chat (text bubbles + the stats card) replaces this in the
// last lesson of the chapter.
export const InvoiceChat = ({ orgName }: InvoiceChatProps) => {
  const { messages, sendMessage } = useChat<InvoiceUIMessage>({
    transport: new DefaultChatTransport({ api: '/api/chat' }),
  });
  const [input, setInput] = useState('');

  const onSubmit = (event: FormEvent<HTMLFormElement>) => {
    event.preventDefault();
    if (input.trim() === '') {
      return;
    }
    sendMessage({ text: input });
    setInput('');
  };

  return (
    <div
      data-testid="invoice-chat"
      className="flex h-full flex-col gap-3 rounded-lg border p-4"
    >
      <div>
        <h2 className="text-sm font-medium">Ask your invoices</h2>
        <p className="text-xs text-muted-foreground">
          Smoke test — {orgName}&apos;s invoices.
        </p>
      </div>

      <div className="flex-1 space-y-3 overflow-y-auto text-sm">
        {messages.map((message) => (
          <div key={message.id} className="space-y-1">
            <span className="text-xs font-medium text-muted-foreground">
              {message.role === 'user' ? 'You' : 'Assistant'}
            </span>
            {message.parts.map((part, index) =>
              part.type === 'text' ? (
                <p key={`${message.id}-${index}`} className="whitespace-pre-wrap">
                  {part.text}
                </p>
              ) : null,
            )}
          </div>
        ))}
      </div>

      <form onSubmit={onSubmit} className="flex items-end gap-2">
        <textarea
          data-testid="chat-input"
          value={input}
          onChange={(event) => setInput(event.target.value)}
          rows={2}
          placeholder="Tell me a joke about invoices"
          className="flex-1 resize-none rounded-md border bg-background px-2 py-1.5 text-sm"
        />
        <Button type="submit" size="sm" data-testid="chat-send">
          Send
        </Button>
      </form>
    </div>
  );
};

This is scaffolding, and naming it scaffolding matters — the last lesson of this chapter throws it away and replaces it with the typed client that renders tool-part cards across their lifecycle states. So it does the bare minimum: send a message, render the text parts that come back, nothing more.

Three v5 details to register. The endpoint goes on the transport — new DefaultChatTransport({ api: '/api/chat' }) — not as a top-level api option on useChat, because @ai-sdk/react@2 removed that option. The input state is yours to manage: useState(''), because v5’s useChat no longer manages input for you (it did in v4). And sending is sendMessage({ text: input }), which appends a user message and kicks off the stream. The render walks message.parts and renders only text parts; every other part type (tool invocations, once they exist) is dropped with : null. The full parts protocol — why messages are arrays of parts rather than a single string — is owned by useChat, useObject, and the parts array.

streamText — API reference

ai-sdk.dev

Every option on the call you're writing: stopWhen, maxOutputTokens, onFinish, onError, and toUIMessageStreamResponse.

Tool calling — multi-step (stopWhen)

ai-sdk.dev

How the agentic loop steps, why the default is stepCountIs(20), and the cap you set first this lesson.

Chatbot — useChat and message.parts

ai-sdk.dev

The v5 client surface your smoke-test driver uses: DefaultChatTransport, sendMessage, and rendering text parts.

Moment of truth

This project ships no per-lesson test suite — the lesson-verification/ directory is a harness slot, not a green gate, so the project itself is the assessment. Your automated check is pnpm verify, which runs Biome’s CI lint, tsc --noEmit, and a next build with SKIP_ENV_VALIDATION=true:

pnpm verify

A clean run means the slice typechecks and the production build succeeds with no real key needed — the build never makes a live model call, so it’s green without AI_GATEWAY_API_KEY set. That confirms the shape is right; the behavior you confirm by hand.

The four behavioral checks below all stream from a real model, so they need AI_GATEWAY_API_KEY in your .env (copy .env.example, paste the key from the Vercel AI Gateway dashboard). Acting as member-A is the default identity, so you can start straight from /invoices; the inspector at /inspector is where you flip flags and read the audit tail. Work down the list:

Acting as member-A, typing “tell me a joke about invoices” streams a text answer into the smoke-test box.

untested

With the inspector’s BYPASS_AUTHED_ROUTE flag on, POST /api/chat returns 401 from authedRoute and the model never ran — then revert the flag.

untested

Asking for another organization’s data (for example, “how many invoices does Globex have?”) is refused in the answer text, because the system prompt is the controller.

untested

After one conversation, the inspector’s LLM audit-events tail shows exactly one llm.finish row with finishReason: 'stop', scoped to org-acme.

untested