Chapter 105Lesson 4

Quiz - When AI features earn their weight

Quiz progress

0 / 0

A teammate ships two “read a document and pull out structure” features in the same sprint: one sorts each expense into one of the four tax categories the accountant defined up front, the other pulls the renewal date and party names out of an arbitrary uploaded vendor contract. One earns an LLM, one doesn’t. What’s the deciding question that separates them?

Are the categories knowable at design time? The tax buckets were fixed in advance (a switch/classifier), but a contract is open-ended text you have to actually read (extraction — Trigger 3).

Does the output need to be structured? Both produce structured output, so both are extraction triggers and both should use a model.

Is a human in the loop? The contract is reviewed by a person, so it’s safe to use a model; the tax sort is automated, so it must stay deterministic.

Both look like “categorize a document,” but the cut is whether the answers were knowable at design time. A fixed, enumerable set of buckets is a switch or a tiny classifier — a general LLM is wildly overpowered. The contract is genuinely open-ended prose that requires understanding language, which is the extraction trigger. Structured output is true of both, so it can’t be the discriminator.

You’re scanning a tutorial to decide whether it’s current for your v5 stack. Which signs tell you it was written against the outdated AI SDK v4? Select all that apply.

It reads each message’s text off a flat .content string instead of a parts array on UIMessage.

It calls append and reload to send and retry messages.

It bounds an agent loop with stopWhen(stepCountIs(n)).

It manages the chat input state manually with useState.

Flat .content and append/reload are the v4 fingerprints — they were replaced by the parts array and sendMessage/regenerate. The other two (stopWhen and manually managed input) are the current v5 shapes, so seeing them is a sign the material is up to date. You don’t need to know the v5 APIs in depth yet, only to recognize the stale shapes so a v4 tutorial sets off the alarm.

Your LLM route already reads usage in onFinish, bumps the per-user counter, and caps maxOutputTokens. Why does the lesson still insist on a pre-call input estimate-and-reject on top of all that?

onFinish doesn’t fire on an aborted stream, and the input cap defends a different attack than the output cap — so the post-call ledger alone leaves a hole an adversary can drive through by aborting mid-stream.

The pre-call estimate is the accurate token count, so it replaces the usage read once it’s in place.

Pre-call rejection is the only thing that increments the daily quota counter; without it the quota never climbs.

The two points catch different things and neither does the other’s job. Pre-call rejects an oversized input (a stuffed context window) cheaply before you pay; post-call records what actually happened. Critically, onFinish never fires on an aborted stream, so a “start a huge generation, abort just before it finishes” loop leaks tokens your ledger misses — the pre-call estimate plus maxOutputTokens bound the worst case regardless of how the call ends.

A user fires 30 requests a second at the chat box; a different user paces one request every few minutes all day to stay under the radar. Your daily token quota is in place. What does the lesson say you need?

A rate limit too — the quota catches the slow drain eventually, but burst spend can run up before the day’s counter even registers it. Burst and sustained are different shapes, so both guards ship.

Nothing more — a daily token quota caps total spend, so by definition it already bounds the fast attacker.

Replace the quota with the rate limit — a sliding-window limiter subsumes the daily cap, so running both is redundant.

Quotas cap how much per day; rate limits cap how fast. The 30-per-second burst can run up enormous spend in the seconds before the daily counter matters, so the sliding-window limiter is what stops it; the slow pacer is exactly what the daily quota eventually catches. Neither guard catches the other’s case, so both run — on different keys (rl:llm vs quota:llm:...:yyyymmdd).

You’re setting the daily token allowance for the invoice chat. Where should the number come from?

From the org’s plan entitlement (getEntitlement(orgId)) — sourcing it from the plan makes the cost ceiling and the pricing lever the same number (“Free: 50 questions/day”).

From a hardcoded constant in the route, tuned to a safe ceiling that applies equally to every user.

From the user’s observed average usage, recomputed nightly so the cap adapts to real behavior.

The quota is a plan entitlement, not a constant. Reading it from getEntitlement(orgId) means Free, Pro, and Enterprise each get their own limit from the single source of truth you already own for plan capabilities — and the same number that guards against abuse becomes a line on the pricing page. A hardcoded ceiling throws away that pricing lever and treats every plan the same.

Setting maxOutputTokens: 4000 on a surface that only ever returns a one-word classification — is that a safe default?

No — the cap must match the surface’s worst useful response. A generous ceiling on a one-word answer hands an injection attack thousands of tokens of headroom to play in.

Yes — a high ceiling is safe because the model stops once it has produced the one-word answer anyway.

Yes — a single generous constant across all call sites is easier to audit than per-surface caps.

maxOutputTokens is sized to the surface’s worst useful case, not a generic ceiling. A 4,000-token cap on a one-word answer is as wrong as no cap — “ignore the question and write 4,000 tokens” now succeeds right up to your headroom. The cap is part of each surface’s spec, decided per surface like its schema, and a missing or oversized cap is a cost bug in the same severity class as a skipped auth check.

You centralize every model string into lib/llm/models.ts and export handles named gpt5ForChat and claudeSummarizer. The day you move chat from OpenAI to Anthropic, why is this naming still going to bite you?

The vendor leaked into the identifier: gpt5ForChat now points at Claude and is a lie. You either rename it across every import — the grep you were escaping — or leave a misleading name forever.

Vendor-named handles can’t be routed through the AI Gateway, so the swap forces you to install the provider package.

The names are fine — once the strings live in one file, what they’re called no longer matters to a swap.

Centralizing the string survives a version bump but not a provider switch, because the vendor is baked into the name. Name the role the model plays (chatModel, summarizerModel, fastModel) — a capability is stable across a swap, so only the right-hand string moves and the name stays honest. Vendor-named handles reintroduce the very codebase-wide grep the central file was meant to delete.

Swapping smartModel to a different vendor is a one-line edit in models.ts. Is swapping embeddingModel the same kind of one-line change?

No — the handle changes in one line, but vectors already in your index were produced by the old model and are meaningless against the new one. It’s a re-indexing project, not a config change.

Yes — both are role-named handles in the same file, so both are equally cheap one-line swaps.

Only if the new embedding model has a different dimension count; at the same dimensions the vectors stay interchangeable.

Embeddings aren’t portable across providers — different models map text into incompatible vector spaces, so distances measured across them carry no information. This holds even within one vendor and at the same dimension count, so a naive version bump can drop search recall to near zero. The embeddingModel handle is a one-way commitment until you re-embed the whole corpus: same file, same-looking line, wildly different blast radius.

Your prototype runs LLM calls through plain 'creator/model' strings — which already route through the AI Gateway by default. When does the lesson say to actually configure the gateway for production (failover, dashboards) rather than leave the bare default?

As soon as any one of three triggers fires: live traffic depends on the surface, multi-model routing is part of the product, or cost observability is a product requirement.

Only once you outgrow the AI SDK entirely and need to call provider SDKs directly.

Immediately on every project — a configured gateway is always required the moment any model string is used.

Quiz complete

Score by topic