Skip to content
Chapter 105Lesson 3

One-line model swaps and the AI Gateway

Structure your Vercel AI SDK integration so a model is a configuration value behind a role-named handle, with the AI Gateway as the production routing layer beneath it.

You shipped the invoice chat in the spring on one provider’s model. It worked, the demo landed, the feature is live. Then autumn arrives and a competitor releases a model that’s half the price and noticeably better at arithmetic, which is exactly the workload your chat box leans on. Your CFO sees the line item on the bill and asks the obvious question: can we move to the cheaper one?

That question should have a one-word answer and a one-line diff. Whether it does depends entirely on a decision you made back in spring, probably without thinking about it: where did you type the model’s name?

Suppose you typed it at the call site, inside the route handler, next to the prompt. Then “move to the cheaper model” becomes a grep across every route, every test, every fixture, and the one surface someone added last week that you’ll forget about until it breaks in production. That is a multi-day pull request with a real blast radius, for what should be a single-character change. And it isn’t a one-time tax. The provider landscape is accelerating rather than stabilizing: a major model from a different vendor lands almost every month. A SaaS that can’t swap models cheaply pays this tax on every release someone else ships.

So here is the promise of this lesson. By the end, a model swap in your codebase is a one-line edit in one file, and you’ll know exactly when to put the AI Gateway behind it for production. None of this is new machinery. It is the same discipline you already use for the Drizzle client in lib/db/index.ts: one file owns the configuration, and every other file reaches for a named export. You’ve done it four times already. This is the fifth.

One scoping note, the same one that has held all chapter. This is a lesson about where the model name lives and what sits behind it, not about how to write the call. Every generation line stays elided, the way it has since the first lesson, and the call-site mechanics arrive in the next chapter.

The model handle is the configuration boundary

Section titled “The model handle is the configuration boundary”

Start with why the AI SDK exists at all, because that is what makes a clean swap possible.

Every generation call the SDK gives you, streamText, generateText, and generateObject, takes a model argument. The property that matters is that the body of the call is identical no matter which provider is behind that argument. Same function name, same Zod schema for generateObject, same tool definitions. The only thing that changes when you switch from one vendor to another is the value of model. That uniformity is the entire reason the SDK is worth using, and it is also why reaching for a provider’s own SDK locks you in: a provider’s SDK speaks that vendor’s call shape, so switching vendors rewrites the call, not just a string.

There are two ways to name a model in the AI SDK, and the difference is worth getting right the first time.

const result = streamText({
model: 'openai/gpt-5-mini',
// model call — Chapter 106
});

A plain 'creator/model' string. No provider package installed and no import: the SDK routes this through the Vercel AI Gateway by default. This is the 2026 default and the form to reach for.

The string form on the first tab, such as 'openai/gpt-5-mini', 'anthropic/claude-...', or 'google/gemini-...', leans on something the SDK calls its global provider . Hand it a 'creator/model' string and it routes the call through the AI Gateway for you, with no provider package and no import. The provider-object form on the second tab is the escape hatch: an installed package that talks to one vendor directly, for the rare case you need a provider-specific option the gateway string doesn’t surface yet.

Notice what the default form means. The gateway isn’t something you bolt on later; the moment you write a plain model string, you’re already going through it. If you’ve read an older tutorial, anything written against v4 of the SDK, you’ll have seen a provider imported and called for every generation. That is a reliable sign the material is stale. In v5 the string-through-gateway path is the default, and the import is the exception.

One file owns the model: lib/llm/models.ts

Section titled “One file owns the model: lib/llm/models.ts”

The string form already buys you a one-line swap. The SDK doesn’t care which model string you pass, so editing 'openai/gpt-5-mini' to 'anthropic/claude-...' at a call site is genuinely one line. So why does this section exist?

Because one line, multiplied by every place you wrote it, is no longer one line.

Picture that model string typed inline at the call site. Now five surfaces use the model: the chat, a summarizer, an extractor, and two background jobs. Each one has the string typed into it. The swap is now five edits, plus the test fixtures, plus the route someone shipped last week that isn’t in your head. A short string doesn’t save you, because the cost is the number of places you have to touch, and the string’s length has nothing to do with that. This is the same reason you don’t write new Pool(...) at the top of every file that runs a query: the Drizzle client lives in one place, and everything imports it.

So the model gets the same treatment. A single module, lib/llm/models.ts, exports named handles bound to model identifiers. Every call site imports a handle instead of typing a string. Now a swap is a one-line edit in models.ts, and not a single call site changes. Five surfaces, one edit. The route you forgot about imported the handle too, so it moves with everything else for free.

Here is the file. This one isn’t elided, because it is the lesson, so it’s worth seeing in full.

import 'server-only';
export const fastModel = 'openai/gpt-5-mini';
export const smartModel = 'anthropic/claude-sonnet-4.5';
export const embeddingModel = 'openai/text-embedding-3-large';

This module is the seam to the model: it reads provider config and routes calls. Like every secret-touching adapter in lib/, it opens with import 'server-only';, which turns any accidental import from a Client Component into a build error rather than a leaked key in the browser bundle.

import 'server-only';
export const fastModel = 'openai/gpt-5-mini';
export const smartModel = 'anthropic/claude-sonnet-4.5';
export const embeddingModel = 'openai/text-embedding-3-large';

The handles are plain 'creator/model' strings, routed through the gateway. This right-hand side is the only thing a swap touches: change the string, and every call site that imports the handle now points at the new model, untouched.

import 'server-only';
export const fastModel = 'openai/gpt-5-mini';
export const smartModel = 'anthropic/claude-sonnet-4.5';
export const embeddingModel = 'openai/text-embedding-3-large';

camelCase, not SCREAMING_SNAKE_CASE. The instinct is to shout a module-level constant, but these are runtime configuration, not compile-time constants, and the conventions name model handles as an explicit camelCase carve-out. Resist the caps.

import 'server-only';
export const fastModel = 'openai/gpt-5-mini';
export const smartModel = 'anthropic/claude-sonnet-4.5';
export const embeddingModel = 'openai/text-embedding-3-large';

One export per role the model plays. The embedding handle lives here alongside the others, but it carries a caveat the chat handles don’t: its swap is a different kind of expensive, which we’ll get to.

1 / 1

This file has siblings you already trust. lib/db/index.ts owns the database client. lib/llm/pricing.ts, the price table you built in Bounding spend before the surface goes public, owns the per-model cost map, keyed by the model each handle points at, so the two files move together on a swap: you change the handle here and add the new model’s price there. lib/rate-limit.ts owns the limiter, and lib/auth.ts owns auth. One concept, one file, every time. The model is just the newest member of that set.

Here is the cut that separates “I extracted it to a file” from “I extracted it well.” Centralizing in one file survives a version bump. On its own, it does not survive a provider switch, and that is the swap you actually fear.

Watch what goes wrong. Suppose your file exports handles named after their vendors: gpt5ForChat, openaiFast, claudeSummarizer. You’ve moved the strings into one file, which is good. But you’ve also baked the vendor into the identifier. The day you move chat from OpenAI to Anthropic, the variable gpt5ForChat is now a lie: it names OpenAI but points at Claude. You have two bad options. You can rename it along with every import across the codebase, which is the grep you were trying to escape wearing a different hat, or you can keep a misleading name forever and let the next engineer trip over it. The vendor has leaked up one level, out of the call site and into the name.

The fix is to name the role the model plays, never the provider behind it.

export const gpt5ForChat = 'openai/gpt-5.5';
export const claudeSummarizer = 'anthropic/claude-sonnet-4.5';

The name hardcodes the vendor. Move chat off OpenAI and gpt5ForChat becomes a lie, so now you rename the variable and every import, which is exactly the grep the central file was supposed to delete.

The right-hand side is identical in both. The only difference is whether the name describes a vendor or a job. Names like fastModel, smartModel, summarizerModel, extractorModel, chatModel, and embeddingModel each describe a capability, and a capability is stable across a swap. “The fast model” is still the fast model after you change which vendor provides it; only the string moves. The framing worth holding onto is that the call site asks for a capability (“give me the fast one”), not a vendor (“give me OpenAI”). Capabilities are stable, and vendors churn.

Here is a quick drill to lock in the distinction. Sort each handle name by whether it is safe across a provider swap.

Sort each model handle by whether its name survives a switch to a different provider. Drag each item into the bucket it belongs to, then press Check.

Role-named (good) Names the job; survives a swap
Vendor-named (leaks) Names the vendor; breaks on a swap
fastModel
summarizerModel
embeddingModel
chatModel
gpt5ForChat
claudeFast
openaiSummarizer
geminiExtractor

What the AI Gateway buys you in production

Section titled “What the AI Gateway buys you in production”

Now look at the layer underneath the handle. You might expect this section to open with “and here’s how to add the gateway.” It doesn’t, because of what you already saw: the plain model string already routes through the gateway. You are not adding it. The question is narrower and more honest. Should you configure it for production, or is the bare default enough?

The bare default, a string that is routed with nothing configured, is plenty for a prototype. Once you lean on it, the gateway offers four production features the bare SDK doesn’t ship on its own, one concern each:

  • Automatic failover. When the primary provider returns a 429, a 5xx, or times out, the gateway transparently retries the next provider in a fallback list, and your code never sees the error. This is the concrete close of last lesson’s provider-429 problem: the branch you were told you’d delete is deleted here.
  • Observability. You get latency, error rate, cost-per-request, and per-user attribution without instrumenting a single route. You already have an in-app view of spend: the operator dashboard from the previous lesson, which is a Drizzle query over your audit log. The gateway’s dashboard is the infrastructure view of that same spend. Both exist, and neither replaces the other: one sees your app’s accounting, the other sees the raw provider traffic.
  • Unified billing. One invoice across every provider, instead of N vendor relationships and N cards to reconcile.
  • BYOK key management. BYOK lets you supply your own provider keys, held at the gateway boundary, so they never sprawl across your app’s environment on every deploy target.

So when does the gateway flip from “nice, but the bare default is fine” to “configure it now”? The shape mirrors the trigger thinking in The four triggers that justify an LLM surface: defaults first, then the threshold that crosses them. Any one of these three firing is enough.

  1. Live traffic depends on the surface. A user-facing surface that earned its weight can’t absorb a provider outage gracefully without failover. The day the primary provider has a bad hour, your feature has a bad hour too, unless the gateway quietly routes around it.
  2. Multi-model routing is part of the product. A fast model for autocomplete and a smart model for the long draft. The gateway routes between them in one place, instead of N branches scattered across your call sites.
  3. Cost observability is a product requirement. The operator needs per-user, per-surface spend. The gateway exposes it without per-route instrumentation.

Until one of those fires, the bare string-through-gateway default is genuinely enough: it is the gateway, just unconfigured. The mental model is that a prototype is a string that routes through the gateway with no extra config, and production is the same string plus a fallback list plus someone reading the dashboard.

It helps to see all of this as a stack, because each layer absorbs a different kind of change. That is the whole lesson in one picture.

Application code route handler / Server Action imports a role handle, never a model string
lib/llm/models.ts absorbs role changes the named handle — a new role is one new line here
AI SDK call streamText, generateObject, … absorbs provider differences one call shape — only the model value moves
AI Gateway absorbs availability + observability failover, metrics, unified billing
Providers does the work
Four layers, four kinds of change — each layer absorbs exactly one axis of churn, so a change on one axis never ripples to the others.

Read the stack top to bottom and you can see where each kind of change stops. A new role (“we need a summarizer”) touches only models.ts. A vendor swap is absorbed by the SDK’s uniform call shape. A provider outage is absorbed by the gateway’s failover. The provider itself just does the work. Four axes of churn, four layers, and a change on one axis never ripples to the others. That separation is the payoff for every rule in this lesson.

The first trigger above, live traffic on the surface, is worth showing concretely. Failover is the feature people most often try to hand-roll, and the hand-rolled version is exactly the kind of duplicated code this chapter keeps warning you about.

There are two ways to get failover. Look at the shape of each.

try {
// model call with smartModel — Chapter 106
} catch (error) {
// retry with smartFallback — Chapter 106
}

It works, but this same block has to be copied into every surface that calls a model. Every new route is one more place to remember it, the same bug class as a forgotten auth check or a missing token cap. A structural problem isn’t solved by a block you have to remember to paste.

The first tab is the manual version: the route handler catches the provider error and re-issues the call against a different handle. It works. But the retry logic is duplicated in every surface, and every new route you add is one more place to forget it, the same structural failure as a forgotten auth check or a missing output-token cap. A discipline you have to remember isn’t a discipline; it is a future bug with a delay timer.

The second tab is the gateway version: failover is a providerOptions.gateway.models array sitting next to the primary model. You list the primary plus one or two fallbacks, and the gateway tries them in order when the primary fails. One declaration, no per-route copy. Those fallback handles, smartFallbackA and friends, are role-named entries that belong in models.ts too, right alongside the primaries, so keep the central-file discipline consistent.

The course’s stance is plain: prefer the gateway. Route-level catch-and-retry is duplicate code the gateway deletes. It is the same lesson as everything else in this chapter: reach for the structural fix, not the one you have to remember at every call site.

Env discipline: provider keys are configuration, not deploy artifacts

Section titled “Env discipline: provider keys are configuration, not deploy artifacts”

The model name is configuration. So is the key that authorizes the call, and keys have sharper edges, so this seam is worth a moment.

Provider keys live in env, validated through the same @t3-oss/env-nextjs + Zod seam in env.ts you’ve used since the database chapter. The naming follows the obvious convention: OPENAI_API_KEY, ANTHROPIC_API_KEY, AI_GATEWAY_API_KEY. The convenience that makes the handle just a string is that the AI SDK auto-reads <PROVIDER>_API_KEY from process.env for its first-party providers. You never pass a key at the call site, which is exactly why a handle is a model string and nothing more. Once the gateway is in front, the gateway’s own key (AI_GATEWAY_API_KEY, or a deployment OIDC token) is what’s needed at the boundary, and the per-provider keys move into the gateway’s BYOK config rather than living in your app’s env on every deploy target.

Adding the keys to the env schema is the whole job:

src/env.ts
export const env = createEnv({
server: {
OPENAI_API_KEY: z.string().min(1),
ANTHROPIC_API_KEY: z.string().min(1),
AI_GATEWAY_API_KEY: z.string().min(1),
},
// ...client, runtimeEnv
});

There are three things you never do with a provider key. Frame them not as warnings to remember but as guarantees the SDK’s shape already hands you, the way the React hooks force the server seam for free.

  • Never read a key from a database row. Env is the canonical seam for configuration, and the database is for tenant-scoped data. A key in a table is a key in a backup, a key in a logged query, a key one bad join away from a response.
  • Never accept a key from a query parameter or request body. It would land in access logs, browser history, and Referer headers, three places you can’t fully scrub.
  • Never expose a key to the browser. You can’t do this by accident, because the SDK’s hook-based shape (useChat, useCompletion) forces the call onto the server. Only NEXT_PUBLIC_* reaches the client, and a key is never that.

There is a boot-time guarantee worth naming too. A missing OPENAI_API_KEY should fail pnpm build through the env validator, the same way a missing DATABASE_URL does, rather than returning a 5xx the first time real traffic hits the surface. The validator catches the gap at boot, and the alternative is discovering it in production at the worst possible moment.

The embedding asterisk: not everything swaps cleanly

Section titled “The embedding asterisk: not everything swaps cleanly”

Everything so far has pointed one direction: a model is a configuration value, and a swap is one line. There is exactly one place where that story breaks, it breaks hard, and it is expensive to learn the hard way. So learn it the easy way, here.

Embeddings are not portable across providers.

An embedding is a list of numbers a model assigns to a piece of text so that similar meanings land near each other. The catch is that those numbers only mean anything inside the exact model that produced them. A vector you indexed with one provider’s embedding model cannot be queried against another provider’s embeddings, because the two models map text into different vector spaces . Those geometries are incompatible, so the distances between vectors from different models are meaningless. This is sharper than it sounds: it holds even within the same vendor, and even at the same dimension count. Swap one embedding model for a newer version without re-embedding your corpus and your search recall can collapse to essentially zero, with every result a near-random miss. This is not a theoretical edge case. It is a documented way teams have broken their own search in a single deploy.

So the embeddingModel handle in models.ts is a one-way commitment until you re-embed everything. Treat it as a far more conservative swap target than the chat handles around it. Swapping smartModel to a different vendor is a one-line edit you ship today. Swapping embeddingModel is a re-indexing project: you re-embed every stored document, rebuild the vector index, plan a migration window, and pay the API cost of running your entire corpus back through a model. Same file, and visually the same kind of line, but a wildly different blast radius. The discipline still holds, since one place owns the handle, but the cost of exercising the swap is what differs, and conflating the two is the trap.

Each claim is about how cheaply a given model handle can be swapped. Mark each statement True or False.

Swapping smartModel to a different vendor is a one-line edit in models.ts.

Chat models share the SDK’s uniform call shape, and the call site asks for a capability, not a vendor — so only the string on the right-hand side moves. Nothing at the call sites changes.

Swapping embeddingModel to a different vendor is a one-line edit.

The handle changes in one line, but the vectors already in your index were produced by the old model. Querying them against the new model is meaningless until you re-embed the whole corpus — a re-indexing project, not a config change.

A vector indexed with provider A’s embedding model can be queried against provider B’s embeddings.

Different models map text into different vector spaces. The coordinates from one are meaningless in the other, so distances measured across them carry no information.

Embeddings from two different versions of the same vendor’s embedding model are interchangeable.

Non-portability holds even within one vendor and at the same dimension count. A version bump can drop search recall to near zero unless you re-embed the corpus.

This course doesn’t build embeddings here; indexing, similarity queries, and the vector column all land in the next chapter. The takeaway is narrow and durable: the clean abstraction story for chat models does not extend to embeddings. Price an embedding swap as a migration, not as a config change.

Structured output swaps cleaner than free-form prose

Section titled “Structured output swaps cleaner than free-form prose”

One last heuristic, because how you shape the call changes how cheaply it swaps, and you now have the vocabulary to see why.

A surface built on generateObject with a Zod schema returns typed data regardless of which provider is behind it. The schema is the contract, and the model is just the implementation. When you swap vendors, the Zod schema absorbs the small differences in how each provider shapes its output, so the same schema validates whatever model you point at. The swap is clean because the contract never moved.

A surface built on streamText with a carefully prompt-engineered free-form response is more fragile. That prompt was tuned to one model’s quirks: its phrasing, its formatting habits, its tone. Point it at a different model and the output can shift in ways the compiler never sees, such as subtly worse formatting, a different structure, or a quietly degraded answer. The code compiles, the swap “works,” and the quality drops on a dimension no type-check can catch.

So the heuristic is this. When the workload is structured, such as extraction, classification, or form-fill, the second and third triggers from the first lesson, reach for generateObject first. The abstraction wins are larger and the swap is cleaner. Free-form streamText is the right call for genuine prose, but know that you’re trading away some swap-portability to get it. This isn’t a rule; it is a trade-off you’re now equipped to make on purpose.

// generateObject({ schema, model: extractorModel }) — Chapter 106

The mechanics of that call are the next chapter’s job. Here it is named only as a portability property: a structured contract swaps cleaner than a tuned prompt.

If you’ve gone looking, you’ve seen other ways to integrate a model into a Next.js app. Here is where they sit, briefly, so the question stops nagging.

Raw provider SDKs

These lock the call site to one vendor’s shape, so a swap rewrites the call rather than a string, and you lose the SDK’s unified streaming model. The only reason to reach for one is a provider feature the AI SDK hasn’t surfaced yet, which is rare in 2026.

LangChain

A heavier programming model (chains, agents, retrievers) that fights React Server Components and the App Router’s streaming primitive. It is right for research-style multi-agent orchestration outside the user-request path, not for a Next.js SaaS surface.

The verdict the course committed to in the first lesson stands: the AI SDK is the canonical Next.js integration. These are the named alternatives and the narrow cases where they fit, not competitors for the surface you’re building.

One closing note before the chapter ends.

Picking the provider, choosing whether to lean on the gateway, and committing to an embedding model are each architectural decisions under the three-test inclusion rule from the documentation unit. Each one touches multiple files: the route, models.ts, env.ts, billing, and possibly the vector index. Each has reasonable alternatives, such as no LLM at all, a different provider, or a hosted RAG service. And each costs more than one pull request to reverse, the embedding commitment most of all, since reversing it is a re-indexing project. That is the full signature of a decision that earns an ADR . When this shape lands in your own codebase, write one per decision. The course’s running app doesn’t own a real provider commitment, so there is no ADR to hand in here, but the expectation is the durable part.

That closes the chapter’s arc. Across three lessons you learned the filter, which is when an LLM-backed surface earns its weight at all; then the guardrails, which is bounding spend before the surface goes public; and now the durability, which is that a swap is one line and the gateway is the production default behind it. What’s left is the call-site mechanics: streamText, generateObject, and the UI hooks. The next chapter installs all of it inside the structurally sound shape these three lessons drew, so the syntax lands on a foundation that is already correct.

Carry one sentence out: the model is a configuration value behind a role-named handle in one file. The provider churns; your call sites don’t.