Skip to content
Chapter 98Lesson 3

Region, runtime, and Fluid Compute

The three Vercel Function settings that govern how your deployed app performs, which one to change and which two to leave alone.

Your code is on the internet now. The first deploy is green, the production URL resolves, and a git push ships a new version. Before you walk away from a fresh project, an experienced engineer asks one question: what did the platform decide for me, and which of those decisions do I need to override?

Three settings govern the three things you care about at deploy time: how slow the first request is, how slow every request is, and what the bill looks like. They are the region the function runs in, the runtime it ships on, and the Fluid Compute model that runs it. This lesson rests on one durable claim: of these three, exactly one needs a deliberate change on day one, and the other two are already correct. One change, and two settings you leave alone for reasons you’ll understand.

The one change is region, and it’s worth seeing why before anything else. Every server action that reads your database pays a network round trip on every call. If the function and the database sit on opposite coasts, that round trip is the difference between a 30 ms query and a 110 ms one, on every request, and it never shows up while you’re running pnpm dev, because locally everything sits right next to everything. This is the platform side of the rule from the first lesson of this chapter: the deploy ships your code, but where it runs, what it runs on, and how it’s scheduled are decisions that ride alongside the code, and they’re yours to get right.

This is the setting that carries the most weight, so it goes first. It’s the only one of the three that’s wrong by default for most projects, and the only one that produces a silent, measurable regression in production.

Start with the default. A Vercel Function runs in a single region, and unless you say otherwise that region is iad1, a datacenter in Washington, D.C., on the US East coast. That’s a perfectly good default. It becomes a problem in one specific case: when your database lives somewhere else.

Think about what your app does on a request. Every server action, every route handler, and every React Server Component that fetches data opens a connection to Postgres and waits for a reply. When the function and the database are in the same datacenter, that’s a sub-millisecond hop across the room. When they’re a continent apart, the network distance becomes a tax on the round trip, and you pay it not once but on every query of every request.

So the rule for a single-database SaaS is as blunt as it sounds: the function region must match the database region. Back in the chapter where you provisioned the database on Neon, you picked a region for it. The function has to sit next to it.

This is dangerous rather than merely suboptimal because it stays invisible right up until it isn’t. In local development, the function and your database are effectively co-located, or the latency is small enough to disappear into everything else, so nothing in pnpm dev will ever surface a region mismatch. It only appears once you’re deployed, and even then it doesn’t announce itself. It shows up as an elevated p95 : your average might look fine while your p95 quietly carries the cross-country trip. That’s why you set the region deliberately on day one instead of discovering it during an incident three weeks in.

The diagram below makes the tax spatial. Flip between the two panels and watch what happens to the function-to-database arrow.

User Browser sends a request
Vercel Function sfo1 San Francisco, US West
~80 ms each way
across the country
Neon Postgres iad1 Washington, D.C., US East
The function in sfo1, the database in iad1. Every query crosses the country and back — a tax paid on every request, invisible in local dev.

Two practical notes close this out.

You don’t have to guess your database’s region: it’s whatever region the Neon project was created in, visible in the Neon console. Match the Vercel function region to that exact value. You set the function region in one of two places, and both are one-time: in the dashboard under Project Settings → Functions → Region, or as a region field in a vercel.json file at the repo root. You’ll see that file in the next section.

There’s also one escape hatch, named once so you know it exists but don’t reach for it. Vercel can run a function in multiple regions at once, up to three on Pro and all of them on Enterprise. That’s for genuinely global SaaS that runs database replicas in several regions, so a function near the user can read from a database near the user. A single-database app doesn’t want this; it wants one region, matched. The pre-launch checklist later in this chapter re-verifies “function region matches the database, and the connection is pooled” as a single row, and the understanding you’re building here makes that row a five-second check.

Node.js is the runtime default, and you keep it

Section titled “Node.js is the runtime default, and you keep it”

The second setting is the runtime : the engine your code runs on. Vercel offers two, and this whole section is about why you don’t have to think about which.

By default, every Vercel Function ships on the Node.js runtime. That gives you the full Node.js API surface, every npm package including ones with native bindings, streaming responses, and a writable /tmp scratch directory. It’s what every server action, route handler, and RSC data fetch in a SaaS app like yours runs on.

It’s the right default for a reason specific to what you’ve built. Your server code talks to Postgres through Drizzle, to Stripe, to Resend for email, and to R2 for file storage. Every one of those libraries assumes it’s running on Node.js with the full package ecosystem underneath it. The default isn’t just convenient; it removes a decision you would otherwise get wrong, because the second runtime can’t load most of those libraries at all. That second runtime, Edge, is the whole of the next section.

Most projects need no configuration file at all, but it’s worth seeing the shape of the one you’d reach for if you did, and where two of the three settings show up in it.

vercel.json
{
"$schema": "https://openapi.vercel.sh/vercel.json",
"region": "iad1",
"fluid": true
}

Most SaaS projects ship with no vercel.json at all, because the Node.js runtime, the iad1 region, and Fluid Compute are already the defaults. The file exists only to override a default, and the one override worth making is the highlighted region line.

That fluid: true line names the third setting, which the next sections unpack. First, the part of the Node.js story an experienced engineer reads differently from a newcomer: the timeout.

Under Fluid Compute, a function can run for up to 300 seconds, five minutes, on every plan, including the free Hobby tier, with higher ceilings on paid plans. The filesystem is read-only except for that /tmp directory, which holds up to 500 MB. Those are the numbers, and here’s the framing that keeps you from misreading them.

The timeout is a ceiling, not a budget. Five minutes is not permission to run a four-minute request. That ceiling exists to absorb the occasional spike, a request that’s usually fast but sometimes slow, not to host work that’s slow by nature. Anything plausibly slow on purpose, such as generating a large export, calling a third-party API in a batch, or processing an image, does not belong in a request at all. It belongs in a background job, the pattern you saw with Vercel’s after() and Trigger.dev. If you ever find yourself eyeing the timeout as a number to fit under, that’s the signal the work should have been backgrounded.

One related point closes a question you might be forming: what about large file uploads, don’t those run into a request size limit? They would, except they never go through the function. Uploads route directly from the browser to R2 through presigned URLs, the pattern from the object-storage chapter. The bytes go browser-to-R2, and the function only ever handles a small bit of metadata, so the function’s request-body limit simply isn’t on the upload path.

When Edge earns its weight (and when it doesn’t)

Section titled “When Edge earns its weight (and when it doesn’t)”

There is a second runtime, and a common reflex says “turn on Edge, it’s faster.” This section exists to dismantle that reflex. Edge is a real tool with a real, narrow use case, but it is not a speed button. What you’re after here isn’t a feature tour; it’s a decision rule you can apply without re-reading the docs.

Start with the genuine upside. The Edge runtime runs your code as a V8 isolate on servers physically close to the user, at a POP near them. The headline benefit is a much lower cold start than a Node.js container. That’s true, and for the right workload it’s worth having.

Now the costs, stated plainly, because they’re what the reflex ignores. On Edge you get no full Node.js API, no native modules, no general filesystem, and most npm packages simply won’t load. The part that matters most for a SaaS app: only HTTP-based database drivers work there. Neon happens to ship one of the few that fit, but your app’s pooled Postgres driver isn’t it. So for any route that touches Postgres through the pooled connection, calls Stripe, or pulls in any Node-only dependency, Edge isn’t an upgrade but a downgrade, because you’d lose more on capability than you’d gain on cold start.

That gives the decision rule its shape: stay on Node.js unless you have measured a latency problem on a specific path, and that path is stateless and shaped by physical proximity to the user, something tiny like a geolocation lookup or a redirect with no Node dependencies. When that rare case is real, the opt-in is a single line on a route handler or page segment:

app/api/geo/route.ts
export const runtime = 'edge';

That single line is still valid in Next.js 16, and it’s the per-route way to opt one specific handler into Edge. Treat it as the rare exception, not a setting you reach for by default.

Now the correction that matters most in this section, because a lot of writing from before 2026 will tell you the opposite.

You may have read that “middleware is Edge by default, and that’s where you do geolocation and A/B tests.” That is no longer true. In Next.js 16, the file that was middleware.ts is renamed proxy.ts, and it runs on the Node.js runtime only. Setting the runtime option in proxy.ts to Edge doesn’t just get ignored; it throws a build error, because the Edge runtime isn’t available there at all. The old mental model of “Edge is where middleware lives” no longer holds. This course handles request-edge concerns, such as auth gating, response headers, and CSP, in proxy.ts on Node.js, and the wiring for that lives in its own lessons. Here you just need the corrected fact: proxy.ts, Node.js, no Edge.

Walk the decision yourself in the tool below. Each step is the next question an experienced engineer asks, in the order they ask it, and notice that capability comes before performance and measurement comes before optimization every time.

Edge or Node.js for this route?

Fluid Compute: one instance, many requests

Section titled “Fluid Compute: one instance, many requests”

The third setting is different from the first two: you didn’t turn it on, and there’s nothing to turn. Fluid Compute is Vercel’s default execution model for Node.js functions, on by default for new projects since April 2025. This section makes a model you’ve been using all along visible, then hands you its one real consequence, which is the next section.

It’s the same Node.js runtime you just committed to. What changed is how an instance runs requests, and the shift comes down to one sentence.

Classic serverless ran one request per instance. A second request arriving while the first was still in flight got its own brand-new instance, and a fresh instance means a cold start, so traffic spikes meant a flurry of cold starts. Fluid changes this: one warm instance now handles multiple concurrent requests at once. While request A is sitting idle waiting on a slow database query, the same instance picks up request B and starts working on it, reusing the dead time instead of spinning up new hardware.

This is close to free money for a SaaS specifically, because your workload is I/O-bound . A typical request spends the bulk of its life waiting on Postgres or a third-party API. Under classic serverless that idle time was pure waste, since the instance held the request and did nothing useful. Fluid fills it with other requests. The same traffic needs fewer instances, which means fewer cold starts, which means lower latency and a lower bill, and you changed not a single line of code to get it.

The diagram below makes the idle-time reuse watchable. Scrub through the three steps.

Instance 1request A
exec
waiting on DB
exec
Instance 2request B
cold start
waiting on DB
exec
Instance 3request C
cold start
waiting on DB
exec
exec (CPU busy) waiting on DB (idle)
time →
Classic serverless: one request per instance. Two of these three concurrent requests pay a cold start, and each instance sits idle while it waits on the database.
Instance 1A · B · C
cold start
A
A
B
B
C
C
A exec B exec C exec waiting on DB
time →
Fluid Compute: one warm instance, the same three requests. While request A waits on the database, the instance runs request B in that gap — the idle time is filled, not wasted.
Classic serverless 3 instances 2 cold starts
Fluid Compute 1 instance 0–1 cold starts
Fewer instances · fewer cold starts · lower latency · lower bill
— with zero code changes.
The payoff: fewer instances, fewer cold starts, lower latency, lower bill — with zero code changes.

Two more facts round out the model.

First, the one you might be reaching for and won’t find: there is no concurrency dial to set. In-function concurrency is automatic. Vercel manages it, preferring to fill an existing warm instance’s idle capacity before it allocates a new instance. You don’t pick a number, and you don’t tune a maxConcurrency setting in vercel.json (older guides describe one, but it isn’t part of the current model). Your job isn’t to tune concurrency. Your job is to write code that’s safe to run concurrently in one process, which is exactly the trap the next section is about.

Second, a reassuring fact: errors are isolated. If one request throws an unhandled error, it no longer takes down the other requests sharing that instance. Fluid logs the error and lets the in-flight requests finish. That’s the comforting half of the picture. The unsettling half, and the reason there’s a whole section coming, is that errors are isolated but memory is not.

This is the one place in the whole lesson where you can ship a real bug, not a cosmetic one. It gets its own section because the thing to watch out for is the concept, not a footnote to the previous one.

Here’s the shift in mental model. When an instance served exactly one request at a time, anything you stored at the top level of a module was per-request in practice, since the next request got a fresh instance with fresh module state. Under Fluid, that’s no longer true. Module-scope state is shared across every concurrent request running on that instance. Anything at the top level of a module is now a shared resource, and several requests can touch it at the same time.

Before that scares you off module-level code entirely, here’s the safe case, which is most of it. The Drizzle/Neon database client you create once at module scope is completely fine. It’s a connection pool, built to be used by many concurrent callers safely. Stateless module-level singletons, and ones that are internally safe for concurrent use, are correct, and they’re the overwhelming majority of what you put at module scope. Keep doing it.

The unsafe case is narrow and specific: a module-scope value that holds request-specific data and gets mutated per request. The canonical mistake looks like a hand-rolled in-memory cache, or a let currentOrgId sitting at the top of a module that a request handler writes to. Picture it under concurrency. Request A, acting for one tenant, writes its org id into that module-scope variable. Before A finishes, request B, a different tenant running concurrently on the same instance, reads that same variable and gets A’s value. That’s not a glitch. That’s one customer’s data served to another customer, a cross-tenant leak, the worst class of bug a multi-tenant SaaS can have. And it’s completely invisible when you test one request at a time locally, because the bug needs two concurrent requests to exist.

The two snippets below are the same logic: one shape leaks, one doesn’t. The only real difference is where the per-request value lives.

app/api/summary/route.ts
import { db } from '@/db';
let currentOrgId: string | null = null;
export const GET = async (request: Request) => {
currentOrgId = request.headers.get('x-org-id');
const totals = await computeTotals(db, currentOrgId);
return Response.json(totals);
};

This leaks across tenants. currentOrgId lives at module scope, so it’s shared by every concurrent request on the instance. Request B can overwrite it between the moment request A sets it and the moment A reads it back, and now A computes totals for B’s organization. The bug only appears under concurrency, which is why local single-request testing never catches it.

So what do you do when per-request state genuinely has to travel deep through a call stack without being passed as an argument to every function along the way? That’s the one case where function locals aren’t enough, and the answer is AsyncLocalStorage . You’ll see it carry things like a request id or the current org context in other parts of the course. Here, just know its name and the shape of the problem it solves. Module scope stays reserved for things that are genuinely safe to share.

This is the same principle Next.js states for proxy.ts: its own docs warn that proxy code should not rely on shared modules or globals. Same hazard, same reason, because concurrent execution and shared top-level state don’t mix.

Sort the following into the two kinds of state. The skill this section teaches is the judgment call itself, recognizing which kind of thing each one is, so practice it rather than restating the rule.

Decide where each of these belongs. Some things are safe to create once at module scope and share; others hold per-request data and must stay inside the request. Drag each item into the bucket it belongs to, then press Check.

Fine to share (module scope) Stateless or built for concurrent use
Must stay per-request Holds data specific to one request
The Drizzle / Neon pooled database client
A Stripe SDK client instance
A compiled regular expression used for validation
A frozen config object read at module top level
let currentUserId set from the incoming request
An in-memory Map caching the current request’s computed totals
A per-request requestId for log correlation

Here’s the spine of the lesson in one pass.

  • Region: change it once, to match the database. This is the only deliberate change of the three, and it’s the one that produces a silent latency tax if you skip it.
  • Runtime: leave it on Node.js. Edge is a measured, per-route exception for a tiny, stateless, proximity-shaped path, not a speed button you reach for by default. And middleware lives in proxy.ts on Node.js, not on Edge.
  • Compute: Fluid Compute is on automatically. There’s no concurrency dial, so your job is to write code that’s safe to run concurrently, which means keeping per-request state out of module scope.

The mistake each one invites is worth memorizing as a set: region mismatch is a silent latency tax, reaching for Edge by reflex is lost capability, and request-specific state at module scope is a cross-tenant leak.

Two of these come back later in the chapter. The launch checklist verifies “function region matches the database, connections pooled” as a single row, and you’ve built the understanding behind it here. And the next platform piece, per-PR Neon branches, is about giving each preview its own database, whereas today’s region rule is about where the production function and database sit.

Test the whole thing on a scenario.

A teammate opens a PR that adds export const runtime = 'edge' to a route handler. The handler runs a Drizzle query against your pooled Postgres connection, and the PR note reads “faster cold starts.” What’s the right review comment?

This route depends on the pooled Postgres driver, which Edge can’t load — it would break the route — and even on the HTTP driver, a faster cold start wouldn’t be worth giving up the pooled connection here. Keep it on Node.js.
Approve it. Edge runs closer to the user, so any route is faster there.
Approve, but ask them to move the Drizzle query into proxy.ts so the query itself runs at the edge.
Approve, and ask them to bump maxConcurrency in vercel.json so the edge instance can absorb the extra traffic.

One more, on the setting that fails quietly.

Right after launch, your p95 latency is much higher than it was in development, even though no code changed near any database query and every test still passes. What’s the first thing to check?

Whether the function and the database ended up in the same datacenter — if they’re a continent apart, every query pays the trip on every request.
Whether Fluid Compute got switched off, since that’s the setting that keeps requests fast.
Whether a heavy npm package slipped into the bundle and slowed the function down.
Whether you forgot to opt the slow routes into the Edge runtime to bring the latency down.