Skip to content
Chapter 99Lesson 3

Rehearsing on a Neon preview branch

How a per-pull-request Neon preview branch carrying production-shaped data lets you rehearse each step of an expand-migrate-contract migration before it ever touches production.

In the last two lessons you learned to plan a safe migration and to read its SQL closely enough to know whether it crosses the trigger. But a migration you have planned and reviewed is still only a belief. You believe the ALTER TABLE applies cleanly. You believe the app keeps working while both schemas coexist. You believe the backfill finishes before anyone notices. None of that is a fact yet. It is a hypothesis, and a hypothesis that has never been tested against real data is exactly the kind of thing that passes review and then fails in production.

You already own the thing that turns those beliefs into facts. Back when you wired up the Native Vercel-Neon Integration, every pull request started getting its own throwaway Neon branch: a full copy of production’s data that nothing in production can ever see. You met that branch as a safety net, a place where a stray test insert can’t pollute the real database. This lesson asks you to look at the exact same branch and see something else: a rehearsal stage. The wiring is the same; the job is new. It is the most production-like environment you have that is completely free to break, which makes it the perfect place to run the risky migration first, on purpose, and watch what happens. By the end you will have a four-check loop that you run on every step of the cadence, so that a missed dual-write or a table-locking backfill shows up on a throwaway branch instead of in an incident channel.

The preview branch is your rehearsal stage

Section titled “The preview branch is your rehearsal stage”

Recall what you already have, because the whole lesson stands on it. When a pull request opens, the integration creates a Neon branch off main, production’s branch, using copy-on-write. That branch carries production’s data: the same row counts, the same value distribution, the same indexes. And the preview deployment’s build command runs pnpm db:migrate && next build, so the pull request’s migration is applied to that branch before the app even boots.

That last detail is the one that matters here. It means the branch is not an empty sandbox you have to seed by hand. It is a fair copy of production with your migration already run against it. Production-shaped data plus full isolation makes it the ideal place to run a change you are nervous about. And because each step of expand-migrate-contract ships as its own pull request, each step gets its own branch and its own rehearsal.

That per-step point is the one people most often get wrong. “Rehearse the cadence” does not mean “rehearse it once before you start.” It means rehearse every step. Expand, migrate, and contract are three separate pull requests, and each one is its own hypothesis that earns its own test.

So how do you actually verify a step? Verification happens in two layers, and keeping them straight is the spine of everything that follows. There is an automatic ring that runs without you lifting a finger: the build applies your migration to the branch, and the CI gate type-checks and tests your code. And there is a manual ring that only happens because you do it by hand: you open the preview URL and walk the app, and you query the branch directly to see what the migration actually did to the data. The two rings prove different things. The automatic ring proves the migration applies and your typed code compiles. Only the manual ring proves the change is correct. A green build is necessary, but it is nowhere near sufficient.

Manual ring
walk the preview URL
Automatic ring
build runs db:migrate
Your migration ADD COLUMN customer_id
CI type-check + tests
proves it applies + compiles
query the branch directly
proves it is correct
Two concentric rings wrap every migration. The inner automatic ring runs on its own and proves the migration applies and the code compiles; the outer manual ring is the work you do by hand, and only it proves the change is correct.

Start with the automatic ring, because it is the cheapest verification you will ever get and the easiest one to overtrust.

The moment your pull request’s commit lands, Vercel’s build runs pnpm db:migrate against the preview branch. That is the db:migrate && you prepended to the build command earlier. If the migration is broken in a way Postgres can detect, such as invalid SQL, a type mismatch, or a lock timeout against the branch, the migration step fails, the build fails, the failure posts straight to the pull request, and none of it reaches production. That is your first verification, and it costs you nothing.

The other half of the automatic ring is the CI gate. Type-check and tests run against what your code expects, and here Drizzle hands you a property worth pausing on: its typed query builder catches every dropped-column read at compile time. Consider what that means for the contract step. The instant you remove customer_name from the schema, the invoices.customerName field stops existing on Drizzle’s inferred row type. Every typed read of it across the entire codebase becomes a type error, surfaced by the build before a single human reviews the diff. This is why the contract step’s nagging question, “did I miss a read of the old column somewhere?”, is mostly answered for you by the type system.

That word “mostly” matters, so let me draw the boundary precisely. The automatic ring proves your migration applies and your typed code compiles. It does not prove the app behaves correctly. It does not prove the dual-write path is actually reached for every mutation. It does not prove the backfill finishes in time, and it certainly does not prove the backfilled values are right. Those four things are the manual ring’s job, and they are the subject of the next section. A green build is necessary but not sufficient because “the SQL ran and the code compiled” and “the change is correct” are different claims, and only the first one is something a machine can check for you.

Your contract pull request’s preview build is fully green: db:migrate ran clean against the branch, and CI type-check and tests both pass. What can you conclude from that — and only that?

The schema change landed and nothing in your typed code still references the dropped column — but whether the app actually behaves correctly is still untested.
Every reviewer concern is now covered, so you can merge straight to production without opening the preview URL.
The data the backfill wrote into the new column is correct.
The migration won’t stall once it’s competing with production’s live write traffic.

The build is green, so the SQL applies and the code compiles. Now you prove that the change is correct. This is the manual ring, and it is the core skill of the lesson: four checks, run in order, on every cadence step. They are always the same four checks. What shifts from step to step is which one carries the most weight, and we will draw that out shortly.

Check one: the migration applied, and it matches the pull request. The Vercel build log shows the db:migrate step succeeded. The new migration row has appeared in the branch’s __drizzle_migrations table. And the live schema actually matches what your pull request promised: open Drizzle Studio against the branch (or run a quick \d invoices) and confirm the column is there with the type and nullability you intended. A migration that ran but produced a slightly different shape than the SQL you reviewed is a real failure mode, and this is the check that catches it.

SELECT id, hash, created_at
FROM drizzle.__drizzle_migrations
ORDER BY id DESC
LIMIT 1;
-- in psql, inspect the live shape of the table:
-- \d invoices

Check two: it finished in a reasonable time. A migration that takes ten minutes against a production-shaped branch will take longer against production proper, because production is busier and its lock windows are narrower. So if the run feels slow on the branch, treat that as an early warning, not an inconvenience. The fix is almost always one of two things you already know: a CREATE INDEX that should have been CONCURRENTLY, or a backfill that should have batched. A slow run on the branch is the cheapest possible preview of a slow run on production.

Check three: the app still works against the new schema. Open the preview URL and walk the critical paths like a user would: list pages render, mutations succeed, the dashboard counts match what you expect. Because the branch is fully populated, a broken query breaks visibly: an empty table, a 500, or a count that is suddenly wrong. This is the check the type system simply cannot do for you. It exercises the things that never go through Drizzle’s typed builder: raw SQL fragments, external integrations, and plain runtime behavior.

Check four: the old shape still works where it is supposed to. This is the check that guards the invariant from the cadence: the schema has to satisfy both live versions of your code at once. What “works” means here depends on the step. For expand, query the old column directly in Studio. It should still be perfectly readable and writable, because an additive change broke nothing. For migrate, hit the real code paths through the preview URL. Every mutation should write both columns, and every read should return the new value when present and fall through to the old when not.

The dual-write inspection (the bit everyone skips)

Section titled “The dual-write inspection (the bit everyone skips)”

Check four hides the single most-skipped verification in the whole rehearsal, and it deserves its own treatment because it is the one the type system can’t help you with at all.

The procedure is short. Through the preview URL, perform a real mutation: create an invoice, or edit one. Then open Studio against the branch and look at that exact row. It should have both customer_name (the old column) and customer_id (the new one) populated. If both are there, your dual-write is doing its job.

SELECT customer_name, customer_id FROM invoices WHERE id = '...';

If only one column is populated, you have found a mutation site where the dual-write code path was never reached. Fix the missed site, push, and re-verify. The reason you have to look at an actual row rather than trust the build is that the type system never proves the dual-write path is reached for every mutation site. An update that sets only the old column type-checks perfectly fine: it is valid code that does the wrong thing. The only thing that proves coverage is a row-level look at data you wrote yourself.

Run on each step
Expand additive — broke nothing
Migrate dual-write + backfill
Contract old shape removed
1 Migration applied + matches the PR
New column present, type + nullability match
Backfill ran, migration row present
Old column gone, type still matches
2 Ran fast on the branch
Additive change, near-instant
Time it, extrapolate to prod rows
Drop is fast
3 Preview URL critical paths work
Everything renders as before
Mutations succeed end to end
heaviest here Nothing broke when the old shape vanished
4 Old shape still behaves
heaviest here Old column still reads + writes
heaviest here A mutated row has BOTH columns populated
No typed or raw read of the old column remains

Three separate pull requests → three separate rehearsals. Rehearsing the cadence means rehearsing every step, not once.

The same four checks run on every cadence step — what shifts is which check carries the most weight. The highlighted cell in each column is that step's heavy check: Expand leans on the old shape still working, Migrate on a mutated row carrying both columns, Contract on nothing breaking once the old shape is gone.

The deeper rehearsals some migrations need

Section titled “The deeper rehearsals some migrations need”

The four checks are your default, and for most migrations they are the whole job. The next three rehearsals are not defaults. They are escalations you reach for when a specific change earns them. Each one closes a gap the four checks leave open, and each one has a threshold that tells you when it is worth the effort.

Timing the backfill and extrapolating to production

Section titled “Timing the backfill and extrapolating to production”

The four checks tell you the backfill ran. They don’t tell you how long it will run when it has ten times the rows. So when the table is large enough that “how long will this take?” is a genuine question, time the backfill on the branch.

Terminal window
time pnpm tsx scripts/backfill_customer_ids.ts

Then extrapolate against production’s row counts. A backfill that works through 200K rows in 90 seconds on the branch will take roughly 900 seconds against a 2-million-row production table. That is a linear first approximation, and reality is usually a little kinder than linear, since the OS page cache warms up, so treat the number as a ceiling rather than a promise.

What the estimate buys you is a decision. If 900 seconds is fine, ship it. If it pushes past an acceptable window, you batch smaller, or you move the backfill to a background job on Trigger.dev, a tool you’ll meet later in the course, named here only as the escape hatch for backfills measured in millions of rows. Either way, discovering a multi-hour backfill on a throwaway branch is far better than discovering it halfway through the real run.

During a normal rehearsal the branch sees zero real traffic. That is usually a feature, but it hides one specific failure: a migration that is only slow under write contention will look instant on a branch nobody is writing to.

To rehearse the contention, you have to manufacture it. Load the branch with synthetic write traffic, a small script that runs the relevant mutation in a tight loop, and run the migration while that script is running. If the migration grabs a long lock, those writes stall, and Neon’s metrics show the spike plainly. That is the “looked fine in dev, stalled production” failure mode, caught on a branch.

This is the empirical companion to the lock reasoning from the previous lesson. There, you reasoned about whether a statement takes an ACCESS EXCLUSIVE lock. Here, you measure whether that lock actually stalls real writes. Reach for it only when the target table’s write traffic is measured in writes per second rather than per minute. For the overwhelming majority of tables, the plain four-check rehearsal is enough, and manufacturing synthetic load is wasted effort.

The data-integrity diff: proving the values are right

Section titled “The data-integrity diff: proving the values are right”

Check one proves the column exists. It says nothing about what is in it. A backfill can populate a brand-new column with completely wrong values and still pass every schema check you have: the column is there, the type is right, nothing is null. That is silent corruption, and the only thing that catches it is a value-level audit after the backfill runs on the branch.

There are two queries, and you run them as a pair. The first counts nulls: after a complete backfill, no row should be missing the new value, so the count should be zero. That proves completeness. The second compares the backfilled value against the source of truth. For our rename, every invoice’s old customer_name should match the name on the customer its new customer_id points to. That count should be zero too, and it proves correctness.

SELECT count(*) FROM invoices WHERE customer_id IS NULL;
SELECT count(*)
FROM invoices i
WHERE i.customer_name IS DISTINCT FROM (
SELECT c.name FROM customers c WHERE c.id = i.customer_id
);

That second query leans on IS DISTINCT FROM rather than a plain <> so that rows with a null on either side still compare the way you expect instead of vanishing from the count.

The point is worth stating plainly: a schema that is structurally perfect can still hold wrong data. The value diff is the only check that catches a backfill that ran cleanly but mapped rows to the wrong place, so reach for it whenever the backfill derives a value through a join, a lookup, or a computation, rather than copying a column straight across.

You have now met every tool in the rehearsal. This section ties them together by showing what makes the per-step rule real: each step of the cadence has its own distinct way of going wrong, and each failure has a check that catches it. Walk the three steps with me.

Expand can fail when the new foreign key rejects a value the migrate step will need to write. Adding customer_id uuid REFERENCES customers(id) is a quiet promise that every value the backfill will eventually write points at a real customer row. If that promise is false, say an invoice whose derived customer doesn’t actually exist, the foreign key rejects it. The rehearsal catches this because the backfill runs against production-shaped data: the violation surfaces the instant the backfill runs on the branch, not weeks later in production. You fix it in the migrate pull request before it ever merges.

Migrate can fail in two ways. The first is the dual-write skipping a code path, caught by the direct Studio inspection you just learned: a row written through the good path has both columns, a row written through a missed path has only the old one. The type system can’t see this, but the row-level look can. The second is the backfill taking too long, caught by timing it on the branch and extrapolating, with batch-smaller or Trigger.dev as the fallback.

Contract can fail when something, somewhere, still reads the old column. This one is caught in two ways, and the combination is what makes it safe. Drizzle’s types catch every typed read at compile time, because the field is simply gone, so the build breaks. The preview URL catches the rest: the raw SQL fragments and external integrations the types can’t see, because dropping the column breaks those reads visibly on the page. The experienced move before merging contract is to grep the codebase for the old column name by hand, so any raw-SQL straggler the types missed turns up before the column does.

And one failure mode spans the whole cadence: a CI job that still names the old column. A test fixture or a seed script that references customer_name after the contract step will fail the CI build, and the pull request stays blocked until the tests are updated. That is the CI gate working exactly as intended, a second line of defense sitting right behind the type system.

Match each cadence-step failure mode to the rehearsal check that catches it. Click an item on the left, then its match on the right. Press Check when done.

The dual-write skips a mutation site
Direct row inspection in Drizzle Studio
The backfill maps rows to the wrong customer
The value-match audit query
A destructive migration locks the table under write load
The synthetic-load lock rehearsal
A raw-SQL query still reads the dropped column
Walking the preview URL
The new foreign key rejects a backfill value
Running the backfill against production-shaped data

What the rehearsal can’t catch, and the production handoff

Section titled “What the rehearsal can’t catch, and the production handoff”

Everything so far has been about building confidence. This section is about keeping that confidence honest, because the most dangerous assumption in this whole topic is “the preview worked, so production can’t fail.” It almost always does work. But “almost always” is not “always,” and there are three specific things the rehearsal genuinely cannot see.

Production-scale concurrency. The branch sees no real traffic during the rehearsal, so a migration that is only slow under contention can look fast, unless you ran the synthetic-load rehearsal, and even that is an approximation of the real thing.

Post-branch data. The branch is a snapshot, frozen at the moment it was created. Every row added to production since then simply isn’t in it. A migration that is flawless on the snapshot can still trip over a value that only exists in the rows the snapshot never saw.

Long-running production transactions. A migration can be blocked by a transaction that exists only in production, such as a slow analytics report or a connection that got stuck open. The branch has no such transactions, so it can’t show you that stall.

Which leads to the conclusion this whole lesson has been building toward: the rehearsal is necessary but not sufficient. For a high-stakes migration, the discipline is the rehearsal plus a low-traffic deploy window plus a watchful eye on your error and database dashboards while it runs. You’ll wire up that kind of observability later in the course; for now, just know that “merge and look away” is never the move for a risky change.

So what does the handoff to production actually look like? Once the four checks pass, you merge the pull request. Vercel re-runs pnpm db:migrate against production’s main branch as part of the production build, using the same command and the same migration runner. And because the branch run already validated this exact migration, the production run is a repeat: same SQL, same runner, same shape of data, just the real rows this time. That is the entire payoff of the branch carrying production-shaped data. Your rehearsal was a fair copy, so the real run holds no surprises.

For the busiest tables, the ones where post-branch data is a real risk, add one step to the cycle. Recall that neonctl is the manual escape hatch sitting underneath the automatic integration: the integration creates a branch when the pull request opens, but you can cut a fresh one at any moment.

Terminal window
neonctl branches create --parent main

Take that fresh branch off production at the exact moment the pull request is ready, run the cadence step against it under synthetic load, and watch. Reach for this only when the first rehearsal raised a question that only a current snapshot can answer. For the vast majority of migrations, the branch the integration already gave you is current enough.

Here is the whole lesson as a gate you can actually tick. Run it on each cadence step you ship, which means three times across a full expand-migrate-contract change, not once. The last row is always the same, and it is the one this lesson exists to install: you watched the migration run, you didn’t just review it.

The preview build’s db:migrate step is green and the new row is in __drizzle_migrations.
tested
Studio confirms the live schema matches the PR’s SQL: column, type, nullability.
untested
The migration finished fast on the branch, with no unexpectedly long run.
untested
Every critical path on the preview URL still works.
untested
The old shape behaves: it still reads/writes where it should (expand), or a mutation populates both columns (migrate), or nothing broke when it vanished (contract).
untested
For a backfill step: the branch timing, extrapolated to production row counts, fits an acceptable window.
untested
For a derived backfill: the null-count and value-match audit queries both return zero.
untested
CI type-check and tests are green against the PR.
tested