Rollback rehearsal and the schema caveat
The cadence is finished. Production sits on the target schema — subtotal and tax, no total — and across three reviewed PRs you never once let the running app and the live database disagree. There is one move left, and it builds nothing: you are going to break production on purpose, roll it back, watch the rollback fail to fix the real problem, then write down exactly what you saw so the engineer who inherits this app at 2 AM does not have to discover it the hard way.
That last clause is the whole lesson. An alias re-point swaps the running code in seconds, but the database stays on whatever schema your forward-only migrations already applied. Roll the code back against a schema it no longer fits and you have traded one outage for another. The only way to make that land — the only way it ever lands — is to do it once, with your own production, while nothing is at stake. So that is what you’ll do.
The only file that changes on disk is docs/runbooks/rollback.md. Everything else is a gesture on the Vercel dashboard and the verification that it did what you expected.
Your mission
Section titled “Your mission”You are not changing production this time; you are rehearsing how to recover it, so the dashboard is not unfamiliar territory the first time an incident actually demands it. You will rehearse against the contract deployment deliberately, and the choice is the point: contract is the one move in the entire cadence whose schema change a rollback cannot reverse, which makes it the sharpest possible demonstration of the caveat. Promoting the previous deployment — the one that shipped right after PR 2 — restores code that reads total through the dual-read coalesce fall-through. But total is gone. For the few seconds before you re-promote, production raises a column total does not exist error, and that is not a mistake to avoid; it is the proof you came for. It doubles as a live test of your observability — Sentry must catch it, exactly as the launch checklist’s Sentry row promised it would.
This casualness is affordable only because the project has no live users. In a system that did, the same rehearsal would run inside a scheduled maintenance window against a throwaway deployment, never against the alias real traffic depends on — name that real-world version in your head, but you are not exercising it here. The runbook you write is not for you; it is for the on-call engineer who arrives at 2 AM with none of today’s context, and it must hand them the one distinction that decides everything: whether they are looking at an application bug — recovered by an alias re-point plus a code-only git revert, schema untouched — or a schema mistake, which an alias re-point cannot undo and which is recovered only by a forward-fix migration (for instance re-adding total as a GENERATED ALWAYS AS (subtotal + tax) STORED column — named in the runbook, never run here).
docs/runbooks/rollback.md carries the four-step alias re-point gesture, the git revert follow-up section, the re-enable-auto-assignment section, and the bolded “an alias re-point does not undo a migration” caveat.git revert path for an application bug, and the forward-fix migration (the GENERATED ALWAYS re-add of total) for a schema mistake.curl -sI returning the PR-2 x-vercel-id and the inspector’s build-source panel showing the PR-2 commit SHA.column total does not exist error and Sentry receives it.main will not silently re-ship the contract code until it is re-enabled.The test for this lesson can only reach the one artifact that lives on disk: it reads docs/runbooks/rollback.md as text and asserts its load-bearing structure. It cannot promote a deployment, read a Sentry event, or query Neon. Every live gesture is therefore yours to confirm by hand in Moment of truth.
Coding time
Section titled “Coding time”Run the rehearsal against the brief, then write the runbook and restore production to the target state.
Reference walkthrough
The rehearsal
Section titled “The rehearsal”-
Open the Vercel dashboard for the project and go to Deployments. The current production deployment is the PR-3 (contract) merge.
-
Find the previous production deployment — the one that shipped when PR 2 merged. Open its menu and choose Promote to Production.
-
Watch the alias swap. It completes in under thirty seconds; no rebuild runs, because the deployment already exists. From a terminal,
curl -sI https://<APP_URL>and read thex-vercel-idheader, then open the inspector’s deployment panel — both now point at the PR-2 commit SHA, not PR-3’s. -
Hit
/invoices. The page errors: the PR-2 code reads through the dual-readcoalesce(invoices.subtotal, invoices.total)fall-through, the Drizzle query reaches for atotalcolumn that the contract migration dropped, and Postgres answerscolumn total does not exist. Open Sentry — the error is there, captured within seconds. -
Confirm auto-assignment flipped off. When you promote an older deployment by hand, Vercel stops auto-assigning the production alias to new builds so a fresh merge cannot silently overwrite your manual choice. Check it under Settings → Domains.
-
Re-promote the PR-3 (contract) deployment the same way. The alias swaps back in seconds;
/invoicesrecovers; the inspector’s schema-state panel shows the target shape again (subtotal/taxNOT NULL, nototal); Sentry goes quiet after a refresh window. Re-enable auto-assignment once you’ve smoke-tested the restored deployment.
The command that makes the alias swap concrete is the response-header read:
curl -sI https://<APP_URL># read x-vercel-id — it identifies the deployment the alias currently servesIf you prefer the CLI to the dashboard, vercel ls --prod lists production deployments and vercel promote <deployment-url> flips the alias — the same gesture, scriptable.
The filled runbook
Section titled “The filled runbook”docs/runbooks/rollback.md ships as a stub: the bolded caveat is already written, and three section headers sit empty under it. Your job is to fill those three sections with the gesture you just rehearsed and append the discriminator that tells an application bug apart from a schema mistake. The completed file:
# Rollback runbook
How to roll back a bad production deploy — and the one caveat that makes a schemamigration different from a code deploy.
## The caveat
**An alias re-point does NOT undo a forward-only migration.** Pointing theproduction alias back at the previous deployment reverts the *code*, but thedatabase schema has already moved forward — the dropped column is gone. Rollingback code without a compatible schema is its own outage.
## The four-step alias re-point
When a bad deploy reaches production, this restores the previous code in seconds.
1. **Identify the previous green production deployment.** Vercel dashboard → Deployments, or `vercel ls --prod` from a terminal. You want the last build that was healthy before the bad one.2. **Promote it.** Open its menu → Promote to Production (UI), or `vercel promote <deployment-url>` (CLI). The alias flips; no rebuild runs.3. **Verify the swap.** `curl -sI https://<APP_URL>` and confirm the `x-vercel-id` header matches the deployment you promoted; cross-check the inspector's build-source panel (the commit SHA) and watch Sentry's error rate fall.4. **Remember the caveat.** This restores code, not schema. If the incident was a forward-only migration, the older code may now fail against the current schema — plan a forward-fix migration (below) as the durable resolution.
## The git revert follow-up
The alias re-point is a stopgap; auto-assignment is now off, so the bad commit isstill the tip of `main`. Make the rollback durable by reverting the code:
1. Open a PR that reverts the bad commit — `git revert <bad-sha>` — and let CI run.2. Merge after green. The next production deploy ships the reverted code, and `main` once again matches what's live.
## Re-enabling auto-assignment
Promoting an older deployment by hand turns off automatic alias assignment so alater merge can't silently overwrite your manual choice. Once the restoreddeployment has passed a smoke test, re-enable auto-assignment (Settings → Domains)so the next merge to `main` resumes shipping to production normally.
## Application bug vs. schema mistake
Before you reach for any of the above, decide which problem you have:
- **An application bug** — the schema is fine, the new code is wrong. Alias re-point to roll back instantly, then a code-only `git revert` to make it durable. The database is never touched.- **A schema mistake** — a forward-only migration changed the shape and the change itself was wrong. An alias re-point will NOT fix this; the old code fails against the new schema. The durable fix is a forward-fix migration — for example, re-adding a dropped `total` as `numeric GENERATED ALWAYS AS (subtotal + tax) STORED`. It is expensive next to an alias re-point and cheap next to true data-loss recovery, and it is warranted only when the contract itself was wrong.A few decisions worth their one sentence each. The four-step gesture stays deliberately tool-agnostic — dashboard or vercel ls --prod / vercel promote — because the 2 AM engineer might be on a phone, not a laptop. The verification step leans on three independent signals (x-vercel-id, the inspector’s commit SHA, Sentry’s error rate) rather than one, because a rollback you can’t confirm is a rollback you can’t trust. The git revert follow-up exists because the alias re-point alone leaves the bad commit as the tip of main, so the very next merge would re-ship it — the runbook closes that trap explicitly. And the discriminator is the section the engineer reads first under pressure: it routes them to the cheap instant fix or warns them off it before they make a code-only rollback against a schema that won’t have them.
The two-layer rollback model — the instant Vercel alias re-point paired with the durable git revert on main, plus why auto-assignment flips off and why a rollback can’t undo a migration — is taught canonically in Two-layer rollback when prod breaks; this runbook is that lesson made specific to this project. For the git revert mechanics — opening the revert PR, why a merged commit gets reverted rather than rewritten — see Reflog, bisect & rescue.
The dashboard gesture you rehearse here, including why auto-assignment turns off after a manual promote.
The scriptable variant of the alias re-point — the same flip from a terminal, not the dashboard.
The GENERATED ALWAYS AS (subtotal + tax) STORED forward-fix the runbook names for a schema mistake.
Moment of truth
Section titled “Moment of truth”Run the lesson’s test suite:
pnpm test:lesson 6The suite reads docs/runbooks/rollback.md and passes when the runbook carries its load-bearing structure: the four-step alias re-point, the git revert follow-up, and the re-enable-auto-assignment sections each filled with guidance rather than a bare header; the bolded “does not undo a migration” caveat still present and still bold; and both recovery paths named — the code-only git revert for an application bug and the forward-fix GENERATED ALWAYS migration for a schema mistake. Pass or fail; that is the entire surface.
Everything else in this lesson happened on the Vercel dashboard and against live Sentry and Neon, which no Node test can reach. Confirm those by hand:
curl -sI and the inspector’s build-source panel both confirm the PR-2 commit SHA is live./invoices raises column total does not exist, and Sentry receives the error during the rehearsal window.subtotal/tax NOT NULL and no total, and Sentry goes quiet after a refresh.That last tick closes the project. You shipped a green repo to a live URL, evolved a destructive schema change across three reviewed PRs with the running app and the live database never once out of step, and rehearsed the recovery you hope never to need. The thing you carry out of here is not the invoices app — it’s the discipline: forward-only migrations in three deploys, every change behind a green PR, rollback as a recovery primitive you’ve already practiced, and a runbook waiting for whoever is on call when it counts.