Skip to content
Chapter 40Lesson 2

Production-safe migrations

Reading and hand-editing Drizzle Kit migrations so a schema change never locks tables or strands the old app version during a live deploy.

The loop from the last lesson had five beats: edit db/schema.ts, run generate, review the SQL, run migrate, commit the three artifacts together. That lesson told you to read the generated SQL before you commit it, but it never told you what to read for.

Here is what that gap costs. A teammate adds an index to speed up a slow query. The schema change is one line, and Drizzle Kit emits one line of SQL. The SQL is correct: it builds exactly the index that was asked for. The PR gets a thumbs-up because the diff looks trivial. Then it ships at 2pm, and CREATE INDEX takes a lock that blocks every write to invoices for the forty minutes it takes to scan two million rows. For forty minutes, nobody can create or update an invoice. The schema was right, but the deploy was an outage.

The idea to internalize early is that drizzle-kit generate produces SQL that is correct in form but not always safe under live traffic. The generator works by diffing two snapshots of your schema. It can see that you added a column or an index, but it cannot see that the table has two million rows, that it’s being written to right this second, or that an old version of your app is still serving requests mid-deploy. You are the only part of this pipeline that knows those things. So the habit you’re building this lesson is neither “trust the generated file” nor “hand-write all the SQL yourself.” It is to read every migration with locks and data loss in mind, and to hand-edit the handful of cases the generator can’t see.

By the end you’ll run every migration through a five-question checklist, and you’ll know the three hand-edits that keep a deploy from taking the site down.

Generate by default, push only for throwaway data

Section titled “Generate by default, push only for throwaway data”

Before any checklist comes a fork: do you even produce a reviewable file? Settle that first, because the rest of the lesson assumes a generated .sql exists to review.

You met two commands in the last lesson, and both speak DDL . drizzle-kit generate reads your schema, diffs it, and writes a numbered .sql file you commit. drizzle-kit push reads your schema and applies the diff directly to the database: no file, no snapshot, nothing to review, nothing in the PR.

generate is the default for anything real, and the case for it comes down to three properties of having a file:

  • It’s reviewable. A human reads the SQL in a PR and catches the destructive operations the rest of this lesson is about, which the schema diff alone hides.
  • It’s replayable. CI applies the exact same bytes to staging that production will later apply to itself, so the drift between environments is zero: they ran the identical file.
  • It’s editable. The hand-edits you’re about to learn live in that file and travel with it forever.

Any team larger than one defaults to generate, which leaves the real question: when is push ever right?

push has one failure mode, and it’s the reason to be careful with it. Picture renaming invoices.customerName to invoices.clientName. To a snapshot diff, a rename and a “drop the old column, add a new empty one” look identical: same before-state, same after-state. generate notices the ambiguity and pauses to ask you whether this is a rename or a drop-and-add. push has no such pause. It resolves the ambiguity by dropping the column and creating a fresh empty one, so every customer name is silently gone, with no SQL file, no review step, and no PR comment where anyone could have caught it.

So the rule, short enough to memorize:

“Whatever you’d happily wipe” means a solo prototype before the schema has stabilized, or an ephemeral branch you’ll delete when the feature merges. It does not mean staging, and it does not mean a dev branch you share with a teammate. A shared staging database still loses data when you push to it: that data is somebody’s afternoon of test setup, and push deletes it without asking.

Walk the decision yourself in the tree below, and notice the order of the two questions, because that order is the part worth keeping.

push or generate?

Reading the generated SQL: the five-question checklist

Section titled “Reading the generated SQL: the five-question checklist”

This checklist holds the rest of the lesson together. Everything after it is one of these questions expanded into its fix.

Every PR that touches db/schema.ts ships a generated .sql. The reviewer reads the SQL, not the schema diff, because the SQL is what actually runs against the database, and the schema diff can’t show you a lock or a deploy window. These five questions are what they read for:

  1. Does anything DROP, whether a column, table, constraint, or index? If so, is the drop safe in this deploy, or does it strand the old version of your app that’s still running?
  2. Is every rename a true rename, or is Drizzle Kit being asked to disambiguate a remove-and-add? Check the prompt when you generate, because the wrong answer is silent data loss.
  3. Does any new index sit on a table with write traffic? If so, it needs CONCURRENTLY and its own migration file, or it locks writes for the whole build.
  4. Does any column-type change rewrite the table? A rewrite takes the strongest lock Postgres has, and every read and write blocks until it finishes.
  5. Does any new NOT NULL column lack a default? It fails the instant it meets a table that already has rows.

There’s a pattern hiding in that list, and naming it is what makes the checklist stick. Questions 1 and 2 are about losing data; questions 3, 4, and 5 are about locks and downtime. That’s two failure families in one read. When you scan a migration, you’re really asking two things: could this delete something I can’t get back, and could this freeze the table while it runs. The figure below is the whole checklist split that way, so keep it within reach.

Data-loss risks

Could this delete something I can’t get back?

1

DROP — is it safe this deploy?

2

A rename vs. a silent remove-and-add

Lock / downtime risks

Could this freeze the table while it runs?

3

A new index on a hot table

4

A type change that rewrites the table

5

NOT NULL column with no default

Read every migration twice — once for data loss, once for locks.

The rest of this lesson walks the checklist. You’ll practice it for real on a multi-file PR at the very end, where the closing exercise is the checklist run against code with planted defects.

Start with question 3, because it’s the most common hand-edit and the easiest to get right.

In the chapter on indexes you learned which index to add and why. This section is the migration-file mechanic for adding one safely. Drizzle Kit emits a line like CREATE INDEX "idx_invoices_status" ON "invoices" ("status"). That’s correct, and on an empty dev table it finishes instantly. The problem is what plain CREATE INDEX does to a table with traffic: it takes a lock that blocks writes for the entire build. Reads keep flowing, but writes queue up behind the lock. On a small table that’s a blink; on a hot table with millions of rows, it’s the forty-minute outage from the start of this lesson.

The fix is one word: CREATE INDEX CONCURRENTLY. Postgres builds the index without the write lock by making several passes over the table instead of one locked pass. You pay for that with time, since it’s slower overall, and with one hard constraint: a concurrent index cannot run inside a transaction block.

That constraint is what turns this into a real hand-edit, because of how the migrate runner works: Drizzle wraps each migration file in a single transaction. So a CREATE INDEX CONCURRENTLY dropped into a file next to other statements fails at runtime, because it’s sitting inside that file’s transaction, exactly where it’s not allowed to be.

The way around this is a discipline, not a flag: put the concurrent index in its own migration file, with nothing else in it. A file that contains only the concurrent index has no other statements to wrap in a transaction with, so it runs cleanly. You generate that empty file with drizzle-kit generate --custom --name add_invoices_status_index. The --custom flag produces an empty migration for hand-written SQL, which is also the seam the next section uses. The rule to remember: a concurrent index is always its own migration, never bundled with table changes.

Compare the two side by side. The difference is one keyword and one file:

drizzle/0006_add_invoices_status.sql
ALTER TABLE "invoices" ADD COLUMN "status" text DEFAULT 'draft' NOT NULL;
--> statement-breakpoint
CREATE INDEX "idx_invoices_status" ON "invoices" ("status");

Correct, but it locks writes. It builds the right index, but plain CREATE INDEX takes a write lock for the whole build, and on a busy invoices table that’s a partial outage.

One last term you’ll see in generated files: --> statement-breakpoint. It’s a marker Drizzle Kit inserts to split a file into separate statements, for database engines that can’t run several DDL statements in one transaction. That’s all it is, the line that separates statements. It is not the trick that escapes the transaction for a concurrent index; the separate file is. When you see --> statement-breakpoint, read it as “next statement” and move on.

The course rule, stated plainly: every index on a table with write traffic uses CONCURRENTLY, in its own migration file.

Migrations Drizzle Kit can’t write: the custom SQL file

Section titled “Migrations Drizzle Kit can’t write: the custom SQL file”

Question 3 has a cousin: things the generator can’t emit at all, because your schema can’t express them. You already wrote one of these, the updatedAt trigger from the chapter on schema design. This section is how that trigger actually reaches the database, and it answers the obvious objection: if I hand-edit SQL, won’t the next generate clobber it?

Drizzle Kit reads db/schema.ts and emits DDL for everything the schema can describe: tables, columns, foreign keys, plain indexes. It does not emit anything the schema can’t model. The cases to recognize:

  • triggers (the updatedAt trigger that stamps a row on every update)
  • CREATE EXTENSION (turning on a Postgres extension)
  • generated columns with complex expressions
  • custom check constraints
  • partial unique indexes whose WHERE references a computed value

The updatedAt trigger is the worked example. Your schema can declare the updatedAt column, but it has no way to say “and run this function before every update.” That instruction has to be written by hand, and the place it goes is a custom migration.

That’s the same --custom flag from the last section. drizzle-kit generate --custom --name set_updated_at_trigger emits an empty migration file, numbered in sequence like any other, for you to fill with hand-written SQL. This isn’t a workaround; it’s the documented seam for exactly this. You write the schema for the parts Drizzle Kit handles, and you write the trigger SQL into a --custom migration.

This is where the objection answers itself, through a split worth saying out loud:

The migration files are the source of truth for the database state. db/schema.ts is the source of truth for the Drizzle types.

A trigger lives only in a migration, because the schema can’t model it. A column lives in both: the schema models its type, the migration applies it. They don’t conflict, because each owns a different surface. So will the next generate overwrite your custom trigger SQL? No. generate writes a new, numbered file for the next diff. It never rewrites a migration that’s already committed; that’s the immutability rule from the last lesson, the one that says an applied migration is frozen. Your custom file is frozen the moment it merges. The two sources stay in sync because the next time you change a column, generate re-reads the schema and emits the next file in the sequence. It has no reason to touch the trigger file, and it never does.

Step through the trigger migration below to see where each piece comes from:

CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS trigger AS $$
BEGIN
NEW.updated_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
--> statement-breakpoint
CREATE TRIGGER invoices_set_updated_at
BEFORE UPDATE ON "invoices"
FOR EACH ROW
EXECUTE FUNCTION set_updated_at();

This started life as an empty file from drizzle-kit generate --custom --name set_updated_at_trigger. Drizzle Kit gave you the numbered, sequenced shell, and everything inside is hand-written.

CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS trigger AS $$
BEGIN
NEW.updated_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
--> statement-breakpoint
CREATE TRIGGER invoices_set_updated_at
BEFORE UPDATE ON "invoices"
FOR EACH ROW
EXECUTE FUNCTION set_updated_at();

This is what the schema can’t express: “before every update, run this function.” It exists only here, in the migration, never in db/schema.ts.

CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS trigger AS $$
BEGIN
NEW.updated_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
--> statement-breakpoint
CREATE TRIGGER invoices_set_updated_at
BEFORE UPDATE ON "invoices"
FOR EACH ROW
EXECUTE FUNCTION set_updated_at();

When you later change an unrelated column and run generate, it writes a new file for that change and leaves this one untouched. Immutability plus the source-of-truth split is why hand-edits are safe.

1 / 1

Three column changes that break a live deploy

Section titled “Three column changes that break a live deploy”

This is the judgment-heavy core: questions 4 and 5, plus the rename trap from questions 1 and 2. Three patterns share one root cause, so start with the cause.

The migration runs at a single instant, but the deploy is a window: two versions of your app coexist while traffic shifts from old to new, and the rows already in the table already have values. The generator sees neither the window nor the existing rows, and every problem below comes from one of those two blind spots.

Each pattern follows the same beat: the naive generated SQL, why it breaks, the fix, and the migration count.

Add NOT NULL to a new column and Drizzle Kit emits ALTER TABLE "invoices" ADD COLUMN "client_ref" text NOT NULL. It fails immediately. The instant that column exists, every row already in invoices violates NOT NULL, because none of them has a value for a column that didn’t exist a second ago.

Two fixes:

  • Add the column nullable, backfill the existing rows, then flip it to NOT NULL in a later migration once every row has a value.
  • Or ship it with a DEFAULT the existing rows can take, so they’re all valid the moment the column lands.

The generator emits the naive, failing form. Spotting it is question 5, and the nullable route costs you exactly one extra migration. This is the lowest-stakes of the three, which is why it goes first: it’s where the pattern is easiest to see.

The auto-generated ALTER TABLE "invoices" RENAME COLUMN "customer_name" TO "client_name" is correct SQL. The trap isn’t the SQL; it’s the deploy window. This is the canonical “I broke prod” mistake, so it earns a careful look.

Walk the timeline. The old version of your app is serving traffic, querying customer_name. The migration runs in one instant and renames the column to client_name. The new version of your app, which queries client_name, goes live a moment later. In the gap between the rename and the new version taking over, and for every old instance still draining its last requests, the old code is querying a column that no longer exists. Postgres throws, and the request 500s. A rename-in-place guarantees a broken window. It might last ninety seconds, but ninety seconds of every request failing is an outage.

Scrub through the cutover in the sequence below, and watch the moment the old version and the database disagree:

Before the migration

Old app version queries customer_name
Postgres invoices
id customer_name amount
200 every request OK
Steady state. The running app queries customer_name, and the invoices table has that column. Everything works.

The migration runs — rename in place

Old app version queries customer_name
Postgres invoices
id customer_name renamed away client_name amount
500 every request 500s
The column is renamed to client_name in one instant — but the old app version is still serving traffic and still asks for customer_name, which is now gone. Postgres throws; every request 500s. This is the broken window.

New version takes over

New app version queries client_name
Postgres invoices
id client_name amount
200 requests OK again
The new app version deploys and queries client_name, which the DB now has. Green again — but the gap between steps 2 and 3 was a live outage, however brief.

The fix: expand-contract

Old app version queries customer_name
New app version queries client_name
Postgres invoices
id customer_name client_name new amount
200 no red window opens
Expand-contract instead: client_name is added alongside customer_name, so both app versions find a column they can read while they coexist. The old column is dropped only later, once nothing reads it. No red window ever opens.

Here is the safe shape. Add the new column, dual-write from your app so both columns stay populated, backfill the existing rows, switch reads over to the new column, and drop the old column in a later deploy once nothing reads it. That’s three migrations and at least two deploys.

The full choreography of those dual-writes, when each one flips and how the app code coordinates the cutover , is a topic of its own that you’ll do hands-on later in the course. What you need now is smaller and more important: rename-in-place is unsafe in a live system, and the multi-step shape is the fix.

A naive, incompatible type change, say text to integer, makes Postgres rewrite the entire table. To do that it takes an ACCESS EXCLUSIVE lock and holds it for the whole rewrite. Every read and every write blocks until it’s done. On a large table, that’s not a slow query; it’s a full outage for as long as the rewrite runs.

One nuance, so you don’t overcorrect into fearing every type change: some changes are cheap. Widening certain types is metadata-only, where Postgres updates the column definition without touching a single row. So the rule isn’t “type changes are dangerous”; it’s to check whether this particular change rewrites the table. The lock and EXPLAIN tooling from the chapter on query plans is exactly how you reason about which bucket a change falls in.

When a change does rewrite, the fix is the same multi-step shape as the rename: add a new column with the new type, backfill it in batches, switch reads, drop the old column. Same three-deploy pattern.

All three share one shape, and it’s worth committing to memory: expand-backfill-contract (often just expand-contract).

  • Expand: add the new shape alongside the old one. Nothing breaks, because everything that worked before still finds what it’s looking for.
  • Backfill: populate the new shape for the rows that already exist, in batches when the table is large so you don’t lock it.
  • Contract: drop the old shape, but only once nothing reads it anymore.

Any destructive change takes three deploys at minimum. The reason it works is the one from the start of this section: it keeps both app versions able to read a column they understand for the entire window where they coexist. The three phases, as one picture:

1

Expand

App reads / writes customer_name
Postgres invoices
customer_name client_name new · empty

Add the new column. Nothing breaks yet.

2

Backfill

App dual-writes customer_nameclient_name
Postgres invoices
customer_name client_name now filled

Copy existing rows into the new column, in batches.

3

Contract

App reads / writes client_name
Postgres invoices
customer_name dropped client_name

Drop the old column once nothing reads it.

Expand-backfill-contract — three deploys, no broken window.

This lesson hands you the vocabulary and the recognition. When the checklist flags a drop, a rename, or a rewriting type change, the answer is “that’s an expand-contract, not a one-shot.” The full app-code choreography, the migration-by-migration walk with the dual-writes wired in, is a hands-on topic later in the course.

Before moving on, sort a handful of operations into the two responses you now have.

Each chip is a line Drizzle Kit generated, plus the one fact about the table it can't see. Sort by your two questions: could it lose data, or could it freeze the table? Drag each item into the bucket it belongs to, then press Check.

Ships as generated Correct SQL, safe under live traffic — let it through
Needs a hand-edit / multi-step Correct SQL, but locks or strands the deploy — rework it
ADD COLUMN "note" text  ·  nullable, no default
ADD COLUMN "archived_at" timestamptz  ·  nullable timestamp
CREATE INDEX "idx_audit_logs_created_at" …  ·  on a brand-new, empty table
CREATE INDEX "idx_invoices_status" …  ·  on a 2-million-row, write-heavy table
ADD COLUMN "client_ref" text NOT NULL  ·  no default, table already has rows
RENAME COLUMN "customer_name" TO "client_name"  ·  on a live, deployed table
ALTER COLUMN "amount" SET DATA TYPE integer  ·  text to integer, rewrites the table

A migration will fail on you eventually. The junior instinct is to roll it back. The 2026 reflex is the opposite, and it’s worth correcting head-on.

Start with a fact: Drizzle Kit emits up migrations only. There is no down file and no built-in undo, and that’s deliberate. The default is forward-only: you never roll back in production, and the next migration corrects the previous one. That correction is a forward-fix. A rollback would itself be a destructive operation, running against data that has changed since the migration ran, which makes the rollback riskier than just going forward. Reversing is the thing you don’t do.

There’s exactly one carve-out, and it’s local-only. drizzle-kit drop removes the most recent unapplied migration in dev, before you re-generate; it’s handy when you generated a file you didn’t mean to and haven’t run it anywhere. The production rule has no exceptions: never reverse, always forward-fix. And the footgun from the last lesson still stands: editing a migration that’s already been applied anywhere, even just on your laptop, drifts the snapshot out of sync with the database, and the next generate produces nonsense. If you need to undo something an applied migration did, you revert the schema edit and generate a new, corrective forward migration.

So what actually happens when a migrate run dies mid-flight? Rather than memorizing a procedure, hold the model.

  • Most failures happen inside the transaction. Each migration file runs in one transaction, so when a statement fails, Postgres rolls the whole file back, __drizzle_migrations gets no row for it, and the database is left exactly as it was. You fix the SQL or the bad data and re-run migrate. Nothing is left half-done.
  • The exception is anything that ran outside a transaction, which is precisely your CREATE INDEX CONCURRENTLY. If it builds halfway and then fails, Postgres can’t roll it back, because it was never in a transaction to begin with. It leaves an invalid index sitting on the table. The same thing that made the concurrent index safe, being outside a transaction, is what complicates its failure.

The habit that makes all of this rare: rehearse every migration’s failure mode against staging before it touches production. You should never first discover that a migration locks a table when the table it’s locking is the production one.

Shipping the migration: CI, deploy, and verifying it applied

Section titled “Shipping the migration: CI, deploy, and verifying it applied”

A reviewed file isn’t a shipped change. Here’s where the migration actually executes in a real pipeline, kept concrete but not exhaustive, because full deployment pipelines come later in the course.

There are two seams where migrate runs:

  • CI runs it against staging. On every merge to main, your CI applies pending migrations to the long-lived staging Neon branch, so staging gets the change first. If a migration is going to misbehave, it misbehaves on staging, where the blast radius is your test data, not your customers.
  • The production deploy pipeline runs it as a pre-deploy step, before the new app version swaps in. On Vercel that’s a build-and-deploy hook. The ordering is the whole point: the schema migrates, then the new app version that depends on it goes live. This is also why expand-contract has to exist: for a destructive change, even “migrate then deploy” isn’t enough, because the old version is still up and serving while the migration runs.

Two conventions carry over from the last lesson. Wrap the production run in a db:migrate:prod script in package.json, so the command is grep-able in your pipeline config instead of a bare CLI invocation buried in a YAML file. And point it at the unpooled DATABASE_URL_UNPOOLED, wired into the deploy environment separately from the pooled URL the app uses, because long DDL transactions die under transaction-mode pooling, so migrations always take the unpooled connection.

Then there’s the part juniors skip and experienced engineers never do: verify it applied. After a deploy, query the production branch:

select * from __drizzle_migrations order by created_at desc limit 5;

__drizzle_migrations tracks every applied migration by the content hash of its file, so the latest row’s hash should match the hash of the latest migration file you committed and reviewed. If they don’t match, the deploy applied something other than what you reviewed: a different version of the file, a migration from a branch you forgot about, anything. Stop and investigate before you serve traffic. Trust the pipeline, but verify the hash.

Putting it together: review this migration PR

Section titled “Putting it together: review this migration PR”

Now run the checklist for real. This is the skill the whole lesson was building toward, and the only way to know it stuck is to do it: read a migration diff and catch what’s unsafe before it ships.

This PR adds a status column and an index, renames customer_name, and re-adds archived_at. The schema diff got a thumbs-up, since it’s four small lines. Now read the generated SQL the way you’ve practiced. One fact you have that the generator doesn’t: invoices is live in production, with two million rows and constant write traffic. Click each line you’d block in review and say why. At least one line is genuinely safe, so don’t flag it just to be safe.

You're reviewing this generated migration for a teammate. `invoices` is a live, two-million-row table under constant write traffic. Click any line to leave a review comment, then press Submit review.

drizzle/0007_add_invoices_status.sql
ALTER TABLE "invoices" ADD COLUMN "archived_at" timestamptz;
--> statement-breakpoint
ALTER TABLE "invoices" ADD COLUMN "status" text NOT NULL;
--> statement-breakpoint
ALTER TABLE "invoices" RENAME COLUMN "customer_name" TO "client_name";
--> statement-breakpoint
CREATE INDEX "idx_invoices_status" ON "invoices" ("status");

Once the checklist runs automatically, every migration PR gets a second read, once for data loss and once for locks. The one idea to carry out of this lesson is the one it opened with: the generated SQL is correct, but correct is not the same as safe under load. You’re the part of the pipeline that knows the difference.

The authoritative sources behind this lesson, worth a read when you hit any of these cases for real.