Chapter 88Lesson 1

Transaction rollback against real Postgres

Integration tests that run real Drizzle queries against a real Postgres, each wrapped in a transaction that always rolls back so tests stay isolated and fast.

In the previous chapter you tested /lib with everything mocked, because /lib is pure: same input, same output, no database, no network. That was the right call there. The instinct it builds, mock the collaborator and assert the call, is the wrong call here, and carrying it across the line ships a whole class of bugs that pass green.

This chapter tests the seams: the Server Action that parses input and writes a row, the route handler that verifies a webhook and dedupes it, the query that leans on a column you forgot was NOT NULL. Most production regressions live at the seams, and they live there precisely because a mock at the boundary can’t see them. So we stop mocking the database and run the real query against a real Postgres, wrapped in a transaction that always rolls back, so the second test never sees the first test’s rows.

That is the whole lesson: real database, production schema, every test in a transaction that rolls back. Everything else is the discipline that keeps it affordable. We’ll start with why the mock fails.

Why a mocked database proves nothing at the seam

Here’s a Server Action you’ll recognize. It validates input and inserts an invoice.

export const createInvoice = async (input: CreateInvoiceInput) => {
  const [invoice] = await db.insert(invoices).values(input).returning();
  return ok(invoice);
};

The chapter-087 reflex is to mock the db client and assert the call shape.

Mocked db (proves nothing)
Real db (proves behavior)

vi.mock('@/db/client');

it('inserts an invoice', async () => {
  const input = buildInvoiceInput();
  await createInvoice(input);

  expect(vi.mocked(db).insert).toHaveBeenCalledWith(invoices);
});

This passes against a fictional database. It asserts the exact arguments the test itself just supplied, so it can never fail for a real reason. It stays green forever.

it('inserts an invoice', withRollback(async ({ tx }) => {
  const result = await createInvoice(buildInvoiceInput(), { db: tx });

  const [saved] = await tx
    .select()
    .from(invoices)
    .where(eq(invoices.id, result.data.id));
  expect(saved.status).toBe('draft');
}));

This passes only if Postgres actually accepts the row. The action runs the real SQL against a real test database, and the assertion reads the row back out. The schema, the constraints, and the defaults are all under test because they actually executed. You haven’t met withRollback or tx yet; we build both in a moment. For now read tx as “the database, but every write vanishes after this test.”

The mocked test is a tautology : you told the mock what to expect, then checked that it expected it. A green result here carries no information.

Consider what the mock hides. None of it exists in a fake db:

Column nullability: the insert silently omits a NOT NULL column.
Unique constraints: the (org_id, slug) unique that the data layer carries for tenant safety.
Default and generated values: the UUIDv7 id and the created_at timestamp Postgres fills in.
The actual SQL Drizzle emits: the query that either is or isn’t valid against this schema.
onDelete cascade behavior: what happens to invoice lines when the invoice goes.

Each of those is a real regression class. Walk through one. Say a migration adds a NOT NULL currency column, and createInvoice doesn’t set it. Under the mock, the insert is just a function call with whatever arguments, so it stays green. Against a real Postgres, the insert is rejected with error code 23502 , and the test goes red the instant the schema and the code disagree. That is the whole difference: the mock tests the shape of a call you wrote, while the real database tests whether your code and your schema still agree.

The fix is what you’d expect: run the test against a real test Postgres with the production schema. The SQL runs, the constraints fire, and the defaults fill in, all under test because they all actually execute.

That costs something, and it’s worth naming now so it isn’t a surprise later. A mocked unit test takes microseconds, since it’s no I/O, just function calls. A real insert is a real connection and a real round-trip, so it takes milliseconds rather than microseconds. The rest of this lesson is the discipline that keeps that price down to tens of milliseconds instead of hundreds.

Where the unit/integration line actually falls

You now have two layers and need a rule for which one a test belongs to. The split comes down to what the test would stop catching if you mocked the collaborator.

Unit tests own /lib’s pure logic: same input, same output, no I/O, no mocks needed. Integration tests own anything that crosses a process boundary the runtime cares about: Drizzle reaching Postgres, a Server Action running through its full wrapper, a route handler including request and response serialization, a webhook receiver including signature verification.

Here’s the litmus test, and it’s worth committing to memory:

The boundary can be subtle inside a single feature. A Zod schema is a unit test, because it’s pure validation with no database in sight. But the Server Action that calls that schema and then writes the parsed result is an integration test, because the write is the part that breaks when the schema and the table drift apart. Same feature, two layers, the line drawn at the I/O.

Apply the rule yourself before moving on.

Apply the litmus test: would mocking the collaborator stop the test from catching a column-rename, schema-drift, or constraint violation? If yes, it's an integration test. Drag each item into the bucket it belongs to, then press Check.

Unit test /lib, pure, mock-free

Integration test Real DB or full wrapper

formatInvoiceTotal — sums line items into a Money value

mapDatabaseError — turns a Postgres error code into an ErrorCode

the createInvoiceSchema Zod validator

createInvoice — parses input, then inserts a row

listInvoices — queries invoices scoped to an orgId

the Stripe webhook POST handler

How one test stays isolated: open, run, roll back

This is the conceptual core, and the place first-timers stumble, so slow down here.

Start with the problem. You have hundreds of integration tests, all hitting one database, each one writing rows. With no isolation, test B reads the invoice test A inserted and breaks; flip the order and it passes again. The suite becomes order-dependent, which is the worst kind of broken, because it depends on which file the runner happened to schedule first. Every test needs to start from the same clean state and leave nothing behind.

The obvious move is to clean up after each test: TRUNCATE every table and re-insert the baseline rows. It works, and it’s simple. It’s also slow, since a TRUNCATE plus a reseed is a full disk round-trip, roughly 500 ms per test. Multiply by a few hundred tests and your suite measures in minutes. Keep this option in mind as the baseline the better one has to beat.

The better approach inverts it. Instead of writing rows and cleaning them up, wrap the entire test in a transaction and never commit it:

await db.transaction(async (tx) => {
  // run the whole test body against tx
  // ...and force a rollback at the end
});

Run the test body against tx, and at the end force a rollback. Postgres discards every write the transaction made, in sub-millisecond time, with no truncate and no reseed. The baseline seed (one org, one admin user, wired up in the next lesson) lives outside the transaction, so every test sees it and no rollback ever touches it.

Now the non-obvious part: how you force the rollback. You need one fact about Drizzle transactions here, and the rest of the lesson rests on it: a transaction commits when its callback resolves, and rolls back when its callback throws. Throwing is the rollback primitive. So after the body runs, the wrapper deliberately throws a private sentinel error . The throw forces the rollback, and the wrapper then catches only that sentinel and swallows it, so the test still reports green. Any real error from the body isn’t the sentinel, so it propagates normally and fails the test as it should.

Watch the row appear and vanish:

BEGIN open tx

RUN BODY insert + assert

THROW RollbackSignal

ROLLBACK discard writes

SWALLOW catch sentinel

DB rows visible inside tx

baseline seed 1 invoice — uncommitted

BEGIN: the transaction opens and tx is live. Nothing has been written yet.

BEGIN open tx

RUN BODY insert + assert

THROW RollbackSignal

ROLLBACK discard writes

SWALLOW catch sentinel

DB rows visible inside tx

baseline seed 1 invoice — uncommitted

Run the test body against tx. The action inserts an invoice, and the assertions read it back through tx and pass. The row is visible, but only inside this transaction.

BEGIN open tx

RUN BODY insert + assert

THROW RollbackSignal

ROLLBACK discard writes

SWALLOW catch sentinel

DB rows visible inside tx

baseline seed 1 invoice — uncommitted

The wrapper throws its private RollbackSignal after the body finishes. The throw is the rollback primitive.

BEGIN open tx

RUN BODY insert + assert

THROW RollbackSignal

ROLLBACK discard writes

SWALLOW catch sentinel

DB rows visible inside tx

baseline seed 1 invoice — discarded

Postgres discards every write the transaction made. The invoice is gone, in sub-millisecond time, with no truncate and no reseed.

BEGIN open tx

RUN BODY insert + assert

THROW RollbackSignal

ROLLBACK discard writes

SWALLOW catch sentinel

DB rows visible inside tx

baseline seed 1 invoice — discarded

The wrapper catches this sentinel and stops it there, so the test reports green. Any other error would have propagated and failed the test.

The vanishing row is the point. The invoice was real enough to insert and to read back, and then it never existed. That is what “isolation comes from the rollback” means: not cleanup code that runs afterward, but a write that was never allowed to land.

Cost is why rollback wins over truncate-and-reseed. The gap is two orders of magnitude:

/lib unit test ~5 ms

Integration, rollback per test ~20–80 ms

Truncate and reseed ~500 ms

Per-test cost on a log scale. Rollback is two orders of magnitude cheaper than truncate-and-reseed, which is the whole reason to prefer it.

That middle bar, tens of milliseconds, is the price of catching everything the mock couldn’t, and it’s the right price at the seam.

The `withRollback` helper

You have the mechanism in your head; here is what it looks like in code. The whole thing lives in src/test/db/with-rollback.ts. It takes a test body that receives { tx } and returns the function Vitest’s it will actually run.

class RollbackSignal extends Error {}

export const withRollback = (body: (ctx: { tx: DbOrTx }) => Promise<void>) => {
  return async () => {
    try {
      await db.transaction(async (tx) => {
        await body({ tx });
        throw new RollbackSignal();
      });
    } catch (err) {
      // rethrow anything that isn't our rollback signal
      if (!(err instanceof RollbackSignal)) throw err;
    }
  };
};

The wrapper takes a body that receives { tx } and returns the zero-argument async function it runs. That returned function is what executes per test.

class RollbackSignal extends Error {}

export const withRollback = (body: (ctx: { tx: DbOrTx }) => Promise<void>) => {
  return async () => {
    try {
      await db.transaction(async (tx) => {
        await body({ tx });
        throw new RollbackSignal();
      });
    } catch (err) {
      // rethrow anything that isn't our rollback signal
      if (!(err instanceof RollbackSignal)) throw err;
    }
  };
};

Open a transaction, run the body against tx, then throw the sentinel. The throw is what forces Drizzle to roll back, so the body’s writes never commit.

class RollbackSignal extends Error {}

export const withRollback = (body: (ctx: { tx: DbOrTx }) => Promise<void>) => {
  return async () => {
    try {
      await db.transaction(async (tx) => {
        await body({ tx });
        throw new RollbackSignal();
      });
    } catch (err) {
      // rethrow anything that isn't our rollback signal
      if (!(err instanceof RollbackSignal)) throw err;
    }
  };
};

This is the load-bearing line. Swallow only the sentinel and rethrow everything else. A catch-all here would hide every real failure and make the suite lie, so the instanceof check is what keeps genuine errors failing the test.

1 / 1

At the call site, it reads cleanly.

it('creates an invoice', withRollback(async ({ tx }) => {
  // ...use tx as the database; every write rolls back
}));

Two details earn their place. First, RollbackSignal is a module-private class, so nothing outside this file can throw it or catch it, and it can never be confused with a real error the body might throw. Second, that catch must rethrow everything that isn’t the sentinel. A swallow-all catch would turn every genuine failure green, which is the one bug a test helper must never have: a test that lies. That single instanceof guard is the difference between a helper that isolates and a helper that hides.

This helper assumes a db whose .transaction opens against the per-worker test connection. Wiring that connection, the migrated and seeded database each worker gets, is the next lesson’s job; here, take it as given.

Threading `tx` through production code

There’s a catch in everything above, and it’s the second core idea of the lesson. The rollback only isolates the test if the production query actually runs on tx. If createInvoice reaches for the global db instead, its write commits for real, outside the transaction, where the rollback can’t reach it. So production code has to accept the database handle as an argument.

Here’s the pattern, and it’s a one-line change to the action you saw at the top of the lesson. Keep the real arguments, then add an options object with a defaulted db:

export const createInvoice = async (
  input: CreateInvoiceInput,
  { db = defaultDb }: { db?: DbOrTx } = {},
) => {
  const [invoice] = await db.insert(invoices).values(input).returning();
  return ok(invoice);
};

That second parameter is the entire change. The default is the singleton, so every production call site passes nothing and is completely unaffected. The test passes { db: tx }, which puts the write inside the transaction. It’s invisible in production and swappable in tests:

const result = await createInvoice(input);

The handle is invisible in production. No call site passes a db, so the default singleton fires every time. Adding the parameter changed nothing about how production calls the function.

it('creates an invoice', withRollback(async ({ tx }) => {
  const result = await createInvoice(input, { db: tx });

  const [saved] = await tx.select().from(invoices).where(eq(invoices.id, result.data.id));
  expect(saved).toBeDefined();
}));

The test passes tx. The same function and the same call, but now the write runs inside the test’s transaction, so it rolls back with everything else. The assertion reads back through the same tx.

await als.run(tx, () => createInvoice(input));

Ambient, zero signature change. AsyncLocalStorage carries tx through the call chain without touching the signature, but it pays for that with a context module, a .run() wrapper, and a getDb() indirection. Reach for it only when the signature is fixed and can’t take a handle, which is exactly the route-handler case in the next section.

That third tab names the alternative, because you’ll meet it and should know why it isn’t the default. AsyncLocalStorage can make tx ambient, so no signature changes at all. The price is added complexity: a context module to set up, a .run() wrapper around every entry point, and a getDb() indirection where you’d otherwise just read db. For a seam that’s already explicit and readable at the call site, that’s weight you don’t need. The course default is the explicit handle. AsyncLocalStorage earns its keep in exactly one situation: when the signature is fixed by the framework and physically can’t take a handle.

The route-handler escape hatch

That one situation is the route handler. A Next.js route handler’s signature is fixed: export async function POST(request: Request). There’s no second parameter to thread tx through, and you don’t control the call, Next.js does. The explicit-handle pattern has nowhere to go.

This is where AsyncLocalStorage earns its place. A module-scope AsyncLocalStorage<DbOrTx> (the DbOrTx type lands in the next section) lets the test wrap the handler call in a context that carries tx, while production reads its database through a small helper that falls back to the singleton when no context is set:

import { AsyncLocalStorage } from 'node:async_hooks';
import { db as defaultDb } from '@/db/client';
import type { DbOrTx } from '@/db/types';

export const testTxContext = new AsyncLocalStorage<DbOrTx>();

export const getDb = (): DbOrTx => testTxContext.getStore() ?? defaultDb;

The handler calls getDb() instead of importing db. In production nothing ever calls .run, so the store is empty and getDb() returns the singleton, with zero behavioral change. Under test, the test enters the handler inside testTxContext.run(tx, () => POST(request)), the store holds tx, and getDb() hands the handler the test’s transaction.

Hold the boundary firmly, because the whole value of AsyncLocalStorage here is that it stays contained.

You’ll put this to work in earnest when you test a webhook receiver later in this chapter. For now it’s enough to see the mechanism and the line around it.

What does not roll back

The rollback is powerful, and the risk is assuming it’s total. It isn’t. One principle keeps you from being surprised: the rollback undoes rows, and nothing else. Anything that isn’t transactional Postgres state escapes it. This is the most damaging silent-failure class in the chapter, so let’s be concrete about the consequences.

Sequences advance and stay advanced. A rolled-back insert still consumed its sequence value, and Postgres doesn’t hand the number back. So never assert on an exact auto-incrementing ID. Assert on shape instead (expect.stringMatching(/^inv_/), expect.any(String)), never id === 42. This is the single most common fragile-test bug in the chapter: an ID that’s correct today is off-by-one the moment another test runs first.
pg_notify and triggers fire before the rollback. A pg_notify message already went out, and a trigger already ran. Don’t assert on the notify directly; assert through whatever observable state it was supposed to produce.
External side effects aren’t transactional at all. A fetch, a queue enqueue, an email send: none of these live inside the Postgres transaction, so the rollback can’t touch them. These get isolated by a different tool (MSW, later in this chapter, or in-memory stubs), and the test asserts on intent, such as “the handler was called with body X”, not on a real side effect.

The unifying idea is that the transaction is a clean, cheap reset for your Postgres rows, and nothing more. For everything else, isolation is a separate problem with a separate tool, which is exactly why this chapter spends three more lessons on the network boundary.

Sort these to make the line tangible:

Sort each effect by whether the test's transaction rollback undoes it. Remember the principle: the rollback undoes rows, and nothing else. Drag each item into the bucket it belongs to, then press Check.

Rolled back automatically Transactional Postgres state

Survives the rollback Needs other isolation

An inserted invoices row

An UPDATE to an existing row

A nextval consumed by a serial column

A pg_notify to a listening channel

A Stripe API call the action made

An email enqueued to the send queue

Two type guards against the silent-commit bug

There’s one failure mode left, and it’s the worst because it’s quiet. Inside withRollback, suppose a production function reaches for the imported singleton db instead of the tx it was handed. That query runs outside the test’s transaction, so its writes commit for real. The rollback undoes nothing of theirs, and the test might still pass, because its assertions go through tx, which can’t even see the committed row. So the row leaks into the database and into the next test, and the test that should have caught it is green. This is the same class of failure as mocking the database: a test that looks like an integration test and isn’t.

You can’t catch this by being careful, because careful is exactly what fails under deadline. You catch it structurally, with two guards that make the bug impossible to ship quietly.

Guard one is the DbOrTx type. Define the transaction type once and union it with the singleton’s type:

import type { PgTransaction } from 'drizzle-orm/pg-core';
import type { NodePgQueryResultHKT } from 'drizzle-orm/node-postgres';
import type { ExtractTablesWithRelations } from 'drizzle-orm';
import { db } from '@/db/client';
import * as schema from '@/db/schema';

type Transaction = PgTransaction<
  NodePgQueryResultHKT,
  typeof schema,
  ExtractTablesWithRelations<typeof schema>
>;

export type DbOrTx = typeof db | Transaction;

Every query helper and every threaded function takes DbOrTx, not the concrete db type. Both the singleton and a live transaction satisfy the union, so production and tests both type-check, which is deliberate. What the union buys you is intent: it says “this function might run inside a transaction” out loud, making “called on tx” a first-class, expected shape instead of a happy coincidence. It’s also the type the lint rule keys off.

One detail worth knowing: the database side uses typeof db rather than re-deriving NodePgDatabase’s generics. It’s cleaner, and it sidesteps a known incompatibility where a bare PgTransaction won’t assign to a NodePgDatabase parameter. (Drizzle moves quickly here. Some recent releases rename PgTransaction to PgAsyncTransaction, so check the import against your pinned version before you write this.)

Guard two is a lint rule. A type union still lets a test import the singleton and call it with no handle, so forbid that import outright. ESLint’s no-restricted-imports blocks @/db/client from any *.int.test.ts file:

{
  "files": ["**/*.int.test.ts"],
  "rules": {
    "no-restricted-imports": ["error", {
      "paths": [{
        "name": "@/db/client",
        "message": "Integration tests must use the tx from withRollback, not the global db."
      }]
    }]
  }
}

Now an integration test physically cannot reach the global db; it has only the tx the wrapper handed it. Using the wrong handle stops being a silent runtime leak and becomes a red squiggle at author time. That’s the lesson’s thesis in tooling form: a test that passes despite not threading tx is the same bug as one that mocks the database, and both must be made loud.

You may have noticed the guard and the project glob both key off the same suffix. That .int.test.ts is the discriminator the whole tooling chain relies on: it’s what the integration Vitest project globs, what the lint rule targets, and what keeps a fast unit test from being accidentally promoted into the slow lane. invoice.test.ts is a unit test; invoice.int.test.ts is an integration test. Those two letters carry real weight.

The model, end to end

Step back and hold the whole shape in view. An integration test runs the real production query against a real test Postgres on the production schema, wrapped in a transaction that always rolls back, so isolation comes from the rollback rather than from cleanup. Production code accepts the database as an explicit handle, defaulted to the singleton, and the test passes tx where production passes db. Two guards, the DbOrTx type and the no-restricted-imports lint rule, make the silent-commit bug loud instead of letting it ship green.

The cost is 20 to 80 ms per test. That’s two orders of magnitude over a /lib unit test, and two orders under truncate-and-reseed. It buys you the column-rename, the constraint violation, and the schema drift a mock can never see, and it’s exactly the right price for the seam layer, where the bugs actually live. That’s the honeycomb argument from the start of this unit, now expressed in code.

One thing this lesson took on faith is that db.transaction has a real, migrated connection to open against. The next lesson wires the database this pattern assumes, one isolated database per worker, the migrations run once, the baseline seed inserted, so every piece you just built has real Postgres underneath it.

External resources

Drizzle — Transactions

orm.drizzle.team

The commit-on-resolve, roll-back-on-throw contract the whole pattern rests on.

Vitest — Test Context

vitest.dev

How a wrapper like withRollback hands per-test values to the test body.

Node.js — Asynchronous context tracking

nodejs.org

Official AsyncLocalStorage reference: run() and getStore(), the escape hatch for framework-fixed handler signatures.

From 4 Minutes to 3 Seconds

dev.to

A real suite trading truncate-and-reseed for rollback — the two-orders-of-magnitude speedup this lesson claims, measured.

Transaction rollback against real Postgres

Why a mocked database proves nothing at the seam

Where the unit/integration line actually falls

How one test stays isolated: open, run, roll back

The withRollback helper

Threading tx through production code

The route-handler escape hatch

What does not roll back

Two type guards against the silent-commit bug

The model, end to end

External resources

The `withRollback` helper

Threading `tx` through production code