Chapter 37Lesson 5

Primary keys: UUIDv7 and identity bigint

Choosing the right primary key strategy for each Drizzle table, time-sortable UUIDv7, identity bigint, or natural key.

Since your very first table, every row has carried id: ...primaryKey() with the value left blank, a placeholder this chapter promised to come back and finish. This is where you finish it, and the choice you make here is the most permanent one in the whole schema. A primary key is referenced by every foreign key that points at the row, it is baked into every URL and every API response that names the row, and it becomes effectively un-renameable the moment real data exists. Get a column type wrong and you change it next week; get a primary key wrong and you live with it.

This lesson is about a decision, not about syntax. The mechanics are three small shapes you already half-know. The harder part is the question an experienced engineer asks before writing any of them: which primary-key strategy does this table want, and what does choosing wrong cost later? By the end you will answer that for each table and write the Drizzle shape it calls for. Here is one concrete fork to hold onto: would you rather give customers a URL that reads /invoices/47, or one that reads /invoices/0193b5a1-…? Notice which one bothers you, and why. The whole lesson turns on that choice.

What a primary key buys you

You have written .primaryKey() since the start of the chapter without looking closely at it. Before we choose what fills it, pin down what it actually does, because it gives you more than it appears to.

A primary key is the column whose value uniquely names a row in the table. It is the address other tables point at: every foreign key in your schema resolves to exactly one primary-key value, which is one row. That is its job, and .primaryKey() does that job by bundling three constraints into one call:

id: uuid().primaryKey(),

That single method gives you three things for free. The column is implicitly NOT NULL, because a row with no name has no identity, so a null value is impossible. It is implicitly UNIQUE, because two rows sharing one address would be a contradiction. And Postgres builds an index on it, because the address has to be fast to look up: every join uses it. That index is a B-tree , and the fact that it is sorted will matter a great deal two sections from now.

One practical rule falls out of this, and it is worth stating outright because it shows up in code review constantly:

So the column’s guarantees are settled before we start. Everything from here is about choosing what value fills a column whose contract is already fixed.

Surrogate or natural: the first fork

An experienced engineer asks two questions to settle a primary key, and asks them in a fixed order. Get the order right and the answer usually falls out on its own. The first question is the universal one, the one that applies to every table before you think about anything else:

Does the database mint this value, or does the outside world already own it?

That splits every key into two kinds. A surrogate key is a value the database makes up. It carries no meaning: it is not the email, not the name, not the slug, just an opaque identifier whose only job is to be unique. A natural key is the opposite, a value that already identifies the thing out in the world: an email address, a URL slug, a country code, a currency code, an ISBN.

The natural key is tempting because it feels like less work. You already have an email, so why mint a second identifier? The reason is the rule to keep:

Walk through what that means. Suppose email is the primary key of your users table, and a hundred other rows across the schema (invoices, sessions, audit entries) point at users by their email. A user updates their email, a one-line edit they expect to be routine. But that email is the address those hundred rows hold. To keep the data consistent, every one of those rows has to be rewritten to the new value, in lockstep, or your foreign keys point at nothing. A routine profile edit has become a schema-wide cascade. The same problem waits on a slug that gets renamed, or a “permanent” external code that gets reissued. Anything the domain owns can change, and a changing primary key causes trouble every time.

So the default is blunt: use a surrogate everywhere a value is user-facing. The natural-key exception is real but narrow, reserved for genuinely immutable, externally defined identifiers. ISO country codes ('US'), currency codes ('USD'), and ISBNs qualify, because they are fixed by an authority outside your app and will never be reissued under you. Even then, use a natural key only when you are certain you will never want to rename the value or wrap it in a display layer.

One heuristic settles the call every time. Before you make any value a primary key, ask:

Would I be comfortable if this value changed, and every row pointing at it had to change too?

If the answer is anything short of a confident yes, use a surrogate. A column type is a constraint you can tighten later, but a primary key is a promise you can’t take back.

slug: text().primaryKey(),                     // bad — a slug gets renamed, and the rename cascades
id: uuid().primaryKey(),                        // good — surrogate id, with a unique on slug instead

One mistake is worth naming now, so you don’t reach for it later in this very lesson. In a few minutes you will meet a way to make a primary key span two columns. It is tempting to read that and decide an entity like a page should be keyed by (orgId, slug): a page is unique within its org, so why not make that pair the key? Resist it. A composite key on a first-class entity is a tenancy-modeling mistake. Give the entity a surrogate id and add a unique(orgId, slug) constraint instead, the pattern the lesson on unique and check constraints will cover. Composite primary keys have exactly one good home, and we will get there.

Why UUIDv4 is a trap and v7 isn’t

Once you have settled that user-facing entities want a surrogate, one question remains: which surrogate? The two candidates are a UUID and a bigint integer. Most of this lesson’s intuition lives on the UUID side, specifically in why the obvious UUID is a trap and a newer one isn’t.

This is the one place you should not take a rule on faith. Build the picture instead, because the difference between the two UUIDs stays invisible right up until it causes a serious problem, and the only defense is understanding why.

A UUID is a 16-byte value, written as those familiar 32 hex digits. There are two versions you care about, and at a glance they look identical: same length, same hex, same - grouping. The difference is not in how they look. It is in where a new one lands in the index. Remember that the primary-key index is a sorted B-tree.

UUIDv4 is fully random. Every byte is noise, so every new id you generate is a random number, and a random number lands at a random spot in a sorted tree. Insert a thousand rows and you have scattered a thousand writes across the whole breadth of the index, each one potentially splitting a page to wedge the new value into its sorted position. On a small table you will never notice: with a few hundred rows the whole index fits in memory, so there is no harm done. The problem is that this cost is write amplification that compounds with table size. The bigger the table, the more a random insert thrashes. It is free in development and brutal in a populated production table, which means it is expensive at the exact moment you can least afford to change it.

UUIDv7 fixes this by making the value time-sortable. It prefixes the 16 bytes with a 48-bit millisecond timestamp, then fills the rest with randomness. That randomness still gives you the global uniqueness and the non-guessability of a UUID. But the timestamp prefix means a value minted now sorts after every value minted before it. New ids no longer scatter: they all land at the tail of the index, appended in order, exactly the way an auto-incrementing integer would. You keep everything good about a UUID and recover sequence-like insert performance. The figure below makes the difference concrete, and the whole point is in the example values printed beneath each strip, so read those closely.

UUIDv4

random 16 bytes

f47ac10b -58cc-4e7a-… 9c3e6d22 -58cc-4e7a-… 2b6f04a1 -58cc-4e7a-… Leading bytes are random → every insert lands at a random spot → writes scatter across the whole index.

UUIDv7

timestamp (ms) ~6 bytes

random ~10 bytes

0193b5a1 -58cc-7e7a-… 0193b5a2 -58cc-7e7a-… 0193b5a4 -58cc-7e7a-… Shared timestamp prefix → new values sort after old ones → inserts append at the tail.

Same 16 bytes, same hex. The only difference is where the timestamp lives, and so where new inserts land.

Two facts ground this as real and current rather than a clever hack. UUIDv7 was standardized by RFC 9562 in May 2024. And Postgres 18, which you put your project on in the previous chapter via Neon, ships a native uuidv7() function: there is no extension to install, it is just there. That native function is what makes the shape in the next section so clean.

One last name to recognize and then move past. Before the standard landed, libraries like ULID and KSUID solved this exact problem of time-sortable, globally unique IDs, and you will still see them in older codebases. They are the pre-standard ancestors of UUIDv7, so you do not need to learn them. In 2026, on Postgres 18, uuidv7() is the answer.

The three canonical shapes

Now for the mechanics, which come down to exactly three shapes this course ships. Each tab below is one strategy, paired with the trigger that should make you reach for it. Read them as “when do I write this,” not “how do I write this,” because the how is one line in every case.

id: uuid().primaryKey().default(sql`uuidv7()`),

The default for any entity whose id is seen outside the database. If the id appears in a URL, an API response, or a client payload, use this. The default fires SQL-side through default(sql\uuidv7()`)rather than in app code, so the id is minted by Postgres itself. That means it also fires for migrations, seed scripts, and a rawpsqlinsert, exactly as the previous lesson argued SQL-side defaults should.sqlis the tagged template you met in the generated-columns work, imported fromdrizzle-orm`.

id: bigint({ mode: 'number' }).primaryKey().generatedAlwaysAsIdentity(),

The choice for high-volume internal tables. An auto-incrementing 8-byte integer, minted by Postgres from a sequence. Reach for it only when all of these hold: the table is high-volume and internal (an event log, analytics rows, a junction nobody fetches by id), the id never crosses a system boundary, and sharding is not on the roadmap. A bigint beats a UUID here for two reasons. First, 8 bytes instead of 16 gives a tighter index. Second, no outsider ever enumerates these rows, so the leak and locality arguments for UUIDs do not apply. mode: 'number' is required: it maps the column to a JS number, which is safe up to 2⁵³.

code: text().primaryKey(), // ISO-3166, e.g. 'US' — immutable, externally defined

The narrow exception. A real domain value carrying .primaryKey() directly, used only for immutable external identifiers. Notice the kind of table this belongs to: a small, static reference table (countries, currencies) whose rows are owned by some outside authority and will never be renamed. It is never a user-facing entity. If you find yourself reaching for this on anything a user creates or edits, use a UUID instead.

Two of those three are the same decision viewed from different sides. The choice between UUIDv7 and identity bigint is the second question: is the id exposed to the world, or internal only? The natural key is the escape hatch from the first question, the rare case where the outside world already owns an immutable value worth keying on.

Two footnotes, so the shapes don’t trip you later. The first is about the UUID. If your team is not on Postgres 18 and has no native uuidv7() to call, the portable fallback is to mint the value in app code: id: uuid().primaryKey().$defaultFn(() => uuidv7()), with uuidv7 from the npm package of the same name. The logical result is the same, but it runs app-side only, so it will not fire for a raw psql insert or a seed that bypasses Drizzle. That is exactly why the native SQL-side version is this course’s default. The second is about the bigint. generatedAlwaysAsIdentity() is the modern SQL-standard identity column . The legacy bigserial still works, but generatedAlwaysAsIdentity is the shape to write in new code.

The decision tree

You have the three shapes in hand. Now comes the part that actually transfers: the order an experienced engineer asks the questions in. The shapes are easy to look up, but the skill is knowing which question to ask first and what each answer rules out. Walk the tree below one click at a time. Commit to an answer at each step before you read the next, so you feel the questions narrow.

Which primary key does this table want?

That last leaf is deliberate, and it carries the main point of the whole tree. UUIDv7 is the default you land on unless you have a specific, justified reason to step off it. bigint identity is an opt-in for a case you can defend, not a coin flip you make per table. When you genuinely can’t decide, the tree decides for you, and it decides UUIDv7.

Composite primary keys, for junction tables only

One mechanic is left, the one you were told to resist reaching for earlier. A primary key can span more than one column. You will need it later in this chapter, for the table that links invoices to tags, so meet the syntax now, along with the boundary around it.

You declare a composite key in the third argument of pgTable, the callback that returns table-level constraints:

export const invoiceTags = pgTable('invoice_tags', {
  invoiceId: uuid().notNull(),
  tagId: uuid().notNull(),
}, (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })]);

Here primaryKey is the standalone import from drizzle-orm/pg-core, the table-level helper, distinct from the .primaryKey() method you chain on a single column.

primaryKey({ columns: [...] }) declares a key across the pair: the combination of invoiceId and tagId is what is unique and indexed, with the same three free guarantees as a single-column key. A given invoice can have many tags and a given tag can sit on many invoices, but each specific pairing appears at most once. That is exactly the identity a join row has.

Now the boundary, which is a firm rule:

The difference is whether the row has an identity of its own. A junction row exists only to connect two things, so “this invoice, that tag” is the whole of what it is. An entity like an invoice or an organization is a thing in its own right, and such a thing deserves its own opaque, stable id. The full two-foreign-key junction shape, and the moment you should promote a junction into a real entity that carries its own metadata, are covered in the lesson on many-to-many junction tables later in this chapter.

Practice: pick the key for each table

The decision is the heart of the lesson, so the practice drills the decision before it drills the syntax. There are two passes: first sort tables into strategies, then write the shapes.

Start with the quick one. Sort each table’s id column into the primary-key strategy it should use. Lean on the two questions in order: is the id exposed, and if not, is this a high-volume internal table?

Sort each table's id column into the primary-key strategy it should use. Drag each item into the bucket it belongs to, then press Check.

UUIDv7 Exposed, ours to mint

identity bigint Internal, high-volume

natural key Immutable, externally owned

An invoices row shown at /invoices/:id

A users row returned in an API response

An organizations row whose id is in every client payload

An append-only audit_logs row nobody fetches by id

A high-volume analytics_events row, internal-only

An ISO-3166 country code ('US')

An ISO-4217 currency code ('USD')

A US state abbreviation ('CA')

Now write the shapes. The starter has three tables whose id (or code) columns are stubbed. Finalize each with the right shape from the variants above: organizations is user-facing, auditLogs is internal and high-volume, and countries is a reference table the outside world owns. The sandbox passes explicit snake_case column-name strings, because there is no casing client in the editor, so the names are written out. This is the same convention the earlier exercises in this chapter used.

The checker reads the shape of each column: its type, its .primaryKey(), and whether it carries a database-side default. That is enough to catch the two real mistakes here: leaving an id without a generated default, or keying audit_logs on the wrong integer type. It can’t, however, see which default expression you wrote, because uuidv7() and the v4 gen_random_uuid() look identical to it. So treat the prose above as the grader for that part: organizations.id wants uuidv7(), never .defaultRandom(). The mini-Postgres running here predates the native uuidv7() function, so once your shape is right you’ll see a green checklist alongside a “function uuidv7() does not exist” note. That note comes from the sandbox, not your schema, and on Neon’s Postgres 18 it just works. Reveal the answer below to check the exact shapes.

Finalize each id with the primary-key strategy that table wants. organizations is user-facing — its id rides in every URL and API response. auditLogs is internal and high-volume — nobody ever fetches a row by its id. countries.code is an ISO-3166 value the outside world owns and never reissues. Write the canonical Drizzle shape for each; the column names are spelled out in snake_case because there's no casing client in scope.

Reveal the answer

export const organizations = pgTable('organizations', {
  // Exposed, ours to mint → UUIDv7. The SQL-side default fires for migrations,
  // seeds, and psql too — not just inserts that go through Drizzle.
  id: uuid('id').primaryKey().default(sql`uuidv7()`),
  name: text('name').notNull(),
});

export const auditLogs = pgTable('audit_logs', {
  // Internal, high-volume, never enumerated by an outsider → identity bigint.
  // 8 bytes, a tighter index, and the id never crosses a system boundary.
  id: bigint('id', { mode: 'number' }).primaryKey().generatedAlwaysAsIdentity(),
  action: text('action').notNull(),
});

export const countries = pgTable('countries', {
  // Immutable, externally owned → the natural key is correct as-is. Don't
  // "fix" it into a surrogate; the ISO code already names the row, forever.
  code: text('code').primaryKey(),
  name: text('name').notNull(),
});

organizations takes the day-one default, a UUIDv7 surrogate minted SQL-side. auditLogs earns the identity bigint because all three conditions hold: it is internal, it is high-volume, and no id ever leaves the database. countries was already right, since a small reference table the world owns is the one place a natural key belongs.

Where to read more

The two-version UUID story is recent enough that a current, primary source is worth a look. The links below cover the native Postgres 18 support and the locality argument in more depth than this lesson needs to.

PostgreSQL 18 UUIDv7 support

neon.com

Neon's explainer on the native uuidv7() function, with the v4-scatter-vs-v7-locality argument in depth.

PostgreSQL 18 release announcement

postgresql.org

The primary source: the official release note introducing native uuidv7() generation.

UUIDv7 generator & decoder

authgear.com

Mint a UUIDv7 and watch the tool pull the timestamp back out — the byte layout figure, made interactive.

RFC 9562 — UUID specification

datatracker.ietf.org

The 2024 IETF standard itself; Section 5.7 is the formal UUIDv7 layout this lesson summarized.

You have now closed the loop the chapter opened in its second lesson: every id column has a shape, and you can defend each one. From here the schema starts connecting tables to each other. The next lesson is foreign keys, where you write the .references() that points one row’s column at another row’s primary key, and decide what happens to the children when the parent is deleted.