Chapter 37Lesson 8

Many-to-many junction tables

Model many-to-many relationships in Drizzle with junction tables, and learn when a link grows into a first-class entity.

You have the invoices table and the tags table from the last lesson, with each tag carrying a slug that is unique across the table. Now wire them together the way the domain actually works: an invoice carries many tags (urgent, paid, q3), and each of those tags applies to many invoices. The column builders you’ve been chaining can’t express that “many on both sides” shape, because a column holds one value and this relationship has two open ends. This leaves you with a question of placement. An invoice can’t store a list of tag ids inside itself, and a tag can’t store a list of invoice ids inside itself, so where does the relationship live? The answer is a third table whose only job is to record the pairings. The harder half of the lesson is knowing when that small linking table should stay a plain link and when it has become a real entity in your domain, one that deserves a name of its own.

A relationship a column can’t hold

Before reaching for the answer, it helps to see exactly why the two obvious attempts fail. Both are things you’d genuinely try, and watching them break is what motivates the shape that works.

The first instinct is to put a tagId column on invoices. You already know that shape, though: it’s the one-to-many you built in the foreign-key lesson, and it reads “this invoice has one tag.” A column is a single slot, so it can hold one tag id, not a list of them. You could add tagId2 and tagId3, but that’s an arbitrary cap pretending to be a schema, and the moment an invoice needs a fourth tag it falls apart. The reverse, an invoiceId column on tags, has the mirror-image problem: it says a tag belongs to one invoice, which is even more wrong. A single foreign-key column expresses one-to-many, and many-to-many is precisely the shape it can’t reach.

The second instinct is sharper, and you saw it coming a few lessons back: store the tags as an array column, text().array(), right on the invoice. This looks like it works: one invoice, a list of tag slugs, done. But it runs into the exact trade-off the data-types lesson flagged when it introduced arrays. The strings in that array aren’t real database rows. They can’t carry a foreign key, so Postgres won’t enforce that each one points at a tag that actually exists. Delete the urgent tag, and every invoice that listed it is now holding a dangling string the database knows nothing about. You also can’t efficiently ask the question you’ll ask constantly, “which invoices have this tag?”, because there’s no row, no key, nothing to index or join against; you’d be scanning every invoice and pattern-matching inside its array. The data-types lesson named the moment arrays stop being enough: when the elements need their own attributes or need to participate in joins. That moment has arrived. An invoice’s tags need referential integrity, and they need to be queried from both directions, so the array has outgrown its job.

Both attempts get the same thing wrong: they try to cram the relationship into one of the two tables, and the relationship belongs to neither of them. It belongs between them, so you give it its own table. Each row in that table records exactly one pairing, this invoice linked to this tag. A row, unlike an array element, is a first-class citizen of the database: it can hold foreign keys, it can be indexed, and it can be joined. This is the junction table , the one correct shape for many-to-many.

The picture is worth studying before the code, because the structure is the whole insight. In the diagram below, the junction table invoice_tags sits in the middle, holding nothing but two foreign keys: one pointing back at invoices, one pointing back at tags. Trace the two arrows, and you’ll see each parent has a plain one-to-many relationship into the junction. That’s the idea to hold onto: a many-to-many relationship isn’t a new, exotic kind of link you have to learn. It’s two of the one-to-many relationships you already know, aimed inward at a table in the middle, and the junction is where they meet.

Many-to-many is two one-to-many relationships pointing inward at a junction table. Each row of invoice_tags is one link between one invoice and one tag, and the pair (invoice_id, tag_id) serves as its composite primary key.

The junction table: two foreign keys and a composite key

Here is the whole table. It is deliberately small: a pure junction is exactly two foreign keys and a primary key built from them, nothing more. Every line is a decision you’ve made before in this chapter, now assembled into the standard shape. Step through it.

import { pgTable, primaryKey, uuid } from 'drizzle-orm/pg-core';
import { invoices } from './invoices';
import { tags } from './tags';

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

The table is named invoice_tags, both parents’ names joined together. That naming rule is its own small decision, and the next section covers it.

import { pgTable, primaryKey, uuid } from 'drizzle-orm/pg-core';
import { invoices } from './invoices';
import { tags } from './tags';

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

The first foreign key. It points back at invoices, exactly the .references(() => invoices.id) shape from the foreign-key lesson. It’s .notNull() because a link with a missing endpoint isn’t a link.

import { pgTable, primaryKey, uuid } from 'drizzle-orm/pg-core';
import { invoices } from './invoices';
import { tags } from './tags';

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

The second foreign key, pointing at tags. A pure junction is exactly these two foreign-key columns and nothing else, which is what makes it “pure.” Both are green because they’re the same kind of decision: an endpoint of the link.

import { pgTable, primaryKey, uuid } from 'drizzle-orm/pg-core';
import { invoices } from './invoices';
import { tags } from './tags';

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

Reuse the foreign-key lesson’s question: if the parent is gone, is the child garbage? For a junction the answer is yes on both sides. A link to a deleted invoice is garbage, and so is a link to a deleted tag. Delete either endpoint and the link rows go with it. This is the textbook cascade case.

import { pgTable, primaryKey, uuid } from 'drizzle-orm/pg-core';
import { invoices } from './invoices';
import { tags } from './tags';

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

The composite primary key over (invoiceId, tagId), the mechanic introduced back in the primary-key lesson and now put to use. This one line buys three things at once, listed right after this block.

1 / 1

That last line, the composite primary key , is the keystone, so it’s worth pulling apart. Declaring primaryKey({ columns: [t.invoiceId, t.tagId] }) in the table callback (the same (t) => [...] third argument you used for composite uniques and checks) makes the pair the table’s identity, and that single declaration earns you three separate guarantees:

The pair is unique. Postgres will reject a second row with the same (invoiceId, tagId). You cannot tag the same invoice twice with the same tag, not because your application remembers to check, but because the database refuses the duplicate. That’s the rule the last lesson made: enforce with a constraint, not with discipline.
Both columns are NOT NULL, for free. A primary key can’t contain a null, so the moment these two columns are the key, neither can ever be null. (The explicit .notNull() on each is redundant insurance; the composite primary key already implies it.)
An index on (invoiceId, tagId), for free. Every primary key is backed by an index, so lookups by invoice (“give me the tag links for this invoice”) are fast out of the box.

Three free guarantees from one line is the deal a primary key always gives you; this is just the composite version of it.

Now a decision that’s easy to skip but worth making on purpose: why a composite primary key here, and not a surrogate id like every other table in the schema? Because a pure junction has no identity worth minting. The pair is the identity. “The link between invoice X and tag Y” is already a complete, unique description of the row, and the composite primary key turns that description into the key directly, handing you uniqueness and an index in one stroke. Bolting a surrogate id onto a pure junction is a mild smell: it costs you an extra column and an extra index, and it still leaves you needing a separate unique(invoiceId, tagId) to stop duplicate pairs, so you’ve paid more for the same guarantee. The primary-key lesson said composite primary keys belong on junction tables and almost nowhere else; this is the junction’s side of that rule. There is exactly one reason to add the id anyway, and it’s the entire subject of the next section.

So step back and look at what the database now enforces, with no help from application code:

A duplicate (invoiceId, tagId) pair is rejected.
An invoiceId that points at no real invoice is rejected.
A tagId that points at no real tag is rejected.
Delete an invoice and its links vanish; delete a tag and its links vanish.

Every one of those is a rule that used to be something a code path had to remember. You’ve pushed all four down into the schema, where no stray writer can break them, not a migration, not a seed script, not a late-night psql session.

Naming the junction

The table name was a real decision back in that first step, not a footnote, so it deserves its own discussion. There are two cases, with two conventions.

For a pure junction, the convention is {parent1}_{parent2}, alphabetized: invoice_tags, not tag_invoices. Alphabetical order isn’t meaningful in itself; it’s just a tiebreaker that settles the argument, so every junction in the codebase reads the same way and nobody debates it. Use plural snake_case for the SQL name like every table, and let the exported const mirror it in camelCase: invoiceTags.

For a junction that has become a real entity, the subject of the next section, you drop the mashed-together name and give it the noun the business actually uses: memberships, not users_organizations; subscriptions, not customers_plans; enrollments, not students_courses. The name is a tell. If you’d naturally say the noun out loud in a product conversation, as in “add them to the membership,” it’s probably not a pure link anymore; it’s an entity.

// Pure junction → both parents, alphabetized
pgTable('invoice_tags', /* ... */);

// Entity the business names → the domain noun
pgTable('memberships', /* ... */);

When a junction becomes an entity

The mechanics above are routine once you’ve done the foreign-key and primary-key lessons, but the judgment in this section is the durable skill. It’s common to reach for a junction table and stop there. The move worth learning is recognizing the moment that small link has stopped being a link and become a first-class entity in your domain, because at that moment the shape has to change.

Two questions decide it. Ask them in order.

Does the relationship carry its own data? A role, a joinedAt, a quantity, a status: data that belongs to the link itself, not to either endpoint. A tag on an invoice carries nothing, so the pairing is the whole story. But a person’s place in an organization carries a role, owner, admin, or member, and that role isn’t a property of the person (they could be an admin of one org and a plain member of another), nor is it a property of the org. It’s a property of the membership. Once you have a column like that, the link has data of its own.
Would anything else need to point a foreign key at the relationship? This is the sharper test. Imagine an invitations table, where each invitation eventually becomes a specific membership and so needs to reference that membership row. Or an audit log that records “this membership’s role was changed.” The moment some other table needs a foreign key to the link, the link has to be a row that another row can address, and a pairing can’t play that part cleanly.

Answer “yes” to either and you promote it to an entity. Answer “no” to both and it stays a pure junction. Put another way: a pure junction is a link, and once the link has properties of its own or something needs to reference it, it’s an entity.

It’s worth watching what promotion does to the schema. The two tables below are the same relationship in both states: a pure junction on the left, the entity it becomes under pressure on the right. Read the diff. Four things change, and each one is forced by the two questions above.

Pure junction
Promoted to entity

export const invoiceTags = pgTable(
  'invoice_tags',
  {
    invoiceId: uuid()
      .notNull()
      .references(() => invoices.id, { onDelete: 'cascade' }),
    tagId: uuid()
      .notNull()
      .references(() => tags.id, { onDelete: 'cascade' }),
  },
  (t) => [primaryKey({ columns: [t.invoiceId, t.tagId] })],
);

A pure link. Two foreign keys and a composite primary key. The pair is the row’s identity, and the table holds no data of its own. There’s nothing here for another table to point a foreign key at, so it never needs a surrogate id.

export const memberRole = pgEnum('member_role', ['owner', 'admin', 'member']);

export const memberships = pgTable(
  'memberships',
  {
    id: uuid().primaryKey().default(sql`uuidv7()`),
    userId: uuid()
      .notNull()
      .references(() => users.id, { onDelete: 'cascade' }),
    organizationId: uuid()
      .notNull()
      .references(() => organizations.id, { onDelete: 'cascade' }),
    role: memberRole().notNull().default('member'),
    ...timestamps,
  },
  (t) => [
    primaryKey({ columns: [t.userId, t.organizationId] }),
    unique('memberships_user_org_unique').on(t.userId, t.organizationId),
  ],
);

A first-class entity. The same two foreign keys, but now there’s a surrogate id so other rows can point at a membership, a role the link carries, lifecycle ...timestamps, and the old composite primary key demoted to a named unique so a user still can’t join the same org twice.

Four changes, and none is arbitrary. Each one follows directly from answering “yes” to one of the two questions. Take them in the order that explains the why, not the order they appear in the file.

The surrogate id exists so other rows can point at this membership. This is the deep reason promotion adds an id, and it’s the second test from above made concrete. You can’t cleanly aim a foreign key at a composite key, so a relationship that needs to be referenced, by an invitation or by an audit row, needs a single-column identity to be referenced by. The pair (userId, organizationId) describes the membership perfectly well for a human, but a foreign key wants one column to hold onto. So the membership gets an id, the same uuid().primaryKey().default(sql\uuidv7()`)` shape from the primary-key lesson.

The composite key didn’t disappear; it was demoted. The primary-key slot now belongs to id, but the rule the composite key enforced, no user joins the same org twice, is still true and still matters. So it survives as a named unique('memberships_user_org_unique').on(t.userId, t.organizationId), the composite-unique shape from the last lesson. The uniqueness rule didn’t change at all; only which column owns the row’s identity changed. Same guarantee, different slot.

The role column is the data that made it an entity in the first place, the literal answer to question one. The timestamps are the subtler tell. A pure link doesn’t have a lifecycle; it just exists or it doesn’t. An entity is created at a moment and changes over time, which is exactly what ...timestamps records. Here is the signal to watch for: the moment you find yourself wanting to stamp when a link was made, you’ve already decided it’s an entity. createdAt on a membership is the “joined at,” so the urge to record it is the promotion signal firing.

One last note on this table, then we’ll move on. This memberships table isn’t a throwaway example; it’s the foundation the course builds organizations and multi-tenancy on, much later. Who belongs to which organization, and at what role, is exactly this table, and the machinery that scopes every query to the current org is built on top of it. You’re meeting it here purely as a data-modeling decision, “this link grew a role, so it’s an entity,” with none of the authentication and access-control concerns that come with it later. That’s the right order to meet it in: the shape first, the security on top of the shape afterward.

What the junction can’t do yet, and one index you still owe

Two honest forward-pointers before you go. Each one heads off a real bug, and the first is the hook into the next lesson.

The junction is data only; it doesn’t give you traversal yet. Right now invoice_tags is a set of inert foreign-key columns that Postgres enforces. What it does not give you is the ability to write db.query.invoices.findFirst({ with: { tags: true } }) and get an invoice with its tags hanging off it. Reading “an invoice and its tags” today still means a manual join through the junction, which a query lesson in the next chapter covers. The foreign-key lesson already drew this line, and it holds here: .references() declares a database constraint, not a query path, and the two are different layers. To make Drizzle walk the relationship for you, you declare a second thing entirely, the Drizzle Relations layer, and that’s the whole job of the next lesson. So you’ve done half the work: you’ve modeled the relationship and made the database guarantee its integrity. Next lesson, you teach Drizzle to traverse it.

The composite primary key only half-indexes the junction, and this is a bug you’d genuinely ship. Remember the free index that came with the composite primary key: it’s an index on (invoiceId, tagId), in that order. A composite B-tree index can serve a query that filters on a left prefix of its columns, so WHERE invoiceId = … (“the tags on this invoice”) is fast, because invoiceId is the leading column. But WHERE tagId = … (“the invoices with this tag”) gets no help from that index, because tagId isn’t a prefix; it sits in second position, and Postgres falls back to scanning the whole table. So one direction of your many-to-many is indexed and the other is a silent full scan. The fix is a second index leading with tagId, which the indexing lesson a chapter from now covers, not this one. It’s worth flagging now so the problem is familiar later: when “find all invoices with tag X” crawls in production, you’ll recognize the half-indexed junction instead of guessing.

scan enters here →

invoice_id leading column

tag_id second column

index sorted left → right — one way in

WHERE invoice_id = … ✓ hits the left prefix — uses the index

WHERE tag_id = … ✗ not a prefix — full table scan

A composite index serves a left prefix. invoice_id is indexed; tag_id alone is not — the second-direction index (leading with tag_id) is still owed.

And one guardrail to recognize: a junction with three or more foreign keys is almost always two relationships disguised as one table, so split it. A table that mashes user, invoice, and tag together (“this user tagged this invoice”) is really an invoice_tags junction plus an actor column on an entity, and the third foreign key is the sign that a second relationship is hiding inside it.

Practice: model the link, then promote it

You’ll practice both skills back to back. First build the pure junction: two foreign keys and a composite primary key. Then perform the promotion yourself, modeling the user-to-organization relationship as an entity because it carries a role. Building the two shapes one after the other is what fixes the difference in your memory.

As in this chapter’s other exercises, the starter passes explicit snake_case column-name strings to every builder, because the sandbox has no casing client and the probe SQL has to line up on the exact column names. The probes are the real test. A grader can read the shape of your composite primary key and your unique, but only an actual INSERT can prove a duplicate is rejected, and only a DELETE can prove the cascade fires. So the probes insert and delete rows that must either succeed or be rejected, and your schema passes only when the database agrees.

Two declarations, back to back. First model the invoice↔tag link as a pure junction `invoice_tags`: two .notNull() foreign keys with onDelete: 'cascade' and a composite primary key on (invoice_id, tag_id). Then perform the promotion — model the user↔organization relationship as an entity `memberships`, because it carries a role: a surrogate id primary key, the two foreign keys, a role column, and a named composite unique on (user_id, organization_id) so a user can't join the same org twice. The requirements read the shape of your PK and your unique; the probes insert and delete rows to prove the constraints actually fire. The column names are spelled out in snake_case because there's no casing client in scope.

Reveal the answer

// 1. The pure junction — two cascading FKs, identity IS the pair.
export const invoiceTags = pgTable('invoice_tags', {
  invoice_id: uuid('invoice_id')
    .notNull()
    .references(() => invoices.id, { onDelete: 'cascade' }),
  tag_id: uuid('tag_id')
    .notNull()
    .references(() => tags.id, { onDelete: 'cascade' }),
}, (t) => [
  // The pair is the identity: unique + NOT NULL on both + an index, from one line.
  primaryKey({ columns: [t.invoice_id, t.tag_id] }),
]);

// 2. The promotion — the link grew a role, so it became a thing.
export const memberships = pgTable('memberships', {
  id: uuid('id').primaryKey(),
  user_id: uuid('user_id')
    .notNull()
    .references(() => users.id, { onDelete: 'cascade' }),
  organization_id: uuid('organization_id')
    .notNull()
    .references(() => organizations.id, { onDelete: 'cascade' }),
  role: text('role').notNull().default('member'),
}, (t) => [
  // The composite key didn't disappear — it was demoted from PK to a named unique.
  unique('memberships_user_org_unique').on(t.user_id, t.organization_id),
]);

The pure junction’s primaryKey({ columns: [t.invoice_id, t.tag_id] }) makes the pair the identity, so probe 1’s doubled (a1, b1) row is rejected as a duplicate. The two cascade foreign keys reject probe 2’s orphan tag_id and, in probe 3, sweep the link row away when its invoice is deleted — the DO block raises only if the row survived, so the probe passing is the cascade firing. In memberships the surrogate id takes the primary-key slot because something could need to point a foreign key at a membership later, and the no-double-join rule survives the demotion as the named unique('memberships_user_org_unique') — which is exactly what rejects probe 4’s second (c1, d1) pair. (role is plain text here; a pgEnum('member_role', [...]) would be the production choice, and grades the same way.)

External resources

Drizzle’s “Indexes & Constraints” page is the one official reference behind every builder in this lesson: the composite primaryKey that made the pair the identity, the named unique it demoted to when the link became an entity, and the references(... { onDelete }) foreign keys that both junction columns ride on. The two guides below go the other way and teach the modeling judgment the schema encodes, including the exact “when does a junction grow into an entity?” call that the middle of this lesson turns on. The Beekeeper Studio guide lets you run the SQL in the browser as you read.

Drizzle ORM — Indexes & Constraints

orm.drizzle.team

Official reference for the composite primaryKey, the named unique, and the references onDelete foreign keys this lesson uses.

Many-to-Many Relationships: An Interactive SQL Guide

beekeeperstudio.io

Walks the junction pattern with browser-runnable SQL, including the pure-link vs. junction-with-attributes split this lesson centers on.

Many-to-Many Relationships: A Guide to Database Design

datacamp.com

Composite keys, associative entities, and the trade-offs behind promoting a junction — the modeling judgment, written up at length.