Chapter 38Lesson 4

Aggregations and grouping

Collapsing rows into dashboard numbers with Drizzle aggregates, groupBy, and having.

The organization dashboard needs a header that reads “23 invoices, $48,200 outstanding,” with a breakdown by status underneath. Reach for the tools you already have and you write db.select().from(invoices), which hands you back 23 rows. The dashboard doesn’t want the rows, though: it wants the number of them, and the sum of one of their columns. Aggregation closes that gap between the rows a table holds and the numbers a UI reports.

By the end of this lesson you’ll write the queries behind a dashboard: invoices per organization, revenue per month, top tags by usage. None of it requires a new tool. It’s the same db.select you met in CRUD and the four chain methods, with a different projection and one new clause, groupBy. The operators sit right next to the ones you already import: count, sum, avg, and the rest come from drizzle-orm, alongside eq and and from earlier lessons.

One mental model carries the whole lesson, so it’s worth stating before any code. An aggregate query collapses many rows into a single number. groupBy decides which rows collapse together. where filters the rows before the collapse; having filters the collapsed groups after. Every clause you’ll meet is a variation on that one picture.

Collapsing rows into one number

The figure below draws that picture, and the rest of the lesson keeps referring back to it.

rows

draft $200

draft $150

sent $900

sent $300

sent $400

paid $1,200

groupBy(status)

one row per group

draft count 2 sum $350

sent count 3 sum $1,600

paid count 1 sum $1,200

Rows fold into one row per group. where thins the left stack before the fold; having thins the right stack after it.

The simplest aggregate doesn’t group at all: it collapses an entire table, scoped to one org, down to a single number. “How many invoices does this org have?” is count():

const [{ total }] = await db
  .select({ total: count() })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId));

Three things are happening here, and you’ll return to each one throughout the lesson.

count() with no argument is SQL’s COUNT(*) , which counts rows. It’s an aggregate function . It arrives in TypeScript as a number: Postgres’ own count returns a bigint, but Drizzle casts it to a JS number for you, so you never see a string here. Hold on to that detail, because sum and avg behave differently, and that difference causes more wrong numbers than anything else in the lesson.

There’s no groupBy, so the whole filtered set collapses to exactly one row. That’s why the result is destructured with const [{ total }], the single-row idiom from the first lesson, pulling the one row out of the array.

The projection still drives the type. You wrote { total: count() }, so the inferred result is { total: number }[]. The type follows the projection, exactly as it did for plain column selects: the aggregate just rides in the projection where a column used to.

count() is one of a small family. You’ll meet each one as the lesson needs it, but here’s the whole set in one place for reference:

Helper

SQL

Returns in TS

Note

count()

count(*)

number

every row

count(col)

number

rows where col is non-null

countDistinct(col)

count(distinct col)

number

distinct non-null values

sum(col)

string | null

numeric precision preserved

sumDistinct(col)

sum(distinct col)

string | null

avg(col)

string | null

min(col) / max(col)

min / max

the column's own type

preserves type

The drizzle-orm aggregate helpers, all imported from drizzle-orm. The amber return cells hand you a string and a null over an empty set — the trap the lesson pays off.

All of them import from drizzle-orm. One detail in that table matters more than the rest, for reasons that come up repeatedly below: sum and avg hand you a string, and they hand you null over an empty set.

There’s also a distinction worth surfacing now, because it quietly produces wrong numbers: count() and count(col) are not the same. count() counts rows; count(col) counts only the rows where col is non-null. On a column that’s always set they agree, but on a nullable column they diverge. count(invoices.assignedToId) gives you the number of assigned invoices, which is smaller than count() whenever some invoices have no assignee. That gap looks like a quirk now, but once a join enters the picture, it’s exactly what makes the count come out right.

One number per group

A single number over the whole table is rarely what a dashboard wants. “How many invoices?” is less useful than “how many invoices of each status?” That’s where groupBy comes in: it turns one number into one number per group.

const byStatus = await db
  .select({ status: invoices.status, total: count() })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId))
  .groupBy(invoices.status);
// → { status: 'draft' | 'sent' | 'paid' | 'void'; total: number }[]

This is the first query with two parts that have to agree, so it’s worth stepping through slowly.

const byStatus = await db
  .select({ status: invoices.status, total: count() })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId))
  .groupBy(invoices.status);
// → { status: 'draft' | 'sent' | 'paid' | 'void'; total: number }[]

The grouped column rides in the projection like any normal column. But because the query is grouped, the result is no longer one row total: it’s one row per distinct status. Each row carries that status value plus whatever aggregates you computed for it.

const byStatus = await db
  .select({ status: invoices.status, total: count() })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId))
  .groupBy(invoices.status);
// → { status: 'draft' | 'sent' | 'paid' | 'void'; total: number }[]

The aggregate is computed within each group, not across the whole table. So total is the invoice count for draft, then for sent, then for paid, and so on: one count per bucket.

const byStatus = await db
  .select({ status: invoices.status, total: count() })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId))
  .groupBy(invoices.status);
// → { status: 'draft' | 'sent' | 'paid' | 'void'; total: number }[]

This names the buckets. Every distinct value of status becomes one output row. It also carries the rule that trips most people up: every column in the projection that isn’t wrapped in an aggregate, here status, must appear in groupBy.

1 / 1

That last rule deserves its own look, since it’s the most common aggregate error and Postgres rejects any query that breaks it.

The rule: every selected column that isn’t wrapped in an aggregate must appear in groupBy. The reasoning makes sense once you see it. If you’re collapsing many rows into one, Postgres has to know what to do with a column that isn’t being aggregated. Which of a five-row group’s ids should invoices.id become? There’s no sensible answer, so Postgres refuses to guess. You either group by the column, which gives one value per group with no ambiguity, or aggregate it, which gives one computed value per group. There’s no third option.

Break the rule and you get a runtime error, not a compile-time one. Drizzle builds the SQL and Postgres rejects it when it runs, naming the offending column:

Rejected
Fixed

await db
  .select({ id: invoices.id, status: invoices.status, total: count() })
  .from(invoices)
  .groupBy(invoices.status);
// ✗ runtime error from Postgres:
// column "invoices.id" must appear in the GROUP BY
// clause or be used in an aggregate function

invoices.id is selected but never grouped or aggregated. Postgres can’t collapse a group of rows into a single id, so it rejects the query at runtime.

await db
  .select({ status: invoices.status, total: count() })
  .from(invoices)
  .groupBy(invoices.status);

Drop id from the projection. A per-status count has no room for a per-row id anyway. If you wanted per-id rows, you didn’t want an aggregate at all.

Notice where the org filter sat in that first byStatus query: in where, before the groupBy. In the fold figure’s terms, where thins the left stack. It runs before the collapse, so it reduces the rows that ever reach a group. That placement is essentially free, because it shrinks the input rather than the output. Keep it in mind, since it’s exactly what sets having apart shortly.

Grouping across a join

“Invoices per organization” sounds like the last query, with one added wrinkle: a dashboard wants the org’s name, and the name lives on organizations, not invoices. Pulling a column from another table means a join, and a join is where group counts quietly go wrong, so this section is mostly about getting it right.

Here’s the correct shape: one row per org, carrying its name, its invoice count, and its outstanding total.

const perOrg = await db
  .select({
    orgId: organizations.id,
    orgName: organizations.name,
    invoiceCount: count(invoices.id),
    outstanding: sum(invoices.amountDue),
  })
  .from(organizations)
  .leftJoin(invoices, eq(invoices.organizationId, organizations.id))
  .groupBy(organizations.id, organizations.name);

Three decisions in that query are deliberate, and each one prevents a specific bug.

const perOrg = await db
  .select({
    orgId: organizations.id,
    orgName: organizations.name,
    invoiceCount: count(invoices.id),
    outstanding: sum(invoices.amountDue),
  })
  .from(organizations)
  .leftJoin(invoices, eq(invoices.organizationId, organizations.id))
  .groupBy(organizations.id, organizations.name);

You’re grouping per org, so the query is driven from the organizations side and joins invoices in, not the other way around. Choosing leftJoin over innerJoin keeps orgs with zero invoices in the result; an inner join would silently drop them, and “0 invoices” is exactly the number a dashboard must show.

const perOrg = await db
  .select({
    orgId: organizations.id,
    orgName: organizations.name,
    invoiceCount: count(invoices.id),
    outstanding: sum(invoices.amountDue),
  })
  .from(organizations)
  .leftJoin(invoices, eq(invoices.organizationId, organizations.id))
  .groupBy(organizations.id, organizations.name);

Group by the org side, never by anything on invoices, since one row per org is the goal. Both id and name appear because both are selected and neither is aggregated; the completeness rule from the last section still holds across the join.

const perOrg = await db
  .select({
    orgId: organizations.id,
    orgName: organizations.name,
    invoiceCount: count(invoices.id),
    outstanding: sum(invoices.amountDue),
  })
  .from(organizations)
  .leftJoin(invoices, eq(invoices.organizationId, organizations.id))
  .groupBy(organizations.id, organizations.name);

Count invoices.id, not count(). This is the choice that makes the count come out right, and the next paragraph is entirely about why.

1 / 1

Here is the trap, and it’s the subtlest one in the lesson. For an org with no invoices, a left join still produces one row: a row where every invoices.* column is null, padding out the missing match. Now compare the two ways you might count:

count(invoices.id) returns 0 for that org. It counts non-null id values, and there are none, which is correct.
count() returns 1 for that org. A row exists (the all-null padding row), and count() counts rows regardless of their contents, which is wrong.

That single-row, all-null padding is invisible until you count it. This is the concrete reason the helper table flagged count() versus count(col): when you’re counting the right side of a left join, you must count a column from that side, count(invoices.id), so the padding rows score zero instead of one.

The same empty org exposes a second gotcha. Its outstanding comes back as null, not 0, because sum over zero real rows is null in SQL. That null is the empty-state bug that reaches your very first customer: the one org with no invoices yet, whose dashboard tile then reads “$null.” A couple of sections from now you’ll see the one-line fix. For now, just register that sum over nothing is null, the same way an empty set has no average.

Filtering groups with `having`

Sometimes you don’t want every group, only the groups that clear a bar. “Which orgs owe us more than $1,000?” is a filter, but it’s a filter on a sum, and a sum doesn’t exist until the rows have already collapsed. That’s the job where can’t do and having can.

const bigDebtors = await db
  .select({ orgId: invoices.organizationId, outstanding: sum(invoices.amountDue) })
  .from(invoices)
  .where(eq(invoices.status, 'sent'))
  .groupBy(invoices.organizationId)
  .having(({ outstanding }) => gt(outstanding, '1000'));

The whole distinction fits in one line, and it’s worth memorizing:

where reduces the rows going in; having reduces the groups coming out.

Back to the fold figure: where thins the left stack before the collapse, having thins the right stack after it. That ordering is the reason having exists at all. The aggregate, the outstanding sum, doesn’t exist until the fold has happened, so the only clause that can filter on it is one that runs after the fold. where runs before, so it never sees an aggregate.

Look at how having references the sum. It’s a callback that receives your projected fields by the keys you named in select, so you write outstanding, the name you already chose, instead of restating sum(invoices.amountDue). Name the aggregate once in the projection, then reference the name everywhere else, the same single-source-of-truth instinct that runs through every Drizzle query. This is also why the aggregate rides in the select even when the dashboard only wants the filtered groups: the projection is where having reads it from.

That leaves a clean rule for deciding where any predicate goes:

A predicate on a raw column (status = 'sent') belongs in where. It’s cheaper, since it filters before grouping and shrinks the work, and post-group the raw column may not even be available to filter on.
A predicate on an aggregate (the outstanding sum) must go in having, because where can’t see aggregates.

One more detail in that query, easy to miss: the threshold is the string '1000', not the number 1000. amountDue is numeric, so sum returns a string, so the comparison happens in numeric space, and you stay in that space by passing a string literal. It’s the same money reflex from the schema chapter: money is numeric end to end, strings all the way, never a float.

Now practice the distinction this section turns on. For each predicate below, decide whether it filters rows, so it belongs in where, or filters groups, so it belongs in having.

Each predicate filters either individual rows or whole groups. Drop it into the clause that can express it. Drag each item into the bucket it belongs to, then press Check.

where Filters rows before grouping

having Filters groups after the count

Only paid invoices

Groups with more than 5 invoices

Invoices created this year

Orgs whose outstanding total exceeds $10k

Assigned invoices only (assignedToId is not null)

Tags used more than 3 times

Several numbers in one pass: filtered aggregates

The status-breakdown tile from the introduction wants three numbers for one org at once: the paid count, the sent count, and an overall total. You could fire three separate queries. The experienced move is one query that computes all three in a single pass, and Postgres has a clause built exactly for it.

FILTER (WHERE …) attaches a predicate to an individual aggregate, so each aggregate counts only the rows that match its own condition. Drizzle has no dedicated builder method for it, so you drop into the sql template, and this is a legitimate, parameterized use of it, not a code smell. The raw-SQL escape hatch gets its full treatment in a later lesson of this chapter; here you’re just borrowing a clause the builder doesn’t wrap.

const [breakdown] = await db
  .select({
    total: count(),
    paid: sql<number>`count(*) filter (where ${invoices.status} = 'paid')`,
    outstanding: sql<string>`coalesce(sum(${invoices.amountDue}) filter (where ${invoices.status} = 'sent'), 0)`,
  })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId));

There’s new syntax in those template strings, so the annotations below walk through it inline.

const [breakdown] = await db
  .select({
    total: count(),
    paid: sql<number>`count(*) filter (where ${invoices.status} = 'paid')`,
    outstanding: sql<string>`coalesce(sum(${invoices.amountDue}) filter (where ${invoices.status} = 'sent'), 0)`,
  })
  .from(invoices)
  .where(eq(invoices.organizationId, orgId));

Two of those tooltips pay off debts from earlier.

The sql<number> is you making a promise to TypeScript. Drizzle can read your schema for real columns, but a raw expression is opaque to it, so the <number> is how you say “trust me, this comes back as a number.” It’s a claim, like a type assertion, and a wrong claim surfaces as a runtime surprise rather than a compile error, so keep it honest.

And coalesce(sum(...), 0) is the fix for the empty-sum bug from the join section. coalesce returns its first non-null argument, so coalesce(<null>, 0) is 0. Wrapping the sum turns the null into a 0 at the database, before the value ever reaches your code. That’s the right place to fix it, because no caller downstream can then forget to. This is the canonical empty-state fix, and you’ll reach for it on essentially every dashboard sum.

The “first row per group” shape: `distinctOn`

One more grouping-adjacent need rounds out the toolkit: not a number per group, but a whole row per group. “The latest invoice for each org” doesn’t fold rows into a count; it picks one representative row out of each org’s invoices. Postgres has a tidy idiom for exactly this, and it’s worth naming so you recognize it.

First, separate two things that sound alike, so you pick the right one:

db.selectDistinct(...) is row-level dedupe, the plain SQL SELECT DISTINCT. Identical rows collapse to one. That’s the easy case: one method, nothing more to it.
db.selectDistinctOn([col]) compiles to Postgres’ DISTINCT ON : keep the first row per distinct col, where “first” is whatever orderBy says.

The second is the idiom for “latest invoice per org” without reaching for a window function:

const latestPerOrg = await db
  .selectDistinctOn([invoices.organizationId])
  .from(invoices)
  .orderBy(invoices.organizationId, desc(invoices.createdAt));

One rule makes it work, and getting it wrong fails silently:

The distinctOn column must be the leading orderBy key; the next orderBy key decides which row wins within each group.

Here organizationId leads (it has to, since it’s the distinctOn column), and desc(createdAt) breaks the tie, so the newest invoice wins per org. Flip that order, or drop createdAt from orderBy, and Postgres either errors or hands you an arbitrary row per group, a bug that passes every test until the data grows enough to expose it.

Think of distinctOn as the light cousin of row_number() over (partition by …). For “one row per group” it’s shorter and clearer. For “the top three per group,” or any real ranking, window functions are the tool, and those land later, in the subqueries and CTEs lesson. For now, distinctOn is enough.

Common dashboard queries

Two queries show up on real dashboards often enough to be worth writing out in full. The first is monthly revenue, and it leans on a Postgres function to build the buckets.

A month isn’t a column you have. createdAt is a precise timestamp, and grouping by it raw would make a separate group per invoice (the high-cardinality trap from earlier). date_trunc('month', …) solves it: it flattens every timestamp down to the first instant of its month, so all of January’s invoices collapse to the same group key.

const monthExpr = sql<string>`date_trunc('month', ${invoices.createdAt})`;

const monthlyRevenue = await db
  .select({
    month: monthExpr,
    revenue: sql<string>`coalesce(sum(${invoices.amountDue}), '0')`,
  })
  .from(invoices)
  .where(and(eq(invoices.organizationId, orgId), eq(invoices.status, 'paid')))
  .groupBy(monthExpr)
  .orderBy(monthExpr);

Notice the monthExpr variable. The same date_trunc expression has to appear in select, groupBy, and orderBy, and Drizzle’s builder has no positional shortcut like SQL’s GROUP BY 1. Without the variable you’d write the expression out three times and risk the copies drifting apart. Extracting it to a const is the cleanup worth making every time an expression repeats across clauses: write it once, reference it three times.

That leaves the boundary this whole lesson has been circling, and it’s worth settling clearly.

sum and avg over a numeric column arrive in TypeScript as a string, exactly like every numeric column has since the schema chapter.

This is not Drizzle being awkward. The database holds full decimal precision, while JavaScript’s number is an IEEE-754 float that cannot represent every cent exactly. Money math on floats corrupts totals, and the corruption surfaces as off-by-a-penny bugs in invoices. So the database keeps money as an exact decimal and hands it to you as a string to protect it. The rule that falls out: format the string for display, never parseFloat it for money math.

Here’s the wrong way and the right way, side by side.

The money bug
Exact in SQL, format in JS

const withTax = parseFloat(row.revenue) * 1.2;

This is the float corruption the numeric-to-string design exists to prevent. parseFloat drags the exact decimal into an IEEE-754 float, and the multiply compounds the error, so the result is wrong money.

const formatted = new Intl.NumberFormat('en-US', {
  style: 'currency',
  currency: 'USD',
}).format(Number(row.revenue));

Do money math in Postgres, where numeric stays exact. The * 1.2 for tax belongs in the query, where the result is still numeric and still arrives as a string. Then do money formatting in JS: the string flows straight to a currency formatter, and Number() appears only at that final display step, never to compute with.

If you genuinely must compute on money in JavaScript, which a dashboard you’re only displaying rarely needs to, that’s the moment to reach for a decimal library that does exact arithmetic. The senior default, though, is simpler: do the math in Postgres where numeric stays exact, and do nothing but formatting in JS. Those string | null cells back in the helper table are the reason the wrong column above misbehaves, and now you know exactly why they’re flagged.

Practice: build a dashboard query

Time to write one end to end. This is the real shape of a dashboard read, a grouped aggregate across two tables with the empty-state case handled, and the grader will check your result against seeded data.

Return each organization's name, its total invoice count, and its outstanding balance — the sum of amountDue for status = 'sent' invoices only — highest outstanding first. invoiceCount counts every invoice, but outstanding sums only the sent ones, so the status filter rides on the sum alone via FILTER (WHERE …). Keep orgs with no sent invoices, showing '0' for their outstanding (not null).

View schema & seed rows

Schema (Drizzle)

export const organizations = pgTable('organizations', {
  id: integer('id').primaryKey(),
  name: text('name').notNull(),
});

export const invoices = pgTable('invoices', {
  id: integer('id').primaryKey(),
  organizationId: integer('organization_id')
    .notNull()
    .references(() => organizations.id),
  amountDue: numeric('amount_due', { precision: 12, scale: 2 }).notNull(),
  status: text('status').notNull(),
});

Seed rows (SQL)

INSERT INTO organizations (id, name) VALUES
  (1, 'Acme'),
  (2, 'Globex'),
  (3, 'Initech');

INSERT INTO invoices (id, organization_id, amount_due, status) VALUES
  (1, 1, '900.00',  'sent'),
  (2, 1, '600.00',  'sent'),
  (3, 1, '1200.00', 'paid'),
  (4, 2, '2000.00', 'sent'),
  (5, 3, '500.00',  'paid');

Step back and notice what every query in this lesson had in common: it was db.select, the SQL builder, never the relational query API. That’s the boundary the earlier lessons kept pointing at. The relational API is the default for reading a tree of nested objects. Aggregates aren’t a tree; they compute over rows, and computing over rows is what the SQL builder is for. There’s no first-class way to aggregate through the relational API, and the answer isn’t to look for a workaround: it’s to drop to db.select, exactly as you did here. When the question is “give me the rows, nested,” reach for db.query; when it’s “give me a number about the rows,” reach for db.select.

External resources

Drizzle ORM — Aggregations, groupBy & having

orm.drizzle.team

The official select reference: every aggregate helper plus the .groupBy() / .having() methods used throughout this lesson.

PostgreSQL — Aggregate functions, GROUP BY & HAVING

postgresql.org

The authoritative source on the WHERE-before / HAVING-after distinction the whole lesson turns on.

The FILTER clause

modern-sql.com

A focused explainer on FILTER (WHERE …) for conditional aggregates, with the older CASE form side by side.

PostgreSQL DISTINCT ON tutorial

neon.com

A worked one-row-per-group example, including why DISTINCT ON must lead the ORDER BY.

Aggregations and grouping

Collapsing rows into one number

One number per group

Grouping across a join

Filtering groups with having

Several numbers in one pass: filtered aggregates

The “first row per group” shape: distinctOn

Common dashboard queries

Practice: build a dashboard query

External resources

Filtering groups with `having`

The “first row per group” shape: `distinctOn`