Chapter 39Lesson 3

Reading EXPLAIN ANALYZE

How to read a Postgres query plan with EXPLAIN ANALYZE, so you can prove why a query is slow and confirm an index actually fixed it.

The invoices list page you built shipped fast. Fifty seeded rows, the page renders instantly, everyone moves on. Then a customer grows, the table fills to two hundred thousand invoices, and the same page takes three seconds to load.

You already know the candidate fixes. Earlier in this chapter you declared a composite index for exactly this query, and you ruled out N+1, since this is one query, not two hundred round-trips. But how do you know the index is the fix? And once you ship it, how do you confirm it actually worked, rather than sitting unused while the database keeps scanning the whole table?

“Feels slow” is not evidence, and “feels fast now” is not proof. There is one reliable way to answer “why is this query slow, and did my change help”: you ask Postgres directly. By the end of this lesson you can take any suspect query, ask the planner how it ran, read the answer top to bottom, name the expensive part, and run the loop that turns a guess into a fix.

One idea frames everything that follows. A query says what you want, the planner decides how to get it, and EXPLAIN ANALYZE shows you the how with real numbers attached. Every tool in this lesson is a different view of that one idea.

The planner decides how, EXPLAIN shows the decision

A SQL query is declarative . When you write select * from invoices where status = 'pending' order by created_at desc limit 20, you have described the result: twenty pending invoices, newest first. You have said nothing about how to produce it. Should the engine read every row and discard the non-pending ones? Walk an index straight to the pending rows? Sort the whole table, or use an index that already keeps it sorted?

That decision belongs to the query planner . For any query, the planner lists the access paths available (scanning the table, walking this index or that one, choosing a join algorithm), estimates the cost of each, and picks the cheapest. This is the same machinery you met when you declared indexes: an index doesn’t change what comes back, only how the engine reaches it. Two plans for the same query return byte-identical results at wildly different speeds.

EXPLAIN is how you read that decision. Prefix any query with it and Postgres prints the plan it would run, without running it:

explain select * from invoices where status = 'pending';

You get the chosen plan annotated with cost estimates , the planner’s guess at how expensive each step is. It’s fast to run, because the query never executes. But a guess is only half the story.

EXPLAIN ANALYZE is the other half. It actually runs the query and prints the same plan, now annotated with the real timings and row counts measured during execution, sitting right beside the estimates:

explain analyze select * from invoices where status = 'pending';

This is the one to reach for. The estimate tells you what the planner thought, the actuals tell you what happened, and the gap between them is, more often than you’d guess, the bug itself. A plan that estimates ten rows and actually returns a million made its decisions on bad information.

One field trips up everyone the first time. Those cost estimates look like cost=0.00..25.40, and the instinct is to read them as milliseconds. They are not. Cost is an arbitrary planner unit, useful only for comparing two nodes or two plans against each other, never as an absolute measure. The real time lives in a separate field, actual time=, which ANALYZE adds and EXPLAIN alone does not. When you want milliseconds, you want actual time, which means you want ANALYZE.

The following three panels show the same query at three levels of evidence: the SQL you wrote, the plan the planner picked, and the plan with reality measured into it.

select * from invoices where status = 'pending';

Declarative: it names the result, not how to reach it.

Seq Scan on invoices  (cost=0.00..25.40 rows=6 width=72)
  Filter: (status = 'pending')

The plan the planner chose, with cost estimates. The query did not run.

Seq Scan on invoices  (cost=0.00..25.40 rows=6 width=72)
                      (actual time=0.018..0.124 rows=6 loops=1)
  Filter: (status = 'pending')
  Rows Removed by Filter: 194
Planning Time: 0.071 ms
Execution Time: 0.146 ms

The same plan, executed. Estimates and actuals side by side, and the query really ran.

Running it: copy the SQL, prefix it, read the output

Getting a plan is the easy part. The skill is in reading it, which is the rest of this lesson, so the mechanics here stay short.

The everyday path is close to something you can already do. Turn on logger: true on your Drizzle client, run the page, and copy the exact SQL Drizzle logged. Paste it into a SQL console, such as Neon’s SQL editor, psql, or Drizzle Studio, and prefix it with explain (analyze, buffers). Drizzle builds the query, and you diagnose the raw SQL it emits. That separation is the point: you don’t need the ORM in the loop while you’re reading plans.

You can also run a plan from inside your application code when you want a one-off probe in a script:

const plan = await db.execute(
  sql`explain (analyze, buffers) select * from invoices where status = 'pending'`,
);

This path has real friction. You’re hand-writing the SQL (or interpolating it), and what comes back is rows of plan text, not a typed result shape you can work with. It’s fine for a quick probe, but for interactive diagnosis the SQL console is cleaner. You’ll read plans far more often than you’ll script them.

Notice the (analyze, buffers) in both. BUFFERS adds, to every node, a count of how many pages came from RAM versus from disk. That’s the single best signal for whether a query is slow because it’s missing the cache. The catch is a version split. Postgres 18, your production target on Neon, turns BUFFERS on automatically whenever ANALYZE runs. Older Postgres, including the one running inside the exercises on this page, does not, so there you have to ask for it explicitly. Rather than remember which version does which, adopt one habit: always write (analyze, buffers). It’s a no-op where buffers are already on, and necessary everywhere else, so the same line is correct on every version.

EXPLAIN ANALYZE executes the query. On a SELECT that’s harmless. On an UPDATE, DELETE, or INSERT it performs the write, so the rows really change. To get a plan for a destructive statement without committing it, wrap it in a transaction and roll the transaction back:

begin;
explain (analyze, buffers) delete from invoices where status = 'draft';
rollback;

Or drop ANALYZE and use plain EXPLAIN for an estimate-only plan that never touches the data. The hazard is real: run EXPLAIN ANALYZE DELETE in the wrong window and it deletes for real.

Reading the plan tree top to bottom

Every later section depends on reading a plan correctly, so this is the part to slow down on.

A plan is a tree, printed as indented text. Each line is a node, an operation like “scan this table” or “join these two inputs”, followed by its estimates and, with ANALYZE, its actuals. Indentation is depth. A node indented under another is its child.

The part beginners get backwards is the direction of execution. The deepest, most-indented node runs first. Its output feeds its parent one level up. Results flow upward until they reach the top line, the root, which is the last thing that happens and the result you actually receive. The natural instinct is to read the top line first and stop there, but the top line is the finish line, not the start.

To read a plan, follow this recipe:

Find the deepest node. That’s where data first enters the query, through a table scan or an index walk.
Walk outward and up, one level at a time, watching how actual rows changes at each node. As rows climb, they get filtered out, joined, sorted, and aggregated.
Find the node where the time concentrates, or where the row count explodes or collapses unexpectedly. That node is your suspect.

The diagram below takes one real four-node plan and walks it in true execution order, deepest first, then up. Scrub through it and watch the highlight climb from the scan at the bottom to the result at the top. Spend a minute with it: seeing the order play out helps more than reading about it.

↑ results flow upward — deepest node runs first

Sort (actual time=2.31..2.34 rows=20 loops=1)

Sort Key: i.created_at DESC

-> Hash Join (actual time=0.42..2.10 rows=180 loops=1)

Hash Cond: (i.customer_id = c.id)

-> Seq Scan on invoices i (actual time=0.01..1.40 rows=180 loops=1)

-> Hash (actual time=0.30..0.30 rows=40 loops=1)

-> Seq Scan on customers c (actual time=0.01..0.18 rows=40 loops=1) runs first

Deepest node, so it runs first. Postgres reads all 40 customer rows. Nothing above has happened yet.

↑ results flow upward — deepest node runs first

Sort (actual time=2.31..2.34 rows=20 loops=1)

Sort Key: i.created_at DESC

-> Hash Join (actual time=0.42..2.10 rows=180 loops=1)

Hash Cond: (i.customer_id = c.id)

-> Seq Scan on invoices i (actual time=0.01..1.40 rows=180 loops=1)

-> Hash (actual time=0.30..0.30 rows=40 loops=1) now running

-> Seq Scan on customers c (actual time=0.01..0.18 rows=40 loops=1) ✓ done

Its parent, the Hash node, takes those 40 rows and builds a hash table keyed by customer id — ready to be probed.

↑ results flow upward — deepest node runs first

Sort (actual time=2.31..2.34 rows=20 loops=1)

Sort Key: i.created_at DESC

-> Hash Join (actual time=0.42..2.10 rows=180 loops=1) now running

Hash Cond: (i.customer_id = c.id)

-> Seq Scan on invoices i (actual time=0.01..1.40 rows=180 loops=1) now running

-> Hash (actual time=0.30..0.30 rows=40 loops=1) ✓ done

-> Seq Scan on customers c (actual time=0.01..0.18 rows=40 loops=1) ✓ done

Now the join runs: Postgres scans the 180 invoices and probes the hash for each one’s customer. Output: 180 joined rows, flowing up.

↑ results flow upward — deepest node runs first

Sort (actual time=2.31..2.34 rows=20 loops=1) runs last

Sort Key: i.created_at DESC

-> Hash Join (actual time=0.42..2.10 rows=180 loops=1) ✓ done

Hash Cond: (i.customer_id = c.id)

-> Seq Scan on invoices i (actual time=0.01..1.40 rows=180 loops=1) ✓ done

-> Hash (actual time=0.30..0.30 rows=40 loops=1) ✓ done

-> Seq Scan on customers c (actual time=0.01..0.18 rows=40 loops=1) ✓ done

Last, the root sorts those rows by created_at descending and returns the final 20. The top line is where execution ends, not where it begins.

Now run a real plan yourself. The sandbox below has a small table seeded for you. Run it as-is, then read the output top to bottom and answer the question in the instructions: which line ran first?

Run this as-is to see a real query plan, then read it top to bottom. Which line ran first, the top line or the most-indented bottom one? (Remember: the deepest node executes first, and its output flows up to the root.)

View schema & data

CREATE TABLE invoices (
  id int PRIMARY KEY,
  organization_id int NOT NULL,
  status text NOT NULL,
  amount_cents int NOT NULL
);
INSERT INTO invoices (id, organization_id, status, amount_cents) VALUES
  (1, 1, 'paid', 12000),
  (2, 1, 'pending', 4500),
  (3, 1, 'paid', 9900),
  (4, 1, 'pending', 30000),
  (5, 1, 'draft', 1500),
  (6, 1, 'paid', 7200);

The four numbers that decide

A plan node carries a dozen fields. An experienced engineer reads four of them and skims the rest. Learn these four by the position they occupy in a real node line and you’ll know where to look in any plan.

Index Scan using idx_invoices_org_created on invoices
  (cost=0.42..18.30 rows=12 width=72) (actual time=0.015..0.040 rows=180 loops=200)
  Buffers: shared hit=120 read=8

Estimated rows is the planner’s guess from table statistics, and actual ... rows is what really came back. When the actual count is off from the estimate by more than about 10×, the planner chose its plan on bad information: the stats are stale, or the predicate is too hard to estimate. To fix it, run ANALYZE <table>; and re-check. Here 12 versus 180 is a 15× miss, a red flag.

Index Scan using idx_invoices_org_created on invoices
  (cost=0.42..18.30 rows=12 width=72) (actual time=0.015..0.040 rows=180 loops=200)
  Buffers: shared hit=120 read=8

This node executed 200 times. The displayed time and rows are per loop, so the real cost is loops × time and the real row total is loops × rows. A node that looks cheap at 0.040ms is expensive at loops=200: about 8ms and 36,000 rows touched. This is the most-missed number on the page, and the database-side cousin of N+1.

Index Scan using idx_invoices_org_created on invoices
  (cost=0.42..18.30 rows=12 width=72) (actual time=0.015..0.040 rows=180 loops=200)
  Buffers: shared hit=120 read=8

Two numbers: startup time before the first row (0.015), then total time to the last row (0.040), both per loop. Read it together with loops= to get the true cost, because on its own it understates a looped node.

Index Scan using idx_invoices_org_created on invoices
  (cost=0.42..18.30 rows=12 width=72) (actual time=0.015..0.040 rows=180 loops=200)
  Buffers: shared hit=120 read=8

120 pages served from RAM (hit), 8 fetched from disk (read). On a query that should be hot, a lot of read means the data doesn’t fit in cache or the index isn’t covering. These counts are per loop too, so multiply by loops= for the real I/O.

1 / 1

Two of those four numbers deserve a closer look, because they hide the most bugs.

The estimate-versus-actual gap is the planner grading its own homework. It chose this plan because it predicted these row counts, drawn from statistics , a sampled picture of how your columns are distributed. When the prediction is wildly off, that picture is out of date. The fix is to refresh it: ANALYZE the table and run the plan again.

And loops= is the silent multiplier. A line reading (actual time=0.01..0.04 rows=5 loops=200) did not touch 5 rows in 0.04ms. It ran two hundred times, so the honest cost is 200 × 0.04ms and 200 × 5 rows. Postgres reports per-loop and trusts you to do the multiplication. Skip it and a node that looks free turns out to dominate the query.

The node types you’ll meet

You don’t need the full catalogue of plan nodes. Postgres has dozens, and most never appear in a SaaS app. You need the handful that map to the query shapes you already write. Here they are, grouped by what they do, each tied to the query that produces it.

Scans are how rows first enter a plan.

Seq Scan reads every row in the table, top to bottom. It’s fine on a small table, but a warning sign on a large one when a WHERE clause should have let an index skip most of it.
Index Scan walks a B-tree index to find matching keys, then fetches those rows from the table. This is the shape a selective predicate produces once the right index exists, the payoff of the indexes you declared earlier.
Index Only Scan answers the whole query from the index alone, never touching the table, because the index already contains every column the query needs. That’s a covering index (built with INCLUDE, named here and drilled later).
Bitmap Index Scan paired with Bitmap Heap Scan combines one or more indexes through a bitmap before fetching rows. The planner chooses it for medium-selectivity or multi-predicate queries, where a plain index scan would jump around the table too much.

Joins combine two inputs. The planner picks the algorithm, and you read which one it picked.

Nested Loop takes each row from the outer input and probes the inner input for matches. It wins when the outer set is small. It turns costly when the inner side is an unindexed scan run many times over, so watch the inner node’s loops=.
Hash Join builds a hash table from one input, then probes it with the other. It’s the workhorse for medium-to-large equality joins.
Merge Join zips together two inputs that are already sorted on the join key.

Post-processing nodes shape rows after they’re gathered.

Sort orders rows. Watch for Sort Method: external merge Disk: …, which means the sort didn’t fit in memory and spilled to disk, and a disk sort is slow.
Aggregate, HashAggregate, and GroupAggregate are the fold behind count, sum, and group by.
CTE Scan and Subquery Scan read the results of a CTE or a subquery.

Match each node to the plain-language thing it does. This locks the vocabulary in before the next section leans on it.

Match each plan node to what it does. Click an item on the left, then its match on the right. Press Check when done.

Seq Scan

Reads every row in the table, in order.

Index Scan

Walks an index to find keys, then fetches those rows from the table.

Hash Join

Builds a hash table from one input and probes it with the other.

Nested Loop

For each row on the outer side, probes the inner side for matches.

Sort

Orders rows; spills to disk if they don’t fit in memory.

Aggregate

Folds many rows into a summary, like count or sum.

From plan to diagnosis

This is where the reading pays off. Reading a plan is a means to an end, and the end is recognition: seeing a pattern and knowing the fix. Each pattern below reads the same way: what you see in the plan → what it means → the fix you already learned.

A Seq Scan with a Filter on a large table, where actual rows is far smaller than the rows scanned. The engine read the whole table and threw most of it away. The tell is a large Rows Removed by Filter line. The fix is the missing or unused index on the filter column.
An Index Scan returning far more rows than the final result needs. The index is used but isn’t selective enough, or it’s the wrong composite for this query. A better composite, leading with the column the query filters on, may help.
A Sort consuming most of the query’s time, especially Sort Method: external merge. No index produces the order the query asks for, so Postgres sorts from scratch, and spills to disk if the sort is big. The fix is a composite index in the query’s exact sort direction, so the rows arrive already ordered and the Sort node disappears entirely.
A Nested Loop with a high loops= count on its inner side, scanning an unindexed table. This is the database-side shape that looks like N+1, and the distinction matters. Application N+1, from earlier in this chapter, is two hundred separate round-trips, invisible to any single plan, found only by counting queries in the log. This is one query whose join re-probes an unindexed table once per outer row. Because it’s a single statement, the plan does show it. The fix is the foreign-key index on the join column.
Estimates off from actuals by orders of magnitude. This is stale statistics. Run ANALYZE <table>; and re-check before changing anything else.

The funnel below sets the order to ask these questions in. Start at the top, pick the symptom your plan shows, and follow it to the fix.

Your query is slow — what does the plan show?

Now see a plan flip for yourself. Earlier in this lesson you ran EXPLAIN on where status = 'pending' against a tiny table and read a Seq Scan. The sandbox below runs the same query against a table of five thousand rows that already carries an index on status, declared in the seed exactly as you’d declare it in lesson 1. Run it, read the plan top to bottom, and watch the access path change. When the plan names an index scan instead of a sequential scan, you’ve seen the index earn its place.

Run this as-is. It's the same selective query you ran earlier, but now against 5,000 rows, with an index on status declared in the seed (open 'View schema & data' to see it). Read the plan top to bottom: which node sits at the root now, and what happened to the Rows Removed by Filter line you saw before?

View schema & data

CREATE TABLE invoices (
  id int PRIMARY KEY,
  organization_id int NOT NULL,
  status text NOT NULL,
  amount_cents int NOT NULL
);
INSERT INTO invoices (id, organization_id, status, amount_cents)
SELECT
  g,
  (g % 50) + 1,
  CASE WHEN g % 97 = 0 THEN 'pending' ELSE 'paid' END,
  (g % 1000) * 100
FROM generate_series(1, 5000) AS g;
CREATE INDEX idx_invoices_status ON invoices (status);
ANALYZE invoices;

The plan you just read started with Index Scan using idx_invoices_status, and the Rows Removed by Filter line is gone, because the engine walked the B-tree straight to the pending rows instead of reading all five thousand and discarding the rest. Hold it next to the very first plan you ran in this lesson: same where status = 'pending', but with no index it was a Seq Scan whose Rows Removed by Filter counted every row it threw away. Same query, same shape of data, different access path. That swap is the whole point of declaring the index.

You run explain (analyze, buffers) on a selective where status = 'pending' and the root node is a Seq Scan with Rows Removed by Filter: 4949. You add the index on status and re-run the same query. Which node now sits at the root of the plan?

Index Scan

Seq Scan

Sort

Hash Join

The index lets the planner walk a B-tree straight to the matching keys instead of reading every row, so the access path becomes an Index Scan and Rows Removed by Filter disappears. A Seq Scan is what you started with; Sort and Hash Join come from ORDER BY and joins, neither of which this single-table, no-order query has.

One change, one re-run: the diagnostic loop

Everything in this lesson collapses into a single discipline, worth naming on its own so it becomes a habit rather than a footnote.

Measure. Run EXPLAIN (ANALYZE, BUFFERS) on the slow query. Read it top to bottom.
Hypothesize. Name which node is expensive and why, as in “the Sort dominates because no index produces this order.”
Change exactly one thing. Add one index, rewrite one clause, tighten one WHERE. One.
Re-run and confirm. Run EXPLAIN (ANALYZE, BUFFERS) again and check that the node you predicted changed the way you predicted. Then repeat if it’s still slow.

The rule that matters most is the “exactly one” in step 3. Change two things at once and you’ve lost the signal: when the query gets faster, you can’t say which change did it, and you may have shipped a useless second index that taxes every write to that table for no benefit. That discipline is the whole difference between a fix and cargo-cult tuning, where indexes get sprayed at a problem and nobody can later say which ones matter.

One trap can undermine the loop before you even start it: the data you measure against. A plan on a hundred dev rows tells you almost nothing about the plan on ten million production rows, because the planner changes strategy as the table grows. A Seq Scan that’s cheapest at a hundred rows loses to an Index Scan at a hundred thousand. That’s the same Seq Scan-to-Index Scan swap you just toggled in the exercise, except here it’s driven by row volume rather than by adding the index. So measure against production-shaped data: a realistically seeded volume, or a production-shaped branch on Neon. Diagnose against a toy table and you’ll “fix” problems that don’t exist and miss the ones that do.

Worked example: the cursor-pagination query, before and after

This section makes good on an earlier promise. When you declared a composite index for cursor pagination earlier in this chapter, the lesson said you’d be able to prove it earns its weight. This is that proof.

The query is the cursor-pagination one you wrote: a page of invoices for an organization, ordered newest-first, fetched with a compound cursor and a limit. Since EXPLAIN runs on SQL, not on the Drizzle builder, the SQL comes out roughly like this:

select * from invoices
where organization_id = $1
  and (created_at < $2 or (created_at = $2 and id < $3))
order by created_at desc, id desc
limit 21;

The index you declared for it is idx_invoices_org_created_at_id, on (organization_id, created_at desc, id desc): the tenant column first, then the sort key and its tiebreaker in the exact direction the query orders. The two tabs below show this query’s plan against a populated table, before that index exists and after.

Before the index
After the index

Limit  (actual time=88.2..88.3 rows=21 loops=1)
  ->  Sort  (actual time=88.2..88.2 rows=21 loops=1)
        Sort Key: created_at DESC, id DESC
        Sort Method: external merge  Disk: 9512kB
        ->  Seq Scan on invoices  (actual time=0.02..71.4 rows=41204 loops=1)
              Filter: (organization_id = 1)
              Rows Removed by Filter: 158796
              Buffers: shared hit=180 read=2304
Execution Time: 90.1 ms

Two suspects, reading bottom-up. The Seq Scan reads all 200,000 rows and throws away 158,796, as Rows Removed by Filter confirms. Then the Sort orders 41,204 rows from scratch and spills to disk (external merge). That’s 2,304 pages read from disk, and 90ms total.

Limit  (actual time=0.05..0.34 rows=21 loops=1)
  ->  Index Scan using idx_invoices_org_created_at_id on invoices
        (actual time=0.04..0.31 rows=21 loops=1)
        Index Cond: (organization_id = 1)
        Filter: ((created_at < $2) OR ((created_at = $2) AND (id < $3)))
        Rows Removed by Filter: 3
        Buffers: shared hit=24 read=0
Execution Time: 0.39 ms

One node. The Index Scan walks idx_invoices_org_created_at_id for organization_id = 1, and because the index already holds the rows in created_at desc, id desc order, the Limit halts it after 21 rows and the engine never reads the rest. The cursor’s OR predicate can’t narrow a B-tree, so it rides along as a Filter that skips the few rows newer than the cursor (Rows Removed by Filter: 3). No Sort node at all. Zero disk reads, and 0.39ms, over two hundred times faster.

Read the two plans side by side and two things stand out. The headline metric is the disk reads: read=2304 collapsed to read=0. Before the index, Postgres dragged thousands of pages off disk to find one page of results; after, it walks straight to them in cache. The structural win is the missing node: the Sort is simply gone. The index already keeps the rows in created_at desc, id desc order, so there is nothing left to sort. That’s the whole reason cursor pagination and its composite index belong together: the index doesn’t just speed the lookup, it deletes an entire stage of work.

Notice that this was the loop, run once. One change, the index. One re-run. The predicted node, the Sort, was gone exactly as predicted. That’s the discipline, and the proof the earlier lessons deferred to this one.

Beyond the console

You won’t hand-run EXPLAIN on production all day. A few production-grade tools do this for you, and it’s worth recognizing them by name now so you reach for them later.

auto_explain is a Postgres extension that logs the plan of any query slower than a threshold you set. Set auto_explain.log_min_duration = '500ms', and every slow query records its own plan to the log without you lifting a finger. It’s the production version of this lesson’s manual loop: instead of guessing which query is slow and running EXPLAIN on it, you let the slow ones announce themselves. It’s a server-config concern, so a later lesson covers the setup, but it’s the answer to “how do I do this at scale.”

One tier up, hosted tools like pganalyze and Datadog database monitoring collect plans and trend them over time, so you can see a query degrade across a deploy. And the consoles where you’ll actually run the commands from this lesson are Drizzle Studio and the Neon SQL editor, both covered when you set them up.

External resources

Postgres docs — Using EXPLAIN

postgresql.org

The canonical reference for every plan node and EXPLAIN option.

pev2 — Postgres EXPLAIN visualizer

explain.dalibo.com

Paste a plan, get a navigable visual tree — invaluable for large plans.

Drizzle docs — Magic sql operator

orm.drizzle.team

How to run raw EXPLAIN with db.execute and the sql template.