Skip to content
Chapter 89Lesson 1

When component tests earn their weight

A decision gate for React Testing Library component tests, which stay off by default on a 2026 Next.js SaaS and earn their place only when a named trigger fires.

A feature lands, and three new components ship with it: a settings panel, a confirmation dialog, a status badge. The team’s reflex fires the way it was trained to, which is that a new component means a new test. Nobody decides to do this. It’s muscle memory, the test pyramid absorbed from a hundred blog posts.

Two months later, the suite has two hundred component tests. Roughly eighty percent of them re-test presentational primitives: one test confirms that <Card> renders its children, another that <Button> forwards an onClick. The watch loop that used to finish before you looked up now takes long enough that you tab away. And the component suite has caught zero production bugs in those two months. The real bugs shipped through the data layer and the auth checks, where no component test was looking. Worse, the team has quietly learned to rubber-stamp every component-test diff in review, because reading them has never once paid off. The suite is now training people to skip review.

That is the failure mode this lesson exists to prevent, and the fix is to invert the reflex. For a Next.js SaaS in 2026, React Testing Library , or RTL, is off by default. You don’t write a component test because a component exists. You write one when the component crosses a specific, named threshold, and outside those thresholds the experienced move is to delete the test, not add it.

This framing should feel familiar. It’s the same shape you met with TanStack Query and Zustand: the platform gives you a default, and you reach for the heavier tool only when the default’s limit is crossed. You’ll meet it again with Playwright in the next chapter. It’s “trigger before tool” applied to your test suite. It’s also a zoom-in on something you already have a map for, the honeycomb suite shape from earlier in this unit, with its wide unit base, its integration center of gravity at the seams, and its thin component and end-to-end bands gated by trigger. This lesson lives entirely inside that thin component band.

You’ll write no RTL code here. The next three lessons cover the wiring, the query ladder, and the catalog. What you’ll leave with is a five-second mental gate you run before you start typing a test file, one that tells you whether this component is worth its maintenance cost or whether the experienced move is to walk away.

Start with the architecture, because the default falls straight out of it. A Next.js 16 SaaS renders most of its tree on the server. That single fact reshapes where your bugs can possibly be, and a test is only worth writing where bugs actually land. The picture comes together one layer at a time.

Take a Server Component that fetches a page of invoices and renders them as a list. Where can it break? The query could be wrong. The auth check could let the wrong tenant through. The currency formatter could mangle a total. Every one of those bugs lives inside the data layer, at the seam, and the seam is exactly what your integration tests cover. You built those last chapter. The rendered output, the actual <li> elements, is orchestrated by the framework. If you point RTL at that markup and assert the list has ten rows, you’re not testing your code, you’re testing that Next.js can render an array. That test passes forever and tells you nothing.

Now look at what’s left, the Client Components. On this stack they’re mostly thin. A form wired to a Server Action is already covered by the action’s integration test, which checks what happens when it runs. A modal, a tooltip, a navigation menu: their behavior is either trivial or it’s the kind of thing you’d notice the first time you clicked it by hand. There’s not much bug density there to justify a standing automated test.

Put those two together and the default writes itself: write no component test. The rendered surface is mostly not yours to test, and the interactive surface is mostly too thin to be worth it. This is the honeycomb’s “bug density follows the architecture” rule, specialized to UI: write the test where the bugs land, and for this stack, that’s rarely the DOM.

One more point belongs right here, because it runs through this entire chapter: the thing you test with RTL is always a Client Component or a pure presentational component, never a Server Component. Async Server Components are not an RTL surface in 2026. The framework’s support for rendering them in a test is fragile, so testing them this way means fighting the tool the whole way down. Server Components get their coverage at the seam, and where they sit on a money path, in the end-to-end tests of the next chapter. Keep that in mind; we’ll keep coming back to it.

Server Components
fetch + render, framework-orchestrated
bugs → the seam (covered last chapter)
Thin Client Components
forms, modals, tooltips, nav
mostly trivial / covered by the action test
Stateful Client Components
state is the behavior
RTL earns its weight here
Across a typical page, the surface where component tests earn their weight is a thin slice: most bugs land at the seam, and most of what's left is too thin to test.

The diagram above is the whole argument in one strip. The wide band on the left is where most of your page lives and where you don’t point RTL. The narrow accented band on the right is the exception this lesson is about, the slice where state is the behavior and a component test finally earns its keep. The rest of the chapter is about recognizing that slice.

So what crosses the threshold? A component earns a test when its bug density justifies the maintenance cost: when the chance of a real bug, times the damage that bug does, outweighs the running cost of keeping a test green forever. There are three places that math reliably tips, plus a fourth that hides inside the other three. Learn them as a set, because they’re the entire reason a component test ever gets written.

First, hold the cost shape in your head, because it’s the denominator in that math. A unit test over your /lib helpers runs in about five milliseconds. An integration test with a real database and a rollback runs in twenty to eighty. A component test runs in a hundred to three hundred: it boots jsdom , renders the tree, and runs queries against it. End-to-end tests run in seconds. Component tests sit in the awkward middle. Twenty well-chosen ones cost you nothing you’ll notice; two hundred double your watch loop. The discipline isn’t in any single test, it’s in the count, and the count is governed entirely by the three triggers below.

A <DataTable>, a <Combobox>, a <DatePicker>: a primitive consumed in thirty different places across the app. Here the cost math doesn’t just tip, it flips. One bug in a shared primitive doesn’t ship one regression, it ships thirty, scattered across every screen that imports it, and you find them one angry bug report at a time. A single test that catches that bug before it merges pays for itself many times over, because it’s standing guard over thirty call sites at once.

You’ve built components exactly like this already: the form field set from the forms unit, used on every form in the app; the toast surface from the notifications unit; the date-range picker from the internationalization unit, leaned on by reports, invoices, and filters alike. These are shared infrastructure, and a bug in one of them is a bad day across the whole product. Reach for RTL here.

Trigger 2: a complex stateful interactive component

Section titled “Trigger 2: a complex stateful interactive component”

Some components are their state transitions. A multi-step form that reveals and hides fields depending on earlier answers, like the subscribe form from the forms unit, where choosing a plan changes which fields appear. A virtualized list with multi-select and keyboard navigation. A command palette with async search. In all of these, the interesting behavior isn’t “does it render,” it’s “click here, then type there, then press Enter, and does the right branch fire?”

Two things make this a trigger. First, walking the state graph by hand is unreliable: you’ll click the happy path and forget the branch where the user goes back a step and changes an answer. Second, the integration test at the action’s seam never sees this logic. The action receives whatever the form finally submits; it has no idea how many branches the form took to assemble that payload. The branch logic lives in the component or nowhere.

You want a sharper line than “complex,” because “complex” is how every component looks to the person who just wrote it. Here’s the rule of thumb: reach when the component’s state graph has more than three distinct states, or when a single user flow spans more than two interactions. Under that, a manual click and an integration test at the seam are enough. Over it, the branches multiply faster than you can hold them in your head, and a test starts earning its weight.

Some behaviors are too granular for an end-to-end test to bother with, yet too consequential to leave to a human remembering to check them. The stakes here are real, so consider two examples you’ve built or will build, and what’s on the line in each.

The cookie consent gate from the security unit. Its whole job is to make sure analytics don’t fire until the user has consented. A one-line bug, a flipped boolean or a default that leans the wrong way, means you’re shipping tracking without consent, which is direct GDPR exposure. “I clicked Accept and it stopped showing” is not a check you want riding on whoever last touched the file.

The checkout summary line from the billing unit. This is the row the user reads right before they commit money: the total, the tax, the trial-end date. An end-to-end test will walk the happy path of pick a plan, see a total, reach Stripe. What it won’t do is exercise the seven content variants this one component has to get right: coupon applied versus not, trial versus no trial, one seat versus five, a tax-exclusive locale where the total has to show tax separately. Each variant is a different sentence in front of the user’s wallet, and each is exactly an RTL assertion: given these props, the user reads this total. Reach here.

The implicit fourth: an accessibility-sensitive surface

Section titled “The implicit fourth: an accessibility-sensitive surface”

There’s a fourth trigger, lighter than the others, and it has one property worth pausing on. Take a high-traffic surface: a login form, a primary call-to-action, the navigation header. Now imagine you write a test that finds the submit button by its role and its accessible name, the way a screen reader finds it. That test fails the instant the accessible name disappears.

Here’s the connection: a primary control losing its accessible name is a real bug, because a screen-reader user can no longer find the submit button. So the accessibility regression and the test failure are the same event, and the test catches the bug for free, as a side effect of how it was written. That’s why the query ladder you’ll learn in a couple of lessons doubles as an accessibility audit. Keep that link in mind; the ladder comes later. On a surface where an accessibility regression is a genuine cost, that’s a trigger.

Now apply the set. Each item below is a component you’d plausibly meet in this codebase. Drop it into the bucket that fits: does it cross a trigger, or is the experienced move to leave it alone?

Each item is a component in a typical 2026 SaaS. Decide whether it crosses one of the four triggers, or whether the experienced move is to skip the test. Drag each item into the bucket it belongs to, then press Check.

Reach for RTL A trigger is met
Don't write the test No trigger — let the seam cover it
A <Card> that renders whatever children you pass it, with no state
The cookie consent banner that gates analytics
A Server Component that lists this org’s invoices
The shared <DateRangePicker> used in Reports, Invoices, and Filters
A multi-step subscribe form that shows a seat-count field only on the Pro plan
A <Section> wrapper that only adds vertical spacing
A button whose only job is to call a Server Action that already has its own seam test
The checkout summary line that shows the total, tax, and trial-end date

When the experienced move is to delete the test

Section titled “When the experienced move is to delete the test”

The triggers are only half the skill. The other half, which carries equal weight because it’s what keeps the suite honest, is knowing when a component test is a liability and the right move is to delete it. An experienced engineer deletes more component tests than they write. Each line below is a deletion you should feel comfortable making, with the reason it’s a deletion.

  • A purely presentational component, such as <Card>, <Section>, or <PageHeader>, with no state and no branching content. There’s nothing to get wrong that a glance wouldn’t catch, so the bug density never justifies the maintenance cost.
  • A Server Component, async or otherwise. This is the wrong surface: the framework owns the rendering, RTL’s async-component support is fragile in 2026, and its bugs are at the seam.
  • Anything already covered end-to-end by a Server Action’s integration test. Mocking the action and asserting it had some effect just re-runs the seam test from a worse vantage point, at higher cost.
  • Anything on the end-to-end money path. If Playwright already walks it in the next chapter, a component test over the same flow is duplicate coverage at higher maintenance.
  • The framework-owned surface, like <Link>, <Image>, and route-segment behavior. You don’t test Next.js. That’s the framework team’s job, and they’re better at it.
  • Library internals, like a rich-text editor’s commands or a virtualization library’s windowing math. The library owns those. If they break, that’s an upstream bug, not yours to assert on.

These share a shape: in every case, someone else’s test or someone else’s code already owns the risk. A component test there isn’t extra safety, it’s a second lock on a door that’s already locked, and you’re the one who has to keep oiling it.

What lives in the component vs. what lives at the seam

Section titled “What lives in the component vs. what lives at the seam”

That last point deserves its own precise picture, because getting it wrong is the most common beginner mistake in this whole area: mocking a Server Action and asserting it wrote a row to the database. To avoid that mistake, you need a crisp line between what a component test can see and what only the seam test can see. The two layers catch genuinely different bugs, and they’re meant to compose.

The tabs below put them side by side. The left tab is what a component test catches that an integration test can’t reach. The right tab is what the integration test catches that a component test can’t reach. Read them as two non-overlapping jobs.

A component test catches these; an integration test never sees them:

  • A render branch that only appears because of client state
  • The keyboard navigation order through a widget
  • Focus returning to the trigger after a modal closes
  • An error message rendered from a useActionState reducer
  • An optimistic update reconciling against the server’s real answer
  • The accessible name on a dynamic button, such as "Delete invoice INV-001", computed from props
All observable from outside the component: they live in the rendered DOM, where only a component test is looking.

Look at the contrast for a second. Everything in the left tab is something a user could observe: it’s on screen, or it’s announced to assistive tech, or it’s the focus moving. Everything in the right tab is invisible in the DOM; you could render the component a thousand times and never see whether a row was written or a tenant filter held. That’s the dividing line, and it’s not a style preference. It’s a hard boundary about what each kind of test can physically reach.

This gives the rule that prevents the beginner mistake: a component test that asserts on a Server Action’s database effect is mocking too deep. It has reached past its own layer and into the integration test’s job, and it does that job worse. Instead, the two tests compose. The component test trusts the action’s contract: “I called it with the right arguments, and I reacted correctly to the result it gave me.” The action test trusts no client: it checks that the row was really written, the tenant was really scoped, and the audit entry really fired. Each owns one side, and neither reaches across.

The five-second gate before you write the test

Section titled “The five-second gate before you write the test”

The triggers and anti-triggers combine into one runnable procedure: a short, ordered checklist you run before your fingers hit the keyboard. The order matters in a specific way: you ask the cheap disqualifiers first. Most of the time one of the first four questions stops you before you ever reach the judgment call, which is exactly why this takes five seconds and not five minutes.

  1. Is this a Server Component? Stop, wrong surface.
  2. Is this already covered by a Server Action integration test? Stop, duplicate.
  3. Is this a one-off, non-shared presentational component? Stop, it isn’t earning its weight.
  4. Is this on the money path Playwright covers? Stop, duplicate at higher cost.
  5. Is this a shared library, a complex state machine, a critical UX path, or an accessibility-sensitive surface? Reach.

Reading the list isn’t the same as running it, and the whole point is that you run it. Walk the gate below for a real candidate. Answer each question and watch where it lands you. Most paths stop you cold long before the last question, and that’s the lesson: ask the cheap disqualifiers first.

Before you write this component test

Notice how rarely you reached the last question. That’s the gate working as intended. And it really is a five-second mental gate, not a process document you fill out: it runs in the time it takes to start typing the test file’s name. Once the order is second nature, you’ll find yourself skipping past the disqualifiers without consciously listing them.

Here’s the honest closer, and it’s permission as much as advice. A small team shipping fast, on a stack where most of the UI is server-rendered and most of the behavior lives at the seam, is correct to write zero RTL tests in year one. That’s not laziness; it’s the right call.

This is the same stance the next chapter takes on end-to-end tests: zero or a handful for the money paths, nothing in the awkward middle. The integration suite catches the seam bugs, and a manual click catches the obvious ones. The trigger language you just learned isn’t a backlog to burn down, it’s what tells the team when to start. The day a <DataTable> gets its thirtieth consumer, or the checkout summary grows its seventh content variant, a trigger fires and you reach. Until then, “we don’t have component tests” is a defensible sentence, not a confession.

The rest of this chapter covers the jsdom setup, the query ladder, and the catalog of components that actually earned their tests. That’s what you reach for when a trigger fires, not an obligation that’s been sitting there unmet. Five well-chosen tests beat fifty box-ticking ones, every time.

To make the box-ticking shape concrete, here’s the exact test the gate is designed to prevent. It’s the kind that pads a coverage number and protects nothing:

coverage theatre
it('renders', () => {
render(<Card />);
});

Look at what it actually asserts: nothing. There’s no expect. The only way this test fails is if rendering <Card /> throws, which makes it a smoke test for a syntax error dressed up as a behavior test. It adds a line to the coverage report and a hundred-odd milliseconds to every watch loop, in exchange for catching a class of bug your type-checker already caught. That’s coverage theatre, and the gate’s entire job is to stop you from writing it.

Run the judgments one more time before you go.

Each claim is about when a component test earns its weight on a 2026 Next.js SaaS. Mark each statement True or False.

For a 2026 Next.js SaaS, component tests are on by default, and you delete the ones that don’t earn their weight.

Inverted. They’re off by default — you add one only when a named trigger fires, not subtract from a default-on suite.

An async Server Component that lists invoices is a good RTL target.

Wrong surface. Server Components aren’t an RTL surface in 2026; their bugs are caught at the seam and, on a money path, end to end.

A <DateRangePicker> shared across Reports, Invoices, and Filters earns a component test.

Shared-library trigger. One bug ships a regression to every consumer, so the test pays for itself across all of them.

A component test that asserts the Server Action wrote a row to the database is correctly scoped.

Mocking too deep. The database effect is the seam test’s job; the component test should only assert it called the action and reacted to the result.

A small SaaS shipping zero component tests in year one is making a defensible call.

Year-one zero is the honest default. The seam suite and manual clicks cover the risk; the triggers tell the team when to start, not that they’re already behind.

The thesis under everything in this chapter, and the through-line of the query ladder you’ll learn next, is one sentence from the Testing Library docs. It’s worth reading before the mechanics arrive. The other two below sharpen this lesson’s two load-bearing claims: write few, well-chosen tests, and keep Server Components off the RTL surface.