Chapter 31Lesson 2

Streaming a page in chunks

How the App Router streams an HTML response in chunks over Suspense boundaries, and the parallel-fetch patterns that keep the first byte fast.

Picture the invoices dashboard from the last lesson. It renders three independent widgets stacked down the page: a ProfileCard showing who’s signed in, an analytics chart summarising the month, and an ActivityFeed of recent events. Each one reads from the database, and the reads are wildly uneven. The profile is a single indexed row, so call it ~10ms. The chart aggregates a few hundred rows, around 300ms. The activity feed joins across three tables and sorts, around 800ms.

When the request comes in, what does the user see, and when?

With the classic server-rendering model, the answer is costly. The server has to finish rendering the whole page before it can send a single byte, and rendering the page means resolving every await on it. So the server waits for the profile, the chart, and the feed, and only once the slowest of those (the ~800ms feed) comes back can it send anything at all. The user stares at a blank tab for 800ms to see a profile card that was ready in 10. The slowest query on the page sets the time-to-first-byte for everything on it.

That coupling is the problem this lesson solves. You already know from the previous lesson where to draw your <Suspense> boundaries: around the smallest piece of UI that loads as one concept. This lesson shows you the machinery that makes those boundaries pay off, which turns boundary placement from a UX decision into a performance decision as well. By the end you’ll be able to read why a page is slow, name the two code shapes that quietly defeat streaming, and confirm from the network panel that your page actually streamed.

What streaming sends down the wire

Here is the whole idea in one sentence: instead of building a complete HTML document and sending it at the end, the server opens the response and writes it in pieces over time, sending the parts that are ready while it’s still working on the rest.

That is streaming. Here is the sequence the server actually runs, in order.

First, it renders everything that is not behind a Suspense boundary, the static shell . That’s the <html> and <body>, the Header, the layout chrome, anything that doesn’t await slow data. This part is fast because nothing holds it up.

Second, it writes the first chunk . That chunk is the shell, plus, in the exact spot where each suspended boundary lives, that boundary’s fallback. So the activity feed isn’t in this chunk; its skeleton is, sitting in the feed’s place. The browser receives this chunk and paints it immediately. The user sees the full page layout, with skeletons standing in for the slow regions, within milliseconds.

Third, as each suspended boundary finishes resolving on the server, the server writes a follow-up chunk for it. That chunk carries two things: the boundary’s now-resolved HTML, and a tiny inline <script> whose only job is to find the fallback’s placeholder in the DOM and swap the real content into its slot. The browser runs the script, the skeleton disappears, and the real widget appears in place. The rest of the page doesn’t flicker; only that one region updates.

Fourth, the connection stays open the entire time. The server holds the response open until every boundary has settled and its chunk has been written, then it closes.

A few things about this sequence are worth stating directly, because they are where intuition tends to go wrong.

There is one response, not many. The browser made a single request for the document and is receiving a single response; it just arrives in installments. This is not the browser polling, not separate fetches per widget, not client-side data loading. It’s one HTTP response written over time.

The fallbacks ship with the shell. The user doesn’t wait to see something: the page looks complete, with loading states, from the first paint. The slow data fills in, and the layout never appears empty.

Boundaries stream in whatever order they finish, not source order. If the chart resolves before the feed, the chart’s chunk goes down first, regardless of which appears higher in your JSX. Each boundary is independent.

And the swap is a DOM patch, not a re-fetch. The follow-up chunk already contains the rendered HTML, and the inline script just moves it into place. The browser never asks the server for anything a second time.

The following trace animates exactly this for the dashboard. Drag the scrubber through the three phases and watch the order of events: the shell and skeletons appear together, then each feed streams into its own slot.

Streaming the invoices dashboard

Watch the stream phase a second time: the two feeds don’t arrive together. Each one’s chunk is written the instant its query comes back, independent of the other. That independence is the entire payoff, and the next section is about not throwing it away.

Before moving on, here is a quick drill to lock in the order.

A request hits the dashboard below. The profile read takes ~10ms and the activity read ~800ms. Drag the events into the order they happen on the wire. Drag the items into the correct order, then press Check.

export default function DashboardPage() {
  return (
    <main>
      <Header />
      <Suspense fallback={<ProfileSkeleton />}>
        <ProfileCard />
      </Suspense>
      <Suspense fallback={<ActivitySkeleton />}>
        <ActivityFeed />
      </Suspense>
    </main>
  );
}

The server flushes the first chunk: the shell plus both skeletons. The browser paints it.

The profile query resolves; its chunk streams in and patches over the profile skeleton.

The activity query resolves; its chunk streams in and patches over the activity skeleton.

Every boundary has settled, so the server closes the response.

What’s actually moving down that connection isn’t only plain HTML. Alongside the markup, the server writes the RSC payload . You met that wire format when you learned the server/client boundary; here it’s the thing being flushed in chunks. You’ll see it in DevTools later in this lesson, so it’s worth being able to name.

Streaming is the App Router default

There is a question lurking under all of this: how do you turn on streaming?

You don’t. There is no export, no config flag, no opt-in.

A page.tsx in the App Router does not return a finished HTML document; it returns a stream. As the renderer walks your tree, every time it hits a suspended boundary it writes the fallback into the output and keeps going rather than stopping to wait. That behaviour is the renderer’s default. The moment you wrap an async child in <Suspense>, that route streams.

A route with zero suspended boundaries still “streams” in the trivial sense: it writes one chunk, because nothing was ever held back, and the response is complete on the first flush. Streaming isn’t a mode you enter; it’s simply what happens when there’s a boundary to flush around.

So the streaming behaviour of a page is decided entirely by where its boundaries are. There’s no separate streaming config to get right, because the boundaries are the config. This is why the previous lesson’s placement decision matters so much: when you chose where to draw a boundary, you were also choosing what streams independently and what waits.

export default function DashboardPage() {
  return (
    <main>
      <Header />
      <Suspense fallback={<ProfileSkeleton />}>
        <ProfileCard />
      </Suspense>
      <Suspense fallback={<ActivitySkeleton />}>
        <ActivityFeed />
      </Suspense>
    </main>
  );
}

No export, no flag: these two boundaries are the entire streaming configuration.

Two independent reads, two boundaries: fetching in parallel

How you arrange your fetches is what decides whether streaming actually buys you anything.

The rule is the one the previous lesson set up, extended one step: each unit of UX owns its own data fetch and its own Suspense boundary. Concretely, you write two boundaries, and inside each one you put an async Server Component that kicks off its own query in its body:

const ProfileCard = async () => {
  const profile = await getProfile();
  return <ProfileWidget profile={profile} />;
};

const ActivityFeed = async () => {
  const activity = await listActivity();
  return <ActivityList items={activity} />;
};

When the server renders the dashboard tree, it reaches ProfileCard, which starts getProfile(), and reaches ActivityFeed, which starts listActivity(). Both queries are now in flight, running concurrently , because nothing made the server wait for the first before it got to the second. Each boundary streams the instant its own query resolves. The profile shows up at ~10ms, the feed at ~800ms, and the user never waited on the feed to see the profile.

Here is the trap, and it is one of the most common latency bugs you will ship. It looks like a perfectly reasonable Server Component, but it is wrong:

const Dashboard = async () => {
  const chart = await getChartTotals();
  const activity = await listActivity();
  return (
    <main>
      <ChartWidget chart={chart} />
      <ActivityList items={activity} />
    </main>
  );
};

Two serial awaits in one component: the second never starts until the first returns.

Read the two awaits carefully. The second one, await listActivity(), does not even begin until the first has resolved, because await pauses the function until its promise settles. So the chart read finishes at 300ms, and only then does the activity read start, finishing at 300 + 800 = 1100ms. And because there’s only one component and one render, nothing streams: the whole thing waits for both reads and ships at the end.

Neither read depends on the other. The activity feed doesn’t need the chart. You paid 1100ms for work that, run together, takes 800. So the habit worth building is this: before you write a second await, ask whether it needs the result of the first. If it doesn’t, the two reads must not run serially.

The cost is clearer in a picture. The following timeline puts both shapes on the same clock.

Sequential awaits (one component)

latency = sum

chart read 300ms

activity read 800ms

first byte — 1100ms

03008001100

The second await can’t start until the first returns, so the reads stack — and the single render ships only at the very end.

Parallel boundaries (two components)

latency = max

shell + skeletons flush — ~0ms

chart read 300ms

chart streams in

activity read 800ms

activity streams in

03008001100

Both reads start together, so they overlap; first paint moves to the shell flush near zero, and each widget appears the moment its own bar ends.

Same data, same queries; the difference is the shape of the code. Sequential awaits sum to 1100ms and nothing reaches the user until the last one returns; parallel boundaries overlap to 800ms, the shell flushes near zero, and each widget streams in when its own read ends.

The top section is a sum: 300 + 800, and the first byte doesn’t land until the very end. The bottom section is an overlap: the bars start together, the shell flushes at near-zero, and each widget appears when its own bar ends. The reads and the data are identical in both; only the shape of the code changed.

Splitting into two components isn’t the only way to run reads in parallel. If the data is consumed together, Promise.all runs them in parallel inside one component, which is the next section. But splitting into separate boundaries is the right move when each piece can render on its own.

Sequential await (serialized)
Two async children, two boundaries (parallel)

const Dashboard = async () => {
  const chart = await getChartTotals();
  const activity = await listActivity();
  return (
    <main>
      <ChartWidget chart={chart} />
      <ActivityList items={activity} />
    </main>
  );
};

One component, two serial awaits: total latency is the sum, and nothing streams. The second read can’t start until the first settles, so 300ms + 800ms = 1100ms, and the single render ships only once both are done.

const Dashboard = () => {
  return (
    <main>
      <Suspense fallback={<ChartSkeleton />}>
        <ChartCard />
      </Suspense>
      <Suspense fallback={<ActivitySkeleton />}>
        <ActivityFeed />
      </Suspense>
    </main>
  );
};

Each child starts its own query and owns its own boundary: the reads run concurrently, the shell ships now, and each widget streams when it’s ready. ChartCard and ActivityFeed are async children that each await their own read, exactly as the trace played out.

When the data is consumed together: `Promise.all`

Splitting into separate boundaries is right when each piece renders on its own. But sometimes a single piece of UI needs all of several reads before it can render anything meaningful.

Which one to reach for is decided by how the data is consumed, not by habit.

Consumed together. Picture a summary card that aggregates across the profile, the chart, and the feed: a header that prints “12 invoices, $48k, last activity 3 minutes ago.” This card can’t render half of itself; it needs every read before it shows anything. Here, splitting into three boundaries would be pointless, because there’s only one thing to reveal. So you fetch all three reads inside one component, with one boundary, using Promise.all:

const DashboardSummary = async () => {
  const [profile, chart, activity] = await Promise.all([
    getProfile(),
    getChartTotals(),
    listActivity(),
  ]);

  return <SummaryHeader profile={profile} chart={chart} activity={activity} />;
};

Concurrent reads, one fallback: correct when the summary can’t render until all three are in.

Promise.all starts all three reads at once and waits for the group, so the reads still run concurrently and you get the max(...) cost, not the sum. The difference from the previous section isn’t parallelism, since both shapes are parallel. The difference is the reveal granularity: Promise.all gives you one fallback covering one combined unit, because there’s one thing to reveal.

Consumed adjacently. Picture the three dashboard widgets sitting side by side, each rendering independently. Here you want three boundaries, one per widget, so each reveals on its own schedule and the fast profile doesn’t wait on the slow feed.

The decision comes down to one question: do these reads feed one rendered thing, or several? One thing means Promise.all plus one boundary. Several things means several boundaries. Both shapes run the reads in parallel; what changes between them is how many fallbacks the user sees and when each region fills in. That follows the UX, exactly as the unit-of-UX rule from the previous lesson predicts.

The decision walker below drills this choice. Work through it by asking how the data is consumed, not how slow it is: the question is whether the UI needs all of these reads before it can render anything.

One boundary or several?

What blocks the first byte

There’s a subtler consequence of how the shell is assembled, and it’s where a lot of real-world TTFB is lost.

Everything above every Suspense boundary runs to completion before any chunk flushes. The server cannot send the shell until it has finished rendering the shell, and the shell is, by definition, everything that isn’t behind a boundary. So a slow await in the root layout, or in the page body before the first boundary, isn’t streamed. It’s pure blank-screen time, paid up front, on every request that renders that layout.

Concretely, a 50ms global query in the root layout adds 50ms to TTFB for every single page under it. The same 50ms query inside a Suspense-wrapped widget is invisible to TTFB: it streams in after the shell, while the user is already looking at the page.

This is why a layout is a risky place to put data: its cost is paid by every child route, all the time. So the habit is to keep above-boundary work cheap and fast, such as the auth check and the layout chrome, and push every slow read below a boundary where it can stream.

export default async function DashboardLayout({
  children,
}: {
  children: ReactNode;
}) {
  const user = await requireUser();
  const stats = await getDashboardStats();

  return (
    <Shell user={user} stats={stats}>
      {children}
    </Shell>
  );
}

requireUser() gates the whole subtree and must run before the shell, so keep it cheap. The dashboard stats belong below a boundary, not here: getDashboardStats() is a slow read that blocks the first byte for every route under this layout.

Look back at the trace from the start of the lesson: the server-render phase is the “before first byte” window. Anything awaiting in that phase that isn’t behind a boundary stalls the shell flush. That’s exactly why the trace refuses to render an awaiting node outside a <Suspense> and instead shows a “needs <Suspense>” error. In the App Router, a slow await above every boundary always costs you blank screen.

Confirming it streamed: reading the network panel

Your render order tells you what you intended to stream, not what actually went out on the wire. To know for sure, read the response in the browser rather than inferring it from the code.

Open DevTools, go to the Network panel, and select the document request, the one for the page URL itself. Watch the response arrive over time. The initial chunk carries the shell markup and the fallback HTML; subsequent chunks carry each boundary’s resolved HTML and its swap script. The response timing shows the connection held open across the boundary resolutions rather than completing in one shot.

The concrete tells to look for:

The document response shows a growing, streamed body rather than a single atomic payload that lands all at once.
The “waiting” versus “content download” split on the document request is unusually long on the download side, because the download genuinely spans the time the boundaries take to settle, not just the network transfer.

The simplest test is one question: did the first chunk arrive while later chunks were still in flight? If the whole body lands at once at the very end, streaming didn’t happen, and the rest of this section is about the most common reason why, even when your code is perfect.

What’s actually traveling here is ordinary HTTP: chunked transfer encoding on HTTP/1.1, or the equivalent on HTTP/2. There is no special infrastructure to set up, and Vercel and a plain Node server both stream out of the box.

But here is the production gotcha that costs people an afternoon: something in the network path can buffer the whole response before forwarding it, and that silently collapses streaming back into one-shot delivery. Your code is correct. The boundaries are right. The user still waits for everything, because a piece of infrastructure between your server and the browser held the chunks until the response was complete and then sent them as one.

Two culprits worth naming:

A reverse proxy or CDN tier that buffers by default, such as Nginx, Traefik, or an application load balancer. The fix is proxy-side (for example, Nginx’s proxy_buffering off, or the X-Accel-Buffering: no response header that tells it not to buffer this response).
Response compression that buffers content before it flushes. A compression layer that waits to see more of the body before it emits anything defeats the chunk-by-chunk write.

You aren’t going to configure Nginx in this lesson, since deployment and self-hosting config is out of scope, so the goal is just to know where to look. When someone says “streaming works locally but not in production,” this is the first thing to check, and the diagnostic is the same sniff test: if the Network panel shows one large response landing at the end instead of incremental chunks, something in the path is buffering.

Streaming is HTTP, not a socket

There’s one conflation to clear up before it takes root, because it’s a natural one.

Streaming the RSC payload is HTTP response streaming: one response, written in chunks. It is not Server-Sent Events, and it is not WebSockets. It flows server→browser exactly once, for one render, and the connection closes when the page is done. It is not a live channel, so it cannot push you an update after the page has settled.

So if your product needs real-time push, such as notifications appearing, a chat message arriving, or presence dots going green, page streaming is the wrong tool. The SaaS answer for live push is a dedicated channel: a hosted service like Pusher or Ably, or your own Server-Sent Events route handler. That’s a different concern entirely and out of scope here. The point to leave with is that page streaming and real-time push are different tools.

Diagnose what defeats streaming

The durable skill from this lesson isn’t writing a <Suspense>, which you can do already. It’s spotting the shapes that quietly defeat streaming in code that looks completely fine on the page. Let’s do that on a real dashboard.

The file below renders a dashboard with a summary header and an activity feed. It compiles, it runs, and it’s slower than it should be. Review it like a teammate’s PR: click the lines where streaming is being defeated and leave a comment explaining the problem and the fix. Two defects are hiding here, and they don’t have the same fix.

This dashboard works but wastes time. Click the lines where streaming is defeated and say why — and how you'd fix it. Two defects are hiding here, and they don't have the same fix. Click any line to leave a review comment, then press Submit review.

app/dashboard/page.tsx

const Summary = async () => {
  const profile = await getProfile();
  const totals = await getChartTotals();
  return <SummaryHeader profile={profile} totals={totals} />;
};

export default function DashboardPage() {
  return (
    <main>
      <Header />
      <Suspense fallback={<DashboardSkeleton />}>
        <Summary />
        <ActivityFeed />
      </Suspense>
    </main>
  );
}

getChartTotals() doesn’t even start until getProfile() has resolved, because await pauses the function until its promise settles. Neither read depends on the other, so you’re paying profile + totals for work that, run together, costs max(...).

Summary genuinely aggregates both reads into one view, so the fix is not two boundaries — it’s one concurrent fetch:

const [profile, totals] = await Promise.all([getProfile(), getChartTotals()]);

Both reads start at once, the component awaits them as a group, and the single summary still reveals as one unit.

One boundary means one fallback for the whole region, so the fast Summary is held hostage by the slow ActivityFeed — nothing reveals until the slowest child inside the boundary resolves. That’s the opposite of progressive reveal.

Give each widget its own boundary so each streams in on its own schedule:

<Suspense fallback={<SummarySkeleton />}>
  <Summary />
</Suspense>
<Suspense fallback={<ActivitySkeleton />}>
  <ActivityFeed />
</Suspense>

Now the summary appears the moment its reads finish, without waiting on the feed.

Recall check

Here are three quick checks on the points that are easiest to get wrong.

A streamed dashboard page has a fast ProfileCard and a slow ActivityFeed, each wrapped in its own <Suspense>. What lands in the browser in the very first chunk of the response?

The page shell, with a skeleton sitting in each widget’s slot — both real widgets arrive in later chunks.

An empty response: the browser sees nothing until the slow ActivityFeed query has resolved.

The finished page, profile and activity feed already rendered with their data.

Just the ProfileCard, since it’s fastest — the shell and the feed both stream in afterwards.

The first chunk is everything not behind a boundary — the shell — plus each boundary’s fallback in the place its widget will go. So both skeletons ship immediately alongside the layout, and ProfileCard and ActivityFeed each patch into their slot in a later chunk when their own query resolves. Waiting for all the data before sending anything is the classic-SSR model streaming replaces, and the shell never streams after its boundaries — it’s always first.

A Server Component reads three independent things as three back-to-back awaits, and the page is slow. Which change actually fixes the latency?

Get the three reads in flight at once — give each its own boundary, or group them in a single Promise.all if they render as one unit.

Add an export to the page that switches streaming on.

Wrap the entire page in one outer <Suspense> boundary.

Hoist the three awaits into the root layout so they kick off sooner.

The reads are serial because each await pauses the function until its promise settles, so the total is their sum. The cure is concurrency: separate boundaries when the pieces reveal independently, or Promise.all when they feed one view — either way you pay max(...). There is no streaming flag to flip; one big boundary just trades a sum of reads for a single all-or-nothing screen; and moving the reads into the layout puts them above every boundary, where they block the first byte for every route — strictly worse.

Which of these is what RSC page streaming actually is?

One HTTP response that the server writes in chunks for a single render, then closes once every boundary has settled.

A WebSocket the server keeps open so it can push fresh UI down to the page whenever the underlying data changes.

A Server-Sent Events feed that goes on delivering new content to the page long after it has finished loading.

The browser re-requesting the server on a timer, once per Suspense boundary, until each one comes back resolved.

Here is one more round, on three points that are easy to half-remember. Mark each true or false; the review at the end explains every one.

Each claim is about how streaming is configured and where it can quietly fail. Mark each statement True or False.

You have to add an export to a page.tsx to turn streaming on.

There is no opt-in export and no config flag. The renderer streams by default — a route streams the moment an async child sits inside a <Suspense>, and a route with no boundaries trivially “streams” one chunk. Where you draw the boundaries is the entire streaming configuration.

A slow await in the root layout is effectively free, since it streams in along with everything else.

The opposite. Everything above every boundary must finish rendering before the first chunk can flush, so an await in the layout is pure blank-screen time — paid up front, on every route under that layout. Only reads below a boundary stream. Keep above-boundary work cheap (the auth check, the chrome) and push slow reads under a boundary.

A reverse proxy or CDN that buffers the whole response can silently break streaming in production even when your code is correct.

Streaming is ordinary chunked HTTP, so anything in the path that holds the body until it is complete — a proxy buffering by default, or a compression layer that waits for more output — collapses it back into one-shot delivery. The boundaries are right and the user still waits for everything. The sniff test is the same: if the Network panel shows one large response landing at the end instead of incremental chunks, something is buffering.

External resources

Loading UI and Streaming

nextjs.org

The official App Router reference for how the shell flushes and Suspense boundaries stream — including TTFB, the network panel, and proxy buffering.

<Suspense> reference

react.dev

React's own account of how a Suspense boundary reveals its content during streamed server rendering.

Next.js Learn — Streaming