Skip to content
Chapter 10Lesson 2

First byte to pixels

How the browser turns HTML bytes into an interactive page through six rendering stages, and where server rendering and React hydration fit on top.

The first byte has arrived, but the user still sees nothing on screen and has nothing to click. Between the bytes on the wire and a page the user can actually use, the browser runs a second pipeline of six stages. Each stage has its own job, its own cost, and its own ways to go wrong. The last lesson named the four network stages from URL commit to first byte; this one names the six browser stages from first byte to interactive page. The framing carries forward: a slow page load is always slow at one specific stage, and the engineer who can point to that stage is the one who leads the debugging conversation.

This lesson gives you the map. When the React chapter talks about renders that trigger layout, when the App Router streams HTML inside a <Suspense> boundary, when Core Web Vitals come up later in the course, every one of those conversations lands on a stage of the pipeline you’re about to learn. This time you won’t open DevTools mid-lesson. The Performance panel that visualises these stages comes later in the course; here the pipeline itself is the focus.

There are six stages: two input lanes that merge in the middle, then a single chain that ends in pixels on screen. Look at the diagram once before reading the prose, because every section below comes back to it.

HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Inputs — built in parallel
Pipeline — sequential after the merge
Six stages from bytes to pixels. The DOM and CSSOM are built in parallel; they merge at the render tree, and the rest of the pipeline is sequential.

Read the diagram left to right. HTML and CSS bytes arrive in parallel, since the browser doesn’t wait for one to finish before starting on the other, and each gets parsed into its own in-memory tree. The two trees merge into a single render tree, and from there the pipeline is sequential: layout figures out where each element sits, paint fills pixels into layers, and composite combines those layers into the final frame the GPU pushes to the display.

The next sections walk those six stages one at a time. The diagram below is the same pipeline as above, but only one stage lights up per step. Scrub through it as a reference while you read.

HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Parse: the HTML parser streams tokens into a tree of typed DOM nodes as bytes arrive.
HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Style: stylesheets parse in parallel with HTML into the CSSOM, a tree of computed style rules.
HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed

Render tree: DOM and CSSOM merge. Nodes with display: none drop out; pseudo-elements get added in.

HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Layout: the browser computes geometry (position, width, height, line breaks) for every render-tree node.
HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Paint: pixels get filled into bitmap layers for text, backgrounds, borders, shadows, and images.
HTML bytes
from network
Parse
tokenize tags
DOM tree
element nodes
CSS bytes
from network
Parse
match rules
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
frame pushed
Composite: layers are combined on the GPU into the final frame and pushed to the screen.

The HTML parser is a streaming state machine. It doesn’t wait for the body to finish downloading; as bytes arrive on the wire, it tokenises tags and emits typed element nodes. Every <div> becomes an HTMLDivElement, every <a> an HTMLAnchorElement, every <input> an HTMLInputElement. The result is the DOM : an in-memory tree of objects with properties, methods, and event listeners that JavaScript can read and mutate.

The DOM is the input for the rest of the pipeline. Layout reads it, paint reads it, and React mutates it once it shows up later. Everything downstream depends on the DOM being built.

One rule about the parser catches every web developer at some point: a <script> without async or defer blocks the parser the moment it’s encountered. The browser must download the script, execute it, and then keep parsing. Three behaviours sit side by side on the same element, and the difference between them is what makes a page fast or slow:

index.html
<script src="/blocking.js"></script> <!-- blocks the parser here -->
<script defer src="/deferred.js"></script> <!-- queued for after parsing, in order -->
<script async src="/async.js"></script> <!-- runs whenever it lands -->

defer queues the script for execution after the parser has finished, with multiple deferred scripts running in source order. async lets the script execute the moment its download finishes, with no ordering guarantees relative to other scripts. The experienced habit is to put scripts in <head> with defer (or at the end of <body> without it) so the parser is never blocked.

The React chapters later in the course assume the parser has done its job before any of your component code runs. On a Next.js page that’s almost always true: the document HTML is delivered fully formed, and the framework handles script attributes so the parser stays unblocked. The rule is still worth knowing, because it’s what your framework is protecting you from.

Every <link rel="stylesheet"> and every inline <style> block is parsed in parallel with HTML. The result is the CSSOM : a parallel tree of rules and computed properties, ready to be matched against DOM nodes downstream.

CSS is render-blocking by default, and for good reason. The browser cannot construct the render tree without the CSSOM, because it would have no idea what each node should look like, what’s hidden, or what’s positioned where. So the first paint waits for the stylesheet. This is why <link rel="stylesheet"> lives in <head>: putting it at the top of the document announces the style dependency as early as possible, so the browser can fetch and parse it while the HTML body is still streaming in.

DOM and CSSOM merge into the render tree : every node that will visually appear, each with its computed styles attached.

Two membership rules are worth remembering. Nodes with display: none are excluded from the render tree: they exist in the DOM and JavaScript can find them, but the rest of the pipeline skips them. Pseudo-elements like ::before and ::after are included in the render tree even though they don’t exist in the DOM. The render tree isn’t the DOM; it’s the set of things that will paint.

That difference is the reason display: none is cheaper than visibility: hidden. display: none removes the node from the render tree entirely, so layout, paint, and composite all skip it. visibility: hidden keeps the node in the render tree, so the browser still computes its geometry and reserves its space, and only the paint step makes it invisible. The two declarations look similar but carry very different costs.

The browser walks the render tree and computes geometry for every node: where each box sits, how wide it is, how tall, where the lines break inside paragraphs, and how the inline content flows around floats and images. The output is a geometry tree of positions and sizes, not pixels yet. This stage is sometimes called layout (or reflow in older docs and DevTools labels, which is the same thing).

Layout is the expensive stage. Its cost grows with the number of nodes in the tree and with the depth of dependencies, since a change to a parent’s width can cascade through every descendant. Anything that affects geometry invalidates layout: inserting or removing a node, changing a width or height, resizing the viewport, or even reading a layout property like offsetHeight while the browser has pending style or DOM changes queued.

That last one has a name: layout thrashing . You read offsetHeight, write a width, read another layout property, then write another width, and each read forces the browser to flush the queued writes synchronously to give you an accurate answer. In a tight loop, that cost multiplies fast. The React render model in a later unit covers why batching reads and writes matters, and the performance unit covers how to spot the pattern in DevTools. For now it’s worth knowing the name so you recognise it when you see it.

Once the geometry is settled, the browser fills pixels into layers: text glyphs, background colors, borders, shadows, images, and gradients. Paint runs per layer, and the browser tries to repaint as little as possible, so when only one region of the page changes, only the layers covering that region get repainted. The output is a set of bitmap textures, ready to be combined.

Some elements get promoted to their own layer because they’re going to be manipulated independently: position: fixed elements, anything with will-change set, certain transforms, and elements with opacity below 1 that have descendants. Layer promotion is what makes the next stage’s optimizations possible. The full details belong to the performance unit; for now the name is enough.

Stage 6: Composite, and the properties that hit 60fps

Section titled “Stage 6: Composite, and the properties that hit 60fps”

The compositor takes the painted layers and combines them into the final frame, which the GPU pushes to the screen. The compositor runs on its own thread, separate from the main thread that handled parsing, layout, and paint. That separation matters: as long as the work stays on the compositor, the screen can keep showing new frames at 60fps, roughly one every 16.7 milliseconds, even when the main thread is busy with something else.

Two CSS properties can be applied by the compositor directly to a layer without re-running layout or paint: transform (translate, scale, rotate, skew) and opacity. These are the compositor-only properties . Animating transform: translateX(...) shifts a layer’s position on the GPU and produces 60fps with no main-thread work. Animating left: ... to do the same visual thing invalidates layout, which forces the main thread to recompute geometry every frame. Whenever the main thread misses the 16.7ms budget for even one frame, the animation visibly stutters.

You don’t have to take this on faith. The widget below runs two animations side by side at the same duration. One uses transform, the other uses left. Toggle the main-thread load on and watch only one of them stutter.

Duration 800ms
Main-thread load
transform: translateX compositor
left layout every frame
Duration 800ms
Main thread idle
Two animations, same duration. Toggle the main-thread load on, and only the layout-driven animation stutters; the compositor keeps painting at 60fps regardless.

If the compositor box stayed smooth while the left box stuttered, you’ve now seen the difference firsthand. Choose transform and opacity for animations that need to feel smooth: slide-ins, fades, hover scales, and drag handles. Reach for width, top, margin, or height only when you actually need the layout side effect, such as collapsing a panel that pushes its siblings down. Those properties can’t share the compositor’s smoothness, so use them only where you have to.

First Contentful Paint, and the Critical Rendering Path

Section titled “First Contentful Paint, and the Critical Rendering Path”

The chain you just walked has a name. HTML parsed into the DOM, CSS parsed into the CSSOM in parallel, the two merged into the render tree, layout computed, and the first paint fired: that dependency chain is the Critical Rendering Path. Every browser-rendered page runs through it on the way to showing anything on screen, and the moment any non-whitespace content first appears is FCP : First Contentful Paint.

A related metric you’ll see in the performance unit is LCP , Largest Contentful Paint. LCP fires when the largest above-the-fold element (typically a hero image or a big headline) finishes painting, which is the moment the page is meaningfully visible rather than just visually started. FCP and LCP are both Core Web Vitals, and they’re labels for moments on the pipeline you just learned. The targets, the thresholds, and how to move them are the performance unit’s territory.

The pipeline above is the same on every page. What the 2026 default stack, Next.js 16 with React 19, adds is what the bytes contain when they arrive and what runs after first paint.

The HTML the browser receives is already populated. The server ran the Server Component tree, serialised the output to HTML, and streamed it as the response body. The pipeline above runs unchanged; the difference is that the bytes coming down the wire already represent the rendered page, so first paint happens before any JavaScript has run. That’s the headline win of server rendering: a user (and a search-engine crawler) sees content the moment the first paint fires, not after a JavaScript bundle has downloaded and executed.

After first paint, the browser starts downloading the JavaScript bundle for any interactive components. When it lands and runs, React performs hydration : it walks the DOM the server already produced and attaches event listeners and component state to the existing nodes. The page is visible during this entire process, so the user can scroll, read, and look around. But they can’t click anything that requires React state until hydration finishes for that component.

The page is visible before it’s interactive. That one fact captures the whole SSR-plus-hydration story, and keeping it in mind prevents a common class of bug reports. When a user says the button doesn’t work for the first second after the page loads, the click handler usually isn’t broken: the component simply hasn’t hydrated yet.

React 19 makes the gap less painful in two ways. Selective hydration lets the runtime hydrate interactive components first and defer the rest based on what the user touches. Suspense streaming in Next.js 16 lets the server send HTML for components inside a <Suspense> boundary as their data resolves, so the user sees skeleton placeholders fill in instead of waiting for the whole page. Both are layered on top of the base SSR-plus-hydration model, and the App Router unit later in the course teaches the mechanics.

The two views of the pipeline, base and with the SSR overlay, sit side by side below. Switch tabs to see how the same six stages stay in place, just with new inputs and a new downstream branch.

HTML bytes
from network
DOM tree
element nodes
CSS bytes
from network
CSSOM
style tree
Render tree
DOM ∩ style
Layout
geometry
Paint
into layers
Composite
to GPU frame
Pixels on screen
first paint
Inputs (parallel)
Browser pipeline
The base browser pipeline. Bytes arrive from the network; the six stages run; pixels land on screen.

One more thing about hydration before moving on. If the HTML the server rendered doesn’t match what React would render on the client, for instance a component that calls Date.now() in its render or branches on typeof window, React logs a hydration mismatch warning and falls back to client-rendering the affected subtree. The App Router unit owns the full debugging surface; for now, knowing the failure mode exists is enough.

The DevTools Performance panel is where every stage of this pipeline becomes visible on a frame timeline: parse events, layout events, paint events, and composite events, all rendered as labelled bars. The performance unit later in the course gives the full tour. The map you just learned is the vocabulary you’ll bring to it.

Here’s a short drill to lock the ordering in. The pipeline is only useful if the steps stick in the right order, so put these six events on a timeline.

Order the six events of a Next.js 16 page load from earliest to latest. Drag the items into the correct order, then press Check.

HTML bytes arrive at the browser.
The DOM and CSSOM are both built.
First Contentful Paint — the user sees the page.
The hydration JavaScript bundle finishes downloading.
React hydration runs — event listeners attach.
A click handler fires for the first time.

Once that ordering is solid, the rest of the course’s React and Next.js conversations land on the same map. A render that mutates state happens between hydration and the click handler firing. A <Suspense> boundary streams HTML during the paint stage. A slow LCP usually points at a stage before paint, while a slow Interaction to Next Paint usually points at hydration or main-thread work after it.

This lesson is structural: a map for the rest of the course. The terrain itself has owners:

  • React’s render phases, hooks, reconciliation, and JSX → the React unit later in the course.
  • Server vs. Client Component boundary mechanics, <Suspense>, streaming, and loading.tsx → the App Router unit later in the course.
  • Tailwind, the CSS cascade, and OKLCH design tokens → the styling unit later in the course.
  • Core Web Vitals (LCP, FCP, INP, CLS) thresholds and tuning, the Performance panel tour, and the bundle analyzer → the performance unit later in the course.
  • HTML element semantics (<header>, <main>, heading hierarchy, the accessibility tree) → the HTML semantics chapter later in the course.

The next lesson stays in DevTools, with a tour of the four panels that earn their keep in SaaS work. Together, the four-stage network map from lesson 1 of this chapter and the six-stage browser map from this lesson are the two halves of the pipeline you’ll keep coming back to.