Chapter 12Lesson 1

Parse, don't concatenate

The browser platform's built-in URL and URLSearchParams APIs for parsing and assembling web addresses without escaping bugs.

You learned backticks as the clean way to interpolate a string, and they are, with one exception. The moment that string is a URL, `${base}/users?id=${id}` stops being clean and becomes a hidden bug. It looks fine in the editor and it works on the happy path, which is exactly why it survives review and then breaks in production. The problem is the premise that URLs look like strings. They aren’t. A URL is a structured value with its own grammar and escaping rules, and the browser already ships a parser that knows all of them, the one the WHATWG URL standard defines. So instead of treating the URL as a fill-in-the-blank template, hand the string to that parser. You’ll assemble an API URL this way in every unit from 3 through 7, so it’s worth learning properly once.

By the end of this lesson you’ll parse any URL into its fields, build one with new URL() and URLSearchParams without escaping bugs, and know which of three encoding tools belongs in which position. The origin field you’ll meet in a moment is the same one that becomes the unit of browser trust in the next lesson, so the vocabulary here carries the rest of the chapter.

A URL is six fields, not one string

Before you can hand a URL to the parser, you need the parser’s vocabulary. Every URL breaks down into a fixed set of named fields, and once you can see them, the URL stops being one opaque string. Here is the full anatomy on a single example.

The WHATWG URL model: one string, six named fields, with origin as the read-only shorthand spanning protocol and host.

Each label in that diagram is a property you can read off a URL instance: url.protocol, url.hostname, url.pathname, and so on. Two of them carry a subtlety the diagram can’t show. The first is that url.searchParams isn’t a plain string like url.search. It’s a live URLSearchParams view of the query, so mutating it rewrites url.search (and therefore url.href) for you. That live link is what the next section builds on. The second is that a URL technically has username and password fields too, and this course ignores them on purpose. They’re deprecated for navigation, and shipping credentials inside a URL is a mistake, so treat the six fields above as the whole surface.

Parsing takes one constructor call. Hand new URL() a string and read the fields straight off the instance.

const url = new URL('https://api.acme.com:8443/v1/invoices?status=paid#row-7');
url.hostname; // 'api.acme.com'
url.pathname; // '/v1/invoices'
url.search; // '?status=paid'
url.origin; // 'https://api.acme.com:8443'

No import made that work. URL is a global in every runtime this course ships on, which we’ll come back to at the end. For now, just notice that you reached for a structured value rather than a regex.

Building URLs with `new URL(input, base)`

Reading a URL is half the job. The other half is building one, and that’s where concatenation does the most damage. The constructor takes a second argument for exactly this: new URL(input, base) resolves a relative path against a base URL. That’s the standard shape for assembling an API URL.

const base = 'https://api.acme.com'; // from a validated env var
const url = new URL('/v1/invoices', base);
url.href; // 'https://api.acme.com/v1/invoices'

In real code that base comes from a validated environment variable rather than a literal (you’ll see that env story in a later unit), but the shape is the same: a relative path resolved against a base. Notice what you didn’t do. You didn’t worry about whether base ended in a slash, or whether the path started with one. The parser resolves the path against the base the way a browser resolves a link on a page, and it does so the same way every time.

That consistency is the first concrete win, so it’s worth seeing the failure it replaces. The classic concatenation bug is trailing-slash drift, where the same logical URL, assembled from two slightly different inputs, produces two different strings. Your cache, your logs, and your access checks then see two URLs where you meant one.

String concat
new URL

const a = 'https://api.acme.com' + '/v1'; // 'https://api.acme.com/v1'
const b = 'https://api.acme.com/' + '/v1'; // 'https://api.acme.com//v1'

A stray slash on either side gives you a double slash. The two strings look equal at a glance, but they compare unequal everywhere it counts.

const a = new URL('/v1', 'https://api.acme.com').href; // 'https://api.acme.com/v1'
const b = new URL('/v1', 'https://api.acme.com/').href; // 'https://api.acme.com/v1'

Both inputs land on the identical href. The parser normalizes the slashes for you, so you stop thinking about them.

While the parser is normalizing slashes, it normalizes two more things you’ll rely on in the next lesson. It lowercases the hostname, and it drops the default port, so :80 for http and :443 for https collapse to no port at all. That is why origin comparisons in the next lesson are reliable: two URLs for the same server normalize to the same origin string instead of differing on capitalization or a redundant port.

The constructor makes one decision worth understanding. new URL() throws when the input can’t be parsed. That’s helpful rather than dangerous, once you know where each kind of input comes from. A base URL almost always comes from configuration, an env var or a constant, so a malformed one is a deploy bug, not a user error. You want it to fail fast and loud at boot, rather than wrapping every construction in a try/catch that swallows a misconfiguration into a vague 500 later. Let it throw.

The exception is untrusted input. When the string you’re parsing came from a user, such as a query param, a pasted link, or a redirect target, a malformed value is data you fully expected to receive, so throwing is the wrong response to it. For that case, reach for URL.canParse(input, base), a static method that returns a boolean instead of throwing.

const next = searchParams.get('next') ?? '';
if (URL.canParse(next)) {
  const target = new URL(next);
  // safe to use target
}

So the rule splits cleanly by trust. Use new URL() for config you control, and let it throw. Use URL.canParse() to guard before parsing anything a user could send. There’s also URL.parse(), which works the same way except that it returns null instead of a boolean. Reach for canParse when you only need a yes-or-no check, and parse when you want the parsed result in the same step. They were added together, and either is fine.

`URLSearchParams`: the end of escaping query strings

The payoff comes first, before the mechanics. You write params.set('q', userInput) and you never think about escaping a query value again. The whole point of URLSearchParams is that it owns the encoding so you don’t, which spares you the afternoons you’d otherwise spend debugging a & that ate the rest of a query string.

It builds from four kinds of input, and knowing which is which saves you from converting between them by hand. Here are the same params built four ways.

const fromUrl = url.searchParams;
const fromRecord = new URLSearchParams({ status: 'paid', limit: '20' });
const fromString = new URLSearchParams('status=paid&limit=20');
const fromEntries = new URLSearchParams([['tag', 'a'], ['tag', 'b']]);

The live view off a URL instance. This is the one you reach for most in app code: grab it, mutate it, and url.href updates with it. There’s no separate object to keep in sync.

const fromUrl = url.searchParams;
const fromRecord = new URLSearchParams({ status: 'paid', limit: '20' });
const fromString = new URLSearchParams('status=paid&limit=20');
const fromEntries = new URLSearchParams([['tag', 'a'], ['tag', 'b']]);

From a plain record. This is the quickest shape when every key has a single value. Note that the values are strings, since a query string has no notion of numbers.

const fromUrl = url.searchParams;
const fromRecord = new URLSearchParams({ status: 'paid', limit: '20' });
const fromString = new URLSearchParams('status=paid&limit=20');
const fromEntries = new URLSearchParams([['tag', 'a'], ['tag', 'b']]);

From a raw query string, with or without the leading ?. This is how you parse an incoming query: take the search off a URL someone handed you and read it as structured fields.

const fromUrl = url.searchParams;
const fromRecord = new URLSearchParams({ status: 'paid', limit: '20' });
const fromString = new URLSearchParams('status=paid&limit=20');
const fromEntries = new URLSearchParams([['tag', 'a'], ['tag', 'b']]);

From an array of entries. This is the only shape that can express a repeated key, like tag=a&tag=b. A record can’t, because an object can’t hold two tag keys.

1 / 1

That last shape hints at something query strings do that records can’t: a key can repeat. ?tag=a&tag=b is a perfectly valid filter for “tagged a or b”, a list rather than a single value. The API reflects that with two methods per operation. append adds a value and set replaces every value for a key, while get returns the first value and getAll returns all of them. Insertion order is preserved when you iterate, so the string comes back out in the order you built it.

const params = new URLSearchParams();
params.append('tag', 'a');
params.append('tag', 'b');
params.getAll('tag'); // ['a', 'b']
params.get('tag'); // 'a'  (just the first)

When you’re done building, params.toString() hands you the finished query string, correctly percent-encoded, with every special character escaped to the right form. That sounds like the end of the story, but the way it encodes is one detail you do have to understand, because it isn’t the same encoding you’d reach for elsewhere in a URL. The next section works through it.

The `%20`-vs-`+` split

Here’s the part that catches experienced developers too: the WHATWG URL standard uses different percent-encode sets for different positions in a URL. Intuition says encoding is encoding, and a space is a space. It isn’t, and the reason is that the same character gets a different escape depending on where in the URL it sits.

It comes down to two cases that percent-encoding handles differently:

Path segments and the fragment encode a space as %20. encodeURIComponent matches this set.
application/x-www-form-urlencoded query strings, which is what URLSearchParams produces and what an HTML form submission sends, encode a space as +. And because + means space on the way back, a literal plus sign has to be sent as %2B.

That second rule is where most of the trouble starts. If you encode with one model and decode with the other, the round trip silently corrupts your data. The clearest way to see it is to push a value that genuinely contains a plus, a search for "a+b", through both paths.

Same model (URLSearchParams)
Mixed models (the bug)

const encoded = new URLSearchParams({ q: 'a+b' }).toString(); // 'q=a%2Bb'
const back = new URLSearchParams(encoded).get('q'); // 'a+b'

URLSearchParams escapes the literal plus to %2B on the way out and reads it back as a plus on the way in. Encode and decode with the same model, and the value survives the round trip exactly.

const encoded = new URLSearchParams({ q: 'a b' }).toString(); // 'q=a+b'
const back = decodeURIComponent('a+b'); // 'a+b'  ← a space went in, a plus came out

The space became + on the wire, but decodeURIComponent reads + as a literal plus rather than a space, so the value is corrupted. The two tools disagree about what + means, and the disagreement stays invisible until a value contains one.

The fix isn’t to memorize which character maps to what. It’s a habit.

Which encoder, which position

The %20-vs-+ split is the subtle case, but it sits inside a simpler, more common confusion. There are three encoding tools, each correct in exactly one place, and most bugs come from picking the wrong one. People reach for encodeURIComponent everywhere and double-encode values that URLSearchParams already handled, or they reach for encodeURI thinking it escapes a value, when it’s actually meant for whole URLs. This table settles it.

| Tool | Escapes structural chars (:/?#&=)? | Use for | | --- | --- | --- | | encodeURI | preserved | a whole, already-trusted, well-formed URL string (rarely the right tool) | | encodeURIComponent | escaped | a single value going into a path segment | | URLSearchParams | handled for you | any query-string construction |

The rule underneath the table is short enough to hold in your head: a value going into a path segment gets encodeURIComponent, anything in the query goes through URLSearchParams, and whole-URL escaping is almost never what you want, so reach for new URL instead.

Match each position in a URL to the tool that belongs there. Pick the right option from each dropdown, then press Check.

To drop a customer name into the path /customers/<name>, reach for . To attach ?q= search terms to a URL, build them with . And to escape an entire URL string you already trust as well-formed — the rare case — you’d use .

The bugs concatenation hands you for free

This is the payoff. Every one of these is a bug that string concatenation produces silently and that new URL plus URLSearchParams removes. They’re worth seeing as a list, because the point isn’t any single one. It’s that you stop writing the whole category at once.

Parameter injection. `?q=${q}` with q = 'a&admin=true' smuggles a second parameter into your URL, while params.set('q', q) escapes the & to %26 so it stays part of the value. This is the most security-relevant of the five, since letting a user pick your query keys is a real attack surface.
Double-encoding. Running encodeURIComponent over a value URLSearchParams already escaped gives you %2520 where you meant %20, because the % itself got encoded.
Path-traversal-shaped values. A raw '../admin' interpolated into a path is read as path syntax by the server, not as a literal segment.
Unicode hostnames. The parser normalizes an IDN via Punycode (münchen becomes xn--mnchen-3ya), whereas hand-templating ships a hostname the DNS layer can’t resolve.
Trailing-slash drift. The double-slash bug from the constructor section, now one entry in a longer pattern.

Read that list as five symptoms of one underlying problem. None of them is a one-off you patch when it surfaces. Each is a category of bug, and reaching for the parser removes the category rather than the single instance. The lesson comes down to this: the goal isn’t to fix URL bugs but to stop writing them in the first place.

One reflex, every runtime

One practical convenience is worth stating plainly, because it saves you from wondering where to import these from. URL and URLSearchParams are globals in every runtime this course ships on: the browser, the Node.js server, and the Vercel Edge runtime. There is no import, ever. The same two lines work unchanged whether they run in a Server Component, a Route Handler, or a click handler in the browser.

The vocabulary travels into React intact, too. Next.js’s useSearchParams() hook hands you a read-only URLSearchParams instance, so get, getAll, and the multi-value model you just learned carry straight from URL strings into the component tree. You’ll see the full treatment in Unit 4. For now, just know it’s the same object, not a new API to learn.

One more name to file away, with no code attached: URLPattern is the standardized URL-matching primitive, now a global on browsers, Node, and Edge. You won’t write it in this course, because Next.js’s file-system router owns route matching, but it’s worth recognizing when you see it, since Unit 4’s middleware uses pattern objects under the hood.

Practice: build the URL

Now put the round trip into practice. You’re handed a base URL and a record, and your job is to return a fully-built, correctly-encoded URL string using URLSearchParams, with no manual ? and no hand-escaping. The grading hinges on the round trip: the test parses your result back out and checks that the + in 'a+b' survived. If you build the query by hand with encodeURIComponent, the + decodes to a space and the test fails. It’s the %20-vs-+ rule turned into a test you can run.

Given a base URL and a record { q, tags }, return the fully-built URL string. Put q in as a single search param, and add each entry of tags as a repeated tag param (so tags: ['x', 'y'] becomes tag=x&tag=y). Use URLSearchParams — no manual escaping, no string concatenation.

Output

The reference solution is short, so try it before you open it.

Show a solution

function buildSearchUrl(base, { q, tags }) {
  const url = new URL(base);
  url.searchParams.set('q', q);
  for (const tag of tags) {
    url.searchParams.append('tag', tag);
  }
  return url.href;
}

Use set for the single value, append in the loop for the repeated one, and url.href to read the finished string back out, with encoding handled the whole way.

External resources

Two references worth keeping open while this becomes second nature, plus a parser you can paste any URL into to see the six fields for yourself.

URL — MDN

developer.mozilla.org

The full property and method surface of the URL interface, including the fields this lesson skipped.

URLSearchParams — MDN

developer.mozilla.org

Every construction shape and the get / getAll / append / set methods, with runnable examples.

Parts of a URL — interactive explorer

partsofaurl.com

Paste any URL and watch it decompose into scheme, host, path, query, and fragment — the six fields made tangible.