Skip to content
Chapter 42Lesson 2

Formats over regexes

Validate SaaS input strings with Zod 4's named format builders, like email, uuid, and ISO datetime, instead of hand-rolled regexes.

In the last lesson you assembled an invoice-creation schema, and you wrote one field as a deliberate placeholder: email: z.string(). Consider what that actually promises. z.string() accepts any string, so "ada@example.com", "not-an-email", and "banana" all parse clean: all three are strings. The schema knows the field is a string. It has no idea whether it’s an email.

That gap widens the moment you flesh out a real sign-up, because a sign-up carries more than an email. It carries an invitation token, which is a UUID a teammate was sent. It carries a requested start date, an ISO 8601 timestamp off a date picker. It carries the visitor’s IP, read off the request headers, to key a rate limiter. Type each of those as z.string() and you’ve validated almost nothing: a forged token, a date typed as "tomorrow", and an IP that reads "localhost" all sail through, because every one of them is technically a string.

A string of a particular kind needs more than “is a string.” If you’ve worked in other languages, your instinct is probably to reach for a regular expression: match the email against /^[^@]+@[^@]+$/ and move on. Set that instinct aside. Zod 4 ships a named builder for every common kind of string, including z.email() , z.uuid(), z.iso.datetime(), and z.ipv4(), and a named builder beats a hand-rolled regex on every axis you’ll come to care about. By the end of this lesson you’ll write the format half of any SaaS input schema, reaching for the right builder for each field instead of a regex.

The format is the type, not a postfix modifier

Section titled “The format is the type, not a postfix modifier”

Start with the email, because the way it’s written changed between Zod 3 and Zod 4, and that one change captures the whole idea of this lesson in a single line.

In Zod 3 you wrote z.string().email(). Read that left to right: it builds a ZodString, a string schema, and then chains a format check onto it. The email-ness is a postfix modifier, bolted on after the fact. The type the field really is, “an email,” is expressed as “a string, with a check.”

Zod 4 promotes the format to its own top-level builder: z.email(). There’s no z.string() underneath and no chained method. It infers as string exactly as before, because an email is a string, but the format is no longer a modifier. It’s the builder. Both forms parse the same valid emails; what differs is where the format lives, and that difference is the point.

lib/signup.ts
const signupSchema = z.object({
name: z.string(),
email: z.string().email(),
});

What you’ll meet in older codebases, and what an AI may still emit for you. It’s deprecated in v4. The format is chained onto a string schema as a postfix check. It still parses, with a deprecation warning, but it’s no longer the form you write.

This change is not just cosmetic, and it’s worth understanding why the top-level form is better, because the same reasons make it the right reflex everywhere else in this lesson.

The first reason is bundle size. z.string().email() has to pull in the entire ZodString class, with every string method, .startsWith, .includes, and the rest, even when all your code does with that field is check it’s an email. z.email() pulls in the email validator and nothing else. When a build tool strips out unused code, the top-level builders tree-shake cleanly, while the chain does not.

The second reason is the error message. z.email() carries its own default message, so a failed parse says "Invalid email" rather than a generic string complaint. Because the format knows what it is, it knows what to say when the input isn’t it. That message is part of the contract the form layer will eventually render, and getting it for free is worth something.

The third reason you’ll appreciate later: each top-level builder emits a cleaner JSON Schema. That matters once an API in a later unit generates an OpenAPI document from these schemas. Note it and move on, since it’s not today’s lesson.

So here is the rule, the spine the rest of this lesson hangs off:

Now put it to work. The exercise below is the sign-up schema as the last lesson left it, where every field that’s really a format is still a bare z.string(). Your job is to swap each one for its top-level builder, so the inputs that should be rejected start failing. Watch the ^? query as you go: the inferred type stays string the whole time. Naming the format doesn’t change the type; only the runtime check tightens.

Replace each bare `z.string()` with the right top-level format builder — `z.email()`, `z.uuid()`, `z.iso.datetime()` — so the bad inputs start failing. Watch the `^?` query: the inferred type stays `string` for each field. Naming the format doesn't change the type; it tightens the runtime check.

Booting type-checker…
Test scenario Value
valid signup {"email":"ada@example.com","token":"0190d3a2-7b4e-7c1a-9f…
bad email {"email":"not-an-email","token":"0190d3a2-7b4e-7c1a-9f3d-…
bad token {"email":"ada@example.com","token":"nope","startAt":"2026…
bad datetime {"email":"ada@example.com","token":"0190d3a2-7b4e-7c1a-9f…

You’ve now stated the rule, named format before regex. The end of the lesson works through exactly why it holds, on every axis. First, here is the catalog the rule points you at.

A SaaS surface reaches for about a dozen builders, and the whole catalog is one rule applied over and over: a string of a known kind has a builder, so name it. What follows is the working set, grouped by the kind of field. Read it as a reference you’ll come back to, not a list to memorize. The skill is knowing a builder exists so you reach for it instead of a regex, not reciting it cold.

z.email() is an RFC-aligned email. Its default is production-appropriate, so you don’t need to tighten it for the common case. The one knob is a pattern option, for the rare platform whose deliverability rules demand a stricter shape: z.email({ pattern: z.regexes.html5Email }), or your own regex. Reach for the bare z.email() until something concrete tells you otherwise.

z.uuid() versus z.guid() is the one distinction in the catalog worth slowing down for, because it’s easy to get wrong and it changed in v4. Both validate the same 8-4-4-4-12 hex shape, but they’re strict to different degrees.

z.uuid(); // strict: RFC 9562/4122, versions 1–8, variant bits correct
z.guid(); // permissive: any 8-4-4-4-12 hex string
z.uuid({ version: 'v7' }); // pin to exactly UUIDv7

z.uuid() is strict in v4. It enforces the real spec: a recognized version (1 through 8) and correct variant bits. This is what you reach for with IDs your own app generates. The UUIDv7 primary keys from earlier in the course pass through it cleanly, because your app produces spec-correct UUIDs.

z.uuid(); // strict: RFC 9562/4122, versions 1–8, variant bits correct
z.guid(); // permissive: any 8-4-4-4-12 hex string
z.uuid({ version: 'v7' }); // pin to exactly UUIDv7

z.guid() is the permissive one: any string in the 8-4-4-4-12 hex shape passes, with version and variant bits unchecked. Reach for it only when an upstream system hands you identifiers that may not be RFC-strict and you can’t change that. It’s the loose escape hatch, not the default.

z.uuid(); // strict: RFC 9562/4122, versions 1–8, variant bits correct
z.guid(); // permissive: any 8-4-4-4-12 hex string
z.uuid({ version: 'v7' }); // pin to exactly UUIDv7

When you want exactly one version, pin it. Since your app standardized on UUIDv7, z.uuid({ version: 'v7' }) rejects a v4 UUID that slipped in from somewhere else. That’s a tighter contract for when the version itself is part of what you’re asserting.

1 / 1

The migration trap here is worth care. In Zod 3, z.string().uuid() was loose: it accepted any 8-4-4-4-12 hex, with the version unchecked. That loose v3 validator maps to v4’s z.guid(), not to z.uuid(). So if you mechanically rewrite an old z.string().uuid() to z.uuid(), you’ve quietly tightened the check, and code that worked in v3 can start rejecting identifiers in v4 if the source was ever producing non-strict UUIDs. Rewrite to z.uuid() when you control the IDs, and to z.guid() when you don’t.

z.url() is a URL-constructor-compatible URL. By itself it accepts far more than you usually want, including javascript: and data: URLs. For any URL your app will render as a link or redirect a user to, that’s not acceptable: a javascript: URL that round-trips through your app and into an href is the open-redirect and XSS class of bug. So the production reflex is not bare z.url(), but the protocol allowlist.

lib/url-schemas.ts
z.url({ protocol: /^https?$/ }); // must be http or https
z.url({ protocol: /^https?$/, hostname: /\.example\.com$/ });
z.httpUrl(); // shorthand for the http/https case

z.url({ protocol: /^https?$/ }) is the canonical “must be http or https”: the protocol regex constrains the scheme, and an optional hostname regex pins it to domains you trust. z.httpUrl() is a shorthand for exactly the http/https case. Treat the protocol allowlist as non-negotiable for any user-supplied URL the app acts on. The full open-redirect rule comes later in the course, but the allowlist is the part to internalize now.

ID encodings: z.cuid(), z.cuid2(), z.ulid(), z.nanoid(). These are alternative ID string formats, each produced by a particular family of generators. There’s no “best” one to argue over. The rule is mechanical: use the format the upstream system produces. If a third-party service hands you CUID2 IDs , validate with z.cuid2(); if it emits nanoids, use z.nanoid(). Name the builder, match it to the source, and you’re done.

z.ipv4(), z.ipv6() are IP address validators. You reach for them when typing a client IP you’ve read off the request headers to key a rate limiter. Reading that IP safely, meaning which header to trust and how proxies rewrite it, is a request-surface concern covered later in the course; here, z.ipv4() is just the validator for the string once you have it. If you’re validating a CIDR block rather than a single address, z.cidrv4() and z.cidrv6() exist for that.

z.iso.date(), z.iso.time(), z.iso.datetime(), z.iso.duration() are ISO 8601 string validators. There’s one point here that’s easy to miss, so read it twice: these validate the string format and infer as string, not Date. z.iso.datetime() confirms a value looks like "2026-09-01T10:00:00Z"; it does not hand you a Date object. The in-memory date type is still Date, and turning the validated string into one is a separate step, covered later in this chapter. Reach for z.iso.datetime() for any instant that crosses the wire as a string, such as a URL param or a JSON body field. One detail to know: it rejects timezone offsets by default and takes options for them ({ offset: true }) and for sub-second precision ({ precision }). Note that these exist; you don’t need to drill them.

z.jwt() validates that a string has the shape of a JWT. You’ll see it once and rarely write it, because auth flows in a later unit verify tokens through the auth library, which checks the signature and claims, not just the shape. Shape-validating a JWT in a schema is almost never what you want.

z.e164() is the phone-number format. E.164 is the + and up-to-fifteen-digits shape. z.e164() validates that and stops there. Full phone parsing, with national formats, carrier lookups, and the libphonenumber library, is a parser, not a schema concern, and it’s out of scope. Validate the format with z.e164(); if you need to parse, that’s a different tool.

That’s the catalog. Notice that you didn’t write a single regex: every one of those is a string of a known kind, and every kind had a builder. That’s the rule doing its job.

Formats are one shape of validation: a builder that knows a kind of string. Numbers and dates work differently. There’s no format to name, because a number is just a number, so instead of a named builder you constrain the range of allowed values with a chain of methods. The idea is the same, but the ergonomics differ.

Start with numbers, and start with some good news. In Zod 4, z.number() rejects NaN and Infinity by default.

lib/number-demo.ts
z.number().parse(3); // ✓ 3
z.number().parse(NaN); // ✗ rejected by default in Zod 4
z.number().parse(Infinity); // ✗ rejected by default in Zod 4

That’s worth flagging because it’s a change from Zod 3, where both slipped through and you had to add .finite() to keep them out. In v4 this is handled for you: z.number() already excludes NaN and Infinity, so .finite() is redundant. If you see .finite() in older code or an AI suggests it, that’s a v3 habit you don’t need.

On top of z.number(), the constraints chain inline. .min(0) and .max(100) bound the range; .gt(), .lt(), and their inclusive cousins .gte() (which .min aliases) and .lte() (which .max aliases) give you the open and closed comparisons; .int() demands a whole number; .positive() and .nonnegative() cover the sign; and .multipleOf(0.01) snaps to a step. You compose the ones the field needs.

Two of those combinations come up constantly, so learn them as whole shapes rather than as individual methods.

lib/number-shapes.ts
z.number().int().positive(); // a quantity: 1, 2, 3 — never 0, never 2.5
z.number().positive().multipleOf(0.01); // a money amount: 19.99, never -5 or 19.999
z.int().positive(); // same as z.number().int(), clearer at the call site
z.date().min(new Date('2020-01-01')); // a Date instance on or after that day

A quantity is z.number().int().positive(): a whole number, one or more. That’s the exact shape the last lesson’s invoice schema used; now you know what each link buys you. A money amount is z.number().positive().multipleOf(0.01): positive, and snapped to two decimal places so 19.999 is rejected. Be clear-eyed about that money shape, though, because it’s only the schema-level approximation. Real money has a deeper story: databases hand back monetary columns as strings to avoid floating-point drift, and production code reaches for a decimal library. That’s covered later in this chapter. Here, multipleOf(0.01) is the right amount of correctness for a validation schema, and no more.

One shortcut worth a line: z.int() is the top-level form of z.number().int(). When a field is conceptually an integer, like a count, an age, or a page number, z.int() says so more directly at the schema site than chaining .int() onto a generic number. Prefer it when the integer-ness is the point.

Now dates, kept brief, because the full timezone-aware story lands much later in the course. z.date() validates an actual Date instance, not a string, and takes ranges with .min(new Date('2020-01-01')) and .max(...). Hold it next to z.iso.datetime() from the catalog, because the two are easy to confuse, and telling them apart is exactly what you need to choose between them.

const schema = z.iso.datetime();
schema.parse('2026-09-01T10:00:00Z');
// ✓ → '2026-09-01T10:00:00Z' (a string; infers as string)

The value arrives as a string, such as a URL param or a JSON body field, and stays one. Reach for this when the date crosses the wire as text and the next thing that happens to it is not date math.

So the choice is mechanical: if the value arrives as a string off the wire, use z.iso.datetime(); if it’s already a Date in memory, use z.date(). The bridge between them, taking a validated ISO string and turning it into a Date, is its own step, covered later in this chapter. For now, pick the one that matches the value you actually have.

Last in this group, z.bigint() accepts a bigint value and takes the same range constraints. One caution you met earlier in the course: a bigint doesn’t JSON.stringify cleanly, so it needs care the moment it crosses a JSON boundary. This is a reminder, not a re-teach, so just don’t be surprised by it.

Seeing those number rejections asserted in a comment is one thing; watching them happen live is another. The callout below is prefilled with a quantity schema. Run it against the inputs and watch NaN, 3.5, and -1 all bounce off z.number().int().positive() while 3 passes. That’s the v4 default and the constraint chain, working against the real runtime.

When the format runs out: string constraints and the regex of last resort

Section titled “When the format runs out: string constraints and the regex of last resort”

The format builders name a kind of string. But plenty of useful checks aren’t about the kind at all; they’re about length, prefix, or casing. Those layer on as a constraint chain, much like the number constraints, and they’re where the format story finishes.

The string constraints are .min(n), .max(n), and .length(n) for length; .startsWith(), .endsWith(), and .includes() for substrings; .regex(/.../) for a custom pattern; and the normalizers .trim(), .toLowerCase(), and .toUpperCase(). Two of these carry real weight.

The first is that formats compose with constraints. A format builder is still a string schema underneath, so you can chain a length check onto it. A real sign-up email isn’t bare z.email(); it’s z.email().max(254). The format validates the kind, and the .max(254) defends against a pathological input: someone pasting a megabyte into the email field to see what breaks. Make that a reflex: a max length is cheap insurance on any free-text field a stranger can submit. It costs you nothing and closes off a class of abuse.

lib/signup-fields.ts
z.email().max(254); // the kind, plus a defense against pathological length
z.string().min(1).max(80); // a display name: present, and bounded

The second carries a warning. The normalizers, .trim(), .toLowerCase(), and .toUpperCase(), don’t just check the value; they change it. .trim() returns the input with surrounding whitespace stripped, and .toLowerCase() returns it lowercased. The inferred type stays string, but the value you get back from a parse is not the value you put in. Parse " ADA@EXAMPLE.COM " through z.email().toLowerCase().trim() and you get back "ada@example.com". Note this now, so you never expect a parsed value to equal its input when a normalizer is in the chain. The general machinery for transforming values, including transforms that change the type and not just the value, is the next lesson’s subject; here, just know that these three change what comes out.

That brings the main rule back, now ready to be proven and bounded: reach for .regex() only when no built-in format fits. You’ve felt the pull of a clever regex this whole lesson, and the chapter’s stance is the opposite of that pull. Here’s why the named builder wins, on axes you might not have weighed:

  • It’s tested against thousands of real-world inputs, including the unusual-but-valid emails and the edge-case URLs that a regex you write in five minutes will miss.
  • It’s internationalized: the email and URL validators handle Unicode and international cases that a naive ASCII regex silently rejects.
  • It’s kept current: when a spec evolves, the builder updates with it, while your regex is frozen the day you wrote it.
  • It produces a better error message: "Invalid email" reads to a user, while "does not match /^[^@]+@..../" reads to no one.

A regex you hand-roll loses on all four. So “last resort” is the literal rule: exhaust the catalog first.

But “last resort” isn’t “never.” There are genuine cases with no built-in, and for those .regex() is exactly right.

lib/sku-schema.ts
z.string().regex(/^SKU-\d{6}$/); // an internal SKU — no built-in format for this

An internal SKU like SKU-000123 is a shape your business invented. No spec defines it, so no builder validates it, and z.string().regex(/^SKU-\d{6}$/) is the correct tool. That’s the test: a regex earns its place only where the format is genuinely yours and nobody has standardized it.

Now make that judgment yourself. Below is a set of fields. Sort each one by whether Zod 4 ships a named builder for it, or whether it’s a genuine last-resort regex: a domain-specific shape with no standard behind it.

Sort each field by whether Zod 4 ships a named builder for it, or whether it's a genuine last-resort regex. Drag each item into the bucket it belongs to, then press Check.

Named format builder A built-in covers this
Reach for .regex() No built-in fits — last resort
A user’s email address
A v7 UUID primary key
An ISO 8601 timestamp
A redirect URL that must be https
An IPv4 rate-limit key
An internal SKU like SKU-000123
A US ZIP+4 like 94105-1234
A semantic-version string like 1.4.2

Where format validation stops: shape here, cross-resource rules at the action

Section titled “Where format validation stops: shape here, cross-resource rules at the action”

You can now write the format half of a sign-up schema, and you should be clear-eyed about that word half. A schema validates shape and format: everything provable from the value alone. Is this a well-formed email? Is it under 254 characters? Is it lowercased? Those questions need nothing but the value in front of you, and the schema answers all of them. Layer on a .max(254) and a .toLowerCase(), and the schema’s job is genuinely done.

But an experienced engineer knows a sign-up needs more than that, and the rest is a different kind of question entirely. Is this email already registered? Is it on a suppression list of addresses that bounced or complained? Does the requested org slug collide with one that already exists? None of those is a format rule. Every one of them requires a database lookup: you can’t answer “is this email taken” from the email’s text, so you have to go ask the database.

And that’s the boundary. A schema that needs a database connection to parse is the failure mode: it can’t run in a test without a database, it can’t run at the edge, and it tangles pure validation with live infrastructure. So those rules don’t go in the schema. They live in the Server Action’s body, after the parse. Server Actions are the very next chapter, and they’re exactly the layer where reaching into the database is legitimate.

Carry that split with you. It’s this lesson’s corner of a larger boundary you’ll see again: the next lesson draws the same line for custom logic rules, and the rest of the course keeps it. Validate what the value can prove about itself in the schema, and ask the database its questions at the action. Keep them apart, and every schema you write stays something you can parse anywhere.

The Zod documentation is the full catalog this lesson selected from. Its schema-definition reference lists every format builder, including the handful this lesson skipped as out of scope. The v4 changelog is worth a skim for the string-format migration in Zod’s own words, since it’s where the deprecation of z.string().email() and the move to top-level builders is spelled out.