Skip to content
Chapter 88Lesson 4

Mock the wire, not the SDK

Why integration tests for third-party services like Stripe should mock the network request itself, not the SDK call above it.

The last lesson left you with signedInAs and a habit: when a test needs to stub something, stub the boundary your code actually calls, auth.api.getSession, not the library’s internals underneath it. This lesson takes that habit one boundary further out, to the place where your code stops talking to a library and starts talking to someone else’s server.

That boundary is where the most expensive test bug in this chapter lives. It isn’t a test that fails. It’s a test that passes while production is broken. You’ll learn to recognize that bug on sight, name exactly why it lies, and move your mock down to the one line where it can no longer hide.

The test that passes while production breaks

Section titled “The test that passes while production breaks”

Here is a Server Action that puts a customer on a paid plan. It’s the subscription sibling of the createInvoice you’ve been carrying through this chapter: same wrapper, same shape, except that instead of writing a row it reaches out to Stripe to start a checkout.

export const createSubscription = authedAction(
'member',
createSubscriptionSchema,
async (input, { user, orgId }) => {
const session = await stripeClient.checkout.sessions.create({
mode: 'subscription',
line_items: [{ price: input.priceId, quantity: 1 }],
metadata: { userId: user.id, orgId },
});
return ok({ checkoutUrl: session.url });
},
);

Now you want a test for it. You reach for the tool you already know, SDK mocking from the unit-testing lessons, and you write the obvious thing:

src/server/actions/createSubscription.int.test.ts
vi.mock('stripe');
it('starts a subscription checkout', async () => {
await signedInAs({ plan: 'free' }, tx);
await createSubscription({ priceId: 'price_pro_monthly' });
expect(stripeClient.checkout.sessions.create).toHaveBeenCalledWith({
mode: 'subscription',
line_items: [{ price: 'price_pro_monthly', quantity: 1 }],
metadata: expect.any(Object),
});
});

It passes. It’s fast, and it reads cleanly. So you ship it.

Three weeks later, production starts throwing 400s on checkout. Maybe Stripe bumped a required header on the API version your account is pinned to. Maybe the SDK started sending a field under a renamed key. Maybe that metadata object had a shape Stripe rejects. Whatever the cause, real Stripe said no, and your test stayed green through every commit.

It’s worth pinning down exactly why, because the reason is the whole lesson. Your assertion checked that a function was called with certain arguments. It never checked what bytes reached Stripe. And the bytes, the method, the URL, the headers, and the body, the thing that goes on the wire , are the only contract Stripe enforces. Stripe has never seen your create call. It sees a request. You tested the call and shipped the request untested.

A mock is a stand-in you wrote, so it can only confirm your own assumptions back to you. It can’t tell you anything about a system you don’t control, because it doesn’t contain that system. It contains your guess about it. So a passing mock-based test against a third party proves exactly one thing: your code called your guess the way you expected. If the guess is wrong, the test is wrong in the same direction, and it stays green.

So here is the question this lesson answers: if mocking the SDK can’t catch the bug, where do you put the mock so it can? To answer that, you first have to see what the SDK actually is.

An SDK is a tower on top of one network call

Section titled “An SDK is a tower on top of one network call”

It’s tempting to picture stripeClient as a magic object whose methods “talk to Stripe.” Drop that picture. An SDK is a stack of plain code, and every method you call is a trip down that stack to a single rung at the bottom: one outbound network call. Everything above that rung is ordinary JavaScript running in your process, doing work on your behalf before a single byte leaves the machine.

Here’s the tower for that one checkout.sessions.create call. Read it top to bottom, the direction your data flows.

Every checkout.sessions.create call is a trip down this tower. The only line you wrote is the top one; the SDK generates everything below it.

Walk up from the bottom and name what each rung does, because each one is something a function-mock silently throws away:

  • Serialize. Your tidy JS object becomes an actual request body. One detail makes this concrete: Stripe’s v1 API doesn’t take JSON. It takes application/x-www-form-urlencoded , the same encoding an HTML form posts. So line_items: [{ price: 'price_x', quantity: 1 }] doesn’t go out as JSON. The SDK flattens it into line_items[0][price]=price_x&line_items[0][quantity]=1. Nothing at your call site wrote that string; the SDK manufactured it. A mock of create never produces it, so a mock of create can never catch a bug in it.
  • Sign and set headers. The SDK attaches Authorization: Bearer sk_..., a Stripe-Version that pins the API shape, and an Idempotency-Key it generated for you so a retried request can’t double-charge a customer. You never see these headers; they are pure SDK output.
  • Retry. On a 429 or a 5xx, the SDK backs off and tries again. That behavior is real and it matters, and it’s invisible to any test that intercepts above it.
  • The network call. The one rung that leaves your process. This is the only thing Stripe observes. Everything above it was your code, and the SDK’s code, preparing this single request.
  • Parse. The response comes back as bytes. The SDK decodes them, and on an error it maps Stripe’s error JSON into a typed StripeError you can catch. This is real too, and a high mock discards it as well.

So “the wire” isn’t an abstraction. It’s a specific string of bytes the SDK builds for you, most of it generated below your call site. Let’s look at it as the actual request, so it stops being a diagram label and becomes something you could read in a network tab.

POST /v1/checkout/sessions HTTP/1.1
Host: api.stripe.com
Authorization: Bearer sk_live_***
Stripe-Version: 2025-04-30
Idempotency-Key: 6f1c... (auto-generated)
Content-Type: application/x-www-form-urlencoded
mode=subscription&line_items[0][price]=price_pro_monthly&line_items[0][quantity]=1&metadata[userId]=usr_123&metadata[orgId]=org_42

The method, path, and host. Your call site named none of these. sessions.create resolved to POST /v1/checkout/sessions on api.stripe.com. A function-mock asserts on your arguments; it never asserts that the request even went to the right place.

POST /v1/checkout/sessions HTTP/1.1
Host: api.stripe.com
Authorization: Bearer sk_live_***
Stripe-Version: 2025-04-30
Idempotency-Key: 6f1c... (auto-generated)
Content-Type: application/x-www-form-urlencoded
mode=subscription&line_items[0][price]=price_pro_monthly&line_items[0][quantity]=1&metadata[userId]=usr_123&metadata[orgId]=org_42

Three headers the SDK added, not you. The Idempotency-Key is the one that matters most: it’s what stops a retried checkout from charging twice. Nothing in your code wrote it, so no mock of your code can verify it’s there.

POST /v1/checkout/sessions HTTP/1.1
Host: api.stripe.com
Authorization: Bearer sk_live_***
Stripe-Version: 2025-04-30
Idempotency-Key: 6f1c... (auto-generated)
Content-Type: application/x-www-form-urlencoded
mode=subscription&line_items[0][price]=price_pro_monthly&line_items[0][quantity]=1&metadata[userId]=usr_123&metadata[orgId]=org_42

The body, form-encoded rather than JSON. Your { price, quantity } object became line_items[0][price]=.... This exact transformation is what real Stripe parses, and it’s exactly what a mock of create() skips, because the mock receives your object and never runs the serializer that produces this.

1 / 1

The payoff is bigger than Stripe. Every outbound HTTP client in your stack is this same tower over one network call. The Resend SDK for sending email is a tower over one request to Resend’s API. An AI provider’s SDK is a tower over one request to theirs. An internal RPC client calling another one of your services has the same shape. They differ in what they serialize and which host they hit, but they all bottom out at the same rung, a request leaving your process, because the network is the one exit they all share.

That shared exit is why a single tool covers all of them. The mechanics are the next lesson, but the tool is MSW (Mock Service Worker). One library can mock Stripe, Resend, your AI provider, and your internal RPC with the same handler shape because it intercepts at the network rung, the one place every tower meets the ground. It doesn’t care which SDK built the request, or whether that SDK used the Fetch API or Node’s own HTTP module underneath. It watches the network boundary, and that boundary is universal.

You now have the vocabulary to name the bug precisely: the naive test mocked too high up the tower. There are two flavors of “too high,” one rung apart. Look at all three options side by side, the two mistakes and the fix, and read what each one actually tests.

const createSession = vi.spyOn(
stripeClient.checkout.sessions,
'create',
);
await createSubscription({ priceId: 'price_pro_monthly' });
expect(createSession).toHaveBeenCalledWith(
expect.objectContaining({ mode: 'subscription' }),
);

Asserts your inputs, nothing else. You’re checking the object you handed create, but the serializer, the signer, the retry, and the parser never run. If the SDK changes how it turns that object into a request, this assertion still passes against the old, imagined shape. You tested that you called your guess.

Each anti-pattern ships a specific class of bug. Here is one for each, so they’re not abstract.

Mock the function misses an SDK-added header. Stripe bumps the Stripe-Version your client sends, the new version rejects a field your code passes, and real requests start 400ing. But your spy only ever saw your arguments, so it never noticed the header existed, let alone changed. The test that should have caught a production outage was blind to it by construction.

Mock the SDK class misses idempotency. Suppose a refactor breaks the SDK’s idempotency-key generation, so retried checkouts no longer carry a key, and a customer who double-clicks “Subscribe” gets charged twice. No test built on a fake FakeStripe could catch this, because the fake class never generated a key in the first place. The behavior you most need to protect is the one the mock erased.

Both mistakes are the same mistake in different clothes, and you’ve seen this shape before. Last lesson, mocking the JWT verifier instead of auth.api.getSession was “too deep”: you reached past the boundary your code calls, into the library’s internals. Mocking create here is “too shallow”: you stopped short of the boundary that matters, the network. Too deep and too shallow are the same error: the wrong boundary. It’s the same skill, pointed at a different seam.

So carry one diagnostic out of this section. It’s the question to ask of any test that touches a third party:

Mock what you don’t own, roll back what you do

Section titled “Mock what you don’t own, roll back what you do”

Step back, because this lesson and the first lesson of this chapter are two halves of one idea, and they now fit together.

The first lesson taught you to test against a real Postgres and contain it with transaction rollback, never to mock your database. This lesson teaches you to mock Stripe and never let a real request leave the machine. At a glance those look like contradictory advice: run the real thing here, fake the real thing there. They’re not. They’re the same rule, decided by a single question: do you own this boundary?

Both directions follow from that one question.

You don’t own Stripe, so mock the wire. You can’t run Stripe in CI, and even if you could, you shouldn’t: real network flake would make your tests flaky, rate limits would throttle them, and a misfire would put real charges on real cards. So you freeze the wire and assert on the request your code produced. The thing under test is your code building the right request, which is precisely the part of the contract you’re responsible for. Stripe’s job is to honor a correct request, and your job is to send one. The mock lets you test your job without depending on theirs.

You own your Postgres, so roll it back, never mock it. You can run your database; the first two lessons of this chapter set that up. Mocking it would reintroduce the exact bug the first lesson opened on: a stubbed Drizzle can’t catch a column rename, a violated constraint, or a mistake in the real SQL, because the stub is your assumption about the schema, and the schema is the thing most likely to have drifted out from under you. So for the boundary you control, you do the opposite of mocking: run the real thing, and contain it with a tx rollback so the test leaves no trace.

One picture holds both. This is the image to keep from the entire chapter: your action sits in the middle with two boundaries, and each boundary gets the opposite treatment.

The same action, two boundaries, opposite rules. You own the database, so you run it and roll back. You don't own Stripe, so you mock the wire.

Now check that the rule transferred. The trap to avoid is the one the first lesson warned about: sliding “your Postgres” into the mock column. Sort each of these the way a reviewer would.

Each of these is something an integration test has to deal with. Decide whether to mock it at the network boundary, or run it for real and roll it back. Drag each item into the bucket it belongs to, then press Check.

Mock the wire (MSW) You don't own it — freeze the request
Run it, roll back (tx) You own it — run the real thing
stripeClient.checkout.sessions.create
An insert into your own invoices table
A resend.emails.send call
A Drizzle query against your own schema
An OpenAI completion call
Reading your own session row

If every item that hits your own database landed in “run it, roll back,” the rule has transferred. That’s the whole decision, and it’s now one question, not two techniques.

You know where to mock. The last piece is knowing what to assert once you’re there, because the entire reason to mock at the network is so you can assert on the one thing that matters: the request your code sent.

The mechanics of pulling that request apart, capturing it in the handler, cloning the body, and reading the form fields back out, are the next lesson. What you need first is the target. Say the contract in plain words: “when a member subscribes, the action must POST to Stripe with metadata.userId, metadata.orgId, and an Idempotency-Key header.” A correct boundary test follows the arrange-act-assert shape you already know, and the only new idea is that last line:

  • Arrange: await signedInAs({ plan: 'free' }, tx). The auth fixture from last lesson is already yours, and this is the same first line as every other action test.
  • Act: call the real createSubscription. There’s no mock of the SDK, so the whole tower runs.
  • Assert: on the intercepted request MSW caught: its URL, its method, its headers, and its decoded body. Not on any SDK method call.

That last bullet is the entire shift, so make it concrete one more time:

// ❌ asserts the call you made — blind to everything the SDK did with it
expect(stripeClient.checkout.sessions.create).toHaveBeenCalledWith(/* ... */);
// ✓ asserts the request Stripe would have received
expect(capturedRequest.headers.get('Idempotency-Key')).toBeTruthy();
expect(decodedBody).toMatchObject({ metadata: { userId: 'usr_123' } });

The bottom two lines test the contract you’re actually on the hook for; the top line tests your own arguments back to yourself. How you get capturedRequest and decodedBody is the next lesson. For now, hold onto the target.

Two boundary policies travel with “assert on the request.” They aren’t mechanics; they’re the conditions under which a boundary test is trustworthy at all, so they belong right here next to the assertion.

The first is that an unhandled request must fail loud. Configure your mock server with onUnhandledRequest: 'error', so that any outbound call your test didn’t explicitly set up throws instead of silently returning a 200. Consider what a silent 200 does to you: your code fires a request to an endpoint you forgot to handle, gets back a fake success, and the test goes green. That’s the green-but-wrong failure, rebuilt from scratch. A boundary test is only honest if every request it doesn’t recognize blows up. (Where this setting lives is the next lesson; that it must be on is non-negotiable.)

The second is no hand-rolled fetch outside your client layer. This one is about making the boundary mockable in the first place. Production HTTP to a third party goes through a typed client, the kind you’ll build later, where stripeClient and the Resend client live. A raw fetch buried inside a route handler isn’t just a style smell. It resists clean interception and assertion, because there’s no single seam to point a handler at. When you find one, it’s a refactor target before it’s a test target: lift it into the client layer, then test it at the boundary like everything else.

Boundary mocks are assumptions; contract tests catch drift

Section titled “Boundary mocks are assumptions; contract tests catch drift”

There’s an honest objection to everything above, and you may already have raised it: if you hand-write Stripe’s response yourself, what stops your canned response from drifting away from what real Stripe actually returns?

Nothing does, and that’s the real tradeoff, so name it plainly. A boundary mock encodes what you assume the third party does. The mock that says “Stripe returns { id, url }” is only as true as your assumption, and assumptions rot: Stripe ships changes on its own schedule, and your frozen handler won’t hear about them.

The containment for that is a different kind of test, named here so you know it exists: a contract test . It runs against Stripe’s live sandbox on a schedule, nightly rather than on every commit, and its whole job is to notice when reality has drifted from your assumption. It’s far too slow and too flaky to live in your per-test loop, which is exactly why it lives outside it. This isn’t a gap in the strategy; it’s the strategy. Fast mocks for the inner loop, where you run hundreds of tests a minute, plus one slow real-network check on a schedule to catch drift. Two layers, each doing the job the other can’t.

One habit follows directly, and it’s worth stating because the temptation runs the other way: hand-write your fixtures, don’t record them. It’s tempting to capture a real Stripe response once and replay the whole fat JSON blob forever. Don’t. A recorded fixture arrives bloated with dozens of fields you never assert on, and when one of those drifts, nothing tells you. Five explicit lines of HttpResponse.json({ id: 'cs_test_123', url: '...' }) that name only the fields your test actually reads is the course default: it’s smaller, it’s legible, and it makes your assumption about Stripe’s response visible in the test instead of buried in a recording.

That’s the decision. The next lesson is the mechanics: how MSW actually intercepts that request, how you write the handler, and how you pull the bytes back out to assert on them. You’re carrying the only thing that lasts, the judgment of where the mock goes and why.

Before you go, here are two quick checks that you’ve internalized the boundary.

A test does vi.mock('stripe') and asserts checkout.sessions.create ran with the expected line_items. It’s green. Which production failures is this test structurally unable to catch? Select all that apply.

You passed the wrong priceId into line_items from the action.
A regression drops the auto-generated Idempotency-Key, so a double-clicked subscribe charges twice.
Stripe starts 400ing because the SDK negotiates a Stripe-Version the account no longer accepts.
The action calls create even though the member is on a plan that shouldn’t reach checkout.

Your action writes an audit-log row to your own Postgres and fires an HTTP call to your internal notifications microservice — a separate service your team also runs. In one integration test, how should each collaborator be handled?

Mock both: each is a collaborator the action depends on, and mocking keeps the test fast and isolated.
Let the audit-log insert hit the database inside tx and roll it back; intercept the notifications request at the network and assert on it.
Run both against the real services — only an end-to-end path proves the action actually works.
Stub the audit-log write, but let the notifications call go through for real, since it’s the side effect that matters most here.