Chapter 66Lesson 6

Waitpoints for callbacks and approvals

Trigger.dev waitpoints, durable pause tokens that let a background run park on a third-party callback, a human approval, or a batch of sub-jobs instead of polling.

The waits you learned in the last lesson all share one property: you know the timing up front. wait.for({ seconds: 2 }) waits two seconds. wait.until(periodEnd) waits until a date you already hold. You set the clock, the clock goes off, the run continues. That covers the workflows that wait on time.

Real workflows often wait on something else. They wait on a partner’s render farm to finish a job that takes anywhere from two minutes to two hours. They wait on an admin to click “Approve” on a refund, which might happen in thirty seconds or after lunch. They wait on a batch of twelve sub-jobs to all report back. In none of these do you know when the wait ends. You only know what will signal you, and that signal comes from outside your run entirely.

With only the tools so far, you’d have to poll. Trigger the partner, then loop: wait.for({ seconds: 10 }), hit a status endpoint, check if it flipped, wait again. It works, sort of, and it costs you three things. It burns run-minutes waking a worker every ten seconds to ask a question that’s almost always “not yet.” It opens a race window: the job finishes one second after a poll and you don’t notice for another ten. And it assumes the partner even exposes a status endpoint to poll, which plenty don’t. This lesson gives you the primitive that removes all three costs: the waitpoint , a durable pause that the outside world completes rather than a clock you set.

Back in the lesson on when Trigger.dev earns its weight, you saw five trigger conditions. The fifth was event-driven, human-in-the-loop pauses, and we deferred its mechanism. This lesson is that mechanism, and it reuses everything from the last lesson, because a waitpoint is just another checkpoint boundary. The run parks, the worker is freed, and the run survives a crash while parked, exactly as with wait.for. The only new thing is who wakes it back up. By the end of this lesson you’ll pause a run on a third-party callback, on a human approval, and on a batch of sub-jobs, each with a mandatory timeout and zero polling.

A token a run can park on

Before the three real-world shapes, it helps to install the model, because all three are the same primitive seen from different angles. A waitpoint has a four-beat lifecycle:

Create a token. You call wait.createToken(...) and get back a handle.
Hand it out. You give whoever will complete the token a way to do so: a URL, an id, or a Bearer token. Which one you give depends on the completer, which the next two sections cover.
Park the run. You await wait.forToken(...). This is a checkpoint, so the worker is released and the run consumes nothing while it waits.
Resume. The token gets completed, by an HTTP callback, an SDK call, or its timeout firing, and the run wakes up, possibly on a different worker, with the completion payload in hand.

The third beat is the one worth sitting with. Recall from the retries lesson that a checkpoint is a saved snapshot the runtime can resume from after a crash. Parking on a token is a checkpoint, so a parked run isn’t a held thread or a busy worker. It’s a row in Trigger.dev’s database. It can park for six hours, survive a redeploy of your app, survive the worker that started it being recycled, and still resume the instant its token completes. Polling holds a worker hostage; parking lets it go.

Here’s the smallest possible version: create a token, then park on it.

const token = await wait.createToken({ timeout: '1h' });
// hand token.url or token.publicAccessToken to whoever completes it
const result = await wait.forToken<{ approved: boolean }>(token.id);
if (!result.ok) {
  throw new AbortTaskRunError('approval timed out');
}
const { approved } = result.output;

wait.createToken returns a handle with four fields: id, url, publicAccessToken, and isCached. The id (which starts with waitpoint_) is how you refer to the token; url and publicAccessToken are the two ways someone else completes it. Which one you hand out depends on the completer, covered in the next two sections. For now, focus on the create-then-park shape.

const token = await wait.createToken({ timeout: '1h' });
// hand token.url or token.publicAccessToken to whoever completes it
const result = await wait.forToken<{ approved: boolean }>(token.id);
if (!result.ok) {
  throw new AbortTaskRunError('approval timed out');
}
const { approved } = result.output;

The timeout is the one option you must never skip. It defaults to '10m', ten minutes, which is almost never what a real wait wants. A human approval or a partner job that runs for hours would hit that default and die. So you set the timeout explicitly, every single time, sized to the slowest completion you’re willing to accept.

const token = await wait.createToken({ timeout: '1h' });
// hand token.url or token.publicAccessToken to whoever completes it
const result = await wait.forToken<{ approved: boolean }>(token.id);
if (!result.ok) {
  throw new AbortTaskRunError('approval timed out');
}
const { approved } = result.output;

wait.forToken<T>(token.id) parks the run. The generic <T> is the shape of the payload you expect back: type it here and result.output is typed for free. This line is the checkpoint, and the worker is freed the moment you hit it.

const token = await wait.createToken({ timeout: '1h' });
// hand token.url or token.publicAccessToken to whoever completes it
const result = await wait.forToken<{ approved: boolean }>(token.id);
if (!result.ok) {
  throw new AbortTaskRunError('approval timed out');
}
const { approved } = result.output;

The result is { ok, output, error }. There’s exactly one way a forToken fails, which is that it timed out because nobody completed it in time, so ok: false always means “timed out.” That gives the resume path exactly two branches: completed in time, where result.output holds your payload, or timed out, where AbortTaskRunError fails the run cleanly, the way you learned to mark a permanent failure last lesson. There’s no third case to forget.

1 / 1

If branching on ok feels like ceremony when a timeout should just fail the run, the result has an .unwrap() helper that collapses it: const { approved } = (await wait.forToken<{ approved: boolean }>(token.id)).unwrap();. On success it hands you the output directly; on timeout it throws a timeout error instead of returning ok: false. Use .unwrap() when a timeout is genuinely fatal and there’s nothing special to do about it; branch on result.ok when a timeout means something, such as auto-reject, escalate, or notify.

There’s one subtlety about creating tokens that the durable model forces you to think about. A task can retry, and if it retries after it created a token but before it parked, a naive createToken would mint a brand-new token on the retry, orphaning the first one and handing out the wrong URL. That’s what the isCached field and an idempotency key on createToken are for: pass wait.createToken({ timeout: '1h', idempotencyKey: ctx.run.id }) and a retry of that run returns the same token with isCached: true instead of a fresh one. Two things to remember: token idempotency lives on createToken, not on forToken, and the key’s TTL defaults to '1h', much shorter than the 30-day default you saw on tasks.trigger, because a token’s whole job is short-lived. For waits that comfortably finish inside an hour the default is fine; size it up for the long ones.

Which wait do you actually want

This is the decision that trips people up most, so make it deliberately. You now have three families of wait, and they differ on exactly one axis: who completes the wait. That is the whole choice.

| Wait | Who completes it | Reach for it when… | | --- | --- | --- | | wait.for / wait.until | the clock | you know the delay or the deadline up front | | triggerAndWait / batchTriggerAndWait | a child task | the runtime owns the wait and hands you the child’s typed result | | wait.forToken | an external system or a human | you create the token and hand it out; something outside your run signals back |

Read that table top to bottom and the rule falls out. If a clock can tell you when to continue, you don’t need a token. If your own task produces the value you’re waiting on, you don’t need a token either, because triggerAndWait already parks you on that child and the runtime completes the wait when the child finishes. You reach for wait.forToken specifically when the thing that will signal you lives outside your run: a third party, a person, or another part of your system that has the SDK. The moment you find yourself building a completer by hand for something the runtime would have completed for you, you’ve picked the wrong row.

One more structural note before the applications, because it rounds out the model. The base case is one run, one token, one completer:

Run parks on token

Waitpoint token durable pause

Completed by…

Third party (HTTP)Human (SDK call)Timeout

One run parks on one token, and any one of three completers — a third party's HTTP callback, a human's SDK-backed click, or the timeout — wakes it. The topology generalizes both ways: many runs could listen on a single token, and one run could wait on several tokens at once. That second direction — wait for *all* of N — is built later in this lesson with `batchTriggerAndWait`.

The topology generalizes in two directions, both worth naming so the model is complete. A single token can unblock more than one run. And, more useful in practice, a single run can depend on several completions at once: spawn twelve sub-jobs, resume when all twelve report back. We’ll build that second direction at the end of the lesson. Note one thing now so a stale AI suggestion doesn’t lead you astray: there is no wait.forWaitpoint([...], { all }) API in Trigger.dev. The fan-in is built from tools you already have. More on that when we get there.

Handing work to a third party and waiting for the callback

Start with the cleanest application: you call out to a third party, and they call back when they’re done. A task kicks off a long external job, such as a video transcode, a document render, or a partner data import. That job runs for minutes to hours on someone else’s infrastructure, and it reports completion by hitting a URL you give it.

So where does your run live in the meantime? Without waitpoints, the answer is a whole apparatus. You’d expose a public webhook route to receive the callback. You’d persist a correlation row tying the partner’s job id to your run id, so you know which run a given callback belongs to. You’d dedup the callback in case it arrives twice. And then you’d have to somehow resume the right run from inside that route handler. That’s the machinery you built for Stripe webhooks, rebuilt for every partner you integrate.

The waitpoint collapses all of it. The token’s url field is the callback URL, and completing it is the resume:

export const renderVideo = schemaTask({
  id: 'render-video',
  schema: z.object({ organizationId: z.uuid(), sourceUrl: z.url() }),
  run: async ({ organizationId, sourceUrl }) => {
    const token = await wait.createToken({ timeout: '6h' });

    await fetch('https://api.partner.example/render', {
      method: 'POST',
      body: JSON.stringify({ source: sourceUrl, callbackUrl: token.url }),
    });

    const result = await wait.forToken<{ renderUrl: string }>(token.id);
    if (!result.ok) {
      throw new AbortTaskRunError('render callback timed out after 6h');
    }

    const db = tenantDb(organizationId);
    await db.insert(renders).values({ url: result.output.renderUrl });
  },
});

Size the timeout to the partner’s worst-case SLA, not a guess. If their render can take up to four hours on a bad day, '6h' leaves headroom, where '10m' would kill the run while the partner is still working. The rule is to size the timeout to the slowest completion you’re willing to accept.

export const renderVideo = schemaTask({
  id: 'render-video',
  schema: z.object({ organizationId: z.uuid(), sourceUrl: z.url() }),
  run: async ({ organizationId, sourceUrl }) => {
    const token = await wait.createToken({ timeout: '6h' });

    await fetch('https://api.partner.example/render', {
      method: 'POST',
      body: JSON.stringify({ source: sourceUrl, callbackUrl: token.url }),
    });

    const result = await wait.forToken<{ renderUrl: string }>(token.id);
    if (!result.ok) {
      throw new AbortTaskRunError('render callback timed out after 6h');
    }

    const db = tenantDb(organizationId);
    await db.insert(renders).values({ url: result.output.renderUrl });
  },
});

You hand the partner token.url. This field is the server-to-server completion webhook, and it carries no CORS headers, which is exactly right for a backend partner calling from its own servers. Do not hand over token.id, which is an identifier rather than a URL, or token.publicAccessToken, which is for browsers and is covered in a moment. The wrong handle here means the partner can’t complete the token, and your run dies on the timeout with no obvious reason.

export const renderVideo = schemaTask({
  id: 'render-video',
  schema: z.object({ organizationId: z.uuid(), sourceUrl: z.url() }),
  run: async ({ organizationId, sourceUrl }) => {
    const token = await wait.createToken({ timeout: '6h' });

    await fetch('https://api.partner.example/render', {
      method: 'POST',
      body: JSON.stringify({ source: sourceUrl, callbackUrl: token.url }),
    });

    const result = await wait.forToken<{ renderUrl: string }>(token.id);
    if (!result.ok) {
      throw new AbortTaskRunError('render callback timed out after 6h');
    }

    const db = tenantDb(organizationId);
    await db.insert(renders).values({ url: result.output.renderUrl });
  },
});

Park on the token. The worker is freed for the entire wait, six hours if it comes to that, at zero cost. Compare that to the poll loop, which would wake a worker every ten seconds for six hours to ask “done yet?” and get “no” almost every time.

export const renderVideo = schemaTask({
  id: 'render-video',
  schema: z.object({ organizationId: z.uuid(), sourceUrl: z.url() }),
  run: async ({ organizationId, sourceUrl }) => {
    const token = await wait.createToken({ timeout: '6h' });

    await fetch('https://api.partner.example/render', {
      method: 'POST',
      body: JSON.stringify({ source: sourceUrl, callbackUrl: token.url }),
    });

    const result = await wait.forToken<{ renderUrl: string }>(token.id);
    if (!result.ok) {
      throw new AbortTaskRunError('render callback timed out after 6h');
    }

    const db = tenantDb(organizationId);
    await db.insert(renders).values({ url: result.output.renderUrl });
  },
});

The timeout branch is mandatory, never a TODO. A partner that goes silent must not park your run forever. AbortTaskRunError fails the run cleanly so your onFailure handler and alerting fire, which means you find out the partner went silent instead of discovering a run stuck in “Waiting” weeks later.

export const renderVideo = schemaTask({
  id: 'render-video',
  schema: z.object({ organizationId: z.uuid(), sourceUrl: z.url() }),
  run: async ({ organizationId, sourceUrl }) => {
    const token = await wait.createToken({ timeout: '6h' });

    await fetch('https://api.partner.example/render', {
      method: 'POST',
      body: JSON.stringify({ source: sourceUrl, callbackUrl: token.url }),
    });

    const result = await wait.forToken<{ renderUrl: string }>(token.id);
    if (!result.ok) {
      throw new AbortTaskRunError('render callback timed out after 6h');
    }

    const db = tenantDb(organizationId);
    await db.insert(renders).values({ url: result.output.renderUrl });
  },
});

Tenancy is re-derived from the payload with tenantDb(organizationId) inside the body. A task inherits no auth context, so the org id rides in the payload and you scope from it, the rule you’ve followed since you started passing org context into background work.

1 / 1

Notice what you didn’t write. There’s no public route, no signature verification, no processed_events table, no correlation row mapping a partner job id to a run id, and no transactional callback handler. The runtime owns the URL, owns the authentication on it, owns the dedup, and owns the resume. You didn’t build a webhook receiver; you handed the partner a one-shot resume button and parked on it.

There’s an observability payoff too, the same free-dashboard benefit that’s been paying off since you first triggered a task. A parked run shows up as “Waiting” in the Trigger.dev dashboard, with the token id and a live countdown to its timeout. When a third-party integration goes quiet, that’s the first place you look, and it tells you instantly which side broke. If the run shows “Waiting,” you did your part and the partner never called back. If it never reached “Waiting,” the bug is on your side, before the handoff. You diagnose a silent integration at a glance, without adding a single log line.

Pausing for a human approval

This is the most product-shaped use of waitpoints, and the one that introduces a new hard rule. Some operations have to wait for a person. A refund above a threshold, a destructive admin action, or a plan downgrade that takes effect immediately must be approved by a human before they proceed.

Consider why neither of the easy options works. You can’t run the work synchronously in the request, because the approver might take hours and no request lives that long. And you can’t make it fire-and-forget, because the decision has to actually gate the action: “approve or reject” must be able to stop it, not just annotate it after the fact. The waitpoint is the join between those. The task parks on a token, a human’s click completes it, and the task resumes carrying the decision.

What makes this pattern click is seeing both ends of the same token, which live in two different files. The task creates the token and parks. A Server Action, triggered by the admin’s click, completes it.

Task side
Server Action side

export const processRefund = schemaTask({
  id: 'process-refund',
  schema: z.object({ organizationId: z.uuid(), refundId: z.uuid() }),
  run: async ({ organizationId, refundId }) => {
    const token = await wait.createToken({ timeout: '48h' });

    const db = tenantDb(organizationId);
    await db.insert(pendingApprovals).values({
      refundId,
      waitpointTokenId: token.id,
    });
    await notify(`Refund ${refundId} needs approval`);

    const result = await wait.forToken<{ decision: 'approve' | 'reject' }>(
      token.id,
    );
    if (!result.ok || result.output.decision === 'reject') {
      await markRefundRejected(db, refundId);
      return;
    }
    await issueRefund(db, refundId);
  },
});

The run parks for up to 48 hours, consuming nothing. It writes a pending_approvals row carrying token.id so the admin UI can map an approval back to its token, notifies the approver, then parks. No worker is held and nothing polls for those 48 hours. The branch handles a timeout (!result.ok) and an explicit reject the same way; only an approve issues the refund.

'use server';

export async function approveRefund(
  approvalId: string,
  decision: 'approve' | 'reject',
) {
  const { orgId } = await requireOrgUser();
  const db = tenantDb(orgId);

  const approval = await getPendingApproval(db, approvalId);
  await wait.completeToken(approval.waitpointTokenId, { decision });

  return { ok: true as const };
}

Completing the token is the resume. requireOrgUser() enforces that only an authorized member of this org can decide. The action looks up the pending_approvals row to get the token id, then wait.completeToken(tokenId, { decision }) completes it. The parked task wakes on a possibly different worker and continues from the line right after wait.forToken, carrying the decision. This returns the Result shape your Server Actions have used since you first wrote one.

The new call here is wait.completeToken(tokenId, payload), the programmatic completion path, the counterpart to token.url. Where token.url is for an external system you hand a URL to, completeToken is for your own code completing the token from anywhere it has the SDK and an auth context: a Server Action, a Route Handler, or another task. This is why it’s the right tool for human approvals specifically. The click doesn’t go to Trigger.dev directly; it goes through your authenticated Server Action, where you check requireOrgUser() first. The person never touches the token. Your code does, on their behalf, after you’ve decided they’re allowed to.

Never complete a token inside a transaction that can roll back

The new hard rule is the kind that fails silently in production, so it’s worth a moment. Completing a token is an external side effect: once it’s done, the resume is out of your database’s control. The runtime has already woken the parked run, and a rollback cannot undo that.

So picture completing the token inside a db.transaction, before the transaction commits:

Wrong — complete inside tx
Right — commit, then complete

await db.transaction(async (tx) => {
  await markRefunded(tx, refundId);
  await wait.completeToken(tokenId, { decision: 'approve' });
  if (await balanceTooLow(tx)) {
    throw new Error('insufficient balance');
  }
});

The token is already completed when the transaction rolls back. If balanceTooLow throws, the update is undone, but the token completion is not, because it left your database entirely. The parked task has already woken and is now acting on a refund the database says never happened. It’s silent, the kind of bug you find through a confused support ticket rather than a stack trace.

await db.transaction(async (tx) => {
  await markRefunded(tx, refundId);
  if (await balanceTooLow(tx)) {
    throw new Error('insufficient balance');
  }
});
await wait.completeToken(tokenId, { decision: 'approve' });

Commit first, complete last. The transaction does its DB work and either commits or rolls back and throws, in which case completion never runs. Only after the commit succeeds do you complete the token, so the run resumes on state that’s guaranteed to exist. The token completion is the final step, outside the transaction.

This is one specific instance of a rule you met when you first learned transactions: external side effects go after the commit, never inside it. You’ve applied it to Resend sends and Stripe calls already. Completing a token is just another external side effect, and it obeys the same rule for the same reason. If the work it depends on might roll back, complete the token only once that work has actually landed.

The hard rule has a reassuring counterpart. A token completes exactly once. A second wait.completeToken for the same token, from a double-click, a retried Server Action, or an admin who got impatient, is a no-op and returns without effect. So the callback itself needs no dedup table on your side; the runtime guarantees one resume per token. Be precise about what this covers, because it’s easy to over-read. The completion is idempotent for free. The work the resume kicks off is not: if the resumed task triggers downstream jobs or sends an email, those still carry the idempotency keys you learned last lesson. The token’s one-shot completion is the runtime’s job; the resumed work’s idempotency is still yours.

Now order the whole flow end to end. It crosses two files and a human, and getting the causality straight is the point of the exercise.

Order the steps of the refund-approval flow, from the task spawning to the run resuming. Drag the items into the correct order, then press Check.

const token = await wait.createToken({ timeout: '48h' });
const result = await wait.forToken<{ decision: 'approve' | 'reject' }>(token.id);

// app/refunds/actions.ts ('use server')
await wait.completeToken(approval.waitpointTokenId, { decision });

The task creates the waitpoint token with a 48-hour timeout.

The task writes the pending_approvals row carrying the token id.

The task notifies the approver and parks on wait.forToken.

An admin clicks “Approve” in the dashboard.

The Server Action looks up the token id and calls wait.completeToken.

The task resumes and applies the decision after committing.

Waiting for many sub-jobs to finish

The last shape is fan-in: spawn N units of work, resume only when all N are done. A task fans out (export each of twelve report sections, resize each of two hundred uploaded images, process each of N imported rows) and a final step must run only after every child finishes. This is the “one run waits on many completions” direction the topology diagram showed.

One correction comes first, because this is the one place a stale AI completion or an older tutorial will hand you an API that doesn’t exist: there is no wait.forWaitpoint([t1, t2, t3], { all }) in Trigger.dev. Don’t reach for it; it isn’t real. The tool for “parallel children, one wait” is batchTriggerAndWait, which you met when you learned to trigger tasks. The runtime creates and manages a waitpoint for every child internally, so you never touch the tokens, and it hands you back a typed array of results once all of them settle.

const results = await sectionTask.batchTriggerAndWait(
  sections.map((s) => ({
    payload: { organizationId, sectionId: s.id },
    options: {
      idempotencyKey: await idempotencyKeys.create([s.id, 'section'], {
        scope: 'run',
      }),
    },
  })),
);

const failures = results.runs.filter((r) => !r.ok);
metadata.set('failedSections', failures.length);

batchTriggerAndWait parks the parent on all the children at once. One checkpoint, the worker freed, and the parent resumes when the last child settles, rather than one wait per child.

const results = await sectionTask.batchTriggerAndWait(
  sections.map((s) => ({
    payload: { organizationId, sectionId: s.id },
    options: {
      idempotencyKey: await idempotencyKeys.create([s.id, 'section'], {
        scope: 'run',
      }),
    },
  })),
);

const failures = results.runs.filter((r) => !r.ok);
metadata.set('failedSections', failures.length);

Each child carries a per-child idempotency key built with idempotencyKeys.create([s.id, 'section'], { scope: 'run' }). The section id makes it unique, and scope: 'run' namespaces it against the parent run id for you. This is the cross-step key pattern from last lesson: if the parent retries, it re-issues the same keys, so children that already finished return their cached result instead of running twice.

const results = await sectionTask.batchTriggerAndWait(
  sections.map((s) => ({
    payload: { organizationId, sectionId: s.id },
    options: {
      idempotencyKey: await idempotencyKeys.create([s.id, 'section'], {
        scope: 'run',
      }),
    },
  })),
);

const failures = results.runs.filter((r) => !r.ok);
metadata.set('failedSections', failures.length);

The batch resolving does not mean every child succeeded; it means every child settled. Each entry in results.runs is { ok, output } (with error when !ok), so you inspect failures explicitly. Here, filter the ones where ok is false.

const results = await sectionTask.batchTriggerAndWait(
  sections.map((s) => ({
    payload: { organizationId, sectionId: s.id },
    options: {
      idempotencyKey: await idempotencyKeys.create([s.id, 'section'], {
        scope: 'run',
      }),
    },
  })),
);

const failures = results.runs.filter((r) => !r.ok);
metadata.set('failedSections', failures.length);

metadata.set(...) writes live progress that the dashboard and the in-app inspector render, the “47 of 200” channel. You met it when you first triggered tasks, and the next chapter builds a real export’s progress on it.

1 / 1

When does a raw token still beat batchTriggerAndWait? The deciding factor is the same axis as the whole lesson: who completes the work. Use batchTriggerAndWait when the parallel work is your own tasks, because the runtime owns those waits and there’s no reason to manage tokens by hand. Reach for raw wait.createToken when each unit of work is completed by an external system or human, say twelve partner jobs each calling back, or three separate approvers. Then you create N tokens, hand each one out, and wait for all of them:

const tokens = await Promise.all(
  partners.map(() => wait.createToken({ timeout: '6h' })),
);
// hand each token.url to its partner…
const results = await Promise.all(tokens.map((t) => wait.forToken(t.id)));

That Promise.all over forToken is the documented way to wait on many external completions; it’s just parking on several checkpoints at once. It is not wait.forWaitpoint, which still doesn’t exist. The batch path is the common one and the one you’ll write most days; this Promise.all shape is the escape hatch for when the completers live outside your system.

A task fans out 200 image-resize sub-jobs — every one of them your own Trigger.dev task — and must run its final step only after all 200 have settled. Which one is correct in Trigger.dev v4?

await wait.forWaitpoint(tokenIds, { all: true });

await resizeTask.batchTriggerAndWait(
  jobs.map((j) => ({
    payload: { organizationId, imageId: j.id },
    options: {
      idempotencyKey: await idempotencyKeys.create([j.id, 'resize'], {
        scope: 'run',
      }),
    },
  })),
);

while ((await getDoneCount(db)) < 200) {
  await wait.for({ seconds: 10 });
}

const tokens = await Promise.all(
  jobs.map(() => wait.createToken({ timeout: '1h' })),
);
await Promise.all(tokens.map((t) => wait.forToken(t.id)));

The correct answer is batchTriggerAndWait over the 200 payloads, each with its own idempotency key. It’s the idiomatic shape for parallel children with one wait: the runtime creates and owns a waitpoint for every child internally, parks the parent on all of them at once, and hands you a typed result array once they settle. The first option calls a wait.forWaitpoint multi-token API that does not exist in v4. The wait.for loop is the exact polling waste waitpoints exist to delete — a held-open worker waking every ten seconds to re-ask a question. The 200-token Promise.all over wait.forToken runs, but raw tokens are the tool for external completers (a partner or a human signalling back); pointing them at your own tasks makes you hand-manage 200 tokens the runtime would have managed for free — the right tool aimed at the wrong problem.

Where waitpoints fail: timeouts, leaks, and rollbacks

These are the failure modes that turn into real incidents. Each one is a symptom, a cause, and a fix. Read them as a set, because four of the five trace back to a decision you’ve already made in this lesson, and naming the failure is how the decision sticks.

The forever-parked run. Symptom: runs pile up in “Waiting” and never leave, and concurrency seats and run-minutes leak quietly. Cause: a missing timeout, so it defaulted to ten minutes (or worse, you assumed there was no default and pictured it waiting forever while it actually died at ten minutes), or a timeout set so long the run outlives any legitimate completion. Fix: every token gets a timeout sized to the slowest acceptable completion, and the !result.ok branch is real code, never a TODO. The rule, stated plainly: an indefinite wait is a leak.

The wrong handle to a third party. Symptom: the partner can’t complete the token, and your run dies on its timeout with no obvious reason in the logs. Cause: you handed over token.id (an identifier, not a URL) or token.publicAccessToken (a Bearer token meant for browsers) where the partner needed a server-to-server callback URL. Fix: match the handle to the completer.

| Completer | Hand it… | | --- | --- | | A server (a backend partner) | token.url, no CORS, server-to-server | | A browser / client | token.publicAccessToken, a Bearer token on the CORS-enabled completion endpoint |

Completion inside a transaction that can roll back. Symptom: the task resumes acting on state the database rolled back. Cause: wait.completeToken called inside a db.transaction that later throws. Fix: commit first, complete last, the canonical case of “external side effect after the commit, never inside it.”

Reaching for a token when the clock or a child task was the answer. Symptom: you hand-build a completer for something the runtime would have completed for free. Cause: wait.forToken for a known delay (that’s wait.for) or for your own child task’s result (that’s triggerAndWait / batchTriggerAndWait). Fix: the table above, where “who completes it” picks the row.

Assuming a token is many-shot, or that a batch resolving means success. Symptom: you expect to complete one token repeatedly, or you trust a resolved batch to mean every child succeeded. Cause: over-reading the guarantees. Fix: a token is one-shot, since the second completion is a silent no-op, and a batch resolving means every child settled, not that every child succeeded. Inspect per-child ok after a batch, every time.

A quick check on the two guarantees that trip people up most:

Each claim is about a waitpoint guarantee you just read. One wrong assumption here is a silent production bug. Mark each statement True or False.

Completing the same token twice resumes the parked run twice — once per wait.completeToken call.

False. A token is one-shot: it completes exactly once and resumes the run once. A second wait.completeToken for the same token — from a double-click, a retried Server Action, an impatient admin — is a silent no-op. That’s why the callback itself needs no dedup table on your side.

A token created with no explicit timeout waits indefinitely until something completes it.

False. With no explicit timeout it defaults to '10m' — and ten minutes is almost never what a real human approval or partner job needs, so the run dies on the default while the work is still legitimately in flight. Set the timeout explicitly on every token, sized to the slowest completion you’re willing to accept.

After batchTriggerAndWait resolves you must still inspect each entry’s ok, because resolving means every child settled, not that every child succeeded.

True. The batch resolving means all children settled — some may have failed. Each results.runs entry is { ok, output } (with error when !ok), so you filter the failures explicitly rather than assuming every child succeeded.

token.url is the right handle to hand a browser client so it can complete the token directly from JavaScript.

False. token.url is the server-to-server completion webhook and carries no CORS headers, so a browser can’t call it. A browser client gets token.publicAccessToken as a Bearer token on the CORS-enabled completion endpoint instead. Hand a server partner token.url; hand a browser publicAccessToken.

That’s the whole primitive. A waitpoint is a durable token the outside world completes, by an HTTP callback, an SDK call, or a timeout, sized with a mandatory timeout, completed exactly once, and never completed inside a transaction that can roll back. It removes the poll loop, the glue webhook handler, and the held worker in one move, and it shows up as three workflow shapes that are really one primitive: call out and wait for a callback, park on a human’s decision, and fan in across many children. In the next lesson you’ll see which of the app’s actual workloads reach for this, and in the chapter project that follows, the CSV export uses metadata.set for live progress while the approval and callback shapes are the patterns a real product layers on top.

Going deeper

The waitpoint API surface moves faster than most of the stack, so when in doubt, trust the official docs over any tutorial, including this one, for the exact current shapes.

Trigger.dev — Wait for token

trigger.dev

The canonical createToken / forToken / completeToken API reference — the source of truth for this version-volatile surface.

Trigger.dev — Wait for HTTP callback

trigger.dev

The third-party callback pattern end to end — handing token.url to a partner and resuming on their POST back.

What is durable execution?

restate.dev

The vendor-neutral primitive under a parked run — why suspend-and-resume survives crashes without holding a worker.