Chapter 66Lesson 3

When Trigger.dev earns its weight

The decision test for when a workload earns a durable background-job platform like Trigger.dev, and when the cheap tiers still win.

You can already keep work off the request path three ways. Inline await handles work the user is waiting on. after() handles cleanup that has to run on the same invocation but the user never needs to see. Vercel Cron handles anything on a schedule. Just as important, you can defend not reaching for anything heavier: most of the work a SaaS does fits comfortably in those three tiers, and knowing that is what stops you from over-building.

This lesson draws the line those three tiers still can’t cross. On the far side of it sits a durable background-job platform, and in this course that platform is Trigger.dev. The next three lessons teach its SDK in detail, but before you learn how to write a task you need to know whether you should be reaching for the SDK at all. That is the real skill here. By the end you’ll be able to stand at the fork and argue both directions: “no, Vercel Cron is enough, here’s why” and “yes, Trigger.dev, because of this exact property.” More often than you’d guess the answer is “no, you don’t need it yet,” and being able to say why is what separates an experienced engineer from one who reaches for the heaviest tool first.

So this isn’t an API lesson, and there’s almost no code in it. It’s a decision lesson, and what you walk away with is a test you can run against any workload.

Here’s the ladder you already own, drawn as one picture so you can see what we’re adding to it.

flowchart LR
  inline["<b>inline await</b><br/><i>user waits on it</i>"]
  after["<b>after()</b><br/><i>same invocation,<br/>after the response</i>"]
  cron["<b>Vercel Cron</b><br/><i>on a schedule</i>"]
  next(["<b>?</b>"])

  inline --> after --> cron --> next

  class inline,after,cron tier
  class next unknown
  classDef tier fill:#dbeafe,stroke:#1d4ed8,color:#111,stroke-width:2px
  classDef unknown fill:transparent,stroke:#94a3b8,color:#94a3b8,stroke-width:2px,stroke-dasharray:5 5

Three tiers cover same-invocation and scheduled work. This lesson is about what lives in the fourth.

The whole lesson is about filling in that fourth slot. You’re extending a ladder you already climbed, not starting over: every tier below the question mark is still the right default until something specific forces you up to the next one.

The senior question: what the cheap tiers still can’t do

Between them, inline await and after() cover same-invocation work: anything that can finish inside one function call. Vercel Cron covers schedules. Put those together and you’ve handled a large fraction of what a SaaS needs to do in the background.

So the question to sit with is precise. What workloads do those tiers, combined, still fail at, and what specifically does a durable job platform provide that’s worth the cost of running a second platform?

That second clause matters as much as the first, because the cost is real and permanent. A job platform is not a library you add to package.json and forget; it’s a second system. That means a second deploy step in CI, a second dashboard to check when something’s wrong, a second secret to rotate, and a second place a 3 a.m. page can come from. None of that is free, and none of it goes away once you’ve added it. So the capability you’re buying has to be worth that standing cost: not merely nice to have, but worth a second platform.

That gives us the rule that governs the rest of this lesson, the same rule that has run through this whole chapter. You escalate only when a named, testable condition crosses, never on a hunch. “This feels like it should be a background job” is not a reason. “This trips condition 3” is. Next we’ll name those conditions, and it turns out there are exactly five.

Five conditions that justify a job platform

Each of these five is a property of a workload that the cheap tiers cannot provide. They’re named so you can hold a real workload up against each one and get a yes-or-no answer. If a workload trips even one of them, the cheap tiers are out and you’ve earned the second platform. If it trips none, you stay cheap.

We’ll walk them in escalation order, roughly from the condition that breaks the cheap tiers most obviously to the one that’s most subtle. For each, hold onto two things: the concrete workload that trips it, and exactly how the cheap tier fails.

Condition 1 of 5

Past the function time wall

Work that needs more wall-clock time than a single function invocation gets, past the 13-minute cap on Pro, 5 minutes on Hobby. The textbook case is a 50,000-row CSV export: reading, formatting, and writing that many rows can’t finish before the wall.

Condition 2 of 5

Multi-step orchestration with intermediate state

Step A, then a pause or a wait for something external, then step B, where re-running step A after a failure in step B would be wrong or expensive. Think: charge a card, then provision the account, then send the welcome email. If provisioning fails, the retry must not charge the card a second time.

Condition 3 of 5

Automatic retries with backoff

Work that must survive a transient downstream outage, on its own schedule rather than the user’s. A partner API or Resend returns a 503, and the right behavior is to try again in 2 seconds, then 6, then 20, until it comes back.

Condition 4 of 5

Fan-out with concurrency control

One trigger that spawns many child runs, hundreds or thousands or tens of thousands, with a cap on how many run at once. This shape is called fan-out . The canonical case is a weekly digest that has to email 50,000 users without tripping Resend’s rate limit.

Condition 5 of 5

Event-driven / human-in-the-loop pauses

Work that blocks on something outside your system: a third-party callback, a human clicking “approve,” or a wall-clock delay measured in hours or days. Kick off a partner video render and resume only when the partner calls back, or hold a refund until an admin approves it.

Read those five back and notice what they have in common: each one is about time or durability in a way a single request can’t honor. A request is short, it runs once, and when it’s over it’s over. The five conditions are all the shapes of work that outlive a single request: work that has to take longer than one, survive the failure of one, multiply into many, or wait across many.

Now practice the test, because being able to run it is the actual skill. The exercise below gives you a handful of real workloads. For each, decide which condition forces it off the cheap tiers, or drop it in the cheap-tier bucket if none of them do. Watch for the trap: some of these belong on the cheap tier, and reaching for a job platform anyway is the most common real-world mistake.

Sort each workload into the condition that forces it off the cheap tiers — or into the cheap-tier bucket if none do. Drag each item into the bucket it belongs to, then press Check.

Past the time wall Exceeds the per-invocation cap

Multi-step with state Steps that must not redo each other on retry

Retries with backoff Survives a transient outage on its own schedule

Fan-out One trigger, many capped child runs

Event / human pause Blocks on a callback, approval, or long delay

Stays on the cheap tier Inline, after(), or Vercel Cron

Send one invitation email (~200 ms)

Export 80,000 invoices to a CSV file

Email a weekly digest to every user without tripping the rate limit

Wait for a partner’s render webhook before saving the result

Retry a flaky payment-provider call until it comes back

Charge the card, then provision, then email — no step repeats on retry

Nightly four-minute trial-expiry sweep

Hold a refund until an admin approves it

Conditions that do not justify a job platform

The exercise had cheap-tier chips in it for a reason. The most common mistake engineers make with background jobs isn’t missing a real trigger; it’s reaching for the platform when none of the five have actually crossed. So this section gets equal billing with the last one.

Here are the non-triggers, each with the answer you should give instead.

A slow API call that’s still under the time wall. Slowness on its own is not a trigger. If the user doesn’t need the result, push it to after(). If they do need it, you’re stuck with the latency: moving it to a job platform doesn’t make it faster, it just adds a “where did my result go” problem on top. A 3-second call is annoying, but it is not a reason to run a second platform.

A nightly job that fits the function budget. A four-minute sweep that runs once a day on a schedule is exactly what Vercel Cron is for. Having a schedule is not a trigger by itself. You only climb past Cron when the scheduled work also trips one of the five, because it’s too big for one invocation, or needs retries, or fans out. Schedule alone stays on Cron.

“I want a separate worker for cleanliness.” This one is seductive because it sounds like good architecture, but a job platform is not an aesthetic choice. Pulling a Server Action’s body out into a “clean” separate worker, when the work finishes fine inside the action, buys you a second deploy, a second dashboard, and a network hop, all in exchange for a feeling. Separation you can’t tie to one of the five conditions is cost with no return.

Here’s the line to keep in your head, the one you can quote in a code review when a teammate proposes a job for a workload that doesn’t need one: escalate on a condition, never on a vibe.

A teammate opens a pull request that moves the body of inviteMember — a DB insert plus a single ~200 ms Resend call that already finishes well inside the function budget — out into a Trigger.dev task. Their PR description reads: “Keeps the Server Action thin and puts the email logic in its own file.” You’re the reviewer. What’s the right call?

Request changes — nothing here trips one of the five, so the move pays a second platform’s standing cost to buy a tidier file. If the action feels crowded, lift the email into a plain helper and keep the work inline.

Approve — pushing side effects into background tasks is the cleaner long-term architecture, and a dedicated file makes the email logic easier to locate.

Approve — a 200 ms outbound call sitting on the request path is precisely the kind of latency a job platform is meant to take off the user’s hands.

Request changes, but only over the missing idempotency key — once the trigger is deduplicated, moving this to a task is the correct shape.

The decision tree, from request to durable job

Now assemble the whole thing. The five conditions don’t live in isolation. They sit at the bottom of a funnel that starts with much cheaper questions, and an experienced engineer runs that funnel top to bottom for every new piece of work. The decision is in the order the questions get asked, not in any single answer.

Walk the tree below. Each step is a question; pick the branch that matches your workload and it advances. The point isn’t to memorize the leaves but to internalize the sequence, so that when you meet a workload this lesson never mentioned, you run the same funnel on it automatically. The schedule branch is the same one from the previous lesson; this tree wraps that decision inside the larger one.

Where does this work run?

The shape to take away is the funnel itself: Is the user waiting? → Can it finish on this invocation? → Is it a schedule that fits? → Which of the five forced it up? Four questions, asked in that order, and most workloads get an answer before they ever reach the last one. The job platform only wins at the very bottom, which is exactly why it’s the last tier rather than the first reach.

Why Trigger.dev, and what else is out there

You now know when to escalate. The remaining question is which tool, and that deserves an honest answer rather than a dogmatic one. The field has several good options in 2026, and the course picks one on purpose.

Here’s the landscape, one line each, with the niche each one wins:

Inngest: a serverless-native event system with step functions. Similar shape to Trigger.dev, and particularly strong for teams whose architecture is already event-driven.
Vercel Queues: Vercel-native durable pub/sub, where you publish to topics and consumer groups process in the background with retries and sharding. It’s lighter than a full orchestration runtime, which also makes it a weaker fit for multi-step jobs that carry intermediate state. As of early 2026 it’s in public beta with at-least-once delivery, worth flagging because architecting on a beta’s delivery semantics is a risk you’d want to take with eyes open.
BullMQ + Redis: self-managed and fully under your control, but you run the Redis instance and the worker process yourself. Wins on hosts with persistent infrastructure, like Render or Railway.
AWS SQS + Lambda: enterprise scale with a heavy operational surface. Wins when you’re already deep inside an AWS footprint and the job system should live there too.

The course picks Trigger.dev v4, which went GA in 2026 on a rebuilt run engine, for one reason above the rest: it’s the best developer experience for a small team in 2026. You get typed payloads, durable runs, visible run timelines you can scrub through, durable pauses, and a local-CLI loop that lets you kill a run mid-flight and watch it recover. For someone shipping a SaaS solo, that free observability and typed surface lower the amount of hard-won judgment you have to supply yourself more than any alternative does. And if cost or data residency ever forces your hand, there’s an Apache-2.0 self-host off-ramp: the full platform runs on your own Docker and Postgres, with no run limits and no features held back behind a paywall. You’re not locked in.

Match each background-job tool to the situation where it's the strongest fit. Click an item on the left, then its match on the right. Press Check when done.

Trigger.dev v4

A small team that wants durable runs and a run dashboard with near-zero ops

Inngest

An architecture that’s already built around events and step functions

Vercel Queues

Lightweight durable pub/sub that stays inside the Vercel platform

BullMQ + Redis

Full control on a host with persistent infrastructure you already run

AWS SQS + Lambda

A workload that should live inside an existing AWS footprint

Now close the loop on the senior question from earlier: what exactly does the platform buy that’s worth a second system? Map Trigger.dev’s capabilities straight back onto the five conditions:

Durable runs that survive worker crashes and redeploys answer conditions 1 and 2 (past the time wall, multi-step with state). The run checkpoints between steps and resumes on a new worker.
Declared retries with exponential backoff and jitter answer condition 3. You configure the policy; the platform runs it.
Code-defined queues with concurrency limits answer condition 4. The queue holds the fan-out and meters how many run at once.
Waitpoints , with wait.for and wait.until for durable pauses, answer condition 5. The run parks and the worker goes free.
Typed payloads and a run dashboard sit across all five: every run, with its input and every step, is visible without you building any of it.

Notice those are named as capabilities, not as code: waitpoints, queues, wait.for. That’s deliberate, because writing them is the job of the next three lessons. Right now you only need to know they exist and which condition each one answers.

Where the run lives: Trigger.dev’s architecture

This lesson keeps calling Trigger.dev a “second platform,” so let’s make that concrete, because the topology has a direct consequence for the code you’ll write.

Trigger.dev runs as a separate service: either Trigger.dev’s cloud or your own self-hosted instance. Your app doesn’t run the task, it triggers it. The app makes a call over HTTPS that says “run this task with this payload,” and the task then executes on Trigger.dev’s workers, not inside your Vercel function. This is the part to get right in your head: a task does not run inside the Server Action that triggered it. The action fires the trigger and returns; the work happens somewhere else.

The diagram below shows the three pieces and how they connect.

The app triggers tasks over HTTPS; the work runs on Trigger.dev's workers. Both talk to the same Postgres, which is where the next section's catch lives.

Two practical facts fall out of that picture. First, the tasks live in your codebase, in a src/trigger/ folder, and ship via the Trigger.dev CLI. So it’s two deploys from one codebase: vercel deploy for the app and trigger deploy for the tasks, with types flowing between them through the shared SDK. (There’s an ordering rule, deploy Trigger.dev first, but that’s a detail for the wiring lesson at the end of this chapter; don’t worry about it yet.) It is not a separate repo or a separate language; it’s the same code, run by a second runtime.

Second, the cost is billed on a different unit, and this trips people up. Vercel bills you per invocation. Trigger.dev bills per run, per run-minute, and per concurrency seat. So the experienced reflex is to watch your per-task run count weekly, and to know that a sudden spike almost always means a missing idempotency key or a retry storm, not real growth. The trap to avoid is comparing “Trigger.dev’s cost per run” against “Vercel’s cost per invocation” as if they were the same number. They aren’t the same unit, so that comparison is a category error.

Tasks run outside your app’s context

This is the one genuinely new mental model in the lesson, and it’s the bridge to writing tasks in the next lesson. It follows directly from that diagram: if the task runs on Trigger.dev’s workers and not in your Vercel function, then it has none of the request-scoped context your app code has leaned on since the auth and multi-tenancy units.

No Better Auth session. No tenantDb middleware deriving the current org for you. No cookies, no headers, no requireOrgUser(). A task is its own world: it boots cold, with nothing but the payload you handed it. Every helper you’ve written that quietly reads “the current user” or “the current org” from the request is simply unavailable in there.

That gives you a rule you’ll apply in every task you ever write: every task payload carries the org context explicitly, as { organizationId, ... }, and every database call inside the task re-derives its tenant scope from that payload, via tenantDb(organizationId). The org id isn’t ambient anymore; it’s cargo, handed across the boundary in the payload and read back out on the other side.

The two panels below show the seam. This is the one code sketch in the lesson, and it’s illustrative: the SDK shapes here (tasks.trigger, the task body) are taught properly in the next lesson. Read it for the boundary, not the syntax.

From the app (has context)
Inside the task (no context)

export const exportInvoices = async (formData: FormData) => {
  const { orgId } = await requireOrgUser();
  const since = parseSince(formData);

  await tasks.trigger('export-csv', { organizationId: orgId, since });
};

The org id is handed across the boundary. The action already has orgId from requireOrgUser(). It puts that id into the payload, because the task can’t reach back and ask who the user is, so the caller has to tell it.

run: async ({ organizationId, since }) => {
  // No session, no cookies, no requireOrgUser() — this runs on a worker.
  const db = tenantDb(organizationId);
  const invoices = await listInvoicesSince(db, since);
  // ...write the CSV
};

Scope is re-derived from the payload, never assumed. organizationId comes straight out of the payload and goes into tenantDb(...). The task never assumes a tenant; it reads the one it was handed.

Two failure modes are worth guarding against, because both are common the first time someone writes a task. The first is assuming the task shares the caller’s request context. It doesn’t, so if you forget to pass the org id, the task has no way to scope its queries and you’ve got a tenancy bug or a crash. The second is subtler: the task hits the same Postgres as your request path, as you saw in the diagram, so a flood of concurrent tasks can contend for the same connection pool as live user traffic. The fix for that, connection pooling with PgBouncer, is something you already met when you set up Postgres. Just keep in mind that “tasks and requests share a database” is a thing to size for, not a thing to ignore.

The reflex to leave with is short: a task is its own world, and org context is cargo, not ambient.

The course’s jobs, and the ones that stay cheap

Let’s make all of this concrete by applying the test to the actual app you’re building, in both directions, because that’s the skill.

This one goes on Trigger.dev: the CSV export. The export job you’ll build in the next chapter’s project is the cleanest possible “yes,” because it trips all five conditions at once. It’s multi-step. It’s paginated past the time wall. It has to resume if a worker crashes mid-export. It fans out a unit of work per page. And it emails the finished file at the end. When a workload lights up every condition like that, the decision makes itself. This is the canonical target the rest of the chapter builds toward.

These stay cheap, on purpose. As the table below shows, the rest of the app’s background work deliberately doesn’t touch Trigger.dev, because none of it trips a condition.

| Workload | Where it runs (and why) | | --- | --- | | CSV export of an org’s invoices | Trigger.dev, trips all five conditions | | Single invitation email | Inline await, one ~200 ms call the user waits on | | Direct file upload (the R2 upload flow a couple of chapters from now) | Inline presigned PUT, no task; the browser uploads straight to storage | | Hourly trial-expiry sweep | Vercel Cron, a schedule that fits one invocation | | Analytics event after checkout | after(), same invocation, fire-and-forget |

That’s the whole point of the lesson in one table. The export earns the second platform because it has to; everything else stays on the tier that already does the job. Not every job is a Trigger.dev job, and you now have the test to tell which ones are.

With the whether settled, the next lesson covers the how. It teaches the SDK, task, schemaTask, payload validation, queues, and triggering, so you can write that export task.

External resources

Trigger.dev — Tasks: Overview

trigger.dev

The official starting point: what a task is and how runs work.

Trigger.dev — How it works

trigger.dev

The run engine, workers, and the trigger-from-your-app model from this lesson.

Trigger.dev — Cloud Pricing

trigger.dev

The per-run, compute, and concurrency units — confirm current numbers yourself.