Chapter 93Lesson 5

Flags, rollouts, and experiments on one primitive

How PostHog feature flags decouple the deploy from the release, serving as one primitive behind kill switches, percentage rollouts, and A/B experiments.

A new onboarding flow is ready to ship. The product team doesn’t want it live for everyone at once. They want it live for 10% of new organizations first, then 50% if the numbers hold, then 100%. If conversion drops, they want it gone with one click, not a frantic revert. And once it’s at 100%, they want proof the new flow actually beat the old one, not a hunch.

Every part of that request fights the way you’ve shipped code so far. Until now, shipping a feature and turning it on were the same act: you merge, the deploy goes out, the feature is live for everyone. Rolling it back means another git revert, another deploy, another wait for CI. That loop is too slow and too blunt for the request above.

The fix is one idea, the senior reframe that motivates this entire lesson: the deploy is not the release. You ship the new code switched off, present in the bundle but inactive. The release is a separate act, flipping a toggle in a dashboard. Rollback stops being a code change and becomes one click.

The primitive that buys you this is the feature flag. The surprising part is that the same flag does three different jobs depending on how you configure it: a kill switch for shipping safely, a percentage rollout for releasing gradually, and an A/B experiment for measuring. You’ll learn the primitive once, then see those three as configuration and discipline rather than three separate tools. There’s exactly one hard technical part, making sure the user never sees the wrong variant flash on screen, and you’ll spend the most time there.

Flags ride on everything you already built. They hang off the same distinctId identity from the previous lesson, and once a flag is evaluated, every event you fire already carries which variant the user was on, at no extra cost. That last part is what makes the whole thing measurable.

A flag is a decision you can change without a deploy

A feature flag is a named decision. Your code asks PostHog “what’s the value of new_onboarding for this user?” and PostHog answers with true, or 'variant_a', or a config object. The targeting rules that decide the answer live in the PostHog dashboard. Your code never encodes the rule; it only reads the value.

That split is the whole point, so it’s worth sitting with. Because the rule lives in the dashboard and the code path is already deployed, changing who sees the feature is a dashboard edit that propagates in seconds. Compare the two rollback stories:

Without a flag: revert the commit, push, wait for CI, wait for the deploy, hope nothing else rode along on that revert. Minutes at best, and it’s a code change under incident pressure.
With a flag: open PostHog, set the flag to off. Seconds. No deploy, no CI, no revert.

That’s decoupling the deploy from the release, made concrete. The code shipped once; the decision moves freely on top of it.

A flag can return three shapes of value, and you reach for them in order:

Boolean: on or off, per user. This is the default and what you’ll use the vast majority of the time. Kill switches and rollouts are booleans.
Multivariate : a named string variant like control, variant_a, or variant_b. This is the substrate for A/B experiments, where you need more than two arms.
JSON payload: the flag returns a structured config object, shipped per group of users. Reach for this when the flag governs configuration rather than a single branch.

Reach for booleans first. Move to multivariate only when there’s an experiment to run. Reach for a payload only when the thing that varies between groups is config, not a yes/no.

Here’s the code-side read for each shape. Don’t worry yet about the imports or where these hooks come from; a later section covers that. For now, just notice that the code only ever receives a value:

const showNewOnboarding = useFeatureFlagEnabled('new_onboarding'); // boolean
const variant = useFeatureFlagVariantKey('paywall_copy_test'); // 'control' | 'variant_a' | …
const pricing = useFeatureFlagPayload('eu_pricing'); // { price_cents: 2900, currency: 'EUR' }

A JSON-payload flag is the one worth a concrete picture. Say you want to ship different pricing to a cohort of EU users. Instead of three booleans (is_eu, is_discounted, is_annual) tangled together in code, the flag returns the whole config at once:

{ "price_cents": 2900, "currency": "EUR" }

The shape your code reads doesn’t change based on who is asking. Every user runs the same useFeatureFlagPayload('eu_pricing') call, and PostHog decides which payload each one gets. That’s the boundary that makes flags safe to change: the code reads a value, never a rule.

Targeting: who sees what, decided in PostHog not in code

If the rule lives in the dashboard, what does a rule actually look like? PostHog calls these release conditions, and you can target on:

Everyone: the flag is on for all users. The simplest case.
A user property: plan = 'pro'. Only users whose property matches see the feature.
A group property: organization.seats > 10, using the org-level group analytics from the previous lesson. The whole org gets the feature or none of it does.
A percentage: a deterministic 10% of users. This is how a rollout ramps.
A cohort: any PostHog cohort, which is itself a saved property predicate.

You combine these with AND / OR to build precise audiences (“pro plan and more than 10 seats”, or “EU country or in the beta cohort”).

The percentage option hides a correctness trap worth stopping on. When you say “10% of users,” PostHog doesn’t flip a coin per request. It hashes the user’s distinctId and checks whether that hash falls in the bottom 10% of the range. Because the hash is deterministic, the same user lands in the same bucket on every single visit. A user who’s in the 10% sees the feature today, tomorrow, and next week. A user who’s out stays out.

This is why you must never roll your own percentage logic.

Math.random() — broken
Flag read — deterministic

const showNew = Math.random() < 0.1;

Re-buckets on every render and every visit. The user sees the new flow, refreshes, and sees the old one. It ruins rollouts, where content jumps, and breaks experiments, where the same user lands in both arms. Never roll your own gate.

const showNew = useFeatureFlagEnabled('new_onboarding');

Stable per user. PostHog hashes the distinctId, so the same user gets the same answer every visit. The 10% holds steady, which is exactly what a rollout and an experiment require.

The targeting UI lives entirely in PostHog. The following screenshot shows the release-conditions panel for a flag set to a 25% rollout, all in the dashboard, with no code involved.

/ Feature flags / new_onboarding

new_onboarding Boolean

Roll out the redesigned onboarding flow.

Enabled

Release conditions

Set 1 Users in this set get the feature

Match users where

plan = pro

Roll out to

25 %

of matching users, bucketed by a deterministic hash of their distinctId.

The release-conditions panel for the `new_onboarding` flag. This is all dashboard, no code: bumping the rollout from 25% to 50% is an edit here, not a deploy. The read in your code never changes.

And the code that reads this flag? It’s identical no matter how the targeting is configured. Whether the flag is on for everyone, 10% of pro orgs, or a single cohort, the read is the same line:

const showNew = useFeatureFlagEnabled('new_onboarding');

That contrast is the lesson of this section: a rich rules panel on one side, an unchanging one-liner on the other. Rules move; the read stays put.

One discipline note while you’re here. Target on bounded, low-cardinality, non-sensitive facts like plan, seats, created_at, and country. Never target by matching on something like email. Targeting predicates are not the place for personal data, and high-cardinality matching makes the rule slow and brittle.

The flash of the default variant

This is the one genuinely hard part. Everything else in this lesson is discipline; this is a technical bug that beginners ship without noticing.

Build the simple, broken version first. Imagine you read the flag purely on the client: the PostHog SDK loads in the browser, asks PostHog for the variant, and renders accordingly. Walk the timeline of a single page load:

The page paints. The SDK hasn’t loaded yet, so the flag read returns the default (or undefined). The user sees the control variant.
The SDK initializes and fires a network request to PostHog asking for this user’s flags.
The response arrives. The component re-renders to the assigned variant.

For one paint, the user saw the wrong thing, and then it flickered to the right thing. That flicker is the flash of the default variant, and what it costs you depends on the job the flag is doing:

For a rollout, it’s a visible UX bug. Content jumps, layout shifts, and the page looks broken for a beat.
For an experiment, it’s far worse: it poisons your data. The user was exposed to control, then to the variant, so the first events can fire while they’re still bucketed under the wrong arm. Your experiment’s day-one numbers are garbage, and nothing in the dashboard warns you. This is the failure that quietly invalidates results.

A static diagram can’t show “default, then variant,” because the whole problem is that it happens over time. So scrub through the following sequence. The first three steps are the broken client-only path, and the last step is the fix you’ll build next.

app.acme.com/onboarding WRONG

posthog-js not loaded yet — flag read returns the default

new_onboarding · control Get started The classic single-screen form.

Client-only, first paint. The SDK hasn't loaded, so the flag read returns the default, and the user sees the OLD onboarding card.

app.acme.com/onboarding WRONG

SDK initialized — asking PostHog for this user's flags…

new_onboarding · control Get started The classic single-screen form.

Client-only, the SDK loads and fires a network request to PostHog for this user's flags. The old card stays on screen while the request is in flight.

app.acme.com/onboarding

Response arrived — re-render flips control → variant_a

new_onboarding · variant_a Welcome — let's get you set up A guided, 3-step onboarding.

Client-only, the response arrives. The component re-renders to the assigned variant and the user sees the card jump from old to new. That visible jump is the flash, and for an experiment, the first events already fired under the wrong variant.

app.acme.com/onboarding CORRECT

First paint already has the bootstrapped variant — no round-trip

new_onboarding · variant_a Welcome — let's get you set up A guided, 3-step onboarding.

Bootstrapped. The assigned variant is baked into the very first render: no network request on the critical path, no jump. This single correct paint collapses steps 1–3.

Notice what the last step did: it collapsed three steps into one. The fix isn’t to make the network request faster. It’s to move the flag evaluation before the first paint, so the client’s first render already knows the answer. That’s server-side bootstrap, and it gets its own section.

A word on the term before we get there. The moment the variant flips in step 3 is closely tied to hydration . When the server renders HTML and the client takes over, any value the two sides disagree on can flip at exactly that handoff. Keep that in mind, because it comes back as a subtle trap in the next section.

Server-side bootstrap kills the flash

The fix is to evaluate the flag on the server, while the page is being rendered, and hand the answer to the client so the browser’s first render already has it. There’s no network round-trip on the critical path, and no flicker.

PostHog gives the client SDK a bootstrap option exactly for this. You evaluate flags on the server with posthog-node, then pass the resulting { flagKey: value } map into posthog.init() as bootstrap data. The client SDK starts up with the flag values already populated, so useFeatureFlagVariantKey('new_onboarding') returns the real answer on the very first render.

Let’s build this in three layers: first get the value to the client, then make sure the identity matches, then make sure it’s fast.

Layer one: get the value to the client

The evaluation happens once per request, high up in the App Router, in the root layout or a server boundary that wraps your client provider. You read the user’s distinctId on the server, evaluate flags against it, and thread the resolved map down into the 'use client' PostHog provider, which passes it to posthog.init.

Walk through the assembled flow one step at a time:

// app/layout.tsx — Server Component
export default async function RootLayout({ children }: { children: ReactNode }) {
  const distinctId = await getDistinctId();
  const flags = await posthog.evaluateFlags(distinctId);
  return (
    <PostHogProvider distinctId={distinctId} bootstrapFlags={flags.featureFlags}>
      {children}
    </PostHogProvider>
  );
}

// app/_components/posthog-provider.tsx — 'use client'
posthog.init(env.NEXT_PUBLIC_POSTHOG_KEY, {
  api_host: '/ingest',
  bootstrap: {
    distinctID: distinctId,
    isIdentifiedID: Boolean(userId),
    featureFlags: bootstrapFlags,
  },
});

Read the same identity cookie the rest of the app uses, on the server, for this request. This is the distinctId that both evaluations must agree on.

// app/layout.tsx — Server Component
export default async function RootLayout({ children }: { children: ReactNode }) {
  const distinctId = await getDistinctId();
  const flags = await posthog.evaluateFlags(distinctId);
  return (
    <PostHogProvider distinctId={distinctId} bootstrapFlags={flags.featureFlags}>
      {children}
    </PostHogProvider>
  );
}

// app/_components/posthog-provider.tsx — 'use client'
posthog.init(env.NEXT_PUBLIC_POSTHOG_KEY, {
  api_host: '/ingest',
  bootstrap: {
    distinctID: distinctId,
    isIdentifiedID: Boolean(userId),
    featureFlags: bootstrapFlags,
  },
});

Evaluate every flag for this user once, through the lib/posthog.ts server adapter you built earlier in the chapter. It returns a snapshot whose .featureFlags is the resolved { 'flag-key': true | 'variant' } map.

// app/layout.tsx — Server Component
export default async function RootLayout({ children }: { children: ReactNode }) {
  const distinctId = await getDistinctId();
  const flags = await posthog.evaluateFlags(distinctId);
  return (
    <PostHogProvider distinctId={distinctId} bootstrapFlags={flags.featureFlags}>
      {children}
    </PostHogProvider>
  );
}

// app/_components/posthog-provider.tsx — 'use client'
posthog.init(env.NEXT_PUBLIC_POSTHOG_KEY, {
  api_host: '/ingest',
  bootstrap: {
    distinctID: distinctId,
    isIdentifiedID: Boolean(userId),
    featureFlags: bootstrapFlags,
  },
});

Thread the resolved map and the distinctId down as props into the 'use client' provider. Nothing has touched the network from the browser yet.

// app/layout.tsx — Server Component
export default async function RootLayout({ children }: { children: ReactNode }) {
  const distinctId = await getDistinctId();
  const flags = await posthog.evaluateFlags(distinctId);
  return (
    <PostHogProvider distinctId={distinctId} bootstrapFlags={flags.featureFlags}>
      {children}
    </PostHogProvider>
  );
}

// app/_components/posthog-provider.tsx — 'use client'
posthog.init(env.NEXT_PUBLIC_POSTHOG_KEY, {
  api_host: '/ingest',
  bootstrap: {
    distinctID: distinctId,
    isIdentifiedID: Boolean(userId),
    featureFlags: bootstrapFlags,
  },
});

The client init receives the map as bootstrap. From here the very first client render already has the real values, with no round-trip and no flicker.

1 / 1

That bootstrap shape has a few exact keys worth pinning down, because they’re easy to get subtly wrong: distinctID (note the capital ID), isIdentifiedID (a boolean, true once the user has been identified), and featureFlags (the { 'flag-key': true | 'variant' } map). There is no featureFlagPayloads key. JSON payloads are not bootstrapped through this option, so don’t go looking for it.

One framing note carried over from earlier in the chapter. PostHog ships a @posthog/next package whose bootstrapFlags helper will fold this wiring into one line once it’s stable. It isn’t the default yet, so you’re wiring posthog-node by hand here. When the wrapper lands, the shape stays the same, and you’ll just write less of it.

Layer two: one identity, evaluated twice, same answer

Here’s the subtle trap hydration warned us about. The flag is now evaluated in two places: once on the server (to bootstrap) and once on the client (when the SDK runs). If those two evaluations use different distinctIds, they can produce different variants. The UI then flips at hydration, which is exactly the flash you were trying to kill, plus a split bucket where the user counts as two people.

The fix is that both sides read the same identity. The distinctId cookie seeded at the request boundary (the one from earlier in this chapter) is read by the server evaluation and by the client SDK. That’s one source of identity, fanning out to two evaluators that must agree.

posthog-node server evaluation bootstrap, before first paint

posthog-js client SDK on hydration, in the browser

same distinctId in → same variant out → no flip at hydration

One identity feeds both evaluators. Because the server-side `posthog-node` evaluation and the client `posthog-js` SDK read the *same* `distinctId`, they compute the same variant, so there's no flip at hydration.

For an authenticated route, there’s one more wrinkle that you already handled in the previous lesson. Once the user signs in and you call identify(), their known user id supersedes the anonymous cookie id, and PostHog stitches the prior anonymous events to the now-identified person. The principle holds, both sides resolve to the same identity. It’s just that for a signed-in user that identity is their real id, not the anonymous cookie.

Layer three: why server evaluation doesn’t tank latency

A reasonable worry at this point: if you evaluate flags on the server for every request, isn’t that a network hop to PostHog on every page render? It would be, if you did it naively. The escape is local evaluation.

posthog-node can download the full flag configuration once, meaning all the targeting rules, and then evaluate flags in memory, refreshing the rules on an interval. A flag read on the server becomes an in-process hash computation, not a round-trip. To turn it on, you give the server client a key with permission to read flag definitions and set the refresh cadence:

import 'server-only';
import { PostHog } from 'posthog-node';

import { env } from '@/env';

export const posthog = new PostHog(env.NEXT_PUBLIC_POSTHOG_KEY, {
  host: env.NEXT_PUBLIC_POSTHOG_HOST,
  personalApiKey: env.POSTHOG_PERSONAL_API_KEY,
  featureFlagsPollingInterval: 30_000,
});

personalApiKey is what unlocks local evaluation: it lets the server fetch flag definitions, not just send events. (PostHog now recommends a dedicated feature-flags secure key for this; the personal API key you wired earlier in the chapter still works.) featureFlagsPollingInterval controls how often the rules refresh. The default is 30 seconds, which is what 30_000 milliseconds sets here. With this in place, server-side flag reads are cheap, and you can read a flag in a server component without fear of a per-render network call.

That’s the full picture: the value reaches the client pre-resolved (no flash), both sides share one identity (no flip at hydration), and local evaluation keeps it fast (no latency hit). Three layers, each closing one gap.

Reading flags: hooks on the client, one call on the server

With bootstrap guaranteeing correct values, reading a flag is the easy part. There are two boundaries you read from.

A small package note first. The React hooks now live in their own companion package, @posthog/react, which you’ll add to the install. This is a slight wrinkle from earlier in the chapter, where you wired raw posthog-js through the provider: the flag reads use @posthog/react hooks, while the provider, identify, and track plumbing stay exactly as you built them. The hooks just talk to the same client instance the provider already holds.

On the client, you have one hook per value shape:

import {
  useFeatureFlagEnabled,
  useFeatureFlagVariantKey,
  useFeatureFlagPayload,
} from '@posthog/react';

const showNewOnboarding = useFeatureFlagEnabled('new_onboarding');
const variant = useFeatureFlagVariantKey('paywall_copy_test');
const pricing = useFeatureFlagPayload('eu_pricing');

Two things make these pleasant to use. First, with bootstrap in place they return the real value on the first render, never undefined mid-flicker. Second, they re-render when PostHog updates the flag remotely. Bump a rollout from 10% to 50% in the dashboard and live clients pick it up without a redeploy. (These are ordinary hook reads at the top of a component, so the React Compiler handles memoization for you, with no useMemo.)

There’s one footgun here that silently breaks experiments, so it’s worth a hard stop.

On the server, you read from a snapshot. Call posthog-node’s evaluateFlags(distinctId) to get the snapshot, then pull individual flags off it with .getFlag('new_onboarding') or .isEnabled('new_onboarding'). Use this in a server component when the fork is rendered server-side, or in proxy.ts when the flag controls a redirect or a layout swap.

The two reads return the same decision at different boundaries. Compare them:

Client (a Client Component)
Server (a Server Component)

'use client';

export const OnboardingCard = () => {
  const showNew = useFeatureFlagEnabled('new_onboarding');
  return showNew ? <NewOnboarding /> : <LegacyOnboarding />;
};

A hook read at the component top level. Because bootstrap populated the value, showNew is correct on the first render, with no flicker.

import { posthog } from '@/lib/posthog';

export const Dashboard = async () => {
  const distinctId = await getDistinctId();
  const flags = await posthog.evaluateFlags(distinctId);
  const showNew = flags.isEnabled('new_onboarding');
  return showNew ? <NewOnboarding /> : <LegacyOnboarding />;
};

The same decision, read in-process on the server via the adapter from earlier in the chapter. There’s no client round-trip, and the HTML ships already forked.

Now the payoff you’ve been owed since the previous lesson. Once a flag is evaluated for a user, PostHog automatically attaches a super-property to every event they fire afterward, named $feature/<flag>:

{
  "event": "plan_upgraded",
  "properties": {
    "from_plan": "free",
    "to_plan": "pro",
    "$feature/paywall_copy_test": "variant_a"
  }
}

That $feature/paywall_copy_test: 'variant_a' is the mechanism that makes experiments and funnels work. The event store now knows which variant the user was on when they upgraded. You don’t do anything extra for it: evaluate the flag, and every downstream event self-tags. This is the bridge from the super-property mechanism you learned in the previous lesson to flags actually driving your analytics.

Now write one. In the following exercise, you’ll build a component that reads a flag and forks the render. The harness gives you a mocked useFlag() hook that stands in for the real PostHog hook, so you can focus on the conditional render, which is all a flag read ever is.

useFlag('new_onboarding') returns the assigned variant. Complete OnboardingFork so it returns the right card for the value it's handed: 'control' → the element with data-testid="legacy-onboarding", 'variant_a' → data-testid="new-onboarding", and undefined (still loading) → data-testid="onboarding-loading". App reads the flag and passes it down — don't change the mock; the tests render OnboardingFork with each value directly.

Preview

Reference solution

A flat switch (or an if/else chain) on the variant value, one branch per shape, with the undefined case as the neutral default. There’s nothing flag-specific here: a flag read is just a value, and forking on it is a plain conditional.

export function OnboardingFork({ variant }: { variant: Variant }) {
  switch (variant) {
    case 'variant_a':
      return <div data-testid="new-onboarding">Welcome — let's get you set up</div>;
    case 'control':
      return <div data-testid="legacy-onboarding">Getting started</div>;
    default:
      return <div data-testid="onboarding-loading">Loading…</div>;
  }
}

The default branch covers undefined, which is what the hook returns before bootstrap resolves the value. With server-side bootstrap in place that window closes, since the very first render already has the assigned variant, but handling it keeps the component honest if a flag ever arrives un-bootstrapped.

The mystique of “feature flags” should be gone now: a flag is a value, and reading it is a normal React conditional. Everything fancy, the targeting, the rollout percentages, and the experiment, happens in the dashboard. Your code just branches on a value.

Three jobs, one flag: kill switch, rollout, experiment

The kill switch, the rollout, and the experiment are not three tools. They’re the same flag primitive wearing three hats, with different configuration, different discipline, and a different lifespan.

Kill switch: a boolean, default off. The move is to gate every newly shipped non-trivial feature behind one for its first week. If something breaks, flip it off instantly, with no deploy, no rollback, and no incident scramble. It’s structural insurance for a feature release. Lifespan: weeks. Delete it once the feature has proven stable.
Rollout: a boolean, ramped by percentage: 10%, then 50%, then 100%, on the deterministic distinctId hash. Each bump is a dashboard edit, and you watch the metrics between bumps. Lifespan: weeks. Delete it at 100%.
Experiment: a multivariate flag, typically a 50/50 split, with a metric attached. This is the only one of the three that exists to measure rather than to release. Lifespan: two to four weeks. When you have significance, convert the winner to a rollout and delete the losing branch.

The skill isn’t memorizing three APIs, because they’re the same API. It’s picking the right framing from the situation. The order in which a senior asks the questions is what does the work, so walk it in the following decision tool:

Pick the flag pattern

Read that again as a sequence of questions, not three boxes: measuring? first, then all-at-once or ramp? The decision lives in the order. Now try sorting real situations into the three patterns:

Sort each scenario into the flag pattern it calls for. Drag each item into the bucket it belongs to, then press Check.

Kill switch Boolean, default off, instant off-toggle

Rollout Boolean, percentage ramp

Experiment Multivariate, metric attached

Shipping a risky payments rewrite and wanting to switch it off the second it misbehaves

Wrapping a fragile new export feature so on-call can disable it without a deploy

Releasing a redesigned dashboard to 20% of orgs before everyone

Gradually exposing a new search backend, watching error rates as you ramp

Testing two paywall headlines to see which lifts trial-to-paid

Comparing a one-step vs three-step signup to prove which converts better

Experiments: declare the metric before you peek

The experiment is where juniors lose money, not through bad code but through bad process. The code is trivial: an experiment is just a multivariate flag read, the same useFeatureFlagVariantKey('paywall_copy_test') you already know. PostHog runs the statistics for you, and this lesson doesn’t teach the math. What it teaches is the discipline that makes those statistics trustworthy.

Under the hood, an experiment is a multivariate flag with a metric attached. You set a primary metric, say plan_upgraded firing within 14 days of paywall_viewed, built from the events you defined in the previous lesson. You can add some secondary metrics too, and PostHog gives you a statistical readout. All of that scaffolding lives in the dashboard, and your code is unchanged.

Three rules separate a trustworthy result from theater:

Pre-declare the primary metric and the hypothesis before you launch. Write the hypothesis into the experiment description: “the three-step onboarding will lift trial-to-paid by at least 2 points.” A metric you pick after seeing the data is unfalsifiable: you’ll always find some number that moved, and you’ll have proven nothing.

The primary metric must be a PostHog event. It has to be one of the events in your dictionary, because that’s what carries the $feature/<flag> tag that lets PostHog join the metric to the variant. An external metric, like a number from your billing dashboard or a count from a spreadsheet, breaks the join. PostHog can’t attribute it to an arm, so the experiment is dead on arrival.

Don’t stop on the first green day. This is the expensive one.

And remember the silent failure from the read-surface section: if the flash of the default variant isn’t fixed by bootstrap, your day-one buckets are poisoned before the discipline above even gets a chance to matter. Bootstrap first, then run clean.

Put the failure modes together. The following question describes a broken experiment; pick everything that could have caused it:

An experiment “proved” a winner, but the lift evaporated the moment the variant shipped to 100% — and a whole slice of exposed users never showed up in the results at all. Which of these would explain that? Select all that apply.

The team called it the first afternoon the dashboard flashed “significant,” instead of waiting out the run.

Nobody wrote down what they were testing until after the data was in, and then they picked whichever number had moved.

The variant was decided in the browser, so the first paint rendered control before the SDK answered.

The card only pulled the flag’s payload object and rendered from that — it never touched the boolean or variant hook.

The two arms each got half the traffic instead of a lopsided 90/10 split.

”Conversion” was read off the billing provider’s revenue export rather than tracked as an event.

Five of these are genuine experiment-breakers. Stopping on the first green day turns noise into a fake winner; a metric chosen after seeing the data can’t be falsified; a client-only read flashes control before the variant arrives, so day-one events fire under the wrong arm; a payload-only read never fires the $feature_flag_called exposure, so those users can’t be attributed and simply vanish from the results; and a metric that isn’t a PostHog event has no $feature/<flag> tag to join on. The odd one out is the split — a 50/50 allocation is exactly what you want for an experiment, not a cause of failure.

Flags have a lifespan: the stale-flag deletion ritual

There’s a cost to flags that beginners forget to pay. Every flag is a fork in your code: if (flag) { ... } else { ... }. The moment a flag reaches 100%, that else branch becomes dead code. No one is ever on it, no one tests it, and it sits there rotting. A codebase full of shipped-and-forgotten flags is a codebase full of dead branches and confusing forks.

So flags are not write-once. Deletion is the last step of a rollout, not optional housekeeping. PostHog helps you find the candidates: its “last evaluated” and feature-flag activity views surface flags whose branches have collapsed, meaning everyone’s on one variant.

The deletion order matters, and getting it backwards causes an outage. If you delete the flag in PostHog while a deployed version of your code is still reading it, that live code now asks for a flag that doesn’t exist. So you remove the read first, ship that, and only then delete the flag.

Follow this order exactly:

Grep the flag name across the codebase. Remove the if (flag) fork, keeping the winning branch as the now-unconditional code. Open the PR.
Merge and deploy. Now no running version of the code reads the flag.
Only now, delete the flag in PostHog.
Confirm zero references remain: search the repo, and check PostHog shows no recent evaluations.

Never reverse steps 2 and 3. Delete-then-deploy leaves a window where live code reads a flag that’s already gone.

Make this a habit, not a one-off. A quarterly stale-flag audit, where you find the collapsed flags and run them through the ritual above, keeps both your code and your analytics clean. “Add a flag” becomes a complete loop with an exit, instead of a one-way accumulation of dead forks.

That’s the full mental model to leave with: a flag is a server-evaluated decision, keyed on a stable identity, that the client receives pre-resolved. The value lives in PostHog so you can change it without a deploy; the read lives in code as one hook or one server call; and the variant travels on every event at no extra cost. Kill switch, rollout, and experiment are three release strategies layered on that one decision, and each one ends with deletion.

External resources

The PostHog docs are the place to confirm exact method names and to dig into the experiment statistics this lesson deliberately skipped. The conceptual piece below is the canonical map of the territory.

PostHog — Feature flags

posthog.com

Targeting, rollouts, payloads, and the bootstrap option in full.

PostHog — Experiments

posthog.com

Primary and secondary metrics, significance, and how PostHog computes the readout.

Feature Toggles (aka Feature Flags)

martinfowler.com

Pete Hodgson's definitive essay — release vs. experiment vs. ops toggles, and their carrying cost.

PostHog — Client-side bootstrapping

posthog.com

The exact bootstrap shape that kills the flash of the default variant.