Release Engineering

What is the job to be done?

"Help me ship faster without breaking things, control who sees what, and validate that changes actually work."

Safely roll out features to specific users or groups before a full release
Instantly kill a bad deploy without a rollback or hotfix
Measure the actual impact of a release on key metrics, not just "it didn't crash"
Reproduce user-reported bugs from the user's actual perspective during a rollout
Run A/B tests tied to releases so every ship is a learning opportunity
Detect quality regressions in AI features after prompt or model changes

What PostHog products are relevant?

Feature Flags (core) — Controlled rollouts, percentage-based releases, targeted delivery to specific users/groups, kill switches. The foundation of safe shipping. Engineering teams use flags to decouple deployment from release: code ships to production but features are gated behind flags. (Getting started · Multivariate flags)
Experiments — A/B testing tied directly to releases. "We shipped a new checkout flow behind a flag. Did it actually improve conversion, or just look better in the demo?" Experiments are billed with Feature Flags, so customers with flags already have access. (Creating experiments)
Session Replay — Reproduce bugs from the user's actual perspective during rollout. When a user reports "the new feature is broken," you don't need to guess. Filter replays by feature flag variant and watch exactly what happened. Also useful for rollout validation: watch how real users interact with the new feature before expanding the rollout.
AI Evals — For products with AI features: detect quality regressions after prompt or model changes. Traditional error tracking won't catch a model that starts producing lower-quality output. Evals compare output quality before and after a change, catching regressions that look "fine" from an error rate perspective but degrade user experience.

Adoption path and expansion path

Entry point

Usually Feature Flags. Engineering team wants controlled rollouts. Common entry scenarios:

Progressive rollout: Team wants to ship a risky change to 5% of users, monitor, then expand gradually. Feature flags give them the gate; they quickly want metrics to know when it's safe to expand (Experiments).
Kill switch: After a bad deploy that took hours to roll back, engineering wants instant off-switches for new features. Feature flags are the answer.
Growth team bridge: The growth team wants to run an A/B test on the signup flow. Experiments requires Feature Flags, which requires engineering to implement. Engineering gets pulled into PostHog through the growth team's request. (See the Growth & Marketing playbook for this entry path.)

Primary expansion path

Feature Flags → + Experiments → + Session Replay (for debugging and rollout validation)

The logic of each step:

Feature Flags → Experiments: They're rolling out features behind flags but only monitoring for crashes, not measuring business impact. Experiments lets them answer "did this change actually improve the metric we care about?" Since Experiments is billed with Feature Flags, the barrier is adoption, not cost.
Experiments → Session Replay: They're measuring impact quantitatively but can't debug issues qualitatively. When an experiment shows the control is outperforming the variant, they need to see why. Filter replays by flag variant, watch what's going wrong.

Alternate expansion paths

Starting from Experiments (growth-driven): The growth team wants to A/B test, which requires engineering to implement flags. Engineering discovers they can use the same flag infrastructure for all their releases. This is the reverse entry: growth team is the catalyst, engineering becomes the power user. The growth team stays in Growth & Marketing; engineering lands in Release Engineering.

AI product teams: After a prompt or model change, engineering wants to verify quality hasn't regressed. AI Evals catches regressions that traditional error tracking misses. This bridges into AI/LLM Observability.

Business impact of solving the problem

This is a different buyer than Product Intelligence. Release Engineering targets engineering managers, platform teams, and individual developers. In most organizations, these are separate from the product analytics buyer (PM). Selling to engineering unlocks a parallel revenue stream from the same account. Two budget holders, two champions, much stickier account.

Feature Flags in the codebase are sticky. Once feature flags are integrated into the release workflow and embedded in production code, they're very hard to rip out. This isn't a dashboard someone stops logging into. It's infrastructure that engineering depends on for every deploy. This makes Release Engineering accounts among the most defensible in our book.

The tight integration between flags and experiments is genuinely differentiated. LaunchDarkly has flags but weak experimentation. Standalone experimentation tools (Statsig, Eppo) have experiments but aren't integrated with the broader analytics platform. PostHog connects flags → experiments → product analytics → session replay in one tool.

Experiments + Feature Flags create the multithreading bridge. When growth wants to experiment and engineering implements the flags, both teams are in PostHog. This is one of the best ways to get multithreaded in an account if you aren't already.

Personas to target

Persona	Role Examples	What They Care About	How They Evaluate
Engineering Manager	EM, VP Eng, Director of Eng	Release velocity, incident rate, rollback time, team productivity	"Will this make my team ship faster with fewer incidents?"
Platform Engineer	Platform Eng, DevEx, Infrastructure	Developer experience, flag management at scale, API reliability	"How does this scale to thousands of flags? What's the API latency?"
Individual Developer	Senior Eng, Staff Eng, Product Engineer	Fast to implement, doesn't slow down CI/CD, good SDK quality	"How many lines of code to add a flag? Does the SDK suck?"
Founding Engineer	CTO, first engineers at early-stage startup	Speed, simplicity, not paying for LaunchDarkly's enterprise pricing	"How fast can I set this up and how much does it cost?"

Signals in Vitally & PostHog

Vitally indicators this use case is relevant

Signal	Where to Find It	What It Means
Feature Flags is the primary or only paid product	Product spend breakdown	Engineering-first account. Full Release Engineering expansion path available.
High flag evaluation volume, low experiment count	Product usage data	They're using flags for rollouts but not measuring impact. Experiments is the next conversation.
Customer mentions LaunchDarkly in notes	Vitally notes / conversations	Competitive displacement opportunity. They may be paying LaunchDarkly prices for flags alone.
Engineering-only users (no PMs or marketing)	User list in Vitally	Engineering-first adoption. Release Engineering is the primary use case. Product Intelligence is the cross-sell.

PostHog usage signals

Signal	How to Check	What It Means
Feature flags created frequently but no experiments	Flag list vs. experiments list	They're using flags for rollouts but not measuring impact. Low-hanging Experiments adoption.
Flags with high evaluation volume	Flag usage metrics	Flags are in production, integrated into the codebase. High stickiness.
Session Replay enabled but not filtered by flag variant	Replay usage	They're recording sessions but not connecting them to rollout debugging. Onboarding opportunity.
Multiple flags per user/team	Flag list + creators	Multiple engineers are using flags. Good health signal and potential for team-wide adoption.

Command of the Message

Discovery questions

How do you currently roll out new features? All at once, or gradually?
When a deploy goes wrong, how long does it take to roll back? What's that process look like?
After you ship a feature, how do you know it's working? What metrics do you check?
Do you run A/B tests on product changes? How is that connected to your release process?
When a user reports a bug in a new feature, how do you reproduce it?
How many deploys per day/week does your team ship? What slows that down?
Are you using a feature flag tool today? What do you like and dislike about it?
How does your growth team run experiments? Does engineering implement those, or is it separate?

Negative consequences (of not solving this)

Risky deploys require full rollbacks, costing hours of engineering time and user trust
No way to gradually roll out to a subset of users, so every release is all-or-nothing
Features ship without measuring impact, so the team doesn't know if changes actually helped
Bug reproduction is guesswork because there's no way to see the user's actual experience during a rollout
Engineering and growth/product teams use separate tools, so experiment results don't connect to release decisions
High LaunchDarkly costs for feature flagging alone, without experiments or analytics integration

Desired state

Every feature ships behind a flag with gradual rollout and instant kill switch capability
Every release is measured against real business metrics, not just error rates
When a user reports a bug in a new feature, engineers can watch their exact session filtered by flag variant
Growth team experiments and engineering rollouts use the same infrastructure
Flag, experiment, and analytics data live in one platform, so the full picture is visible without switching tools

Positive outcomes

Faster release cycles: engineers ship with confidence because they can roll back instantly
Fewer incidents: gradual rollouts catch issues at 5% instead of 100%
Better product decisions: every release is also a measurement opportunity
Reduced tooling cost: replace LaunchDarkly + separate experimentation tool with one platform
Multithreaded account: growth and engineering share the same platform for experiments and rollouts

Success metrics

Customer-facing:

Release velocity increases (more deploys per week)
Mean time to recovery from bad deploys decreases
Percentage of releases measured with experiments increases
Bug reproduction time decreases (engineers can watch filtered replays)

TAM-facing:

Feature Flag evaluation volume grows (flags are being used more broadly)
Experiment count increases (moving from "just flags" to "flags + measurement")
Session Replay adoption grows alongside flag usage (debugging workflow)
Non-engineering users (growth, PM) start creating experiments (multithreading indicator)

Competitive positioning

Our positioning

Flags + experiments + analytics in one platform. The only tool where you can create a flag, run an experiment, measure the result in Product Analytics, and watch user sessions filtered by variant. No stitching together LaunchDarkly + Statsig + a replay tool.
Experiments included with Feature Flags. Experiments are billed as part of Feature Flags. Customers using flags already have experimentation. The barrier is awareness and adoption, not cost.
Session Replay filtered by flag variant. When an experiment shows the control winning, filter replays by the losing variant and watch what went wrong. No other flag tool offers this.
Better pricing than LaunchDarkly. LaunchDarkly is expensive and charges separately for experimentation. PostHog bundles it and prices on requests, not seats.

Competitor quick reference

Competitor	What They Do	Our Advantage	Their Advantage
LaunchDarkly	Feature flags, targeting, enterprise flag management	Experiments included; analytics integration; session replay; far better pricing	More mature enterprise flag management; larger feature set for complex targeting rules; bigger enterprise install base
Statsig	Feature flags + experimentation + analytics	Broader platform (replay, surveys, workflows); open source	Purpose-built for experimentation; strong warehouse-native story; more advanced statistical methods
Eppo	Warehouse-native experimentation	Broader platform; doesn't require a data warehouse; integrated replay	Warehouse-native means they use your existing data; more advanced statistical methodology
Split.io	Feature flags + experimentation	Broader platform; better pricing; integrated analytics	More mature enterprise integrations

Honest assessment: Our strongest position is against teams paying LaunchDarkly prices for flags alone and not getting experiments included. The "flags + experiments + analytics in one platform" pitch is genuine and saves money. We're weaker against teams that need very complex flag management at enterprise scale (LaunchDarkly's core strength) or teams that want warehouse-native experimentation (Eppo's pitch). Our sweet spot is engineering teams that want the full loop: flag a feature, measure its impact, debug issues with replay, all in one tool.

Pain points & known limitations

Pain Point	Impact	Workaround / Solution
Flag management UX is simpler than LaunchDarkly's	Enterprise teams with hundreds of flags may want more organizational features	PostHog flags work well at scale. For very complex targeting, review the multivariate flags and payloads documentation.
No built-in flag approval workflows	Some enterprise teams want PR-style review before a flag goes live	Use existing code review processes (flags are in code). PostHog audit logs track changes.
Statistical methodology is Bayesian	Teams preferring frequentist methods may push back	Bayesian is faster to reach conclusions and easier to interpret. For teams that insist on frequentist, this is a real limitation.

Getting a customer started

What does an evaluation look like?

Scope: Implement feature flags on one upcoming release. Ship behind a flag with gradual rollout. Optionally set up an experiment to measure impact.
Timeline: 1 to 2 days to implement first flag. 1 to 2 weeks to see experiment results (depends on traffic).
Success criteria: Can you gate a feature behind a flag and roll it out gradually? Can you instantly kill a flag if something goes wrong? Can you measure the impact of the change with an experiment?
PostHog investment: Feature Flags free tier covers 1M requests. Experiments are included.
Key requirement: Engineering needs to integrate the PostHog SDK into their codebase. This is the implementation step. Once the SDK is in, adding new flags is trivial.

Onboarding checklist

Install PostHog SDK in the application (Feature Flags getting started)
Create first feature flag for an upcoming release
Set up gradual rollout (start at 5-10%, monitor, expand)
Test kill switch: turn flag off and verify the feature is immediately disabled
Set up first Experiment tied to a flagged feature, measuring a real business metric
Enable Session Replay and filter replays by flag variant to debug an issue
Review experiment results and use them to make a ship/no-ship decision
Plan second experiment to establish the workflow as a team habit

Cross-sell pathways from this use case

If Using...	They Might Need...	Why	Conversation Starter
Feature Flags only	Experiments	They're gating features but not measuring impact	"You're rolling out features safely. But do you know if they're actually working? Experiments are included with your flags."
Feature Flags + Experiments	Session Replay	They're measuring impact but can't debug qualitative issues	"Your experiment shows the control winning. Want to watch what users in the losing variant are actually experiencing?"
Feature Flags (engineering-driven)	Product Intelligence (for the product team)	Engineering is in PostHog. Product team should be too.	"Your engineers use PostHog for releases. Has your product team seen the analytics? They could track feature adoption and retention without a separate tool."
Feature Flags (for growth experiments)	Growth & Marketing (for the growth team)	Growth team initiated the experiments, engineering implemented the flags. Expand the growth side.	"Your growth team started the experiments. Have they explored Web Analytics and Marketing Analytics for attribution?"
Feature Flags + Experiments	Error Tracking / Observability	They're catching issues via experiments but want proactive error detection	"You're catching regressions through experiments. Error Tracking would catch exceptions before they show up in your metrics."
AI product releasing prompt/model changes	AI/LLM Observability	They need to detect quality regressions that error tracking won't catch	"After your last prompt change, did output quality hold up? AI Evals would tell you automatically."

Internal resources

Feature Flags docs: Getting started · Feature Flags · Multivariate flags · Payloads
Experiments docs: Experiments · Creating experiments · Exposures
Session Replay docs: Session Replay
Competitive battlecard: To be added: LaunchDarkly competitive positioning

Appendix: Company archetype considerations

Archetype + Stage	Framing	Key Products	Buyer
AI Native — Early	"Ship fast, break nothing. Feature flags let you deploy AI features to a subset of users and measure quality before going wide." AI Evals is especially relevant here.	Feature Flags, Experiments, AI Evals	CTO, founding engineer
AI Native — Scaled	"Your engineering team is growing and releases are getting riskier. Feature flags give everyone a safety net, and experiments make sure every change is measured."	Feature Flags, Experiments, Session Replay	VP Eng, Platform Lead
Cloud Native — Early	"Stop doing all-or-nothing deploys. Ship behind a flag, measure the impact, roll back in one click if something breaks." Speed and simplicity matter.	Feature Flags, Experiments	CTO, founding engineer
Cloud Native — Scaled	"Multiple teams shipping to the same product. Feature flags give each team independent release control. Experiments ensure changes are measured, not just shipped."	Feature Flags, Experiments, Session Replay	VP Eng, EM, Platform team
Cloud Native — Enterprise	"Standardize your release process across teams and BUs. Feature flags + experiments give you a consistent framework for safe, measured releases at scale." Governance (audit logs, RBAC) matters here.	Feature Flags, Experiments, Session Replay + Enterprise package	VP Eng, Director of Platform, DevEx Lead