Feature Success Team

See what we're building

People
What we're building
Roadmap & recently shipped
Goals
Handbook

People

What we're building

Feature Success Analysis
Bringing together different parts of PostHog (flags, replay, surveys) to allow users to better analyse the success of a new feature.
Progress
Project updates
No updates yet. Engineers are currently hard at work, so check back soon!
Users & recordings linked to feature flags
We want to make it easier for those who use feature flags to get information on users attached to a particular feature flag, and gather more information on those users' experience through session recordings.
Progress
Project updates
No updates yet. Engineers are currently hard at work, so check back soon!

Roadmap

Here’s what we’re considering building next. Vote for your favorites or share a new idea on GitHub.

No-code experiments / Visual editor#16417
- ❤️15
- 👍21
Vote on GitHub
Feature flags for Java#16419
- ❤️20
- 👍52
Vote on GitHub
Feature flags based on events (or behavioral cohorts)#14796
- 👍7
Vote on GitHub
Correlation analysis for cohorts#7875
- ❤️5
- 👍12
Vote on GitHub
Feature flag code references#13845
- 👍6
Vote on GitHub
Do surveys in emails#21071
Vote on GitHub

Recently shipped

Cohort creation improved

Feature success has made two small, but mightily important quality-of-life improvements for when you create new cohorts.

The first is simply that we now tell you very clearly at the top of the page when your cohort was last calculated, so you can judge how accurate it is. Cohorts are still normally recalculated every 24 hours, so we added that information too.

The second update is that now, when you specify an event, you can also add filters and custom dates directly within the cohort creation. It's another small tweak which should hopefully greatly improve the speed at which you build cohorts.

Goals

As always, reliability is the #1 unwritten goal: Making sure feature flags are reliable trumps every other objective.

Objective: Make sure feature flags can handle 10x current scale

Last quarter we hit some scaling limits on flags: It's now very expensive to run flags on Django and we'll hit some scaling limits at 5x our scale.

To get ahead of this problem, we'll rewrite our flags service to be more performant and reliable.

Objective: Polish new experiments UI & collect feedback

Last quarter we finished a basic version of our new experiment UI. This quarter, we want to polish this up, collect feedback from users, and address any issues that come up.

Broadly, we should:

Make it easy for people to set up experiments and understand the results, with clear action items on when to end and what to do after ending experiments.
At every stage of the experiment, tell people what to do next
Have common sense boundaries, and better running time predictions, and more resilient significance calculations (flip-flopping problem)
Make sure support for experiments goes down as a result of the above^

Objective: Add most requested surveys functionality

We will, in order of priority build:

Branching logic with multiple questions
Ability to duplicate surveys
Customise how many questions show up at a time

Handbook

Values

Fast, iterative and high output rather not slow and thoughtful - achieving this
Feedback-driven not spec-driven - we do a decent job at this
Missionary (we have a clear problem definition and are aligned on how impactful a solution would be) not mercenary - glimpses of this
Collaborative not lone wolf - glimpses of this

Personas

Company Persona

Primary
- Size:
  - 20-75 employees
- Stage:
  - Post-PMF
  - Series A-D
- Customer type:
  - B2B/B2C/(B2B2C)
- High expectation traits:
  - Use the modern data stack
  - Frontend uses typescript and react
  - High-growth
Not:
- API companies
- Shopify stores/no-code companies

User Persona

Primary
- Role
  - Product-minded front-end engineer
  - Growth engineer
- Seniority
  - Decision-making seat on product
  - Senior engineer
  - IC
- High expectation traits
  - Reads HackerNews
  - Educated about the other feature flagging/experimentation tools in the space
  - Needs high-reliability and high-performance
  - Uses best-in-class tools such as Linear/Figma
Secondary:
- Role:
  - Product Manager
Not:
- Role:
  - Backend engineer
  - Marketing

Jobs to be done

Feature flags

Primary
- Safely rollout frontend features with the least risk
Secondary
- Persistent feature flags e.g. country/pay gate
- Build/test in production
- Enable beta users to try out experimental features ahead of time

Experimentation

Primary
- Test whether a particular feature achieves the desired change in user behavior

Feature ownership

You can find out more about the features we own here

Long term vision

Imagine Bob is a product manager, and Alice is an engineer, both of whom love using PostHog.

During their weekly growth review, Posthog shows them that one of their workflows is performing 50% worse than other SaaS companies with a similar flow. They decide to build a new feature together, but they're unsure of the impact, so Bob & Alice decide to gate the feature via a feature flag.

Alice builds the feature and runs the PostHog CLI, automatically converting his feature branch to a feature-flagged version. During creation, he selects the team template they normally use, called "Autorollout based on conversion metric", using the conversion metric that Posthog suggests. The feature progressively rolls out to internal users, then to beta users, then to remaining users. If their conversion metric falls by more than 20% the feature automatically rolls back and alerts their team. Alice requests a feature flag review from Bob.

Bob checks the Posthog UI and because it's such an important feature - adds a safety condition for Sentry errors increasing by 30% and a few counter metrics. This should result in an automatic rollback as well. Bob starts the experiment.

Thankfully, nothing goes wrong when the feature is rolled out. The team is disappointed that the feature doesn't seem to move any of the core company metrics, however. This doesn't fit into either of Alice's or Bob's model, so they dig deeper why this was the case.

Before they even start, PostHog automatically does some impact analysis on their core metrics, and generates some insights into what properties are highly correlated with conversion & which aren't.

As it turns out, people in USA and India love their new feature and show a 40% increase in conversion. Other countries, especially the UK, seem to dislike it so much that it negatively affects conversion. In the end, these forces balance out, leading to similar total conversion rates.

They suspect it might have something to do with their positioning in other countries, so they run a marketing experiment using PostHog, where PostHog automatically generates recommended copy text to try out. It generates 5 variants, and they test these in all countries.

As it turns out, copy wasn't the issue, and there's no significant change here. They watch a few recordings from the experiment to confirm there's nothing off here.

Since it's not a positioning issue, Bob & Alice decide that it makes sense to introduce some personalisation, and let people opt-in to the new feature, and have it on by default for USA and India. They can customise this right from the feature flag, and set this up such that any users who opt-in on their UI automatically get the flag.

PostHog keeps analysing metrics for this flag over time, and notifies Bob and Alice when their customers behaviour change. For example, if the conversion for users in UK has taken a turn for the better, or if enterprise customers have taken a turn for the worse.

Our long term vision is to make all of this possible.