An introductory guide to identifying users in PostHog
Nov 21, 2022
To understand your product’s usage, you must know who did what. Many of the most valuable insights require an accurate understanding of the user using your product. To make sure user data and events are as accurate as possible, it is critical to identify users properly.
PostHog relies on your implementation of identification to connect event data to specific users. We require events to have a related user ID (even if it is anonymous). It is users who create events after all.
This tutorial goes over the different ways to identify users and recommendations on how to do it better.
Automatic anonymous IDs
If you haven’t set up any identification and are using the PostHog snippet or
posthog-js library, events are captured with an anonymous user ID. We create the anonymous ID using the user’s device ID, and it is automatically tied to all the events a user sends in that session (and future sessions if the cookie is set). In the “Live events” tab of PostHog, anonymous ID user events look like this:
Anonymous IDs are a basic way to understand the user behind events. They lack preciseness because there is no way of ensuring consistency between sessions (it depends on cookies, which users often block) and lack the depth of custom user properties or groups. Explicitly identifying users using the other methods detailed below creates more accurate user data.
If you installed
posthog-js, you can use it to better identify users. Calling
posthog.identify() with a distinct user ID connects all events from that user with that ID. You can choose what distinct user ID you want to identify users with. Email is the most popular (it’s what we use), but it could be anything from a username to a random string you generate.
identify call links the anonymous user ID PostHog generates with the new ID you choose. This means all of the events a user generates when they weren’t identified connect to their new ID (if they happen with the “buffer,” which we explain in our Identify docs).
For example, if a user browses your marketing site, then signs up and you call
posthog.identify() on signup, their events from browsing the marketing site can connect with the new user ID.
Be sure to call
posthog.reset()on logout (or when users change) to ensure events captures disconnect from the old user and can connect to a new (right) one.
You can also use
posthog.identify() call in the docs.
Identifying and setting user IDs for every other library
If you don’t have a unique ID like an email, you can always generate a UUID or use some other piece of information (like a device or request ID). Ideally, try to find a way to connect these IDs across sessions. Some ideas:
- Linking whatever ID you choose to an API Key or authorization method if they are using an API
- Storing the ID on the frontend and passing the values to the backend when they make a request
- Use an ID based on the resources they are accessing on the backend
The importance of setting accurate distinct user IDs
The goal of setting distinct user IDs is accurately representing unique users and their behavior. Having multiple IDs for the same user will cause insights such as unique users, active users (daily, weekly), funnels, and more to be inaccurate.
Here’s a ranking of user identification options (you want to aim for #1):
identifyor other library’s event
- same ID across many sessions, such as automatic anonymous IDs (ideally)
- same ID for single sessions, such as automatic anonymous IDs (with cookies blocked)
- new ID for every request, such as poorly configured event
You want to work up this list because it creates more accurate user stats. For example, even if you want your users to be anonymous, better identifying them across a single session (rather than every request) provides more accurate stats. This creates better insights to build a better product, which we want to enable you to do.
- See our docs on identifying users for more details
- Understanding group analytics: frontend vs backend implementations