Migrate from Postgres to ClickHouse

Last updated:

If you're attempting this migration, feel free to ask questions and provide feedback via the PostHog Communty Slack workspace or a GitHub issue. You should also be aware that some of the steps on this document are potentially destructive! Proceed with caution.

PostHog backed by PostgreSQL is now deprecated. We will continue to provide support to Postgres backed installs but we strongly encourage you to migrate to PostHog backed by ClickHouse for vastly superior performance, as well as to receive new features and continued support in the future.

Requirements

  • You should have a clean ClickHouse-backed PostHog instance up and running. Your new PostHog instance should have no ingested events. We recommend using a fresh and unused installation.
  • Your old and new instances should both be running the exact same version of PostHog, with a minimum version of 1.29.0 (note: if you do not follow release tags and instead update PostHog by pulling from master, you should make sure both versions are on the same commit.)

Note: PostHog users on version 1.29.0 will be able to migrate events to a new instance, but autocaptured events will not be migrated correctly. This bug will be fixed in version 1.30.0. More information on the Exporting events section.

Having difficulties upgrading your Postgres-backed instance due to long-running migrations?

Check out this guide.

Approach

ClickHouse-backed PostHog instances still use a Postgres database to store data that is not used in analytical queries, such as user information, feature flags, dashboard configurations, etc.

However, tables that contain data used to run analytical queries such as events, persons, and person distinct IDs is now stored in ClickHouse.

With this migration, we will copy over the Postgres data that is still stored in Postgres to the new instance using tools from the Postgres ecosystem, and then migrate over your events using a PostHog plugin. The events migration will in turn create the necessary person, person distinct ID, and related records in ClickHouse.

Exporting Postgres Data

Please note that this guide assumes your PostHog pods are running in the posthog namespace, thus appending -n posthog to kubectl commands. Substitute posthog for the namespace applicable to you, or omit the -n flag if you're using the default namespace.

1. Creating a data dump from your old instance

Access your old Postgres-backed PostHog instance and run the following command:

# tip: use the flag -U to specify a username if necessary
pg_dump -d posthog -f tmp/export.sql --no-owner --data-only \
-t posthog_action \
-t posthog_actionstep \
-t posthog_annotation \
-t posthog_cohort \
-t posthog_dashboard \
-t posthog_dashboarditem \
-t posthog_featureflag \
-t posthog_featureflagoverride \
-t posthog_messagingrecord \
-t posthog_organization \
-t posthog_organizationinvite \
-t posthog_organizationmembership \
-t posthog_personalapikey \
-t posthog_plugin \
-t posthog_pluginattachment \
-t posthog_pluginconfig \
-t posthog_pluginstorage \
-t posthog_team \
-t posthog_user \
-t posthog_user_groups \
-t posthog_user_user_permissions

Note: If your old PostHog instance is hosted on Heroku, you should follow this guide to export all of your data, load into a separate database (we recommend a local Postgres instance), and then run the command above.

If you do not access your Postgres database via a port-forwarding mechanism to your local machine, you will then need to copy the file from the server to your local system (or the remote server from where you access your new instance). You can do this with scp:

scp <your_username>@<database_hostname>:/tmp/export.sql .

If your old instance was deployed using the Helm chart, you can use:

kubectl cp <postgres-pod-name>:/tmp/export.sql ./export.sql # -n posthog

2. Accessing Postgres on your new instance

To access Postgres on your new PostHog instance (Kubernetes cluster), you should do the following:

Tip: Find out your pod names with kubectl get pods -n posthog

  1. Find out your Postgres password from the web pod:

    kubectl exec -n posthog -it your-posthog-web-pod \
    -- sh -c 'echo password:$POSTHOG_DB_PASSWORD'
  2. Connect to your Postgres pod:

    # Replace posthog-posthog-postgresql-0 with your pod's name if different
    kubectl exec -n posthog -it posthog-posthog-postgresql-0 -- sh
  3. Connect to the posthog DB:

    psql -d posthog -U postgres

    Postgres will ask you for the password. Use the value you found from step 1.

3. Deleting data in the new instance

In order to correctly migrate your data over, we need to make sure the tables we're importing into are completely empty. This will ensure that the foreign key mappings are set up correctly.

To do so, connect to your new Postgres instance using the command from the previous step.

Refresher: Connect to Postgres from your Postgres pod with the command psql -h your-pgbouncer-host -p 6543 -d posthog -U postgres.

Once connected, you should have access to the psql shell. To erase all data in the tables we'll be importing into, run the following:

Warning: Only run this in your new instance.

-- This command is a destructive action that leads to data deletion!!!
-- Uncomment the next line (remove '--') if you're certain you want to run it
-- TRUNCATE TABLE posthog_action,
posthog_actionstep,
posthog_annotation,
posthog_cohort,
posthog_dashboard,
posthog_dashboarditem,
posthog_featureflag,
posthog_featureflagoverride,
posthog_messagingrecord,
posthog_organization,
posthog_organizationinvite,
posthog_organizationmembership,
posthog_personalapikey,
posthog_plugin,
posthog_pluginattachment,
posthog_pluginconfig,
posthog_pluginstorage,
posthog_team,
posthog_user,
posthog_user_groups,
posthog_user_user_permissions
CASCADE;

The command output may indicate that some other tables were also truncated. This is normal and expected.

4. Copying your data file to the Postgres pod

In order to copy the data to the new Postgres instance, we must first have the data accessible in our Postgres pod, where we're able to access our instance.

To do so, run the following in the directory of your export (data dump) file:

# Replace posthog-posthog-postgresql-0 with your pod's name if different
kubectl cp export.sql posthog-posthog-postgresql-0:/tmp/export.sql -n posthog

This will copy the export file to a tmp/ directory in the Postgres pod.

5. Loading the data into the new instance

With the export file now in the right place, we can finally import the data!

First, access your Postgres pod:

# Replace posthog-posthog-postgresql-0 with your pod's name if different
kubectl exec -n posthog -it posthog-posthog-postgresql-0 -- sh

Now, run the following to import the data:

psql -d posthog -U postgres < tmp/export.sql
I'm seeing some errors, what should I do?

If you see some errors on this final step, you can try to use pg_restore instead. You should follow all steps from the beginning, with two modifications:

  1. When running the pg_dump command, add the flag -Fc. Your command will end up looking something like this:
pg_dump -Fc -d posthog -f export.sql --no-owner --data-only # ...tables
  1. On this final step, instead of psql, use pg_restore, like so:
pg_restore -U postgres -d posthog < /tmp/export.sql

If errors persist, reach out to us on Slack.

And that's it! If you see no errors, your data should now have been migrated over.

6. Upgrade

As a final step, we should update the cluster with the following command:

helm upgrade -f values.yaml --timeout 20m posthog posthog/posthog --atomic -n posthog

Once the command completes, you can check that all pods are healthy (status = Running) with the following command:

kubectl get pods -n posthog

The output should look something like this:

NAME READY STATUS RESTARTS AGE
chi-posthog-posthog-0-0-0 1/1 Running 0 11d
clickhouse-operator-6b5438eh5fb-bt5fk 2/2 Running 0 11d
posthog-beat-7782927b778-wxvhl 1/1 Running 0 11d
posthog-cert-manager-69fahs7b57-c48dn 1/1 Running 0 11d
posthog-cert-manager-cainjector-6d95d93mn8-6tz6k 1/1 Running 0 11d
posthog-cert-manager-webhook-6469930mdfc-6l958 1/1 Running 0 11d
posthog-events-55283995cc-rpjdm 1/1 Running 0 11d
posthog-ingress-nginx-controller-648bdn892f-w7qhp 1/1 Running 0 11d
posthog-pgbouncer-77fb4djs85c-2d24t 1/1 Running 0 11d
posthog-plugins-54fjd8292649-66gsm 1/1 Running 0 18m
posthog-posthog-kafka-0 1/1 Running 0 11d
posthog-posthog-postgresql-0 1/1 Running 0 11d
posthog-posthog-redis-master-0 1/1 Running 0 11d
posthog-posthog-zookeeper-0 1/1 Running 0 11d
posthog-posthog-zookeeper-1 1/1 Running 0 11d
posthog-posthog-zookeeper-2 1/1 Running 0 11d
posthog-web-78dns2f5d7-6zdkc 1/1 Running 0 11d
posthog-worker-7857nd8268-j8c4f 1/1 Running 0 11d

Exporting events

Once you have completed the steps above, you can now move on to migrating your events.

For this, we will use the PostHog Migrator 3000 plugin.

Important considerations

This plugin works on PostHog instances with version 1.29.0 or above.

Version 1.30.0 fixes a bug with exporting autocaptured events, so, unless you do not use autocapture at all, we recommend upgrading to 1.30.0 on both instances before performing the events migration.

Instructions

  1. Get your project API key (token) for your new instance from the 'Project Settings' page. If your migration completed successfully, this will actually be the same token as the one in the old instance.

  2. Visit the 'Plugins' page on the old instance and click on the 'Advanced' tab.

  3. Under 'Install from GitHub, GitLab or npm', add the URL https://github.com/PostHog/posthog-plugin-migrator3000 and click 'Fetch and install'.

  4. A plugin configuration drawer should open. Here, add the hostname for your new instance, as well as the project API key from step 1. Also, make sure the toggle at the top is 'Enabled'.

  5. Specify a date to start exporting events in the format YYYY-MM-DD (e.g. 2021-10-26) and click 'Save'.

  6. That's it! The plugin will now start exporting your events to the new instance. To keep track of its progress, click on the 'Logs' icon for the plugin. Ignore the progress bar, focusing instead on the timestamps in the logs. Once the plugin is done exporting all of your historical events, it will then continue to export the events that are coming in live, but with a delay. Our recommendation is to change the event sources to point to the new instance once you notice all historical events have been exported.