Working with data warehouse
Contents
This is an internal guide to setting up and working with the data warehouse for PostHog engineers. If you're a PostHog user, check out our data warehouse docs instead.
Adding a new source
Looking to add a new source to data warehouse? We have a detailed guide in the codebase.
If you're a customer of PostHog Cloud and are looking to import data into your project, then you're likely looking for this section of the docs instead
Importing your local Postgres instance
- Head to the new source flow in your local app, hit the link button next to Postgres
- Use the following settings:
- host = 127.0.0.1
- port = 5432
- database = posthog
- user = posthog
- password = posthog
- schema = public
- Hit next, then select which tables you'd like to import. More info on the sync types can be found here
- Hit next and finish the import -
temporal-worker-data-warehousewill then import the data into your local object storage
Accessing object storage
All your data warehouse data is stored in your local object storage (SeaweedFS, S3-compatible, running at http://localhost:19000). Unlike MinIO, SeaweedFS has no web console, so inspect it with any S3 client. For example, with the AWS CLI:
There's a separate folder under the data-warehouse bucket for each table you sync.
Setting up a MySQL source
If you want to set up a local MySQL database as a source for the data warehouse, there are a few extra set up steps you'll need to complete:
First, install MySQL:
Once MySQL is installed, create a database and table, insert a row, and create a user who can connect to it:
To verify everything is working as expected:
- Navigate to "Data pipeline" in the PostHog application.
- Create a new MySQL source using the settings above (username and password both being
posthog) - Once the source is created, click on the "MySQL" item. In the schemas table, click on the triple dot menu and select the "Reload" option.
After the job runs, clicking on the synced table name should take you to your data.
Working with a MS SQL source
You'll need to install MS SQL drivers for the PostHog app to connect to a MS SQL database. Learn the entire process in posthog/warehouse/README.md. Without the drivers, you'll get the following error when connecting a SQL database to data warehouse: