BigQuery destination for batch exports
Contents
Batch exports can be used to export data to a BigQuery table.
Setting up BigQuery access
To set up the right permissions for a batch export targeting BigQuery, you need:
- A Service Account.
- A dataset which has permissions allowing the service account to access it.
Here's how to set these up so that the destination has access only to the dataset it needs:
- Create a Service Account.

- Create a key for the Service Account you created in the previous step.
- Save the key file as JSON to upload it when configuring a batch export.

- Create a role which has only the specific permissions the batch export requires:- bigquery.datasets.get
- bigquery.jobs.create
- bigquery.tables.create
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.list
- bigquery.tables.updateData
- (Optional, for mutable models) bigquery.tables.delete
 

This step can be skipped if using the built-in roles BigQuery Data Editor and BigQuery Job User in the steps that follow.
- Grant the Service Account access to run jobs in your Google Cloud project. This can be done by granting the BigQuery Jobs Userrole or the role we created in the previous step on your project.
Navigate to IAM and click on Grant Access to arrive at this screen:

In the screenshot above, we have used a custom role named
Testing PostHog BatchExportswith the permissions listed in the previous step.
- Create a dataset within a BigQuery project (ours is called BatchExports, but any name will do).
- Use the Sharing and Add Principal buttons to grant access to your dataset with your Service Account created in step 1. Next, assign either the BigQuery Data Editorrole or your custom role created in step 4 to provide permissions for the dataset access. Read the full instructions on granting access to the dataset in BigQuery if unclear.
 

In the screenshot below, we grant our Service Account access to the
BatchExportsdata set and assign theTesting PostHog BatchExportsrole permissions for it.

- All done! After completing these steps you can create a BigQuery batch export in PostHog and your data will start flowing from PostHog to BigQuery.
Models
This section describes the models that can be exported to BigQuery.
Note: New fields may be added to these models over time. To maintain consistency, these fields are not automatically added to the destination tables. If a particular field is missing in your BigQuery tables, you can manually add the field, and it will be populated in future exports.
Events model
This is the default model for BigQuery batch exports. The schema of the model as created in BigQuery is:
| Field | Type | Description | 
|---|---|---|
| uuid | STRING | The unique ID of the event within PostHog | 
| event | STRING | The name of the event that was sent | 
| properties | STRINGorJSON | A JSON object with all the properties sent along with an event | 
| elements | STRING | This field is present for backwards compatibility but has been deprecated | 
| set | STRINGorJSON | A JSON object with any person properties sent with the $setfield | 
| set_once | STRINGorJSON | A JSON object with any person properties sent with the $set_oncefield | 
| distinct_id | STRING | The distinct_idof the user who sent the event | 
| team_id | INT64 | The team_idfor the event | 
| ip | STRING | The IP address that was sent with the event | 
| site_url | STRING | This field is present for backwards compatibility but has been deprecated | 
| timestamp | TIMESTAMP | The timestamp associated with an event | 
| bq_ingested_timestamp | TIMESTAMP | The timestamp when the event was sent to BigQuery | 
Some fields can be either STRING or JSON type depending on whether the corresponding checkbox is marked or not when creating the batch export.
Persons model
The schema of the model as created in BigQuery is:
| Field | Type | Description | 
|---|---|---|
| team_id | INT64 | The id of the project (team) the person belongs to | 
| distinct_id | STRING | A distinct_idassociated with the person | 
| person_id | STRING | The id of the person associated to this ( team_id,distinct_id) pair | 
| properties | STRINGorJSON | A JSON object with all the latest properties of the person | 
| person_version | INT64 | Internal version of the person properties associated with a ( team_id,distinct_id) pair, used by batch export in merge operation | 
| person_distinct_id_version | INT64 | Internal version of the person to distinct_idmapping associated with a (team_id,distinct_id) pair, used by batch export in merge operation | 
| created_at | TIMESTAMP | The timestamp when the person was created | 
The BigQuery table will contain one row per (team_id, distinct_id) pair, and each pair is mapped to their corresponding person_id and latest properties. The properties field can be either STRING or JSON, depending on whether the corresponding checkbox is marked or not when creating the batch export.
Creating the batch export
- Click Data pipelines in the navigation and go to the Destinations tab.
- Click + New destination in the top-right corner.
- Select BigQuery as the batch export destination.
- Fill in the necessary configuration details.
- Finalize the creation by clicking on Create.
- Done! The batch export will schedule its first run on the start of the next period.
BigQuery configuration
Configuring a batch export targeting BigQuery requires the following BigQuery-specific configuration values:
- Table ID: The ID of the destination BigQuery table. This is not the fully-qualified name of a table, so omit the dataset and project IDs. For example for the fully-qualified table name project-123:dataset:MyExportTable, use onlyMyExportTableas the table ID.
- Dataset ID: The ID of the BigQuery dataset which contains the destination table. Only the dataset ID is required, so omit the project ID if present. For example for the fully-qualified dataset project-123:my-dataset, use onlymy-datasetas the dataset ID.
- Google Cloud JSON key file: The JSON key file for your BigQuery Service Account to access your instance. Generated on Service Account creation. See here for more information.
Examples
These examples illustrate how to use the data from batch exports in BigQuery.
Requirements
Two batch exports need to be created:
- An events model batch export.
- A persons model batch export.
For the purposes of these examples, assume that these two batch exports have already been created and have exported some data to BigQuery in tables example.events and example.persons.
Example: Count unique persons that have triggered an event
The following query can be used to count the number of unique persons that have triggered events:
FAQ
How does PostHog keep the persons model (or any mutable model) up to date?
Exporting a mutable model can be divided into new rows that have to be inserted, and existing rows that have to be updated. When a PostHog batch export exports mutable data (like the persons model) to BigQuery, it executes a merge operation to apply new updates to existing rows.
The operation the PostHog batch export executes in BigQuery roughly involves the following steps:
- Creating a stage table.
- Inserting new data into stage table.
- Execute a merge operation between existing table and stage table. a. Any rows that match in the final table and for which any of the stage table's version fields is higher are updated. b. Any new rows not found in the final table are inserted.
- Drop the stage table.
Why are additional permissions required to export the persons model?
The merge operation described above explains why a mutable export requires additional permissions beyond the permissions required for exporting the events model: Since we need to clean-up a stage table, bigquery.tables.delete is required.
Which jobs does the batch export run in BigQuery?
If you check your BigQuery JOBS view or the Google Cloud console for job details, you may notice the PostHog batch export running jobs in your BigQuery warehouse.
Regardless of model, PostHog batch exports run a load job to batch load the data for the current period into BigQuery. Moreover, you will see additional query jobs in your logs when exporting a mutable model as the merge operation the batch export executes requires running additional queries in your BigQuery warehouse.
If you are noticing an issue with your BigQuery batch export, it may be useful to check the aforementioned JOBS view and the Google Cloud console. The error logs in them could be valuable to either diagnose the issue by yourself, or when creating a support request for us to look into.