Link a source

PostHog enables you to link your most important data from sources like your CRM, payment processor, or database. Once linked, you can combine this data with the product analytics data already in PostHog and query across all of it.

To link a source, go to the Integrations tab, and click the Sources tab. On that page, you have an option to create a managed source such as:

The other option is a self-managed source, which include:

You can find the set up instructions in-app or in the source specific docs.

Inbound IP addresses

We use a set of IP addresses to access your instance. To ensure this connector works, add these IPs to your inbound security rules:

US	EU
44.205.89.55	3.75.65.221
44.208.188.173	18.197.246.42
52.4.194.122	3.120.223.253

Linking a custom source

The data warehouse can link to data in your object storage system. To do this, you'll need to:

Create a bucket in your object storage system like S3, GCS, or Cloudflare R2
Set up an access key and secret
Add data to the bucket (potentially using a tool like Airbyte, Fivetran, Stitch, or others)
Link the table in PostHog

See an example in our S3 setup docs.

Incremental vs append only vs full table

On some sources and tables, you can choose the sync method. The options are incremental replication or full table replication.

Incremental

With incremental replication, you only sync new or updated data. This reduces the total number of rows synced and how long it takes to sync.

When choosing incremental replication, you must select a field to identify new and updated data. This is often something like an updated_at timestamp, or an autoincrementing ID. Not all fields are suitable to be used to identify new and updated data, and so we only support the following types as replication keys:

integer (including bigint and smallint)
datetime
date
timestamp
numeric (for Snowflake)

The one downside to incremental syncing is that deletions of data won't be synced to your PostHog data warehouse. You need to use full table refreshes for this.

Append only

Append only does what it says on the tin: append rows to your table.

We use a cursor in the form of an incremental field (similar to incremental syncing above) to query the source for new data. This replication method won't merge any rows, but this can cause duplicate data if you're using an incremental field that can change. To avoid this, we recommend using a timestamp field like created_at.

Append only is ideal for tables where rows are never updated, only added. A common example is an event log table, where each new event is recorded as a new row and existing records remain unchanged.

Full table

This reloads the whole table on every sync. This is great for tables with common data deletions or ones without an incrementing field (such as a updated_at timestamp).

Syncing

Once you add a source, you can see its status, sync frequency, and last successful run in the sources page. You can also reload or delete sources here.

When you expand each source, you can see: