Linking Postgres as a source
Contents
The Postgres connector can link your database tables to PostHog. You can use it for regular scheduled syncs, or enable change data capture (CDC) for near real-time syncs that capture inserts, updates, and deletes.
To link Postgres:
- Go to the Data pipeline page and the sources tab in PostHog
- Click New source and select Postgres
- Enter your database connection details:
- Connection string: An optional PostgreSQL connection string.
- Host: The hostname or IP your database server like
db.example.comor123.132.1.100. - Port: The port your database server is listening to. The default is
5432. - Database: The name of the database you want like
analytics_db. - User: The username with the necessary permissions to access the database.
- Password: The password for the user.
- Schema: (Optional) The schema for your database. Leave blank to browse and sync tables from all schemas, or specify one to limit the selection. The default is
public.
- If you need to connect through an SSH tunnel, enable and configure it (optional):
- Tunnel host: The hostname of your SSH server.
- Tunnel port: The port your SSH server is listening on.
- Authentication type:
- For password authentication, enter your SSH username and password.
- For key-based authentication, enter your SSH username, private key, and optional passphrase.
- Require TLS through tunnel? - Whether to require TLS encryption through the SSH tunnel. Enabled by default. Disable if your database does not support TLS.
- Optional: enable Change data capture (CDC) if your database is prepared for logical replication.
- Click Next, select the tables to sync, choose the sync method for each table, then click Link
The data warehouse then starts syncing your Postgres data. You can see details and progress in the sources tab.
Configuration
| Option | Type | Required |
|---|---|---|
Connection string (optional) | text | No |
Host | text | Yes |
Port | number | Yes |
Database | text | Yes |
User | text | Yes |
Password | password | Yes |
Schema | text | No |
Use SSH tunnel? | ssh-tunnel | No |
Selecting columns
By default, PostHog syncs all columns from each table. To sync only specific columns:
- During source setup, click Columns next to any table in the table picker.
- Uncheck columns you don't want to sync.
- Primary key columns and the incremental sync field (if configured) are always synced and cannot be disabled.
You can also change column selection after setup:
- Go to the sources tab and click your Postgres source.
- Click Configure next to any schema.
- Under Columns, select which columns to sync.
When you add columns to a schema using incremental, append, or CDC sync, PostHog prompts you to choose:
- Sync forward only - New columns are populated only for future data. Existing rows show
NULLfor the new columns. - Full resync - Triggers a complete resync to backfill the new columns for all rows.
PostgreSQL sources created after February 18, 2026 require SSL/TLS encryption for database connections. Ensure your PostgreSQL database supports SSL/TLS connections before linking. Existing sources created before this date are not affected. If you're connecting through an SSH tunnel, you can optionally disable TLS using the Require TLS through tunnel? toggle since the tunnel already encrypts the connection.
Inbound IP addresses
We use a set of IP addresses to access your instance. To ensure this connector works, add these IPs to your inbound security rules:
| US | EU |
|---|---|
| 44.205.89.55 | 3.75.65.221 |
| 44.208.188.173 | 18.197.246.42 |
| 52.4.194.122 | 3.120.223.253 |
Note: We currently don't support connections using IPv6, therefore, you may need to enable IPv4 connections to your database. This is required for Supabase, for example.
Sync methods
Postgres tables support full table, incremental, append-only, and CDC sync methods.
Use CDC when you need PostHog to stay close to your source database and you care about deletes. It is especially useful when:
- Your table has inserts, updates, and deletes.
- You don't have a reliable
updated_ator other incremental cursor field. - Full table refreshes are too expensive for a frequently changing table.
- You want an optional change-history table for audit or lifecycle analysis.
Use incremental or append-only sync instead when your table is append-heavy, has a reliable non-null cursor field, or you don't want to manage logical replication.
CDC relies on PostgreSQL logical replication and a replication slot. If PostHog cannot consume the slot fast enough, PostgreSQL retains WAL files and your source database can run out of disk. Only enable CDC if you can monitor and operate replication slots, or if your database administrator is involved.
How PostHog CDC works
PostHog's Postgres CDC implementation uses PostgreSQL logical replication with the built-in pgoutput plugin.
When you create a CDC source:
- PostHog creates a replication slot for the source database.
- In PostHog-managed mode, PostHog creates the publication and adds selected CDC tables to it.
- In self-managed mode, you or your DBA create and update the publication. PostHog verifies that publication exists, then creates and manages the replication slot itself.
- Each CDC table first runs an initial snapshot through the normal Postgres import pipeline.
- After the snapshot completes, the table switches to streaming mode.
- A source-level CDC extraction schedule reads pending WAL changes, writes them to PostHog's data warehouse storage, then advances the replication slot.
PostHog reads changes with pg_logical_slot_peek_binary_changes(...) and confirms progress with pg_replication_slot_advance(...) only after the batch is safely written. CDC extraction is scheduled from the most frequent active CDC table interval. For newly created CDC schemas, PostHog uses a 5 minute interval by default.
The decoder currently handles Postgres logical replication messages for:
- Inserts
- Updates
- Deletes
- Truncates
Truncates are not replayed as row-by-row deletes. If PostHog sees a TRUNCATE, it marks the table for a new snapshot so the destination can recover to the new source state.
Output table modes
For each CDC table, choose how PostHog exposes the captured data.
Consolidated table only
PostHog keeps one destination table with the latest row state per primary key. CDC batches are deduplicated by primary key before being merged. Delete events are represented with CDC metadata columns such as _ph_deleted and _ph_deleted_at.
Use this when you mainly want a current-state table for analytics.
CDC history table only
PostHog creates a _cdc-suffixed table, for example users_cdc. It stores change history using SCD type 2-style columns:
_ph_cdc_op:I,U, orD_ph_cdc_timestamp: the Postgres commit timestamp_ph_deleted: whether the event is a delete_ph_deleted_at: delete timestamp for delete eventsvalid_from: when this version became validvalid_to: when this version stopped being current
The initial snapshot seeds the _cdc table as the starting point. History before the initial snapshot is not available.
Both
PostHog creates both the consolidated current-state table and the _cdc history table. This costs more storage and processing, but gives you both easy current-state analysis and a row-level history of changes from the CDC starting point.
CDC requirements
PostHog includes a Check CDC compatibility step in the source configuration screen. It validates the main prerequisites against your database before the source is created, including PostgreSQL version, logical replication, selected table primary keys, SELECT permissions, replication permissions, replication slot capacity, and publication existence for self-managed CDC.
You should still prepare the database before enabling CDC, especially if logical replication is not already enabled.
PostgreSQL version
Postgres CDC requires PostgreSQL 13 or later. PostHog creates publications with publish_via_partition_root = true, which requires Postgres 13+.
Check your version:
Logical replication
Your database must have logical replication enabled:
The value must be logical.
On self-managed Postgres, this usually means setting the following in postgresql.conf, then restarting PostgreSQL:
On AWS RDS or Aurora PostgreSQL, enable logical replication in the DB parameter group:
Changing this setting requires a database restart. Other managed Postgres providers have their own equivalent logical replication setting.
Replication slot capacity
PostHog creates one logical replication slot per Postgres CDC source. The source database must have a free replication slot:
Don't reuse an existing replication slot. A slot should have exactly one consumer.
Table primary keys
Every table you sync with CDC must have a primary key. PostHog uses the primary key to merge updates, represent deletes, and build the optional history table.
Tables without a primary key won't be offered as CDC-capable tables in the setup flow.
Database user permissions
The PostHog database user needs schema USAGE and table SELECT permissions for every schema and table you want to sync. You don't need to grant access to public unless you're syncing tables from the public schema.
If all selected tables are in one schema, grant access to that schema:
If you're syncing tables across multiple schemas, repeat the grants for each selected schema, or grant SELECT on only the specific tables you selected:
It also needs permission to create and read logical replication slots.
For self-managed Postgres or most managed providers:
For AWS RDS:
On RDS, membership in rds_superuser also satisfies PostHog's prerequisite check.
Slot and publication management
PostHog supports two CDC management modes.
PostHog-managed
PostHog creates and manages both the replication slot and publication. It generates names like:
- Slot:
posthog_<source id prefix> - Publication:
posthog_pub_<source id prefix>
PostHog-managed mode is the easiest path if the database user has enough privileges. The user needs REPLICATION, SELECT, and ownership of the synced tables or superuser-equivalent permissions so PostHog can add and remove tables from the publication.
When you disable a CDC table, PostHog removes it from the publication. Changes made while the table is disabled are permanently lost from the CDC stream, so re-enabling the table requires a full resync.
When you delete the source, PostHog soft-deletes the warehouse tables and attempts to drop the PostHog-managed replication slot and publication. Cleanup is best-effort, so you should still verify the slot is gone if the source was failing or unreachable.
Self-managed
Use self-managed mode when you don't want to grant table ownership or superuser-equivalent permissions to the PostHog user.
In this mode:
- You or your DBA create the publication as the table owner.
- PostHog verifies the publication exists.
- PostHog still creates and manages the replication slot itself.
- PostHog still needs
REPLICATION, schemaUSAGE, and tableSELECT.
After you select CDC tables, PostHog shows a setup SQL dialog. It looks like this:
Do not manually create the replication slot for normal self-managed setup. PostHog creates the slot with pg_create_logical_replication_slot(..., 'pgoutput').
If you later add a CDC table in self-managed mode, add it to the publication yourself:
Avoid putting unrelated tables in the publication. PostHog filters out changes for tables that are not active CDC schemas, but the database still has to decode and send those changes.
WAL lag protection
Replication slots retain WAL until the consumer advances the slot. If PostHog stops consuming changes, retained WAL can grow.
PostHog stores two lag thresholds on CDC sources:
- Warning threshold:
1024 MBby default - Critical threshold:
10240 MBby default
For PostHog-managed sources, automatic slot protection is enabled by default. If lag exceeds the critical threshold, PostHog attempts to drop the replication slot and publication, then marks the source as errored. This protects your source database from unbounded WAL growth, but it means you must recreate or resync the CDC source after fixing the issue.
For self-managed sources, PostHog does not drop the slot or publication. It marks the source as errored at the critical threshold, but you remain responsible for cleanup.
You can monitor a Postgres slot yourself:
Operational risks and limitations
CDC is powerful, but it changes how your production database retains WAL. Keep these constraints in mind:
- Enabling logical replication often requires a database restart.
- A stuck replication slot can fill your database disk with retained WAL.
- PostHog CDC only supports tables with primary keys.
- CDC captures row changes, not a complete history before the initial snapshot.
- PostgreSQL logical replication doesn't send DDL as normal row changes. Schema changes may require a full resync, especially destructive changes like dropping or changing column types.
- Large
TOASTed values can be omitted from some update messages when they are unchanged. Validate CDC output carefully if your tables include very largetext,jsonb, orbyteacolumns. - Disabling a table removes it from the publication in PostHog-managed mode. Changes made while disabled are not recoverable from the slot.
- If a publication does not include a selected table, PostHog won't receive changes for that table.
- If a table is truncated, PostHog triggers a new snapshot instead of replaying the truncate as individual deletes.
For production databases, we recommend enabling CDC on one or two representative tables first so your team can validate database permissions, WAL settings, monitoring, and internal rollout steps before adding more tables.
Troubleshooting
wal_level must be set to logical
Enable logical replication on the database and restart PostgreSQL. On RDS or Aurora PostgreSQL, set rds.logical_replication = 1 in the parameter group and restart the instance.
No replication slot capacity available
Increase max_replication_slots or remove unused slots. Check existing slots with:
Publication does not exist
This happens in self-managed mode when PostHog can't find the publication name you entered. Create the publication as the table owner, then click Verify & create source again.
Source is errored because of WAL lag
Check the slot lag query above. If PostHog-managed slot protection dropped the slot, create a new CDC source or run a full resync after fixing the underlying issue. If you're in self-managed mode, your DBA must decide whether to keep, advance, or drop the slot.
Looking for an example of the Postgres source? Check out our tutorial where we connect and query Supabase data.