Linking ClickHouse as a source
The ClickHouse connector can link your ClickHouse database tables to PostHog. ClickHouse databases are often very large, so we stream the data in Arrow batches to keep memory bounded.
To link ClickHouse:
- Go to the Data pipeline page and the sources tab in PostHog
- Click New source and select ClickHouse
- Enter your database connection details:
- Host: The hostname or IP of your ClickHouse server like
play.clickhouse.comor123.132.1.100. - Port: The HTTP(S) port your ClickHouse server is listening on. The default is
8443for HTTPS and8123for HTTP. - Database: The name of the database you want to sync. The default is
default. - User: The username with read permissions on the database.
- Password: The password for the user (optional).
- Use HTTPS?: Whether to connect over HTTPS. Default is enabled.
- Verify SSL certificate?: Whether to verify the server's SSL certificate. Default is enabled. Disable if your server uses a self-signed certificate.
- Host: The hostname or IP of your ClickHouse server like
- If you need to connect through an SSH tunnel, enable and configure it (optional):
- Tunnel host: The hostname of your SSH server.
- Tunnel port: The port your SSH server is listening on.
- Authentication type:
- For password authentication, enter your SSH username and password.
- For key-based authentication, enter your SSH username, private key, and optional passphrase.
- Click Next
The data warehouse then starts syncing your ClickHouse data. You can see details and progress in the sources tab.
Permissions: The ClickHouse source only requires read permissions on the database and tables you intend to sync, plus read access to
system.tablesandsystem.columnsfor schema discovery.
Configuration
| Option | Type | Required |
|---|---|---|
Connection string (optional) | text | No |
Host | text | Yes |
Port | number | Yes |
Database | text | Yes |
User | text | Yes |
Password | password | No |
Use HTTPS? | select | Yes |
Verify SSL certificate? | select | Yes |
Use SSH tunnel? | ssh-tunnel | No |
Selecting columns
By default, PostHog syncs all columns from each table. To sync only specific columns:
- During source setup, click the columns button at the end of the row (it says "All 7" if the table has 7 columns)
- Uncheck columns you don't want to sync.
- Primary key columns and the incremental sync field (if configured) are always synced and cannot be disabled.
You can also change column selection after setup:
- Go to the sources tab and click your ClickHouse source.
- Click Configure next to any schema.
- Under Columns and filters, select which columns to sync.
When you add columns to a schema using incremental or append sync, PostHog prompts you to choose:
- Sync forward only - New columns are populated only for future data. Existing rows show
NULLfor the new columns. - Full resync - Triggers a complete resync to backfill the new columns for all rows.
Row filters
Row filters let you sync only specific rows from a table. During source setup, you can define predicates (column, operator, value) that PostHog applies to the extraction query.
To add row filters:
- During source setup, click the columns button at the end of the row (it says "All 7" if the table has 7 columns)
- Under Row filters, click Add filter.
- Select a column, operator (such as
equals,greater than, orin), and enter a value. - Add additional filters as needed. Multiple filters are combined with AND logic.
Row filters compose with incremental syncs — the incremental cursor predicate is applied first, then your filters.
Row filters are especially useful for large ClickHouse tables. For example, filtering by a timestamp column can prune partitions at the source, significantly reducing the amount of data transferred during each sync.
Supported table engines
PostHog can sync data from any ClickHouse table engine, but row counts are only available for engines that track them:
- MergeTree family (including
ReplacingMergeTree,SummingMergeTree, etc.) — full support including accurate row counts fromsystem.tables.total_rows. - Distributed tables — row counts come from a distributed
SELECT count(). - MaterializedView — resolves to the underlying
TOtarget table or.inner_id.<uuid>inner table for row counts. - View — synced on demand. Row count shown as "Skipped" because counting would require a full scan.
- Memory, Buffer, Log, Kafka, URL, and other no-counter engines — synced on demand. Row count shown as "Skipped".
Incremental sync
Incremental syncs are supported on integer (Int8–Int256, UInt8–UInt256) and temporal (Date, Date32, DateTime, DateTime64) cursor fields.
PostHog uses the sorting key from system.columns as the detected primary key. Because ClickHouse sorting keys are not guaranteed to be unique, every incremental sync runs a bounded duplicate-key probe first and will fail the sync if duplicates are detected on the chosen primary key.
Type handling
ClickHouse's Arrow output does not support every type, so PostHog serializes the following to strings on the server side to keep the stream reliable: UUID, IPv4/IPv6, wide ints (Int128/Int256/UInt128/UInt256), Enum8/Enum16, FixedString, Array, Map, Tuple, Nested, Variant, Dynamic, JSON, and Object.
Nullable and LowCardinality wrappers, DateTime/DateTime64 precision and timezones, and Decimal[32–256] are all preserved natively.
Troubleshooting
Table not found or has no columns
If you see an error like "table not found or has no columns," the source table no longer exists in your ClickHouse database. This can happen when:
- The table was dropped or renamed.
- You're syncing a materialized view's internal
.inner_id.<uuid>table, which changes whenever the view is recreated.
Solution: Remove the missing table from your source configuration, or re-point it to the correct table. If you were syncing a materialized view, sync it by its own name rather than its internal .inner_id.<uuid> table.
import InboundIpAddresses from '../_snippets/inbound-ip-addresses.mdx'