Linking ClickHouse as a source

The ClickHouse connector can link your ClickHouse database tables to PostHog. ClickHouse databases are often very large, so we stream the data in Arrow batches to keep memory bounded.

To link ClickHouse:

  1. Go to the Data pipeline page and the sources tab in PostHog
  2. Click New source and select ClickHouse
  3. Enter your database connection details:
    • Host: The hostname or IP of your ClickHouse server like play.clickhouse.com or 123.132.1.100.
    • Port: The HTTP(S) port your ClickHouse server is listening on. The default is 8443 for HTTPS and 8123 for HTTP.
    • Database: The name of the database you want to sync. The default is default.
    • User: The username with read permissions on the database.
    • Password: The password for the user (optional).
    • Use HTTPS?: Whether to connect over HTTPS. Default is enabled.
    • Verify SSL certificate?: Whether to verify the server's SSL certificate. Default is enabled. Disable if your server uses a self-signed certificate.
  4. If you need to connect through an SSH tunnel, enable and configure it (optional):
    • Tunnel host: The hostname of your SSH server.
    • Tunnel port: The port your SSH server is listening on.
    • Authentication type:
      • For password authentication, enter your SSH username and password.
      • For key-based authentication, enter your SSH username, private key, and optional passphrase.
  5. Click Next

The data warehouse then starts syncing your ClickHouse data. You can see details and progress in the sources tab.

Permissions: The ClickHouse source only requires read permissions on the database and tables you intend to sync, plus read access to system.tables and system.columns for schema discovery.

Configuration

OptionTypeRequired
Connection string (optional)textNo
HosttextYes
PortnumberYes
DatabasetextYes
UsertextYes
PasswordpasswordNo
Use HTTPS?selectYes
Verify SSL certificate?selectYes
Use SSH tunnel?ssh-tunnelNo

Selecting columns

By default, PostHog syncs all columns from each table. To sync only specific columns:

  1. During source setup, click the columns button at the end of the row (it says "All 7" if the table has 7 columns)
  2. Uncheck columns you don't want to sync.
  3. Primary key columns and the incremental sync field (if configured) are always synced and cannot be disabled.

You can also change column selection after setup:

  1. Go to the sources tab and click your ClickHouse source.
  2. Click Configure next to any schema.
  3. Under Columns and filters, select which columns to sync.
Adding columns to existing syncs

When you add columns to a schema using incremental or append sync, PostHog prompts you to choose:

  • Sync forward only - New columns are populated only for future data. Existing rows show NULL for the new columns.
  • Full resync - Triggers a complete resync to backfill the new columns for all rows.

Row filters

Row filters let you sync only specific rows from a table. During source setup, you can define predicates (column, operator, value) that PostHog applies to the extraction query.

To add row filters:

  1. During source setup, click the columns button at the end of the row (it says "All 7" if the table has 7 columns)
  2. Under Row filters, click Add filter.
  3. Select a column, operator (such as equals, greater than, or in), and enter a value.
  4. Add additional filters as needed. Multiple filters are combined with AND logic.

Row filters compose with incremental syncs — the incremental cursor predicate is applied first, then your filters.

Row filters are especially useful for large ClickHouse tables. For example, filtering by a timestamp column can prune partitions at the source, significantly reducing the amount of data transferred during each sync.

Supported table engines

PostHog can sync data from any ClickHouse table engine, but row counts are only available for engines that track them:

  • MergeTree family (including ReplacingMergeTree, SummingMergeTree, etc.) — full support including accurate row counts from system.tables.total_rows.
  • Distributed tables — row counts come from a distributed SELECT count().
  • MaterializedView — resolves to the underlying TO target table or .inner_id.<uuid> inner table for row counts.
  • View — synced on demand. Row count shown as "Skipped" because counting would require a full scan.
  • Memory, Buffer, Log, Kafka, URL, and other no-counter engines — synced on demand. Row count shown as "Skipped".

Incremental sync

Incremental syncs are supported on integer (Int8Int256, UInt8UInt256) and temporal (Date, Date32, DateTime, DateTime64) cursor fields.

PostHog uses the sorting key from system.columns as the detected primary key. Because ClickHouse sorting keys are not guaranteed to be unique, every incremental sync runs a bounded duplicate-key probe first and will fail the sync if duplicates are detected on the chosen primary key.

Type handling

ClickHouse's Arrow output does not support every type, so PostHog serializes the following to strings on the server side to keep the stream reliable: UUID, IPv4/IPv6, wide ints (Int128/Int256/UInt128/UInt256), Enum8/Enum16, FixedString, Array, Map, Tuple, Nested, Variant, Dynamic, JSON, and Object.

Nullable and LowCardinality wrappers, DateTime/DateTime64 precision and timezones, and Decimal[32–256] are all preserved natively.

Troubleshooting

Table not found or has no columns

If you see an error like "table not found or has no columns," the source table no longer exists in your ClickHouse database. This can happen when:

  • The table was dropped or renamed.
  • You're syncing a materialized view's internal .inner_id.<uuid> table, which changes whenever the view is recreated.

Solution: Remove the missing table from your source configuration, or re-point it to the correct table. If you were syncing a materialized view, sync it by its own name rather than its internal .inner_id.<uuid> table. import InboundIpAddresses from '../_snippets/inbound-ip-addresses.mdx'

Community questions

Was this page useful?

Questions about this page? or post a community question.