Linking ClickHouse as a source

The ClickHouse connector can link your ClickHouse database tables to PostHog. ClickHouse databases are often very large, so we stream the data in Arrow batches to keep memory bounded.

To link ClickHouse:

  1. Go to the Data pipeline page and the sources tab in PostHog
  2. Click New source and select ClickHouse
  3. Enter your database connection details:
    • Host: The hostname or IP of your ClickHouse server like play.clickhouse.com or 123.132.1.100.
    • Port: The HTTP(S) port your ClickHouse server is listening on. The default is 8443 for HTTPS and 8123 for HTTP.
    • Database: The name of the database you want to sync. The default is default.
    • User: The username with read permissions on the database.
    • Password: The password for the user (optional).
    • Use HTTPS?: Whether to connect over HTTPS. Default is enabled.
    • Verify SSL certificate?: Whether to verify the server's SSL certificate. Default is enabled. Disable if your server uses a self-signed certificate.
  4. If you need to connect through an SSH tunnel, enable and configure it (optional):
    • Tunnel host: The hostname of your SSH server.
    • Tunnel port: The port your SSH server is listening on.
    • Authentication type:
      • For password authentication, enter your SSH username and password.
      • For key-based authentication, enter your SSH username, private key, and optional passphrase.
  5. Click Next

The data warehouse then starts syncing your ClickHouse data. You can see details and progress in the sources tab.

Permissions: The ClickHouse source only requires read permissions on the database and tables you intend to sync, plus read access to system.tables and system.columns for schema discovery.

Supported table engines

PostHog can sync data from any ClickHouse table engine, but row counts are only available for engines that track them:

  • MergeTree family (including ReplacingMergeTree, SummingMergeTree, etc.) — full support including accurate row counts from system.tables.total_rows.
  • Distributed tables — row counts come from a distributed SELECT count().
  • MaterializedView — resolves to the underlying TO target table or .inner_id.<uuid> inner table for row counts.
  • View — synced on demand. Row count shown as "Skipped" because counting would require a full scan.
  • Memory, Buffer, Log, Kafka, URL, and other no-counter engines — synced on demand. Row count shown as "Skipped".

Incremental sync

Incremental syncs are supported on integer (Int8Int256, UInt8UInt256) and temporal (Date, Date32, DateTime, DateTime64) cursor fields.

PostHog uses the sorting key from system.columns as the detected primary key. Because ClickHouse sorting keys are not guaranteed to be unique, every incremental sync runs a bounded duplicate-key probe first and will fail the sync if duplicates are detected on the chosen primary key.

Type handling

ClickHouse's Arrow output does not support every type, so PostHog serializes the following to strings on the server side to keep the stream reliable: UUID, IPv4/IPv6, wide ints (Int128/Int256/UInt128/UInt256), Enum8/Enum16, FixedString, Array, Map, Tuple, Nested, Variant, Dynamic, JSON, and Object.

Nullable and LowCardinality wrappers, DateTime/DateTime64 precision and timezones, and Decimal[32–256] are all preserved natively.

Inbound IP addresses

We use a set of IP addresses to access your instance. To ensure this connector works, add these IPs to your inbound security rules:

USEU
44.205.89.553.75.65.221
44.208.188.17318.197.246.42
52.4.194.1223.120.223.253

Community questions

Was this page useful?

Questions about this page? or post a community question.