It's important to make sure you understand how to recovery from scenarios before they happen.
Data is stored in PostgreSQL, ClickHouse and Kafka. At a minimum you should have backups of PostgreSQL and ClickHouse, these are crucial to PostHog. Kafka stores the incoming analytics events which are then persisted into ClickHouse.
Note: in order to work, ClickHouse and Kafka use Zookeeper as distributed key value store and locking service. While Zookeeper doesn't host any data related to PostHog it's a core dependency of the overall stack.
PostgreSQL stores your PostHog instance configuration and authentication/authorization.
We recommend that you use a provider that supports backups. For example, AWS RDS will handle backups and restores for you. Other cloud providers offer similar services.
For ClickHouse, see our ClickHouse backup runbook.
ClickHouse consumes from Kafka using a static consumer group. Make sure that you only ever have one cluster consuming at a time. When bringing up a cluster from backup it is a good idea to ensure that the consumer cursors are reset such that the cluster consumes any events in Kafka that may be missing from your ClickHouse the backup.
Kafka is the first place events are persisted before being consumed by other services, and ultimately ending up in ClickHouse. Solutions exist for streaming this data to long term storage. For example, with AWS you can use either the Lenses.io S3 Kafka Connector or AWS Kinesis Firehose.
If PostHog is consuming events to ClickHouse in good time, you should not lose much in case of a Kafka failure, so it may be a risk that you want to accept.
ZooKeeper is a centralized service providing distributed synchronization. It is a core dependency of both ClickHouse and Zookeeper.