PostHog is very good at handling large volumes of data, and, for most products or websites, you'll never have to worry about scaling. We've seen volumes of 5 million events/day and hundreds of requests a second on Heroku's cheapest tier with no problems.
In that example, the database grew by about 500mb a day. That means Heroku's cheapest standard database is able to hold around a years' worth of data.
However, if you do start going beyond these numbers, there are things you can do to scale up.
We recommend partitioning tables when running queries in PostHog is getting noticably slow. The partition function applies event/timestamp partitioning to the event table, meaning the table will be partitioned per event and each event table will be partitioned per week.
Partitioning events doesn't change how you use PostHog. It'll only speed things up.
The command allows you to specify which events to explicitly create partitions for. If none are specified, the table will only be partitioned by timestamp per week.
If any single event has over a million events a week, it's worth partitioning on that event. Any events that aren't partitioned get put together in a default bucket for that week. For instance, if you're using
posthog-js or the snippet, it's worth partitioning on
$autocapture, as they tend to be high volume events.
Note: It's possible to create partitions while the site is running, but you might run into locking or memory issues. It's recommended to take your app temporarily offline (maintenance mode in Heroku) while creating the partitions.
Note: Partitions were introduced in PostHog 1.6.0.
Partitioning only by week (not events):
python manage.py partition
Partioning by '$pageview' events:
python manage.py partition --element '$pageview'
Partitioning by multiple events:
python manage.py partition --element '$pageview' --element '$autocapture'
Should any problem arise with partitions, we provide a reversal function that will return the partitioned table to its original configuration:
python manage.py partition --reverse
One easy way of scaling is to add more workers or web servers (dynos on Heroku). It's worth having a look at load and RAM metrics for your specific instance to see which one is struggling. PostHog can handle many web servers and workers working concurrently.
If you're doing lots of queries over large numbers of events, it might make sense to scale up your database machine, specifically the RAM. This is probably the cheapest and most effective way of speeding up workload.
If the volume of data starts to become a problem to the point where you can't scale Postgres any further, we offer integrations with various databases designed to hold huge volumes of data. This is part of our enterprise offering, and we'd love to work with you to set this up.
You can contact us at [email protected] if you have any questions.