• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
      • Team structure
      • Why Small Teams
      • Team App East
      • Team App West
      • Team Platform
      • Team Ingestion
      • Team Infrastructure
      • Team Marketing
      • Team Website and Docs
      • Team People and Ops
      • Team Customer Success
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Releasing a new version
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • Breaking glass to debug PostHog Cloud
      • Developing the website
      • MDX setup
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Scale features prioritization
    • Paid features
    • Releasing as beta
    • Overview
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
      • Team structure
      • Why Small Teams
      • Team App East
      • Team App West
      • Team Platform
      • Team Ingestion
      • Team Infrastructure
      • Team Marketing
      • Team Website and Docs
      • Team People and Ops
      • Team Customer Success
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Releasing a new version
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • Breaking glass to debug PostHog Cloud
      • Developing the website
      • MDX setup
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Scale features prioritization
    • Paid features
    • Releasing as beta
    • Overview
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
  • Handbook
  • Engineering
  • Working with data
  • How to optimize queries

How to optimize queries

Last updated: Mar 15, 2022

On this page

  • PostgreSQL
  • Coding best practices
  • Indexing
  • How-to find slow queries
  • How-to fix slow queries
  • How-to reduce IO
  • Removing unused indices on foreign key fields
  • ClickHouse
  • How-to find slow queries
  • Grafana
  • PostHog instance/status dashboard
  • Metabase
  • How-to fix slow queries

Making sure PostHog operates fast at scale is key to our success.

This document outlines some best practices to archive good query performance at scale, as well as describing tools and procedures to discover and fix performance issues.

PostHog uses two different datastores:

  • PostgreSQL: row-oriented OLTP database, mainly used to store and query datasets using predictable clause(s). It is likely your best choice if:

    • the query pattern to access your dataset is predictable
    • the dataset will likely not grow overtime above (<= 1 TB)
    • the dataset needs to mutate often (DELETE/UPDATE)
    • the query pattern requires joins across multiple tables
  • ClickHouse: column-oriented OLAP database, used to store large datasets and run on them analytical queries. It is likely your best choice if:

    • the query pattern to access your dataset is unpredictable
    • the dataset will likely grow overtime (> 1 TB)
    • the dataset doesn't need to mutate often (DELETE/UPDATE)
    • the query pattern doesn't requires joins across multiple tables

PostgreSQL

Coding best practices

  1. only ask for the field(s) you need: SELECT name, surname is better than SELECT * (the latter is only helpful in few edge cases)

  2. only ask for the row(s) you need: use a LIMIT condition at the end of your query

  3. (if possible) avoid explicit transactions: if you can't, keep them small since transactions lock the processing tables data and may result in deadlocks (super discouraged to use them in application hot paths)

  4. (if possible) avoid JOIN

  5. avoid the use of subqueries: a subquery is a SELECT statement that is embedded in a clause of another SQL statement. It's easier to write, but JOINs are usually better-optimized for the database engines.

  6. use appropriate data type(s): not all the types occupy the same, and when we use a concrete data type, we can also limit its size according to what we store. For example, VARCHAR(4000) is not the same as VARCHAR(40). We always have to adjust to what we will store in our fields not to occupy unnecessary space in our database (and we should enforce this limit in the application code to avoid query errors).

  7. use the LIKE operator only if necessary: if you know what you are looking for use the = operator

Note: for the Django app we currently rely on the Django-ORM as interface between our data and the relational database. While we don't directly write SQL queries in this case, the following best practices should be considered anyway.

If you want to print executed queries (while running with DEBUG) you can run:

from django.db import connection
print(connection.queries)

while for an individual query you can run:

print(Model.objects.filter(name='test').query)

Indexing

If you are programmatically ordering, sorting, or grouping by a column, you should probably have an index on it. The caveat is that indexing slows down writes to the table and takes disk space (please drop unused indexes).

Composite indices are useful when you want to optimize querying on multiple non-conditional columns. For more info on indices and multi-column indices see the official docs.

How-to find slow queries

To find and debug slow queries in production you have a few options available:

  • Browse to the Diagnose tab in Heroku Data's dashboard. You can break queries down by:
    • Most time consuming
    • Most frequently invoked
    • Slowest execution time
    • Slowest I/O
  • You can also use Heroku's Diagnose feature by running heroku pg:diagnose to get a breakdown of long running queries, long transactions, among other diagnostics.
  • For a more raw approach you can access real time logs from Heroku by executing heroku logs --app posthog --ps postgres
  • With any logs pulled from PostgreSQL you can use pgbadger to find exactly the queries that are consuming the most time and resources.

How-to fix slow queries

Fixing a slow query is usually a 3 steps process:

  1. identify which part of the codebase is generating it (adding the stacktrace as query comments is usually helpful to map query <-> code).

  2. re-run the query with EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) as prefix to get the query plan. Query plans aren't the easiest thing to read. They're packed with information and it's closer to being machine parsable than human readable. Postgres Explain Viewer 2 (aka pev2, see GitHub repository) is a tool to simplify reading query plans. It provides a horizontal tree with each node representing a node in the query plan. It includes timing information, the error amount in the planned versus actual times, and badges for interesting nodes like "costliest" or "bad estimate".

  3. fix the query that should now generate a less costly EXPLAIN plan.

How-to reduce IO

Some operations we perform can cause a large amount of IO. For writes specifically this can be from writing table data but also from writing to indices. We're using the Django ORM to generate table schemas, which is convenient but also tends to add indices where we perhaps either don't need them, or where a single composite index would better serve our querying patterns.

TODO: fill in other sources and mitigations to high IO

Removing unused indices on foreign key fields

When we add a ForeignKey to a model, django will add a couple of things that we may or may not need, an index and a constraint. Sometimes these are superfluous and we'd do better to not have these. For example, consider this changes in this PR.

Here we had a table with two ForeignKeys, team and person. The query pattern for this table always filters by team and the distinct_id CharField column (it's essentially a lookup table from (team_id, distinct_id) -> person_id). As a result we have an index on (team_id, distinct_id). The team_id index that django is adding for the ForeignKey is not used at all, as observed by running:

SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
ORDER BY pg_relation_size(s.indexrelid) DESC;

To remove this index, we perform the following:

  1. add db_index=False to the team ForeignKey. This updates Djangos understanding of the state of the database schema.
  2. run ./manage.py makemigrations posthog to generate a new database migration. One issue with the generated migration is that it does not DROP the index CONCURRENTLY. See the postgresql docs for more details here, but essentially it means that while the generated migration is running, the table will be locked, and we do not know how long for. Queries on this table will pile up and could cause an outage. To get around this we:
  3. manually update this generated migration to drop the index CONCURRENTLY. This updated command will return instantly, the table will not be locked, and dropping the index will happen in the background. Note that we need to use SeparateDatabaseAndState such that we can ensure that django will not create new migrations on subsequent runs.

Then once we've run this migration, we should see the index disappear from the above index SQL results.

ClickHouse

How-to find slow queries

To find and debug slow queries in production you have several options available

Grafana

The Clickhouse queries - by endpoint dashboard gives a breakdown of how things are looking reliability and performance-wise. Highly used and slow/unreliable endpoints often indicate issues with queries.

PostHog instance/status dashboard

Under https://app.posthog.com/instance/status/internal_metrics you will find various metrics and query logs. Note: if you are a staff user you can also analyze queries by clicking on them (or copying your own queries).

This analysis will output:

  • Query runtime
  • Number of rows read / Bytes read
  • Memory used
  • Flamegraphs for CPU, time and memory

These can be useful for figuring out why certain queries are performing slow.

Metabase

Need more granular access to queries than these dashboards provide? Take a look at this Metabase query. The ClickHouse system tables (e.g. system.query_log) provide a lot of useful information for identifying and diagnosing slow queries.

How-to fix slow queries

TODO

If you have a fix, you need to make sure it works and doesn't regress. Under the main repo, you can find a query performance benchmark suite at ee/benchmarks. These are run nightly on master. You can refresh this manually by running the Benchmark workflow, which might be handy for chained PRs.

After fixing an issue:

  1. Make sure there's an appropriate benchmark for the fix
  2. Add performance label to your PR to get information about the performance changes

Only changes > 20% are reported. ClickHouse queries are run on a private ClickHouse installation with pre-populated data.

Questions?

Was this page useful?

Next article

How to write an async migration

Also see: user-facing documentation under in the runbook Writing an async migration To write an async migration, you should create a migration file inside posthog/async_migrations/migrations . The name should follow the convention we use for Django and EE migrations (e.g. 0005_update_events_schema ). Check out the existing migrations or examples . Workflow and architecture Setup When the Django server boots up - a setup step for async migrations happens, which does the following: Imports all…

Read next article

Share

Jump to:

  • PostgreSQL
  • How-to find slow queries
  • How-to fix slow queries
  • How-to reduce IO
  • ClickHouse
  • How-to fix slow queries
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • About
  • Blog
  • Newsletter
  • Careers
  • Support
  • Contact sales

Product OS suite

Product overview

Analytics
  • Funnels
  • Trends
  • Paths

Pricing

Features
  • Session recording
  • Feature flags
  • Experimentation
  • Heatmaps

Customers

Platform
  • Correlation analysis
  • Collaboration
  • Apps

Community

Discussion
  • Questions?
  • Slack
  • Issues
  • Contact sales
Get involved
  • Roadmap
  • Contributors
  • Merch
  • PostHog FM
  • Marketplace

Docs

Getting started
  • PostHog Cloud
  • Self-hosted
  • Compare options
  • Tutorials
  • PostHog on GitHub
Install & integrate
  • Installation
  • Docs
  • API
  • Apps
User guides
  • Cohorts
  • Funnels
  • Sessions
  • Data
  • Events

Company

About
  • Our story
  • Team
  • Handbook
  • Investors
  • Careers
Resources
  • FAQ
  • Ask a question
  • Blog
  • Press
  • Merch
  • YouTube
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy
  • Terms