• Product
  • Pricing
  • Docs
  • Using PostHog
  • Community
  • Company
  • Login
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
      • Team structure
      • Why Small Teams
      • Team App East
      • Team App West
      • Team Platform
      • Team Ingestion
      • Team Infrastructure
      • Team Marketing
      • Team Website and Docs
      • Team People and Ops
      • Team Customer Success
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Releasing a new version
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • Breaking glass to debug PostHog Cloud
      • Developing the website
      • MDX setup
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Scale features prioritization
    • Paid features
    • Releasing as beta
    • Overview
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
  • Table of contents

  • Handbook
    • Start here
    • Meetings
    • Story
    • Team
    • Investors
    • Strategy overview
    • Business model
    • Objectives
    • Roadmap
    • Brand
    • Culture
    • Values
    • Goal setting
    • Diversity and inclusion
    • Communication
    • Management
    • Offsites
    • Security
    • Brand assets
      • Team structure
      • Why Small Teams
      • Team App East
      • Team App West
      • Team Platform
      • Team Ingestion
      • Team Infrastructure
      • Team Marketing
      • Team Website and Docs
      • Team People and Ops
      • Team Customer Success
    • Compensation
    • Share options
    • Benefits
    • Time off
    • Spending money
    • Progression
    • Training
    • Feedback
    • Onboarding
    • Offboarding
      • Product Manager ramp up
    • Merch store
      • Overview
      • Engineering hiring
      • Marketing hiring
      • Operations hiring
      • Design hiring
      • Exec hiring
      • Developing locally
      • Tech stack
      • Project structure
      • How we review PRs
      • Frontend coding
      • Backend coding
      • Support hero
      • Feature ownership
      • Releasing a new version
      • Bug prioritization
      • Event ingestion explained
      • Making schema changes safely
      • How to optimize queries
      • How to write an async migration
      • How to run migrations on PostHog Cloud
      • Working with ClickHouse materialized columns
      • Deployments support
      • Working with cloud providers
      • Breaking glass to debug PostHog Cloud
      • Developing the website
      • MDX setup
    • Shipping things, step by step
    • Feature flags specification
    • Setting up SSL locally
    • Tech talks
    • Overview
    • Product metrics
    • User feedback
    • Scale features prioritization
    • Paid features
    • Releasing as beta
    • Overview
    • Overview
    • Personas
    • Testimonials
    • Value propositions
      • Content & SEO
      • Sponsorship
      • Paid ads
      • Email
      • Press
    • Growth strategy
    • Customer support
    • Inbound sales model
    • Sales operations
      • Managing our CRM
      • YC onboarding
      • Demos
      • Billing
      • Who we do business with
  • Handbook
  • Engineering
  • Working with data
  • Working with ClickHouse materialized columns

Working with ClickHouse materialized columns

Last updated: Mar 24, 2022

On this page

  • Background
  • Materialized columns in practice
  • Automatic materialization
  • Manual materialization

This document outlines what materialized columns in ClickHouse are, how we're making use of them and how to manage them on cloud.

Background

We currently store JSON data in string columns in clickhouse, reading and parsing that data at query-time. This can be slow due to how "fat" these columns are.

Materialized columns allow us to "store" specific properties stored in JSON as separate columns that are there on disk, making reading these columns up to 25x faster than normal properties.

Also check out our blog post on materialized columns.

Note that ClickHouse 22.3 added experimental support for Object data type which will eventually replace how we store JSON data and remove the need for materialized columns

Materialized columns in practice

Materialized columns play a huge role in optimizing performance for large clients having difficulties with performance.

This is why we automatically materialize columns and have tooling for creating them manually as well.

Note that materialized columns also require backfilling the materialized columns to be effective - an operation best done on a weekend due to extra load it adds to the cluster.

Automatic materialization

We have a cron-job which analyzes slow queries ran last week and tries to find properties that are used in these slow queries, materializing some of these. Code for this can be found in ee/clickhouse/materialized_columns/analyze.py

Note that this cron can often be disabled due to cluster issues or ongoing data migrations.

See environment variables documentation + instance settings for toggles which control this.

Manual materialization

python manage.py materialize_columns command can be used to manually materialize one or more properties.

Alternatively this can be done over python manage.py shell_plus. One example of materializing all properties used by a team can be found here:

from ee.clickhouse.materialized_columns.columns import *
pd = PropertyDefinition.objects.filter(team_id=2635)
used_props = set(p.name for p in pd if "distinct_id" not in p.name and "$" not in p.name)
event_props_to_materialize = used_props - set(get_materialized_columns("events", use_cache=False))
from ee.clickhouse.sql.person import GET_PERSON_PROPERTIES_COUNT
rows = sync_execute(GET_PERSON_PROPERTIES_COUNT, {"team_id": 2635})
person_props_to_materialize = set(name for name, _ in rows if "$" not in name) - set(get_materialized_columns("person", use_cache=False))
from ee.clickhouse.materialized_columns.analyze import logger, materialize_properties_task
columns_to_materialize = []
columns_to_materialize += [("events", prop, 0) for prop in event_props_to_materialize]
columns_to_materialize += [("person", prop, 0) for prop in person_props_to_materialize]
materialize_properties_task(
columns_to_materialize=columns_to_materialize,
backfill_period_days=90,
dry_run=False,
maximum=len(columns_to_materialize)
)

Note that this snippet might need modification depending on the usecase.

Questions?

Was this page useful?

Next article

Deployments support

If you're the week's support hero or you are providing support for a customer and they have questions about their self-hosted deployment, follow this guide to provide initial support before looping in someone from the Platform team. Gather basic information Here's a sample message that should help gather the relevant information up-front (appropriate for #community-support , but if working in a private channel with a paid customer, remove some of the obvious questions). 👋 Are you self-hosting…

Read next article

Share

Jump to:

  • Background
  • Materialized columns in practice
  • Automatic materialization
  • Manual materialization
  • Questions?
  • Edit this page
  • Raise an issue
  • Toggle content width
  • Toggle dark mode
  • About
  • Blog
  • Newsletter
  • Careers
  • Support
  • Contact sales

Product OS suite

Product overview

Analytics
  • Funnels
  • Trends
  • Paths

Pricing

Features
  • Session recording
  • Feature flags
  • Experimentation
  • Heatmaps

Customers

Platform
  • Correlation analysis
  • Collaboration
  • Apps

Community

Discussion
  • Questions?
  • Slack
  • Issues
  • Contact sales
Get involved
  • Roadmap
  • Contributors
  • Merch
  • PostHog FM
  • Marketplace

Docs

Getting started
  • PostHog Cloud
  • Self-hosted
  • Compare options
  • Tutorials
  • PostHog on GitHub
Install & integrate
  • Installation
  • Docs
  • API
  • Apps
User guides
  • Cohorts
  • Funnels
  • Sessions
  • Data
  • Events

Company

About
  • Our story
  • Team
  • Handbook
  • Investors
  • Careers
Resources
  • FAQ
  • Ask a question
  • Blog
  • Press
  • Merch
  • YouTube
© 2022 PostHog, Inc.
  • Code of conduct
  • Privacy
  • Terms