How to write an async migration
Also see: user-facing documentation under in the runbook
Writing an async migration
To write an async migration, you should create a migration file inside
posthog/async_migrations/migrations. The name should follow the convention we use for Django and EE migrations (e.g.
0005_update_events_schema). Check out the existing migrations or examples.
Workflow and architecture
When the Django server boots up - a setup step for async migrations happens, which does the following:
- Imports all the migration definitions
- Populates a dependencies map and in-memory record of migration definitions
- Creates a database record for each
- Check if all async migrations necessary for this PostHog version have completed (else don't start)
- Triggers migrations to run (in order) if
AUTO_START_ASYNC_MIGRATIONSis set and there are uncompleted migrations for this version
Running a migration
When a migration is triggered, the following happens:
- A task is dispatched to Celery to run this migration in the background
- The following basic checks are performed to ensure the migration can indeed run:
- We're not over the concurrent migrations limit
- The migration can be run with the current PostHog version
- The migration is not already running
- The service version requirements are met (e.g. X < ClickHouse version < Y)
- The migration's
- The migration's
- The migration's dependency (if any) has been completed
- We run through each of the operations in order
- Every 30 minutes, a Celery task performs a healthcheck, to ensure that:
- The Celery process running the migration didn't crash
- The migration's healthcheck still passes
Note: Async migrations can also be run synchronously (i.e. not in Celery) using the async migrations CLI (WIP) or the Django shell.
Stopping a migration
A migration can be stopped from the async migrations management page or by issuing a command via Celery's app control to terminate the process running the task.
If a migration is stopped for any reason (manual trigger or error), we will attempt to roll back the migration following the operations specified in reverse order from the last started operation.
If a roll back succeeds, the migration status will be updated to reflect this.
If a migration errors, the error message is added to the migration's database record and we automatically trigger a rollback.
Scope and limitations
The initial implementation of async migrations targets only data migrations, and assumes that the migration is used as a mechanism to help users move into a new default state.
For example, when we moved our ClickHouse
person_distinct_id table to a
CollapsingMergeTree, we updated the SQL for creating the table, and wrote a migration to help users on the old schema migrate to the new schema.
However, users that did a fresh deploy of PostHog after this change already had the table with the new schema created by default.
This is the only type of operation that async migrations currently support, to prevent a complex web of dependencies between migration types.
As such, those writing an async migration should write a sensible
is_required function that determines if the migration should run or not.
Thus, when a user deploys a new PostHog instance, we will first run all EE migrations in order, and then all of the async migrations in order. At this step, async migrations should be skipped if the codebase already contains updated default schemas.
For instance, here's a good
is_required function, which ensures the migration will only run if the table does not already exist.
def is_required(self):result = sync_execute("SELECT count(*) FROM system.tables WHERE database='posthog' AND name='table_x_new'")return result == 0
Is required functions could also take into consideration table schemas, for example by checking the output of
SHOW CREATE TABLE in ClickHouse.
The codebase is structured as follows:
The Django ORM (Postgres) model for storing metadata about async migrations.
API for requesting data about async migrations as well as triggering starts, stops, and rollbacks.
Celery tasks for dealing with async migrations. These are:
run_async_migration: Explicitly triggered to run a migration
check_async_migration_health: Runs every 30 minutes to perform a healthcheck
Classes to be used when writing an async migration, outlining the necessary components of a migration.
Code that runs when the Django server boots to setup the necessary scaffolding for async migrations.
Code related to running an async migration, from executing operations in sequence to attempting rollbacks.
Code to support the runner in tasks that do not depend on the availability of the migration definition (module).