0002_fill_distinct_id2 is an async migration added to migrate the data from the old
person_distinct_id table to the new
This is needed for faster
person_distinct_id queries as the old schema worked off of (
person_id) pairs, making it expensive for our analytics queries, which need to map from
distinct_id to the latest
The new schema works off of
distinct_id columns, leveraging
ReplacingMergeTrees with a version column we store in postgres.
We migrate teams one-by-one to avoid running out of memory.
The migration strategy:
1. Write any new updates to both tables2. Insert all non-deleted (`team_id`, `distinct_id`, `person_id`) rows from `person_distinct_id` into `person_distinct_id2` (this migration)3. Once migration has run, we only read/write from/to pdi2.
Is it dangerous for this migration to be in an errored state?
No, the migration copies data to the new table, but that new table is not used until the migration has successfully completed.