Historical migrations overview

Last updated:

|Edit this page

Prior to starting a historical data migration, ensure you do the following:

  1. Create a project on our US or EU Cloud.
  2. Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
  3. Raise an in-app support request (Target Area: Data Management) detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
  4. Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
  5. Set the historical_migration option to true when capturing events in the migration.

Historical migrations refers to ingesting and importing past data into PostHog for analysis. This includes:

Migrating historical data is free, but you must contact sales@posthog.com if you plan to migrate more than 20M events (or 10k events per minute) to avoid being rate-limited.

What about exporting data from PostHog? To export data from PostHog to external services like S3 or BigQuery, use our batch export feature.

The basics of migrating data into PostHog

Start your migration by formatting your data correctly. There is no way to selectively delete event data in PostHog, so getting this right is critical. This means:

  • Using the correct event names. For example, to capture a pageview event in PostHog, you capture a $pageview event. This might be different than the "name" other services use.

  • Including the timestamp field. This ensures your events are ingested with the correct time in PostHog. It needs to be in the ISO 8601 format.

  • Use the correct distinct_id. This is the unique identifier for your user in PostHog. Every event needs one. For example, posthog-js automatically generates a uuidv7 value for anonymous users.

To capture events, you must use the PostHog Python SDK or the PostHog API batch endpoint with the historical_migration set to true. This ensures we handle this data correctly and you aren't charged standard ingestion fees for it.

An example Python implementation looks like this:

Python
from posthog import Posthog
from datetime import datetime
posthog = Posthog(
'<ph_project_api_key>',
host=https://us.i.posthog.com,
debug=True,
historical_migration=True
)
events = [
{
"event": "batched_event_name",
"properties": {
"distinct_id": "user_id",
"timestamp": datetime.fromisoformat("2024-04-02T12:00:00")
}
},
{
"event": "batched_event_name",
"properties": {
"distinct_id": "used_id",
"timestamp": datetime.fromisoformat("2024-04-02T12:00:00")
}
}
]
for event in events:
posthog.capture(
distinct_id=event["properties"]["distinct_id"],
event=event["event"],
properties=event["properties"],
timestamp=event["properties"]["timestamp"],
)

An example cURL implementation using the batch API endpoint looks like this:

Terminal
curl -v -L --header "Content-Type: application/json" -d '{
"api_key": "<ph_project_api_key>",
"historical_migration": true,
"batch": [
{
"event": "batched_event_name",
"properties": {
"distinct_id": "user_id"
},
"timestamp": "2024-04-03T12:00:00Z"
},
{
"event": "batched_event_name",
"properties": {
"distinct_id": "user_id"
},
"timestamp": "2024-04-03T12:00:00Z"
}
]
}' https://app.posthog.com/batch/

Best practices for migrations

  • Separate exporting your data from your service from importing it into PostHog. Store it in a storage service like S3 or GCS in between to ensure exported data is complete.

  • Build resumability into your exports and imports, so you can just resume the process from the last successful point if any problems occur.

  • A best practice is creating a new PostHog project and testing the migration on that project before running it on your production PostHog instance.

  • To batch user updates, use the same request but with the $identify event. Same for groups and the $group_identify event.

  • If you're running a migration that is more than 20M events (or 10k events per minute), talk to us at sales@posthog.com to avoid being rate-limited.

Questions?

Was this page useful?

Next article

Historical migrations overview

Historical migrations refers to ingesting and importing past data into PostHog for analysis. This includes: Migrating from a different tool or platform like Mixpanel or Amplitude Migrating from a self-hosted PostHog instance to PostHog Cloud Migrating from one PostHog Cloud instance to another , for example US -> EU. Adding past data from a third-party source into PostHog Migrating historical data is free, but you must contact sales@posthog.com if you plan to migrate more than 20M events…

Read next article