Session replay architecture
Contents
Session recording architecture: ingestion → processing → serving
1. Capture (client-side)
PostHog-JS uses rrweb (record and replay the web) to:
- Serialize DOM into JSON snapshots
- Capture full snapshots (complete DOM state) + incremental snapshots (mutations/interactions)
- Track clicks, keypresses, mouse activity, console logs, network requests
- Batch events into
$snapshot_itemsarrays with a$session_id(UUIDv7) - Send to
/s/(replay capture endpoint) via$snapshotevents
Events include metadata: $window_id, $session_id, $snapshot_source (Web/Mobile), timestamps, distinct_id
2. Ingestion pipeline
Phase 1: Rust capture service (recordings mode)
rust/capture/src/router.rs:235 and rust/capture/src/v0_endpoint.rs:342
- Separate capture service instance running with
CAPTURE_MODE=recordings - Receives POST to
/s/endpoint (routed torecordinghandler) - Validates session_id (rejects IDs >70 chars or with non-alphanumeric characters except hyphens)
- Calls
process_replay_eventsto transform events into$snapshot_itemsformat - Publishes to
session_recording_snapshot_item_eventsKafka topic - Or
session_recording_snapshot_item_overflowif session is billing-limited (checked via Redis) OR if session id is present in Redis key @posthog/capture-overflow/replay (operational/load management)
Kafka sink (rust/capture/src/sinks/kafka.rs):
- Produces to primary topic:
KAFKA_SESSION_RECORDING_SNAPSHOT_ITEM_EVENTS - Overflow topic:
KAFKA_SESSION_RECORDING_SNAPSHOT_ITEM_OVERFLOW
Phase 2: Blob ingestion consumer (Node.js/TypeScript)
plugin-server/src/main/ingestion-queues/session-recording-v2/
SessionRecordingIngester consumes from Kafka and:
- Parses gzipped/JSON messages (
kafka/message-parser.ts) - Batches by session via SessionBatchRecorder
- Buffers events in memory per session using SnappySessionRecorder:
- Accumulates events as newline-delimited JSON:
[windowId, event]\n[windowId, event]\n... - Tracks metadata: click_count, keypress_count, URLs, console logs, active_milliseconds
- Compresses each session block with Snappy
- Flushes periodically (max 10 seconds buffer age or 100 MB buffer size)
Persistence (sessions/s3-session-batch-writer.ts):
- Writes to S3 as multipart uploads
- File structure:
{prefix}/{timestamp}-{suffix} - Each batch file contains multiple compressed session blocks
- Uses byte-range URLs:
s3://bucket/key?range=bytes=start-end - Retention-aware: writes to different S3 paths based on team retention period
Metadata written to ClickHouse via Kafka:
- Produces to
clickhouse_session_replay_eventstopic - Table:
session_replay_events(AggregatingMergeTree, sharded) - Stores: session_id, team_id, distinct_id, timestamps, URLs, counts (clicks/keypresses/console), block locations, retention_period_days
- Old format also used:
session_recording_events(deprecated, contains raw snapshot_data)
3. Storage schema
ClickHouse tables
session_replay_events (primary, v2):
session_recording_events (legacy):
- Stored raw
snapshot_datadirectly in ClickHouse - Deprecated, no usage (maybe except very old, long-lived hobby installs but unlikely and totally unsupported)
PostgreSQL
PostgreSQL writes happen when:
- User pins to playlist → Immediate write
- User requests persistence → Immediate write + background LTS copy task
- Auto-trigger on save → Background LTS copy task (via post_save signal)
- Periodic sweep → Finds recordings 24hrs-90days old without LTS path, queues background tasks
Note: Regular session recordings (not pinned/persisted) do NOT write to PostgreSQL - they only exist in ClickHouse session_replay_events table until explicitly pinned or persisted as LTS.
posthog_sessionrecording model:
- session_id (unique), team_id
- object_storage_path (for LTS recordings)
- full_recording_v2_path
- Metadata: duration, active_seconds, click_count, start_time, end_time, distinct_id
- Used for persisted/LTS recordings
S3 object storage
- Main storage:
session_recordings/{team_id}/{session_id}/... - Blob ingestion: organized by retention period + timestamp
- Files are byte-addressable compressed session blocks
4. Playback/Retrieval
API Flow (posthog/session_recordings/session_recording_api.py)
GET /api/projects/:id/session_recordings/:session_id/:
- Loads metadata from ClickHouse
session_replay_eventsor Postgres - Returns: duration, start_time, person info, viewed status
GET /api/projects/:id/session_recordings/:session_id/snapshots:
Two-phase fetch:
- Phase 1: Returns available sources:
["blob"]or["blob", "realtime"]- note: "realtime" source is no longer used for blob-ingested recordings (possibly used only for Hobby)
- Phase 2: Client requests
?source=blob
Source resolution:
- Blob (primary): Queries ClickHouse for block metadata
- Gets S3 URLs with byte ranges for each session block
- Generates pre-signed URLs (60s expiry)
- Client fetches compressed blocks directly from S3
- Legacy: Recordings were originally stored in ClickHouse
session_recording_eventstable (migrated away in 2024)
Query (queries/session_replay_events.py):
Returns block listing:
Frontend playback
frontend/src/scenes/session-recordings/player/
- sessionRecordingPlayerLogic fetches snapshot sources (only blob_v2 now, except for hobby)
- For each snapshot source fetches snapshots
- Decompresses Snappy blocks
- Parses JSONL:
[windowId, event]per line - Feeds to rrweb-player for DOM reconstruction
- Renders in iframe with timeline controls
Metadata (playerMetaLogic.tsx):
- Shows person info, properties, events, console logs
- Queries events from
eventstable filtered by session_id
Key optimizations
- Compression: Snappy for session blocks
- Byte-range fetching: Only fetch needed time ranges from S3
- Pre-signed URLs: Direct client→S3 download, no proxying
- Buffering: 10 second batches reduce S3 write ops
- Sharding: ClickHouse sharded by distinct_id
- TTL: Automatic expiry based on retention_period_days
- Overflow handling: Separate Kafka topic + limiter for billing control