Azure Blob Storage destination for batch exports
Contents
With batch exports, data can be exported to Azure Blob Storage.
The Azure Blob Storage destination is currently in beta. This means the configuration and features are subject to change.
Setting up Azure Blob Storage access
To set up a batch export to Azure Blob Storage, you'll need:
- An Azure Storage account with a blob storage container where PostHog can export data.
- A connection string to authenticate PostHog with your Azure Storage account.
Getting your connection string
To retrieve your connection string from the Azure Portal:
- Navigate to your Storage account.
- Go to Security + networking > Access keys.
- Copy the connection string from either the primary or secondary key.
Inbound IP addresses
We use a set of IP addresses to access your instance. To ensure this connector works, add these IPs to your inbound security rules:
| US | EU |
|---|---|
| 44.205.89.55 | 3.75.65.221 |
| 44.208.188.173 | 18.197.246.42 |
| 52.4.194.122 | 3.120.223.253 |
If your Azure Storage account has firewall rules enabled, you'll need to add these IP addresses to your allowlist. For more information on configuring Azure Storage firewall rules, see the Azure Storage network security documentation.
Models
Azure Blob Storage supports all the models mentioned in the batch export models reference.
You can view the schema for each model inside the batch export configuration in the UI.
Note: New fields may be added to these models over time. Therefore, it is recommended that any downstream processes are able to handle additional fields being added to the exported files.
Creating the batch export
- Click Data pipelines in the navigation and go to the Destinations tab.
- Click + New destination in the top-right corner.
- Search for Azure Blob Storage.
- Click the + Create button.
- Fill in the necessary configuration details.
- Finalize the creation by clicking on "Create".
- Done! The batch export will schedule its first run on the start of the next period.
Azure Blob Storage configuration
Configuring a batch export targeting Azure Blob Storage requires the following Azure-specific configuration values:
- Azure connection: Select or configure your Azure Blob Storage connection using your connection string.
- Container name: The name of the Azure Blob Storage container where the data is to be exported. The container must already exist and follow Azure's naming rules.
- Blob prefix (optional): A prefix to use for each blob created. This prefix can include template variables to organize your data.
- Format: Select a file format to use in the export. See the file formats section for details on which file formats are supported.
- Max file size (MiB) (optional): If the size of the exported data exceeds this value, the data is split into multiple files. (Note that this is approximate and the actual file size may be slightly larger). If this value is not set, or is set to 0, the data is exported as a single file.
- Compression: Select a compression method (like gzip, brotli, or zstd) to use for exported files or no compression. See the compression section for details on which compression methods are supported.
- Events to exclude: A list of events to omit from the exported data.
Blob prefix template variables
The blob prefix provided for data exporting can include template variables which are formatted at runtime. All template variables are defined between curly brackets (for example {day}). This allows you to partition files in your Azure Blob Storage container, such as by date.
Template variables include:
- Date and time variables:
yearmonthdayhourminutesecond
- Name of the table exported (for example, 'events' or 'persons')
table
- Batch export data bounds:
data_interval_startdata_interval_end
So, as an example, setting {year}-{month}-{day}_{table}/ as a blob prefix, will produce files prefixed with keys like 2023-07-28_events/.
File formats
PostHog Azure Blob Storage batch exports support two file formats for exporting data:
- JSON lines
- Apache Parquet (latest version of the format specification is the only one supported)
The batch export format is selected via a drop down menu when creating or editing an export.
Compression
Each file format supports a variety of compression methods. The compression method you choose can have a significant effect on the exported file size and the overall time taken to export the data. From our own internal testing, we would recommend using Parquet with zstd compression for the best combination of speed and file size.
The following compression methods are supported:
- Parquet: zstd, gzip, brotli, lz4, snappy
- JSONLines: gzip, brotli
Note on Parquet compression: The compression type is included in the file extension, even for Parquet files. For example, files compressed with zstd will have the extension
.parquet.zst, lz4 will be.parquet.lz4, and snappy will be.parquet.sz. Since compression is embedded in the format itself, the file should be read directly as a Parquet file and not uncompressed first.
Manifest file
If you specify a max file size in your configuration, several files may be exported. In order to know when the export is complete, we send a manifest.json file (with the same prefix as the other files) once all the data files have been exported. This manifest file contains the key names of all the files exported.