Use incremental or append only sync
Full table syncs are the most expensive way to sync data into the data warehouse, but aren't always necessary.
Both incremental and append only syncs reduce the number of rows synced as both account for existing data.
Sync less frequently
Not all synced data needs to be up to date to be useful. Think about what sort of data freshness you need and change the sync frequency to match.
PostHog offers sync options from every 5 minutes to every 12 hours to monthly. For incremental and full table syncs, decreasing the sync frequency can reduce the number of rows synced.
Audit and disable unused tables
Sources and tables continue to sync even if you aren't actively using them. This can cause unnecessary costs. You should regularly audit your sources and tables to see if you can disable ones you aren't using.
Beyond just looking at the source synced, you can use PostHog's query_log
table. This contains details on all queries run in your project. You can use this to see what tables are (or are not) being queried and how often.
For example, you can use the following query to identify potential unused or underused tables:
WITH cleaned AS (SELECTquery,-- Normalize SQL for easier parsingreplaceRegexpAll(replaceRegexpAll(replaceRegexpAll(lowerUTF8(query), '--[^\\n]*', ' '), '/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/', ' '), '\\s+', ' ') AS qFROM query_log),tail AS (SELECTquery,-- Take everything after the *last* " from "arrayElement(splitByString(' from ', q),length(splitByString(' from ', q))) AS after_fromFROM cleaned),tokens AS (SELECTquery,-- Extract the first token after FROMtrim(BOTH ' ' FROM replaceAll(replaceAll(arrayElement(splitByChar(',', arrayElement(splitByChar('(', arrayElement(splitByChar(' ', after_from), 1)), 1)), 1), '`',''), '"','')) AS tokFROM tail)SELECTtok AS table_after_from,count() AS hitsFROM tokensWHERE tok != ''GROUP BY table_after_fromORDER BY hits ASC
This query fetches your most queried tables by parsing the FROM
clause from every query. It's a bit complex, we know.


It effectively lists how often each table is referenced. This means it also includes views, subqueries, and other queries that reference tables, so it's not a perfect one-to-one-map, but it might be able to help you identify unused or underused tables and disable syncs for them.