You create SQL queries out of a list of commands that shape what data, filters, aggregations, and ordering we want to see.
SQL queries in PostHog don’t require the trailing semi-colon (
;) of traditional SQL queries.
SELECT to select data (usually columns, transformations, or aggregations) from one or more tables in the database.
SELECT *FROM events
DISTINCT clause to
SELECT commands to keep only unique rows in query results.
SELECT DISTINCT person_idFROM events
FROM to select the database table to run the queries on. In PostHog, examples include
raw_session_replay_events, and more listed in the data management database tab in PostHog.
SELECT session_id, min_first_timestamp, click_countFROM raw_session_replay_events
You can join multiple tables together using the
LEFT JOIN command which takes one table before the command and another after the command and combines them based on the join condition using the
SELECT events.event, persons.is_identifiedFROM eventsLEFT JOIN persons ON events.person_id = persons.id
WHERE to filter rows based on specified conditions. These conditions can be:
- Comparison operators like
- Logical operators like
NOT. These are often used to combine multiple conditions.
- Functions like
- Clauses like
SELECT *FROM eventsWHERE event = '$pageview'AND toDate(timestamp) = today()AND properties.$current_url LIKE '%/blog%'
GROUP BY to group rows that have the same values in specified columns into summary rows.
selectproperties.$os,count()from eventsgroup byproperties.$os
ORDER BY to sort the query results by one or more columns. You can specify order by ascending with
ASC or descending with
SELECTproperties.$browser,count()FROM eventsGROUP BY properties.$browserORDER BY count() DESC
AS to alias columns or tables with different names. This makes the query and results more readable.
SELECTproperties.$current_url as current_url,count() as current_url_countFROM eventsGROUP BY current_urlORDER BY current_url_count DESC
LIMIT to restrict the number of rows returned by the query. It specifies the maximum number of rows the query should retrieve. By default, PostHog sets it at
SELECTproperties.$lib as library,count() as library_countFROM eventsWHERE properties.$lib != ''GROUP BY libraryORDER BY library_count DESCLIMIT 1
HAVING with the
GROUP BY command to filter the results based on aggregate function values. While
WHERE filters rows before grouping,
HAVING filters grouped results after aggregation.
SELECTproperties.$os,count()FROM eventsGROUP BYproperties.$osHAVING count() > 100
WITH to define a temporary result set that you can reference within a larger query. It helps break down complex queries into smaller parts. You can think of it as a function that returns a temporary table similar to using a subquery in a
FROM command. The difference is that we query
WITH subqueries each time they are used, potentially leading to slower queries.
with first_query as (selectcount() as first_countfrom events)selectfirst_countfrom first_query
HogQL includes many functions to aggregate and manipulate queried data. Below are some examples of some of the most popular SQL functions you can use in your insights.
These aggregate results for columns across all rows. They include:
count(): Counts the number of rows that match a certain condition.
avg(): Calculates the average numeric value of a column.
sum(): Calculates the total (sum) numeric value of a column.
min(): Finds the maximum or minimum value of a column.
SELECTavg(properties.$screen_height),sum(properties.$screen_height),max(properties.$screen_height),min(properties.$screen_height)FROM eventsWHERE event = '$pageview' AND properties.$screen_height IS NOT NULL
You can find a full list of these in supported aggregations.
HogQL provides many functions for accessing, modifying, and calculating data from queries. Along with the one’s listed below, many basics include calculation operators (
*), type conversions (
toString), conditional statements (
multiIf), and rounding (
You can find a full list of these in supported ClickHouse functions.
You can access nested data in JSON and objects directly.
select properties.$set.$geoip_country_namefrom events
You can parse JSON with
JSONExtractRaw() to return a value.
SELECTJSONExtractRaw(properties.$set) as set_propertiesFROM eventsWHERE properties.$set IS NOT NULL
JSONExtract functions exist for different data types including:
arrayElement(arr, n): Retrieves the element with the index of n from the array
arrayJoin(arr): Takes a row and generates multiple rows for the number of elements in the array. It copies all the column values, except the column where this function is applied. It replaces the applied column with the corresponding array value.
SELECT flag, count()FROM (SELECT arrayJoin(JSONExtractArrayRaw(assumeNotNull(properties.$active_feature_flags))) as flagFROM eventsWHERE event = '$pageview' and timestamp > '2023-08-01')GROUP BY flagORDER BY count() desc
Date and time
yesterday(): Returns the current time, date, or yesterday’s date respectively.
interval: A length of time for use in arithmetic operations with other dates and times.
toMinute: Converts date number of day of week (1-7), hour in 24-hour time (0-23), and minute in hour (0-59).
toStartOfMinute: rounds date down to the nearest year, month, Monday, day, hour, or minute respectively
dateDiff('unit', startdate, enddate): Returns the count in
formatDateTime: Formats a time according to a MySQL datetime format string.
SELECTformatDateTime(now(), '%a %b %T') AS current_time,toDayOfWeek(now()) AS current_day_of_week,dateDiff('day', timestamp, now()) AS days_since_eventFROM eventsWHERE timestamp > now() - interval 1 day
Read more examples in How to do time-based breakdowns (hour, minute, real time) and Using HogQL for advanced time and date filters.
extract: Extracts a fragment of a string using a regular expression.
concat: Concatenates strings listed without separator.
splitByWhitespace: splits a string into substring separated by a specified character, string, regular expression, or whitespace character respectively.
match: Return whether the string matches a regular expression pattern.
replaceRegexpOne: Replace the first occurrence of matching a substring or regular expression pattern respectively with a replacement string.
trim: Remove specified characters (or whitespace) from the start or end of a string.
selectextract(elements_chain, '[:|"]attr__class="(.*?)"') as class_name,concat(properties.$os, ' version: ', properties.$os_version),replaceRegexpOne(properties.$current_url, '^/', 'site/') AS modified_current_urlfrom eventswhere event = '$autocapture'
A sparkline is a tiny graph contained in one cell of your query result. As an argument, it takes an array of integers.
SELECT sparkline(range(1, 10)) FROM (SELECT 1)
You can use it to visualize queries, such as a 24-hour
$pageview count for different
SELECTpageview,sparkline(arrayMap(h -> countEqual(groupArray(hour), h), range(0,23))),count() as pageview_countFROM(SELECTproperties.$current_url as pageview,toHour(timestamp) AS hourFROMeventsWHEREtimestamp > now () - interval 1 dayand event = '$pageview') subqueryGROUP BYpageviewORDER BYpageview_count desc
You can also use it for art.
selectsparkline(arrayMap(a -> cos(toSecond(timestamp) + a/4), range(100 + 5 * toSecond(timestamp))))from events