Hog

Last updated:

|Edit this page

Hog is the coolest programming language in the world (we're biased).

It is being used to build our CDP product, which you can follow along with in GitHub.

Note: Hog shouldn't be confused with HogQL, our SQL-like query language used inside PostHog. If you're looking to query data in PostHog, see those docs.

Quickstart

Hog was designed to be syntax-compatible with SQL / HogQL. This means all HogQL expressions work as Hog code.

This compatibility imposes a few key differences comparted to other programming languages:

  • Variable assignment in Hog is done with the := operator, as = and == are both used for equality comparisons in SQL
  • You must type out and, or and not. Currently && and ! raise syntax errors, whereas || is used as the string concatenation operator.
  • All arrays in Hog start from index 1. Yes, for real. Trust us, we know. However that's how SQL has always worked, so we adopted it.
  • The easiest way to debug your code is to print() the variables in question, and then check the logs.
  • Strings must always be written with 'single quotes'. You may use f-string templates like f'Hello {name}'.

Read more about the story behind Hog in Why hog? below.

Syntax

Comments

Hog comments start with //. You can also use SQL style comments with -- or C++ style multi line blocks with /*.

// Hog comments start with //
-- You can also use SQL style comments with --
/* or C++ style multi line
blocks */

Variables

Use := to assign a value to a variable because = is just equals in SQL and HogQL.

Rust
// assign 12 to myVar
let myVar := 12
myVar := 13
myVar := myVar + 1

Comparisons

On top of standard comparisons, like, ilike, not like, and not ilike work.

Rust
let myVar := 12
print(myVar = 12 or myVar < 10) // prints true
print(myVar < 12 and myVar > 12) // prints false
let string := 'mystring'
print(string ilike '%str%') // prints true

Regex

Compares strings against regex patterns. =~ matches exactly, =~* matches case insensitively, !~ does not match, and !~* does not match case insensitively.

Rust
print('string' =~ 'i.g$') // true
print('string' !~ 'i.g$') // false
print('string' =~* 'I.G$') // true, case insensitive
print('string' !~* 'I.G$') // false, case insensitive

Arrays

Supports both dot notation and bracket notation.

Arrays in Hog (and HogQL) are 1-indexed!

Rust
let myArray := [1,2,3]
print(myArray.2) // prints 2
print(myArray[2]) // prints 2

Tuples

Supports both dot notation and bracket notation.

Tuples in Hog (and HogQL) are 1-indexed!

Rust
let myTuple := (1,2,3)
print(myTuple.2) // prints 2
print(myTuple[2]) // prints 2

Objects

You must use single quotes for object keys and values.

Rust
let myObject := {'key': 'value'}
print(myObject.key) // prints 'value'
print(myObject['key']) // prints 'value'
print(myObject?.this?.is?.not?.found) // prints 'null'
print(myObject?.['this']?.['is']?.not?.found) // prints 'null'

Strings

Strings must always start end end with a single quote. Includes f-string support.

Rust
let str := 'string'
print(str || ' world') // prints 'string world', SQL concat
print(f'hello {str}') // prints 'hello string'
print(f'hello {f'{str} world'}') // prints 'hello string world'

Functions and lambdas

Functions are first class variables, just like in JavaScript. You can define them with fun, or inline as lambdas:

Rust
fun addNumbers(num1, num2) {
let newNum := num1 + num2
return newNum
}
print(addNumbers(1, 2))
let square := (a) -> a * a
print(square(4))

See Hog's standard library for a list of built-in functions.

Logic

Rust
let a := 3
if (a > 0) {
print('yes')
}

Ternary operations

Rust
print(a < 2 ? 'small' : 'big')

Nulls

Rust
let a := null
print(a ?? 'is null') // prints 'is null'

While loop

Rust
let i := 0
while(i < 3) {
print(i) // prints 0, 1, 2
i := i + 1
}

For loop

Rust
for(let i := 0; i < 3; i := i + 1) {
print(i) // prints 0, 1, 2
}

For-in loop

Rust
let arr = ['banana', 'tomato', 'potato']
for (let food in arr) {
print(food)
}
let obj = {'banana': 3, 'tomato': 5, 'potato': 6}
for (let food, value in arr) {
print(food, value)
}

Hog's standard library

Hog's standard library includes the following functions and will expand. To see the the most update-to-date list, check the Python VM's stl/__init__.py file.

Type conversion

  • toString(arg: any): string
  • toUUID(arg: any): UUID
  • toInt(arg: any): integer
  • toFloat(arg: any): float
  • toDate(arg: string | integer): Date
  • toDateTime(arg: string | integer): DateTime
  • tuple(...args: any[]): tuple
  • typeof(arg: any): string

Comparisons

  • ifNull(value: any, alternative: any)

String functions

  • print(...args: any[])
  • concat(...args: string[]): string
  • match(arg: string, regex: string): boolean
  • length(arg: string): integer
  • empty(arg: string): boolean
  • notEmpty(arg: string): boolean
  • lower(arg: string): string
  • upper(arg: string): string
  • reverse(arg: string): string
  • trim(arg: string, char?: string): string
  • trimLeft(arg: string, char?: string): string
  • trimRight(arg: string, char?: string): string
  • splitByString(separator: string, str: string, maxParts?: integer): string[]
  • jsonParse(arg: string): any
  • jsonStringify(arg: object, indent = 0): string
  • base64Encode(arg: string): string
  • base64Decode(arg: string): string
  • tryBase64Decode(arg: string): string
  • encodeURLComponent(arg: string): string
  • decodeURLComponent(arg: string): string
  • replaceOne(arg: string, needle: string, replacement: string): string
  • replaceAll(arg: string, needle: string, replacement: string): string
  • generateUUIDv4(): string
  • position(haystack: string, needle: string): integer
  • positionCaseInsensitive(haystack: string, needle: string): integer

Objects and arrays

  • length(arg: any[] | object): integer
  • empty(arg: any[] | object): boolean
  • notEmpty(arg: any[] | object): boolean
  • keys(arg: any[] | object): string[]
  • vaues(arg: any[] | object): string[]
  • indexOf(array: any[], elem: any): integer
  • has(array: any[], element: any)
  • arrayPushBack(arr: any[], value: any): any[]
  • arrayPushFront(arr: any[], value: any): any[]
  • arrayPopBack(arr: any[]): any[]
  • arrayPopFront(arr: any[]): any[]
  • arraySort(arr: any[]): any[]
  • arrayReverse(arr: any[]): any[]
  • arrayReverseSort(arr: any[]): any[]
  • arrayStringConcat(arr: any[], separator?: string): string
  • arrayMap(callback: (arg: any): any, array: any[]): any[]
  • arrayFilter(callback: (arg: any): boolean, array: any[]): any[]
  • arrayExists(callback: (arg: any): boolean, array: any[]): boolean
  • arrayCount(callback: (arg: any): boolean, array: any[]): integer

Date functions

  • now(): DateTime
  • toUnixTimestamp(input: DateTime | Date | string, zone?: string): float
  • fromUnixTimestamp(input: number): DateTime
  • toUnixTimestampMilli(input: DateTime | Date | string, zone?: string): float
  • fromUnixTimestampMilli(input: integer | float): DateTime
  • toTimeZone(input: DateTime, zone: string): DateTime | Date
  • toDate(input: string | integer | float): Date
  • toDateTime(input: string | integer | float, zone?: string): DateTime
  • formatDateTime(input: DateTime, format: string, zone?: string): string - we use use the ClickHouse formatDateTime syntax.
  • toInt(arg: any): integer - Converts arg to a 64-bit integer. Converts Dates into days from epoch, and DateTimes into seconds from epoch
  • toFloat(arg: any): float - Converts arg to a 64-bit float. Converts Dates into days from epoch, and DateTimes into seconds from epoch
  • toDate(arg: string | integer): Date - arg must be a string YYYY-MM-DD or a Unix timestamp in seconds
  • toDateTime(arg: string | integer): DateTime - arg must be an ISO timestamp string or a Unix timestamp in seconds

Cryptographic functions

  • md5Hex(arg: string): string
  • sha256Hex(arg: string): string
  • sha256HmacChainHex(arg: string[]): string

Why Hog?

Truth be told, we didn't set out to build our own programming language. We actually wanted to build the HogVM: a way to run HogQL expressions synchronously within our NodeJS ingestion pipeline. However, things escalated quickly.

HogVM

The HogVM is a stack-based bytecode virtual machine (VM). It was built to power event filtering in our real-time delivery system.

It's paired with a compiler that converts HogQL expressions like properties.$screen_width < properties.$screen_height into bytecode like ["_H", 1, 32, "$screen_height", 32, "properties", 1, 2, 32, "$screen_width", 32, "properties", 1, 2, 15, 35]. The HogVM then executes that bytecode.

The HogVM had a few design constraints:

  • Virtually no startup time (thousands of different expressions must run against millions of events every minute).
  • Fast enough to filter the firehose of our event stream without drowning us in hosting costs.
  • Runtime flexibility: the ability to embed Hog in many target languages like Python, NodeJS and the browser.
  • Small surface area that makes it easy maintain multiple implementations.
  • Secure enough to run user provided code.
  • Syntax compatible with HogQL.

Effectively, we wanted to run HogQL synchronously within a NodeJS process, at scale. So that's what we built, twice.

We now have two identical HogVMs: one in Python (to run alongside queries, unreleased), and one in TypeScript (to run in the pipeline, and in the browser). We might even build a third version in Rust soon. The surface area for the VM is really that small.

Async HogVM

We chose NodeJS for our old ingestion pipeline because we wanted to give users the ability to write their own custom JavaScript functions. That was a mistake. We never released this on Cloud because it was too risky:

  1. There's no way to run untrusted JavaScript without heavy sandboxing (our approach with VM2 fell short).
  2. User provided JavaScript is unpredictable and leaky (unfinished promises, data in globals, etc).
  3. Functions must run until the end. We can't pause or resume and must set short global runtime limits.

The "aha" moment came when we realized we could build a real programming language on top of HogVM to get around these limitations.

The key insight was this: we can pause and resume our own VM at any point, including right before fetch() calls. When a script calls fetch, we serialize the entire VM state (stack, bytecode, metadata), pass all of that onto the "fetch service" (via a queue like Kafka), push the response back onto the VM's stack, and resume execution on what might be a completely different machine. It doesn't matter if the fetch was called in a for loop or in an extremely recursive call stack, we can pause and resume the language at will. It's magic.

During the MyKonos hackathon in May we built the first version of Hog, and then productized it over the next 5 months as part of our new CDP product.

So that's why we built Hog: to run multi-hop user-provided async scripts in a safe, predictable, and scalable way.

Future Hog

These are the early days for Hog and the HogVM. We will be evolving the language and the experience based on user feedback, so please let us know what you think. We're excited to see what you're going to build with it.

Running Hog locally

To run Hog, first, you need to clone and set up PostHog locally. The repo has VMs to run the source code and complied bytecode as well as example files. The default VM relies on PostHog's Python dependencies, but we also have a Typescript VM that relies on those dependencies.

Once you have PostHog set up, go into the repo and run bin/hog with a .hog file.

Terminal
cd posthog
bin/hog hogvm/__tests__/mandelbrot.hog

You can add the --debug flag to step through and see the stack trace.

Compiling Hog

You can compile a .hog file to a .hoge executable with bin/hoge.

Terminal
bin/hoge hogvm/__tests__/mandelbrot.hog

You can then run the complied .hoge file automatically with bin/hog.

Terminal
bin/hog hogvm/__tests__/mandelbrot.hoge

Questions?

Was this page useful?

Next article

Notebooks

Notebooks enable you to combine the tools of PostHog into a single space for analysis. Rather than jumping between dashboards, insights, replays, experiment results, and more, your chain of analysis exists on a single page. It provides a flexible interface for organizing ad-hoc analysis, bug investigations, internal documentation, and feature releases by yourself or with your team. You can get started with notebooks by going to the notebook tab in your PostHog instance and clicking "New…

Read next article