Building an open source data stack
Oct 19, 2022
At PostHog, we believe an open source approach doesn’t just lead to greater growth; it also leads to better products. That’s what inspired us to make our platform open source, to adopt a transparent company culture, and also why we try to use open source software wherever we can in our stack.
We think the open source approach is best because it forces teams to be transparent, both in their decision making and also in their implementation. Then, because other teams have full visibility of the code, they can build on it and make it even better, faster and stronger. Often, they can do this for free.
Of course, PostHog isn’t the only open-source analytics platform out there. In fact, there’s such variety that it’s possible to build an entirely open-source stack — and here are some of our favorite open source alternatives for engineers.
Not an engineer? Find other open source alternatives on the PostHog blog!
- Useful for: Product analytics, session recording, feature flags
- Alternative to: Amplitude, Mixpanel, Matomo
- License: MIT
Of course, PostHog may not be the only open source software useful to engineers, but it’s our (completely biased) favorite.
PostHog’s all-in-one product analytics platform is a direct alternative to expensive, proprietary tools such as Amplitude, Mixpanel or Heap. It’s entirely self-serve, can be self-hosted or deployed in the cloud, and offers everything from funnel analytics and path analysis to cohort creation and user tracking. Best of all, because PostHog can be deployed on-prem, it’s more suitable for teams worried about GDPR compliance and HIPAA compliance.
Unlike proprietary platforms such as Amplitude however, PostHog offers far more than just core analytics. Features such as session recording, feature flags and experimentation mean it can also act as an open-source alternative to LaunchDarkly, HotJar, VWO and more. This means PostHog isn’t just useful to engineers, but also to product managers and other teams which work closely with engineering.
- All-in-one product analytics
- Individual and group analytics
- Deploy to the cloud, or on-prem
- Unlimited feature flags
- Built-in session recording
- Multivariate experimentation suite
- Third-party apps to enrich and move data
- Useful for: Data visualization, business intelligence
- Alternative to: Looker, Tableau, PowerBI
- License: AGPL
We use Metabase for visualizing data in different ways and running complex business intelligence – it's powerful alternative to Looker, Tableua and PowerBI.
What makes Metabase so essential is that, like PostHog, you can accomplish a huge amount without needing to resort to SQL. Instead, you can create BI dashboards in just a few minutes without ever needing to write a line of code. Afterwards, these dashboards can be shared anywhere — internally, or externally.
- Easy to use, no SQL required
- Interactive, drag and drop dashboards
- Deploy to the cloud, or on-prem
- Integrate with 20+ data sources
- Useful for: CDP, Data pipeline
- Alternative to: Segment, Tealium
- License: AGPLv3
RudderStack has two things in common with PostHog. Firstly, it can act as a data pipeline to funnel ingested events to other sources, such as a data warehouse or third-party platform. Secondly, you need two capital letters to spell it correctly.
Unlike PostHog however, RudderStack is completely focused on acting as a data pipeline. It can ingest data from more than 20 different sources, perform transformations in real-time and even perform version control operations via GitHub actions.
RudderStack is perfect for teams which are concerned with regulatory compliance, as it can mask PII and filter out sensitive events with ease. This has helped it become wildly popular with users such as Stripe, Hinge and Allbirds to name a few.
- Transport data to anywhere
- Transform data in real-time
- Pre-defined schemas for data warehouses
- SDKs track anonymous users and update downstream tools
- Useful for: Database collaboration
- Alternative to: Airtable, Postgres
- License: AGPLv3
NocoDB isn’t just an alternative to database tools like Airtable — it can also act as a tool which sits on top of platforms like Airtable, converting complex databases into ‘smart spreadsheets’.
Why would you want databases transformed into spreadsheets? Because that makes it easier to collaborate with others, especially if you’re collaborating with non-engineers. That’s why NocoDB is a no-code platform, with tools which make it easy to share spreadsheets with others (or not, via password protection).
If you are comfortable with code however, NocoDB has a few advanced features up its sleeve. Like other open-source solutions, NocoDB can be self-hosted easily, or extended further through REST APIs or an SDK.
- No-code, built for collaboration
- Strong permissioning options
- Free and self-hostable
- Useful for: Workflow automation, connecting services
- Alternative to: Zapier, node-red
- License: Sustainable use license
When you need to automate work, move data between platforms, or create basic bots there are generally two options: you can create a custom solution which you’ll then need to maintain and manage… or you can use n8n.
With over 200 different integrations — or ‘nodes’, as n8n calls them — to choose from, n8n offers the best of both worlds. It’s simple enough that stakeholders can maintain workflows on a casual basis using a drag and drop UI, but powerful enough that it lets you delve into the code for more complex work.
As with most open-source solutions, n8n can be self-hosted if you need to keep sensitive workflows or data off the cloud. Best of all, n8n even integrates directly with PostHog!
- 200+ integrations with other platforms
- Deploy in the cloud, or on your own infrastructure
- Integrates with PostHog -Simple UI for casual users; code editor for engineers
- Useful for: System monitoring
- Alternative to: Datadog, Newrelic
- License: GPL 3.0
Compatible with almost any physical or virtual server, Netdata’s open-source agent enables you to collect and visualize any available metric in real time. This means you can effortlessly track over 2,000+ metrics in graphs, as well as configure 200+ alerts to notify you when something goes awry.
Going beyond this, Netdata also offers more advanced anomaly detection issues for when you really need to get into the weeds, all while making a minimal resource footprint. It’s just one of the reasons why we’re big fans of Netdata — and not just because they use PostHog themselves!
- 1ms collection-to-visualization latency
- Single-line auto-deployment
- Anomaly detection powered by machine learning
- Useful for: Database management, powering PostHog
- Alternative to: Redshift, Bigquery, Snowflake
- License: Apache 2.0
ClickHouse is a database system which is so quick that it’s an order of magnitude faster than other systems, such as Postgres, while also featuring a columnar structure which offers easy scaling (as long as you’re managing your tables correctly). It’s also popular with early-stage projects because of how efficient it is in terms of system resources, meaning that you can use less costly hardware and avoid more expensive software options.
- Much faster than most other databases
- Efficient use of hardware; scales horizontally
- Columnar database structure; easily scales
- Deployed with a single binary; no need for multiple layers
Ready to find out more?