Observability

What is the job to be done?

"Help me know when things break, understand why, and fix them fast."

  • Catch exceptions and regressions before users report them
  • See the user's actual experience when an error occurred, not just a stack trace
  • Understand the business impact of incidents (which users were affected, what revenue was at risk), not just the technical impact
  • Centralize log collection and search alongside error data
  • Triage incidents faster with natural language queries

This is where our roadmap is heading and where significant market opportunity exists. The long-term vision is a full observability stack that competes with Datadog and Sentry on their home turf, but with the massive advantage that our observability data is connected to product analytics data. No other vendor can tell you "this API endpoint is slow AND here's the business impact in terms of user drop-off and revenue loss."

Separating this from Release Engineering is important because the buyer is often different (SRE/platform team vs. product engineering), the competitive landscape is different (Datadog/Sentry vs. LaunchDarkly), and the expansion path is different.

What PostHog products are relevant?

  • Error Tracking (core) — Track exceptions, get alerts, resolve issues. Automatic grouping of similar errors, stack traces, affected user counts. The starting point for most Observability adoption.
  • Session Replay — User impact of errors, visual reproduction. When an error fires, click through to the user's session and see exactly what happened. This is the killer differentiation vs. Sentry: you don't just see the stack trace, you see the user's actual experience.
  • Product Analytics — Error correlation with user behavior and business impact. Answer "how many users hit this error?" and "did this error cause drop-off in our conversion funnel?" Connect technical incidents to business outcomes.
  • PostHog AI — Natural language incident triage. "What errors spiked in the last hour and which users were affected?" without writing a query. Faster mean time to understanding during incidents. (Example prompts)
  • Logging beta — Centralized log collection and search. Logs are table stakes for any observability stack. Having logs alongside errors, replays, and analytics means the full debugging context lives in one place.
  • Roadmap: APM, API tracing — Not shipped yet. When these arrive, the observability story becomes complete: errors + logs + traces + replay + analytics in one platform.

Adoption path and expansion path

Entry point

Usually Error Tracking. Team wants to catch exceptions and regressions. Common entry scenarios:

  1. Sentry replacement: They're paying for Sentry and want to consolidate into PostHog (which they're already using for analytics or flags). Error Tracking is the direct replacement.
  2. First observability tool: Early-stage company that hasn't invested in error tracking yet. PostHog's free tier (100K exceptions/month) lets them start without a new vendor relationship.
  3. Session Replay → Error Tracking: They're already using Session Replay for debugging and discover that errors surfaced in replays could be tracked systematically with Error Tracking.

Primary expansion path

Error Tracking → + Session Replay (error context) → + Logging → + Product Analytics (impact analysis)

The logic of each step:

  • Error Tracking → Session Replay: They can see the error and the stack trace. But they can't see what the user was doing when it happened. Session Replay lets them click from an error event directly to the user's session and watch the full context. This is the single most differentiated feature in our Observability story.
  • Session Replay → Logging: They're seeing errors and user sessions. They need the backend context: what was the server doing when this error fired? Centralized logs complete the debugging picture.
  • Logging → Product Analytics: They're debugging individual errors. Now they want to understand the aggregate impact: how many users are hitting this error? Is it correlated with a specific funnel step? Did this bug cause a revenue drop? Product Analytics connects technical incidents to business outcomes.

Future expansion (roadmap dependent)

As APM and tracing ship, the path extends: Logging → APM → Tracing, completing the full observability stack. Position this honestly: name the vision, be transparent about what's available today vs. what's coming.

Business impact of solving the problem

Observability data connected to product analytics is a moat. Every other observability tool (Datadog, Sentry, New Relic) can tell you "this endpoint threw an error." Only PostHog can tell you "this error affected 500 users, 30 of whom were in the middle of checkout, resulting in an estimated $15k in lost revenue this week." That's a fundamentally different conversation with engineering leadership.

Session Replay as error context is a killer feature. Sentry shows you a stack trace. PostHog shows you the user's actual experience. For frontend and full-stack debugging, this is dramatically faster for reproduction and resolution.

Consolidation play for accounts already using PostHog. If they're already on PostHog for analytics or flags, adding Error Tracking and Logging means one fewer vendor (Sentry, Datadog) to manage. The consolidation saves money and reduces context-switching.

This use case has the highest growth ceiling. The observability market is enormous (Datadog alone is $25B+). Our story gets stronger with every product we ship in this space.

Personas to target

PersonaRole ExamplesWhat They Care AboutHow They Evaluate
SRE / Platform EngineerSRE, Platform Eng, Infrastructure EngReliability, alerting, mean time to resolution, not getting paged at 3am"Will this catch issues before users report them? How fast can I triage?"
Backend EngineerBackend Eng, API Engineer, Server-side EngStack traces, log correlation, reproducing bugs efficiently"Can I see what happened on the server when this error fired?"
Product EngineerFull-stack Eng, Frontend EngUser-facing bugs, reproduction, understanding the user impact of errors"Can I see the user's session when this error happened?"
Engineering ManagerEM, VP Eng, Director of EngTeam velocity, incident metrics (MTTR, error rates), cost of observability tooling"How does this reduce our incident response time? What does it cost vs. Sentry/Datadog?"
Founder (early stage)CTO, first engineerCatching bugs before users complain, not paying Datadog prices"Does this work out of the box and is it affordable?"

Signals in Vitally & PostHog

Vitally indicators this use case is relevant

SignalWhere to Find ItWhat It Means
Error Tracking is active but low product countProduct spend breakdownThey've started with errors. Full Observability expansion path available.
Customer mentions Sentry or Datadog in notesVitally notes / conversationsCompetitive displacement opportunity. Consolidation pitch.
High Session Replay usage with error-related viewing patternsProduct usage dataThey're using replay for debugging already. Error Tracking formalizes this.
Engineering-heavy user base, no PM usersUser list in VitallyEngineering-first account. Observability and Release Engineering are the primary use cases.

PostHog usage signals

SignalHow to CheckWhat It Means
Error tracking exceptions growing week over weekProduct usage metricsThey're instrumenting more of their stack. Good adoption signal.
Session Replay filtered by error eventsReplay usage patternsThey're connecting replay to error debugging. The integration is clicking.
High error volume but no alerting configuredError tracking settingsThey're collecting errors but not acting on them. Help them set up alerts.
Product Analytics queries referencing error eventsSaved insightsThey're starting to connect errors to business impact. Encourage this.

Command of the Message

Discovery questions

  • When something breaks in production, how long does it take your team to find out? From users? From monitoring?
  • How do you currently track and prioritize errors? Do you have a tool for this, or is it ad hoc?
  • When you see an error, how do you reproduce it? How long does reproduction typically take?
  • Can you tell which users were affected by a specific error? Do you know the business impact?
  • What does your current observability stack look like? (Sentry? Datadog? New Relic? How many tools?)
  • How much are you spending on observability tooling today?
  • When an incident happens, how many tools does your team switch between to understand what happened?
  • Do you have centralized logging? Where do your logs live today?

Negative consequences (of not solving this)

  • Errors are discovered through user complaints, not proactive monitoring
  • Stack traces exist but reproduction is guesswork because there's no user session context
  • Engineering can quantify "how many errors" but not "how many users affected" or "what revenue was lost"
  • Multiple observability tools (Sentry for errors, Datadog for APM, separate logging) with no connection between them
  • Incident triage is slow because context is spread across 3+ tools
  • Observability costs are high and growing (Datadog pricing, Sentry pricing) without clear ROI

Desired state

  • Errors are caught proactively with alerts before users report them
  • Every error links to the user's actual session, so reproduction takes seconds, not hours
  • Error impact is measured in business terms: affected users, affected revenue, affected funnels
  • Errors, logs, and user sessions live in one platform alongside product analytics
  • Incident triage is faster because all context is in one place

Positive outcomes

  • Reduced MTTR (mean time to resolution) from instant session replay access on errors
  • Fewer user-reported bugs (proactive error detection + alerting)
  • Better incident prioritization based on business impact, not just error count
  • Observability cost reduction through consolidation (replace Sentry + separate logging)
  • Engineering leadership gets business-impact reporting on reliability, not just technical metrics

Success metrics

Customer-facing:

  • Mean time to resolution for user-facing bugs decreases
  • Percentage of errors caught proactively (via alerting) vs. user-reported increases
  • Error rate trends downward as bugs are prioritized by impact and fixed systematically

TAM-facing:

  • Error Tracking exception volume grows (instrumenting more of their stack)
  • Session Replay usage increases with error-filtered viewing (integration is working)
  • Logging adoption starts (filling out the observability stack)
  • Product Analytics queries reference error data (connecting to business impact)

Competitive positioning

Our positioning

  • Errors + replay in one platform. See the stack trace and the user's actual session. No other error tracking tool offers this depth of user context. Sentry shows you the error. PostHog shows you the experience.
  • Business impact, not just error count. Connect errors to user behavior and revenue with Product Analytics. "This error caused 200 users to abandon checkout" is a different conversation than "this error fired 500 times."
  • Consolidation for existing PostHog users. If they're already using PostHog for analytics or flags, adding Error Tracking means one fewer vendor. Same data platform, one less tool to manage.
  • Logging completes the picture. Errors, user sessions, and backend logs in one place. No switching between Sentry, Papertrail, and Amplitude to understand an incident.

Competitor quick reference

CompetitorWhat They DoOur AdvantageTheir Advantage
SentryError tracking, performance monitoring, session replayDeeper product analytics integration; business impact context; flag/experiment connection; better pricingMore mature error tracking features; broader language support; larger install base; dedicated performance monitoring
DatadogFull observability: APM, logs, metrics, errorsProduct analytics integration; session replay depth; much cheaperComplete observability stack (APM, traces, metrics); enterprise-grade; massive ecosystem
New RelicFull observability: APM, logs, errors, distributed tracingProduct analytics integration; session replay; simpler pricingComplete observability stack; mature enterprise features

Honest assessment: Our Observability story is credible but incomplete. Error Tracking + Session Replay + Logging is a meaningful starting point, and the connection to product analytics is genuinely differentiated. But we don't have APM or tracing yet. We can't position PostHog as a full Datadog replacement today. The honest pitch is: "For error tracking, we're better than Sentry because of the user context. For full observability, we're building toward it, and in the meantime, the product analytics connection gives you something no other observability tool offers." Be transparent about what's available today vs. what's on the roadmap.

Pain points & known limitations

Pain PointImpactWorkaround / Solution
No APM or tracing yetCan't replace Datadog for teams that need full backend observabilityBe honest about the roadmap. Position PostHog as complementary for now: errors + replay + analytics in PostHog, APM in their existing tool. The consolidation play gets stronger as we ship more.
Logging is betaTeams expecting production-grade centralized logging may find gapsSet expectations on maturity. For teams with existing logging (ELK, Papertrail), PostHog logging complements rather than replaces initially.
Error Tracking language/framework support may lag SentrySentry supports a very wide range of languages and frameworksCheck Error Tracking docs for current support. For unsupported frameworks, generic exception capture via the API may work.
No built-in on-call/incident managementTeams wanting PagerDuty-style incident workflows won't find it herePostHog alerts can trigger webhooks to PagerDuty, Slack, etc. Error Tracking is about detection and context, not incident management workflows.

Getting a customer started

What does an evaluation look like?

  • Scope: Enable Error Tracking on their primary application. Connect Session Replay to error events. Set up alerts for critical error spikes.
  • Timeline: 1 to 3 days to start capturing errors. 1 week to have meaningful error data and session replay context.
  • Success criteria: Can you see errors grouped by type with affected user counts? Can you click from an error to the user's session replay? Can you get alerted when a new error type spikes?
  • PostHog investment: Error Tracking free tier covers 100K exceptions/month. Session Replay free tier covers 5K recordings.
  • Key requirement: They need to integrate the PostHog SDK or connect their existing error capture to PostHog's Error Tracking. If they're already using PostHog's SDK, Error Tracking may just need to be enabled.

Onboarding checklist

  • Enable Error Tracking in the PostHog SDK configuration
  • Verify errors are being captured and grouped correctly
  • Connect Session Replay to error events (verify you can click from error → replay)
  • Set up alerts for critical error types or spike detection
  • Build an "Error Health" dashboard: error trends, top errors by affected users, error rate by release
  • Review the top 5 errors with the team, using session replay context to prioritize fixes
  • If applicable, enable Logging (beta) for backend log context
  • Create a Product Analytics query that correlates errors with funnel drop-off or business metrics

Cross-sell pathways from this use case

If Using...They Might Need...WhyConversation Starter
Error Tracking onlySession ReplayThey see stack traces but can't reproduce the user experience"You can see the error. Want to see exactly what the user was doing when it happened?"
Error Tracking + Session ReplayLoggingThey have frontend error context but need backend logs"You can see the user's session. But what was happening on the server at the same time?"
Error Tracking + analytics correlationProduct Intelligence (for the product team)They're connecting errors to user impact. The product team would benefit from the same analytics."You're measuring error impact on users. Has your product team seen what they can do with funnels and retention in the same platform?"
Error Tracking (engineering in PostHog)Release Engineering (same engineering team)Engineering is in PostHog for errors. Feature flags for safe releases is a natural add."You're tracking errors after releases. What if you could gate features behind flags and roll back without a deploy?"
Error Tracking for AI featuresAI/LLM ObservabilityTraditional error tracking misses AI quality regressions"You're catching exceptions, but are you catching when your model starts giving worse answers? That's a different kind of 'error.'"

Internal resources

Appendix: Company archetype considerations

Archetype + StageFramingKey ProductsBuyer
AI Native — Early"You're shipping fast and breaking things. PostHog catches errors and shows you the user's experience when they hit a bug. No Sentry bill required." Error Tracking + Session Replay is the sweet spot.Error Tracking, Session ReplayCTO, founding engineer
AI Native — Scaled"Your AI features have failure modes that traditional error tracking misses: hallucinations, slow responses, quality regressions. PostHog catches the technical errors AND lets you evaluate output quality." Bridge to AI/LLM Observability.Error Tracking, Session Replay, Logging, AI EvalsVP Eng, Platform Lead, SRE
Cloud Native — Early"Stop finding bugs from user complaints. Error Tracking catches exceptions automatically, and Session Replay lets you see exactly what happened. 100K exceptions/month free."Error Tracking, Session ReplayCTO, founding engineer
Cloud Native — Scaled"Your team is juggling Sentry, Papertrail, and Datadog. PostHog consolidates error tracking, logging, and user context into the platform you already use for analytics." Consolidation pitch.Error Tracking, Session Replay, Logging, Product AnalyticsVP Eng, SRE Lead, Platform team
Cloud Native — Enterprise"Multiple teams, multiple services, and incident context spread across 5 tools. PostHog gives you errors + logs + user sessions + business impact in one platform. No more switching between Sentry, Datadog, and Amplitude during an incident."Full Observability stack + Enterprise packageVP Eng, Director of SRE, Platform leadership

Community questions

Was this page useful?

Questions about this page? or post a community question.