Infrastructure Team

Infrastructure Team

People

Roadmap

Here’s what we’re considering building next. Vote for your favorites or share a new idea on GitHub.

Recently shipped

In-app status indicator added

We try our best to make PostHog as stable and reliable as possible but, to quote our handbook, "Incidents are going to happen."

When they do, we want to be as transparent and communicative as possible - that's why we've added a in-app status indicator to the overflow menu in the sidebar. You can click it to go to our full status page for more information.

We hope you'll never need to check on the system status, and that the icon will never be anything other than green, but...well, the handbook is usually right.

Check our status

Goals

💪 Deploy with confidence (follow up from Q1)

Our deploy speed keeps us moving fast but bigger changes would benefit from better tooling to gradually roll out, validate and roll back if necessary.

  • Support new rust capture to full release using our new ingress system
  • Finalize our canary deploy process

🚨 Improved alerting and monitoring

We have a pretty solid alerting and monitoring solution but there is always room for improvement. There is as much here about scaling to our number of products and teams as there is technical scaling.

  • Improve process around planning and detecting gaps in our alerting
  • Improve capacity planning (process as well as implementation)
  • Alerting on reverse proxy solutions
  • Make the internal tooling around creating alerts to be more opinionated
  • Swap to a more scalable solution for log aggregation

🔒 Deeper Security

Security is a never ending journey. We want to do some work to make sure we are ahead of the curve.

  • Extend secret management tooling to more areas
  • Improved logging and auditing

💰 Continued cost control

  • Focus on our biggest cost centers where we can make the biggest impact

Handbook

Slack channel

#team-infrastructure

How we work?

Guidelines

  • We work as teams on one goal/project - not having a single person alone working on a goal
  • The board should be our source of truth
  • We document what we do to share context internally
  • We finish what we start, or we don't start it at all
  • We continually prioritize
  • We prioritize unblocking others
  • We have an agenda and follow up on actions from our meetings
  • Be frugal

Standups

We have a Platform wide standup every Monday, Wednesday, and Friday. Standups are an opportunity for us to discuss what we are working on, feedback, and topics we may want other people's opinions on. It is also an important forum to announce that you are blocked or ask for help. Everyone should try to make standups but feel free to drop off if what is being talked about isn't relevant or valuable to you.

Engineering Planning

We plan our work using a two week sprint with sprint planning and retro meetings on the wednesday before the start of the next print. We primarily use the Platform project board to communicate what we are working on for the sprint, what is blocked, in review, done, and what we are planning on doing next.

Sprint Planning

Sprint planning happens every other week the Wednesday before the start of the new sprint and is PostHog engineering wide. We first break out into breakout rooms for Infrastructure and Ingestion and in each we determine what the goals will be for the upcoming sprint. After that we join back into the engineering wide sprint planning we pitch the goals to the entire engineering team, looking for feedback. This is a great opportunity for other teams within the company to raise concerns about things they may be blocked on.

Sub-teams are fluid so members may change from sprint to sprint.

Retro & Team Planning

After the Eng Sprint Planning meeting the Infrastructure and Ingestion teams will meet up to retro the previous sprint. This is also where we game plan the next sprint in terms of what tasks need to be done to accomplish to goals set in the Engineering Sprint Planning.

While planning we make sure that the teams that we have settled on for Ingestion and Infrastructure have more than one person working on a goal or project in the same timezone. We want to reduce the number of lone wolfs and encourage people to work together and spread context. There are a few benefits to this including shared context, quicker PR approvals, easy rubber ducking, and more trust and camaraderie on the team. 🌞

Project Boards

We use Github Project boards to organize what work needs to get done for a certain project. During a sprint we may not get an entire project done, but we should set our goals relative to milestones measurable in the project boards.

For example if there is a project that is to re-partition 100 tables, goals set for a sprint could look like:

  • Migrate 50 of the 100 tables
  • Migration framework is production ready
  • We have migrated 10% of customers with all 100 tables

The projects can be viewed as epics if that is what you are used to.

Team Kanban Board

We use a Kanban style board for each team (Infrastructure and Ingestion) to show what we are working on, planning on working on, blocked on, what is in review, and what is done. This provides context on what the operational priorities are for the sprint and what work people can pick up if they have a few extra cycles or are looking for what the most impactful task is at any one moment for them and their sub-team. It's up to the sub-teams to decide what tasks are on deck for each sprint based on the goals that were set for the sprint. We try to keep this as up to date as possible and assign ourselves as owners so that there isn't duplicate work done on the same task and if there is a question about a task we know who to ask.

We also tag the tasks that we set aside for the sprint with the sprint number/name so that we can filter out what is on the board for a quick view of how we are making progress against the sprint goals.

The board also acts as a source of truth for other teams to have a quick check in on the progress of tasks for the sprint, especially if they will be the primary consumers of the product of the task. This works within the team as well.

Slack channel

#team-infrastructure