Pain in the PaaS

Platform teams are born out of perceived necessity. A start-up enters hyper-growth mode and the company’s technical leadership wants a mechanism to assure technical coherence and developer velocity. An established company, aware of the burden of its existing technical debt, decides a platform team will provide a future-proofed foundation for a shift to the Brave New World of Microservices and DDD. The exact reasons will vary, but there are a few common themes in platform teams' raisons d’ĂȘtre. Developer velocity – the notion that developers are slowed down when they concern themselves with how or where their code runs. Guardrails – determining and implementing constraints on what development teams must and must not do when running their code. Platform teams are also seen as a “force multiplier”/mechanism to help scale teams in size and quantity while minimising the usual friction of doing so.

You don’t need a platform team when you’ve already got a platform. If you’re building your software in AWS / GCP / Azure / similar, then you already have a world-class platform team (or, in GCP’s case, theoretically world-class). When you introduce a platform on top of a cloud platform, you’re introducing an abstraction layer on top of an abstraction layer.

Engineering teams' velocity is a function of autonomy and competence. With an in-house platform, velocity becomes a local maximum as you shrink the autonomy-space in response to your perceived lack of current or future competence (“I cannot trust this engineering team to make good use of AWS without an abstraction layer”).

If this site implemented a comments section, then I’m sure the more cogent hate mail would tell me that platform teams are effective at finding good ways of doing a certain thing - and once that is discovered then product teams no longer have to “reinvent the wheel”. This is partially true, but it’s short-sighted. Platform engineering suffers from a dynamic that is analogous to an economic doom loop. First, a platform team develops a solution for a problem experienced by multiple teams. The team delivers a stable release and those teams now, by design, depend on the solution. Like the consumers of any piece of software, those teams will invent new requirements and request them. Until the platform team implements those changes, the product team is limited or blocked – their velocity has been limited by an artificial dependency on a central team. The mounting requirements throttle the capacity of the platform team – which results in an increasing backlog of requirements. Velocity in multiple areas of the organisation becomes degraded.

Not every platform team works this way. Some platforms are “top-down”, where there is a weak/non-existent feedback loop. This can be observed in highly-regulated environments, but should generally be considered a giant antipattern – why do we regard rapid feedback loops as critical for some software, but not required for other software?

Small, autonomous software teams move the quickest. To facilitate this, teams must be free to choose the technologies they need and the implementations that make sense for them. They should be freed of arbitrary complexity and artificial dependencies. Autonomous teams should adopt an open source culture, co-authoring libraries where duplicated requirements exist. If a team has a specific requirement from the “platform”, then they should self-author a patch and this should be democratised by merging it into the shared codebase. Technical leadership should seek to impose quality and standardise through linting and shared standards, not by a central team of guardians. Platform engineers (or others of their ilk, such as Site Reliability Engineers), should consult closely with, or embed into, engineering teams rather than seek to enable from a distance.

To be clear - I’ve been a platform engineer at multiple companies. None of the above should read as a criticism of the engineers themselves, nor those who look to implement the idea of a platform team. Platform engineers can be incredibly skillful, multi-disciplinary technologists. Team toplogies that don’t implement a central platform team can unlock these engineers to be even more effective.