Building a Platform Team

December 28, 2017

Every growing engineering organization faces a choice: either every team builds their own deployment pipelines, monitoring, databases, and tooling—or a dedicated team builds shared infrastructure that everyone uses.

The first approach duplicates effort. The second creates a platform team.

Platform teams are force multipliers. Done well, they enable product teams to move faster by providing self-service capabilities. Done poorly, they become bottlenecks that slow everyone down.

What Platform Teams Do

Platform teams build and operate internal infrastructure:

Developer experience: CI/CD pipelines, development environments, testing infrastructure, deployment tools.

Runtime infrastructure: Kubernetes clusters, service mesh, databases, caches, message queues.

Observability: Monitoring, logging, alerting, tracing infrastructure.

Security infrastructure: Secret management, authentication, authorization, security tooling.

The common thread: capabilities that every product team needs but shouldn’t build independently.

Platform as Product

The key insight: treat platform capabilities as products with internal customers.

Product Mindset

Users are your product teams. Understand their needs, pain points, and workflows. Build what helps them, not what’s technically interesting.

Adoption is your metric. A platform nobody uses provides no value. Track adoption and understand why teams do or don’t use your capabilities.

User experience matters. Internal tools can be hard to use and poorly documented because “it’s just internal.” This attitude kills adoption.

Self-Service

Product teams shouldn’t need platform team involvement for standard operations.

Good: “I need a new database. I fill out a form, and it’s provisioned automatically.”

Bad: “I need a new database. I file a ticket and wait three weeks.”

Self-service enables product team velocity and reduces platform team bottleneck.

Documentation and Support

Internal platforms need documentation like external products:

Provide support channels: Slack, office hours, on-call for production issues. Good support builds trust and adoption.

Building the Team

Skills Mix

Platform teams need diverse skills:

Infrastructure: Deep knowledge of the underlying systems (Kubernetes, databases, networking).

Software engineering: Building tools and automation requires solid engineering skills.

Developer experience: Understanding developer workflows and building usable tools.

Operations: Running production infrastructure reliably.

Avoid teams that are all infrastructure experts with no software engineering skills, or all software engineers with no operational experience.

Size and Scope

Start small. A 50-person engineering organization doesn’t need a 20-person platform team.

Rule of thumb: platform team should be roughly 10-15% of total engineering. Adjust based on:

Organizational Placement

Platform teams typically report to engineering leadership, parallel to product engineering. They should have direct access to product teams and their pain points.

Avoid placing platform teams too far from product teams—they’ll lose touch with actual needs.

Common Patterns

Golden Paths

Define recommended ways to do common tasks:

“If you’re building a new service, here’s the recommended setup: this template, this CI pipeline, this monitoring configuration, this deployment process.”

Golden paths provide guidance without mandates. Teams can deviate when necessary but have a clear default.

Paved Roads

More structured than golden paths: the platform provides capabilities that make the right thing easy and the wrong thing hard.

“Deploying through our pipeline is one click. Deploying any other way requires manual infrastructure access that most people don’t have.”

Paved roads guide teams toward good practices through incentives rather than rules.

Capability APIs

Expose platform capabilities through well-defined APIs:

# Request a database
apiVersion: platform.company.com/v1
kind: Database
metadata:
  name: my-service-db
spec:
  engine: postgresql
  version: "13"
  size: small

APIs enable self-service, automation, and integration with team workflows.

Measuring Success

Adoption Metrics

Low adoption indicates problems: capability doesn’t meet needs, poor usability, lack of awareness.

Velocity Metrics

Platform should measurably improve product team velocity.

Reliability Metrics

Unreliable platform creates downstream reliability problems.

Satisfaction Metrics

Survey product teams:

Direct feedback identifies priorities.

Common Failure Modes

The Bottleneck

Platform team becomes a gatekeeper. Product teams wait for platform involvement for routine operations. Velocity decreases.

Fix: Invest in self-service. Automate common operations. Reserve platform team time for building capabilities, not executing them.

The Ivory Tower

Platform team builds technically sophisticated solutions that don’t match product team needs. Low adoption, wasted effort.

Fix: Embed with product teams. Understand their workflows. Involve them in design. Measure adoption ruthlessly.

The Cost Center

Platform team is seen as overhead, not value creator. Budget pressure, underinvestment, gradual capability degradation.

Fix: Demonstrate value in business terms. Connect platform improvements to product team velocity. Build executive relationships.

The Support Queue

Platform team spends all time on support tickets, no time on building. Capability stagnation, team burnout.

Fix: Invest in documentation and self-service to reduce support load. Set expectations about support SLAs. Protect time for development work.

Evolution Over Time

Platform needs evolve as the organization grows:

Early stage (< 50 engineers): Maybe don’t need a dedicated team. Senior engineers build shared tooling as part of their work.

Growth stage (50-200 engineers): Dedicated platform team emerges. Focus on highest-leverage capabilities: CI/CD, basic infrastructure.

Scale stage (> 200 engineers): Platform team grows, potentially splits into sub-teams (developer experience, infrastructure, security). More sophisticated capabilities.

Revisit platform strategy as the organization changes.

Key Takeaways