Every growing engineering organization faces a choice: either every team builds their own deployment pipelines, monitoring, databases, and tooling—or a dedicated team builds shared infrastructure that everyone uses.
The first approach duplicates effort. The second creates a platform team.
Platform teams are force multipliers. Done well, they enable product teams to move faster by providing self-service capabilities. Done poorly, they become bottlenecks that slow everyone down.
What Platform Teams Do
Platform teams build and operate internal infrastructure:
Developer experience: CI/CD pipelines, development environments, testing infrastructure, deployment tools.
Runtime infrastructure: Kubernetes clusters, service mesh, databases, caches, message queues.
Observability: Monitoring, logging, alerting, tracing infrastructure.
Security infrastructure: Secret management, authentication, authorization, security tooling.
The common thread: capabilities that every product team needs but shouldn’t build independently.
Platform as Product
The key insight: treat platform capabilities as products with internal customers.
Product Mindset
Users are your product teams. Understand their needs, pain points, and workflows. Build what helps them, not what’s technically interesting.
Adoption is your metric. A platform nobody uses provides no value. Track adoption and understand why teams do or don’t use your capabilities.
User experience matters. Internal tools can be hard to use and poorly documented because “it’s just internal.” This attitude kills adoption.
Self-Service
Product teams shouldn’t need platform team involvement for standard operations.
Good: “I need a new database. I fill out a form, and it’s provisioned automatically.”
Bad: “I need a new database. I file a ticket and wait three weeks.”
Self-service enables product team velocity and reduces platform team bottleneck.
Documentation and Support
Internal platforms need documentation like external products:
- Getting started guides
- Reference documentation
- Examples and tutorials
- Troubleshooting guides
Provide support channels: Slack, office hours, on-call for production issues. Good support builds trust and adoption.
Building the Team
Skills Mix
Platform teams need diverse skills:
Infrastructure: Deep knowledge of the underlying systems (Kubernetes, databases, networking).
Software engineering: Building tools and automation requires solid engineering skills.
Developer experience: Understanding developer workflows and building usable tools.
Operations: Running production infrastructure reliably.
Avoid teams that are all infrastructure experts with no software engineering skills, or all software engineers with no operational experience.
Size and Scope
Start small. A 50-person engineering organization doesn’t need a 20-person platform team.
Rule of thumb: platform team should be roughly 10-15% of total engineering. Adjust based on:
- How much shared infrastructure exists
- How standardized product team needs are
- How much platform work was previously duplicated
Organizational Placement
Platform teams typically report to engineering leadership, parallel to product engineering. They should have direct access to product teams and their pain points.
Avoid placing platform teams too far from product teams—they’ll lose touch with actual needs.
Common Patterns
Golden Paths
Define recommended ways to do common tasks:
“If you’re building a new service, here’s the recommended setup: this template, this CI pipeline, this monitoring configuration, this deployment process.”
Golden paths provide guidance without mandates. Teams can deviate when necessary but have a clear default.
Paved Roads
More structured than golden paths: the platform provides capabilities that make the right thing easy and the wrong thing hard.
“Deploying through our pipeline is one click. Deploying any other way requires manual infrastructure access that most people don’t have.”
Paved roads guide teams toward good practices through incentives rather than rules.
Capability APIs
Expose platform capabilities through well-defined APIs:
# Request a database
apiVersion: platform.company.com/v1
kind: Database
metadata:
name: my-service-db
spec:
engine: postgresql
version: "13"
size: small
APIs enable self-service, automation, and integration with team workflows.
Measuring Success
Adoption Metrics
- What percentage of teams use the platform?
- Which capabilities have highest/lowest adoption?
- Why do teams opt out?
Low adoption indicates problems: capability doesn’t meet needs, poor usability, lack of awareness.
Velocity Metrics
- How long does it take teams to ship new services?
- How has this changed since platform introduction?
Platform should measurably improve product team velocity.
Reliability Metrics
- Uptime of platform services
- Incident frequency affecting product teams
- Mean time to recovery
Unreliable platform creates downstream reliability problems.
Satisfaction Metrics
Survey product teams:
- How satisfied are you with platform capabilities?
- What’s working well?
- What needs improvement?
Direct feedback identifies priorities.
Common Failure Modes
The Bottleneck
Platform team becomes a gatekeeper. Product teams wait for platform involvement for routine operations. Velocity decreases.
Fix: Invest in self-service. Automate common operations. Reserve platform team time for building capabilities, not executing them.
The Ivory Tower
Platform team builds technically sophisticated solutions that don’t match product team needs. Low adoption, wasted effort.
Fix: Embed with product teams. Understand their workflows. Involve them in design. Measure adoption ruthlessly.
The Cost Center
Platform team is seen as overhead, not value creator. Budget pressure, underinvestment, gradual capability degradation.
Fix: Demonstrate value in business terms. Connect platform improvements to product team velocity. Build executive relationships.
The Support Queue
Platform team spends all time on support tickets, no time on building. Capability stagnation, team burnout.
Fix: Invest in documentation and self-service to reduce support load. Set expectations about support SLAs. Protect time for development work.
Evolution Over Time
Platform needs evolve as the organization grows:
Early stage (< 50 engineers): Maybe don’t need a dedicated team. Senior engineers build shared tooling as part of their work.
Growth stage (50-200 engineers): Dedicated platform team emerges. Focus on highest-leverage capabilities: CI/CD, basic infrastructure.
Scale stage (> 200 engineers): Platform team grows, potentially splits into sub-teams (developer experience, infrastructure, security). More sophisticated capabilities.
Revisit platform strategy as the organization changes.
Key Takeaways
- Treat platform as a product with internal customers; adoption is your key metric
- Build for self-service; platform shouldn’t be a bottleneck
- Provide documentation and support like an external product
- Measure adoption, velocity impact, reliability, and satisfaction
- Avoid common failures: bottleneck, ivory tower, cost center, support queue
- Evolve platform strategy as the organization grows