Platform Engineering: Building Internal Developer Platforms

October 19, 2020

Platform engineering has emerged as a discipline for making development teams more productive. Instead of every team solving the same infrastructure problems, platform teams create self-service capabilities that others can use.

Here’s how to build internal developer platforms that actually help.

What Platform Engineering Is

The Problem It Solves

Without platforms:

Team A: Figures out Kubernetes deployment
Team B: Figures out Kubernetes deployment (differently)
Team C: Figures out Kubernetes deployment (also differently)

Result:
- Duplicated effort
- Inconsistent practices
- Different security postures
- Hard to maintain

With platforms:

Platform Team: Creates deployment abstraction
All Teams: Use standardized, secure, maintained deployment

Result:
- Consistent practices
- Centralized improvements
- Teams focus on business logic

Platform as Product

Treat the platform like a product:

Core Components

Golden Paths

Paved roads for common tasks:

# Example: "Create a new service" golden path
steps:
  1. Run template generator
     - Scaffolds code structure
     - Creates CI/CD pipeline
     - Sets up monitoring
     - Creates initial infrastructure

  2. Push to repository
     - Triggers automated setup
     - Creates namespaces
     - Configures secrets
     - Deploys to dev environment

  3. Follow README
     - Local development setup
     - Testing instructions
     - Deployment guide

Service Templates

Standardized starting points:

# Generate new service
platform create service \
  --name order-api \
  --type http-api \
  --language go \
  --team orders

# Creates:
# - Repository with standard structure
# - Dockerfile following best practices
# - Kubernetes manifests
# - CI/CD pipeline
# - Monitoring dashboards
# - Initial documentation

Template structure:

templates/http-api-go/
├── {{.Name}}/
│   ├── cmd/
│   │   └── main.go
│   ├── internal/
│   │   ├── handler/
│   │   └── service/
│   ├── Dockerfile
│   ├── Makefile
│   └── README.md
├── .github/
│   └── workflows/
│       └── ci.yaml
├── k8s/
│   ├── base/
│   └── overlays/
└── monitoring/
    └── dashboard.json

Self-Service Infrastructure

Developers provision what they need:

# Developer requests a database
apiVersion: platform.example.com/v1
kind: Database
metadata:
  name: orders-db
  namespace: orders-team
spec:
  engine: postgres
  version: "13"
  size: small
  backup:
    enabled: true
    retention: 7d

Platform handles:

CI/CD Platform

Standardized pipelines:

# Teams use shared pipeline components
jobs:
  build:
    uses: platform-team/workflows/.github/workflows/build-go.yml@v2
    with:
      go-version: '1.19'

  security-scan:
    uses: platform-team/workflows/.github/workflows/security-scan.yml@v2

  deploy:
    needs: [build, security-scan]
    uses: platform-team/workflows/.github/workflows/deploy-k8s.yml@v2
    with:
      environment: production

Observability Platform

Consistent monitoring setup:

# Automatic instrumentation
apiVersion: platform.example.com/v1
kind: Service
metadata:
  name: orders-api
spec:
  observability:
    metrics: true      # Prometheus metrics injected
    tracing: true      # OpenTelemetry sidecar
    logging: structured # JSON logging configured
    dashboards: auto   # Generates standard dashboard

Developer Experience

Portal/Dashboard

Central hub for developers:

Platform Portal
├── Services
│   ├── My Services (list, health status)
│   ├── Create New Service
│   └── Service Catalog
├── Infrastructure
│   ├── My Resources (databases, caches, queues)
│   └── Request New Resource
├── Deployments
│   ├── Recent Deployments
│   └── Deploy Service
├── Documentation
│   ├── Getting Started
│   ├── How-To Guides
│   └── API Reference
└── Support
    ├── FAQs
    └── Request Help

CLI Tools

Command-line for power users:

# Common operations
platform services list
platform deploy orders-api --env production
platform logs orders-api --since 1h
platform db create --name cache --type redis

# Troubleshooting
platform debug orders-api
# Opens shell, attaches debugger, tails logs

# Local development
platform dev start
# Starts local Kubernetes with dependencies

Documentation

Discoverable, accurate docs:

# How to Deploy a Service

## Quick Start
```bash
platform deploy my-service --env staging

Detailed Steps

1. Ensure CI passes

Your service must pass CI before deployment…

2. Choose environment

3. Verify deployment

platform status my-service

Common Issues

Deployment stuck

[Link to troubleshooting guide]

Permission denied

[Link to access request process]


## Building the Platform

### Start Small

Don't build everything at once:

Phase 1 (Month 1-3):

Phase 2 (Month 4-6):

Phase 3 (Month 7-12):


### Measure Adoption

Track platform success:

```yaml
metrics:
  adoption:
    - services_using_templates: 80%
    - teams_on_platform: 90%
    - self-service_vs_ticket_ratio: 95:5

  efficiency:
    - time_to_first_deploy: < 1 day
    - deployment_frequency: daily
    - change_failure_rate: < 5%

  satisfaction:
    - developer_nps: > 50
    - support_ticket_volume: decreasing

Feedback Loops

Listen to developers:

Feedback channels:
- #platform-feedback Slack channel
- Monthly developer surveys
- Platform office hours
- Bug/feature request tracking

Balance Standardization and Flexibility

Too rigid:
"You must use this framework, this language, this pattern"
→ Developers work around the platform

Too flexible:
"Do whatever you want"
→ No consistency, platform provides little value

Right balance:
"Here's the golden path. You can deviate with approval."
→ Most use standards, edge cases are handled

Anti-Patterns

Platform as Gatekeeper

❌ "Submit a ticket to deploy"
❌ "Platform team must approve all changes"
❌ "Only platform team can modify infrastructure"

✓ "Self-service with guardrails"
✓ "Policies enforced automatically"
✓ "Exceptions have clear process"

Building Without Users

❌ Build platform for 6 months, then release
❌ Assume you know what developers need
❌ Ignore feedback after release

✓ Start with one team, iterate
✓ Talk to developers constantly
✓ Ship small, improve continuously

One Size Fits All

❌ Same abstractions for all use cases
❌ No escape hatches
❌ Ignore legitimate edge cases

✓ Templates for common cases
✓ Lower-level access when needed
✓ Support for migration path

Team Structure

Platform Team Responsibilities

platform_team:
  owns:
    - Service templates and scaffolding
    - CI/CD pipeline definitions
    - Kubernetes platform configuration
    - Observability infrastructure
    - Developer portal

  supports:
    - Teams adopting the platform
    - Troubleshooting platform issues
    - Feature requests and improvements

  does_not_own:
    - Individual service code
    - Business logic
    - Application-specific configuration

Interaction Model

Development Teams:
- Use platform for common tasks
- Request features through feedback channels
- Contribute back improvements

Platform Team:
- Build and maintain platform capabilities
- Support teams adopting platform
- Gather requirements, prioritize roadmap
- On-call for platform issues

Key Takeaways

Platform engineering done well multiplies developer productivity. Done poorly, it becomes another bureaucratic hurdle. The difference is treating developers as customers whose needs drive the roadmap.