The Rise of Platform Engineering

November 7, 2022

DevOps promised developers would own their infrastructure. The reality: developers drowning in operational complexity. Platform engineering emerged to solve this—building internal platforms that give developers self-service capabilities without requiring them to become infrastructure experts.

Here’s why platform engineering matters and how to build effective platforms.

From DevOps to Platform Engineering

The DevOps Challenge

devops_reality:
  promise:
    - Developers own the full lifecycle
    - Break down silos
    - Ship faster

  challenge:
    - Cognitive overload on developers
    - Inconsistent practices across teams
    - Duplicated effort
    - Security and compliance gaps

  symptoms:
    - Every team builds their own CI/CD
    - Developers learning Kubernetes instead of coding
    - Shadow IT for cloud resources
    - Slow onboarding, tribal knowledge

Platform Engineering Solution

platform_engineering:
  definition: Building and maintaining internal platforms that enable developer self-service

  approach:
    - Treat developers as customers
    - Build golden paths
    - Reduce cognitive load
    - Enable without restricting

  outcome:
    - Developers focus on product
    - Consistent, secure infrastructure
    - Faster time to production
    - Scalable operations

Internal Developer Platform

Platform Components

internal_developer_platform:
  components:
    developer_portal:
      purpose: Single entry point
      capabilities:
        - Service catalog
        - Documentation
        - API reference
        - Team ownership

    self_service:
      purpose: Provision without tickets
      capabilities:
        - Create new services
        - Deploy to environments
        - Request resources
        - Manage configurations

    ci_cd:
      purpose: Automated pipelines
      capabilities:
        - Build automation
        - Testing frameworks
        - Deployment orchestration
        - Rollback mechanisms

    observability:
      purpose: Understand system behavior
      capabilities:
        - Metrics dashboards
        - Log aggregation
        - Distributed tracing
        - Alerting

    security:
      purpose: Built-in compliance
      capabilities:
        - Secrets management
        - Policy enforcement
        - Vulnerability scanning
        - Access control

The Golden Path

golden_path:
  concept: Opinionated, supported way to do things

  example_service_creation:
    input:
      - Service name
      - Team
      - Language/framework
      - Required integrations

    output:
      - Git repository with template
      - CI/CD pipeline configured
      - Kubernetes manifests
      - Monitoring dashboards
      - Security scanning
      - Documentation stub

  benefits:
    - 10 minutes vs. 2 weeks
    - Consistent structure
    - Built-in best practices
    - Security by default

Building Platforms

Platform as a Product

platform_as_product:
  customers: Internal developers

  product_thinking:
    user_research:
      - Interview developers
      - Observe pain points
      - Understand workflows

    roadmap:
      - Prioritize by developer impact
      - Balance features and stability
      - Communicate plans

    feedback_loops:
      - Usage metrics
      - Satisfaction surveys
      - Support channels

    marketing:
      - Internal documentation
      - Training sessions
      - Office hours

Team Structure

platform_team:
  size: 5-10% of engineering org typically

  skills:
    infrastructure:
      - Kubernetes, cloud platforms
      - Networking, security
      - Infrastructure as Code

    development:
      - API design
      - Frontend for portals
      - Tooling development

    product:
      - User research
      - Product management
      - Technical writing

  anti_patterns:
    - Pure ops team (no developer empathy)
    - Pure dev team (no ops expertise)
    - Ticket-driven (blocking team)

Build vs. Buy

build_vs_buy:
  build:
    when:
      - Unique requirements
      - Strong engineering culture
      - Competitive advantage
    cost: High initial, ongoing maintenance

  buy:
    when:
      - Commodity capability
      - Quick time to value
      - Resource constrained
    cost: Licensing, integration effort

  common_decisions:
    build:
      - Deployment pipelines (org-specific)
      - Service templates (culture-specific)
      - Internal APIs

    buy:
      - Source control (GitHub, GitLab)
      - Observability (Datadog, Grafana Cloud)
      - Secrets management (Vault, AWS Secrets)

Platform Capabilities

Service Catalog

# Backstage service catalog example
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  description: Handles order processing
  annotations:
    github.com/project-slug: company/order-service
    pagerduty.com/service-id: P123ABC
spec:
  type: service
  lifecycle: production
  owner: team-commerce
  dependsOn:
    - component:inventory-service
    - component:payment-service
  providesApis:
    - orders-api

Self-Service Templates

# Backstage software template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: go-microservice
  title: Go Microservice
  description: Create a new Go microservice
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service Information
      properties:
        name:
          title: Service Name
          type: string
        owner:
          title: Owner Team
          type: string
          ui:field: OwnerPicker

    - title: Infrastructure
      properties:
        database:
          title: Database
          type: string
          enum: [none, postgresql, mongodb]
        queue:
          title: Message Queue
          type: string
          enum: [none, kafka, rabbitmq]

  steps:
    - id: fetch
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}

    - id: publish
      action: publish:github
      input:
        repoUrl: github.com?owner=company&repo=${{ parameters.name }}

    - id: register
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}

Automated Environments

environment_automation:
  preview_environments:
    trigger: Pull request opened
    creates:
      - Isolated namespace
      - Service deployment
      - Database copy
      - Unique URL
    destroyed: PR merged or closed

  staging:
    trigger: Merge to main
    creates:
      - Deploy to staging cluster
      - Run integration tests
      - Performance tests

  production:
    trigger: Promotion or schedule
    creates:
      - Canary deployment
      - Gradual rollout
      - Automatic rollback on errors

Measuring Platform Success

Platform Metrics

platform_metrics:
  adoption:
    - Percentage of services on platform
    - New services using golden path
    - Active users

  efficiency:
    - Time to first deployment
    - Time to create new service
    - Support tickets per developer

  satisfaction:
    - Developer NPS
    - Survey scores
    - Retention/churn

  reliability:
    - Platform uptime
    - CI/CD success rate
    - Mean time to recover

DORA Impact

platform_dora_impact:
  deployment_frequency:
    before: Monthly releases
    after: Daily releases
    driver: Self-service deployments

  lead_time:
    before: 2 weeks
    after: 1 day
    driver: Automated pipelines

  change_failure_rate:
    before: 30%
    after: 10%
    driver: Standardized testing, canary deploys

  mttr:
    before: 4 hours
    after: 30 minutes
    driver: Observability, runbooks

Key Takeaways

The goal is not the platform itself—it’s enabling developers to deliver value faster.