DORA Metrics: Measuring Engineering Effectiveness

January 24, 2022

The DORA (DevOps Research and Assessment) metrics have emerged as the standard for measuring software delivery performance. After years of research, four key metrics predict both organizational performance and developer wellbeing. But measuring them properly requires thought.

Here’s how to implement DORA metrics effectively.

The Four Key Metrics

Overview

dora_metrics:
  deployment_frequency:
    definition: How often code is deployed to production
    question: "How frequently do you deploy?"
    elite: Multiple times per day
    high: Daily to weekly
    medium: Weekly to monthly
    low: Monthly to semi-annually

  lead_time_for_changes:
    definition: Time from commit to production
    question: "How long from commit to deploy?"
    elite: Less than one hour
    high: One day to one week
    medium: One week to one month
    low: One month to six months

  change_failure_rate:
    definition: Percentage of deployments causing failures
    question: "What percentage of deployments cause incidents?"
    elite: 0-15%
    high: 16-30%
    medium: 16-30%
    low: 16-30%  # Note: ranges overlap in original research

  mean_time_to_restore:
    definition: Time to recover from failure
    question: "How long to restore service?"
    elite: Less than one hour
    high: Less than one day
    medium: One day to one week
    low: One week to one month

Why These Metrics?

metric_rationale:
  deployment_frequency:
    - Indicates batch size
    - Smaller batches = lower risk
    - Enables fast feedback

  lead_time:
    - Measures pipeline efficiency
    - Reflects process overhead
    - Indicates responsiveness

  change_failure_rate:
    - Quality indicator
    - Balance to velocity metrics
    - Prevents "move fast, break things"

  mttr:
    - Resilience indicator
    - Measures incident response
    - Recovery over prevention

Implementation

Deployment Frequency

# Track deployments in your CD system
deployment_tracking:
  sources:
    - CI/CD pipeline completions
    - Kubernetes deployments
    - Cloud deployment events
    - Feature flag changes (optional)

  calculation:
    metric: Count of production deployments
    period: Per day/week
    granularity: Per team/service

  implementation:
    github_actions: |
      - name: Record deployment
        if: success()
        run: |
          curl -X POST "$METRICS_ENDPOINT/deployment" \
            -d '{"service": "${{ github.repository }}", "timestamp": "${{ github.event.head_commit.timestamp }}"}'
-- Query deployment frequency
SELECT
    service,
    DATE_TRUNC('week', deployed_at) as week,
    COUNT(*) as deployments
FROM deployments
WHERE deployed_at > NOW() - INTERVAL '90 days'
GROUP BY 1, 2
ORDER BY 1, 2;

Lead Time for Changes

lead_time_tracking:
  measurement_points:
    - Commit timestamp (start)
    - PR merge timestamp (optional)
    - Deploy to production timestamp (end)

  calculation:
    metric: Time from first commit to production deploy
    aggregation: Median (not mean—outliers skew)
    period: Rolling 30 days

  considerations:
    - Track per-commit, not per-deploy
    - Handle multi-commit deploys
    - Consider working hours vs. calendar time
# Calculate lead time
def calculate_lead_time(deployment):
    commits = get_commits_in_deployment(deployment)
    lead_times = []

    for commit in commits:
        lead_time = deployment.timestamp - commit.timestamp
        lead_times.append(lead_time)

    return {
        'median': statistics.median(lead_times),
        'p90': numpy.percentile(lead_times, 90),
        'deployment_id': deployment.id
    }

Change Failure Rate

change_failure_tracking:
  definition: |
    Percentage of deployments that result in degraded
    service requiring remediation (rollback, hotfix, patch)

  what_counts:
    - Production incidents
    - Rollbacks
    - Hotfixes
    - Emergency patches

  what_doesnt:
    - Test environment failures
    - Failed deployments that don't reach users
    - Issues caught before production

  calculation:
    metric: Failed deployments / Total deployments
    period: Rolling 30 days
-- Query change failure rate
SELECT
    service,
    COUNT(CASE WHEN caused_incident THEN 1 END)::float /
    COUNT(*)::float as failure_rate
FROM deployments
WHERE deployed_at > NOW() - INTERVAL '30 days'
GROUP BY service;

Mean Time to Restore

mttr_tracking:
  measurement_points:
    - Incident start (detection or report)
    - Incident resolved (service restored)

  considerations:
    - Use median, not mean
    - Measure restoration, not root cause fix
    - Link to deployments that caused issues

  sources:
    - Incident management system
    - PagerDuty/Opsgenie
    - Status page history

Data Collection Architecture

Pipeline Integration

# Metrics collection architecture
collection_architecture:
  sources:
    ci_cd:
      - GitHub Actions
      - GitLab CI
      - Jenkins
      - ArgoCD

    incident_management:
      - PagerDuty
      - Opsgenie
      - Custom incident system

    version_control:
      - GitHub/GitLab webhooks
      - Commit data

  storage:
    options:
      - Time-series database (InfluxDB, TimescaleDB)
      - Data warehouse (Snowflake, BigQuery)
      - Dedicated platform (Sleuth, LinearB, Faros)

  visualization:
    - Grafana dashboards
    - Custom dashboards
    - Weekly reports

Webhook Integration

# GitHub webhook for deployments
github_deployment_webhook:
  events:
    - deployment_status

  handler: |
    if event.deployment_status.state == 'success':
        record_deployment(
            service=event.repository.name,
            sha=event.deployment.sha,
            timestamp=event.deployment_status.updated_at,
            environment=event.deployment.environment
        )

        # Calculate lead time from commits
        commits = get_commits_since_last_deploy(
            repo=event.repository.name,
            sha=event.deployment.sha
        )
        for commit in commits:
            record_commit_lead_time(
                commit_sha=commit.sha,
                commit_time=commit.timestamp,
                deploy_time=event.deployment_status.updated_at
            )

Dashboards

Team Dashboard

team_dashboard:
  summary_metrics:
    - Current week deployment frequency
    - Rolling 30-day lead time (median)
    - Rolling 30-day change failure rate
    - Rolling 30-day MTTR (median)

  trends:
    - 12-week trend for each metric
    - Comparison to previous period
    - Benchmark against targets

  drill_down:
    - By service
    - By deployment
    - By incident

Executive Dashboard

executive_dashboard:
  organization_level:
    - DORA metrics by team/department
    - Distribution across performance levels
    - Trend over quarters

  comparison:
    - Elite/High/Medium/Low breakdown
    - Industry benchmarks
    - Year-over-year improvement

Using Metrics Effectively

What to Avoid

anti_patterns:
  gaming:
    problem: Teams optimize metric, not outcome
    example: Many small deploys that don't ship value
    solution: Pair with business metrics

  individual_measurement:
    problem: Using for individual performance
    example: Developer lead time tracking
    solution: Team-level metrics only

  punishing_failure:
    problem: Teams hide incidents to improve CFR
    example: Not reporting issues
    solution: Blameless culture, reward transparency

  comparing_incomparable:
    problem: Comparing teams with different contexts
    example: Platform team vs. product team
    solution: Compare to self, not others

What Works

effective_use:
  continuous_improvement:
    - Track team's own trend
    - Set improvement goals
    - Celebrate progress

  investment_justification:
    - "CI/CD investment reduced lead time 50%"
    - "Testing automation reduced CFR 40%"
    - Connect to business outcomes

  identifying_bottlenecks:
    - Long lead time → Pipeline or process issue
    - High CFR → Testing or quality issue
    - Long MTTR → Detection or response issue

  balanced_view:
    - Don't optimize one at expense of others
    - Velocity without quality is meaningless
    - Recovery matters as much as prevention

Key Takeaways

Measurement enables improvement, but only if used thoughtfully.