The DORA (DevOps Research and Assessment) metrics have emerged as the standard for measuring software delivery performance. After years of research, four key metrics predict both organizational performance and developer wellbeing. But measuring them properly requires thought.
Here’s how to implement DORA metrics effectively.
The Four Key Metrics
Overview
dora_metrics:
deployment_frequency:
definition: How often code is deployed to production
question: "How frequently do you deploy?"
elite: Multiple times per day
high: Daily to weekly
medium: Weekly to monthly
low: Monthly to semi-annually
lead_time_for_changes:
definition: Time from commit to production
question: "How long from commit to deploy?"
elite: Less than one hour
high: One day to one week
medium: One week to one month
low: One month to six months
change_failure_rate:
definition: Percentage of deployments causing failures
question: "What percentage of deployments cause incidents?"
elite: 0-15%
high: 16-30%
medium: 16-30%
low: 16-30% # Note: ranges overlap in original research
mean_time_to_restore:
definition: Time to recover from failure
question: "How long to restore service?"
elite: Less than one hour
high: Less than one day
medium: One day to one week
low: One week to one month
Why These Metrics?
metric_rationale:
deployment_frequency:
- Indicates batch size
- Smaller batches = lower risk
- Enables fast feedback
lead_time:
- Measures pipeline efficiency
- Reflects process overhead
- Indicates responsiveness
change_failure_rate:
- Quality indicator
- Balance to velocity metrics
- Prevents "move fast, break things"
mttr:
- Resilience indicator
- Measures incident response
- Recovery over prevention
Implementation
Deployment Frequency
# Track deployments in your CD system
deployment_tracking:
sources:
- CI/CD pipeline completions
- Kubernetes deployments
- Cloud deployment events
- Feature flag changes (optional)
calculation:
metric: Count of production deployments
period: Per day/week
granularity: Per team/service
implementation:
github_actions: |
- name: Record deployment
if: success()
run: |
curl -X POST "$METRICS_ENDPOINT/deployment" \
-d '{"service": "${{ github.repository }}", "timestamp": "${{ github.event.head_commit.timestamp }}"}'
-- Query deployment frequency
SELECT
service,
DATE_TRUNC('week', deployed_at) as week,
COUNT(*) as deployments
FROM deployments
WHERE deployed_at > NOW() - INTERVAL '90 days'
GROUP BY 1, 2
ORDER BY 1, 2;
Lead Time for Changes
lead_time_tracking:
measurement_points:
- Commit timestamp (start)
- PR merge timestamp (optional)
- Deploy to production timestamp (end)
calculation:
metric: Time from first commit to production deploy
aggregation: Median (not mean—outliers skew)
period: Rolling 30 days
considerations:
- Track per-commit, not per-deploy
- Handle multi-commit deploys
- Consider working hours vs. calendar time
# Calculate lead time
def calculate_lead_time(deployment):
commits = get_commits_in_deployment(deployment)
lead_times = []
for commit in commits:
lead_time = deployment.timestamp - commit.timestamp
lead_times.append(lead_time)
return {
'median': statistics.median(lead_times),
'p90': numpy.percentile(lead_times, 90),
'deployment_id': deployment.id
}
Change Failure Rate
change_failure_tracking:
definition: |
Percentage of deployments that result in degraded
service requiring remediation (rollback, hotfix, patch)
what_counts:
- Production incidents
- Rollbacks
- Hotfixes
- Emergency patches
what_doesnt:
- Test environment failures
- Failed deployments that don't reach users
- Issues caught before production
calculation:
metric: Failed deployments / Total deployments
period: Rolling 30 days
-- Query change failure rate
SELECT
service,
COUNT(CASE WHEN caused_incident THEN 1 END)::float /
COUNT(*)::float as failure_rate
FROM deployments
WHERE deployed_at > NOW() - INTERVAL '30 days'
GROUP BY service;
Mean Time to Restore
mttr_tracking:
measurement_points:
- Incident start (detection or report)
- Incident resolved (service restored)
considerations:
- Use median, not mean
- Measure restoration, not root cause fix
- Link to deployments that caused issues
sources:
- Incident management system
- PagerDuty/Opsgenie
- Status page history
Data Collection Architecture
Pipeline Integration
# Metrics collection architecture
collection_architecture:
sources:
ci_cd:
- GitHub Actions
- GitLab CI
- Jenkins
- ArgoCD
incident_management:
- PagerDuty
- Opsgenie
- Custom incident system
version_control:
- GitHub/GitLab webhooks
- Commit data
storage:
options:
- Time-series database (InfluxDB, TimescaleDB)
- Data warehouse (Snowflake, BigQuery)
- Dedicated platform (Sleuth, LinearB, Faros)
visualization:
- Grafana dashboards
- Custom dashboards
- Weekly reports
Webhook Integration
# GitHub webhook for deployments
github_deployment_webhook:
events:
- deployment_status
handler: |
if event.deployment_status.state == 'success':
record_deployment(
service=event.repository.name,
sha=event.deployment.sha,
timestamp=event.deployment_status.updated_at,
environment=event.deployment.environment
)
# Calculate lead time from commits
commits = get_commits_since_last_deploy(
repo=event.repository.name,
sha=event.deployment.sha
)
for commit in commits:
record_commit_lead_time(
commit_sha=commit.sha,
commit_time=commit.timestamp,
deploy_time=event.deployment_status.updated_at
)
Dashboards
Team Dashboard
team_dashboard:
summary_metrics:
- Current week deployment frequency
- Rolling 30-day lead time (median)
- Rolling 30-day change failure rate
- Rolling 30-day MTTR (median)
trends:
- 12-week trend for each metric
- Comparison to previous period
- Benchmark against targets
drill_down:
- By service
- By deployment
- By incident
Executive Dashboard
executive_dashboard:
organization_level:
- DORA metrics by team/department
- Distribution across performance levels
- Trend over quarters
comparison:
- Elite/High/Medium/Low breakdown
- Industry benchmarks
- Year-over-year improvement
Using Metrics Effectively
What to Avoid
anti_patterns:
gaming:
problem: Teams optimize metric, not outcome
example: Many small deploys that don't ship value
solution: Pair with business metrics
individual_measurement:
problem: Using for individual performance
example: Developer lead time tracking
solution: Team-level metrics only
punishing_failure:
problem: Teams hide incidents to improve CFR
example: Not reporting issues
solution: Blameless culture, reward transparency
comparing_incomparable:
problem: Comparing teams with different contexts
example: Platform team vs. product team
solution: Compare to self, not others
What Works
effective_use:
continuous_improvement:
- Track team's own trend
- Set improvement goals
- Celebrate progress
investment_justification:
- "CI/CD investment reduced lead time 50%"
- "Testing automation reduced CFR 40%"
- Connect to business outcomes
identifying_bottlenecks:
- Long lead time → Pipeline or process issue
- High CFR → Testing or quality issue
- Long MTTR → Detection or response issue
balanced_view:
- Don't optimize one at expense of others
- Velocity without quality is meaningless
- Recovery matters as much as prevention
Key Takeaways
- DORA metrics: deployment frequency, lead time, change failure rate, MTTR
- These four metrics predict both performance and developer wellbeing
- Use median not mean for lead time and MTTR (outliers skew)
- Collect data from CI/CD, version control, and incident systems
- Link deployments to incidents for change failure rate
- Dashboard at team level, not individual
- Compare teams to themselves, not each other
- Avoid gaming: pair with business outcome metrics
- Use metrics for improvement, not punishment
- Balance velocity metrics with quality metrics
Measurement enables improvement, but only if used thoughtfully.