Engineering teams measure many things. Lines of code, story points, tickets closed. Most of these metrics are noise at best, actively harmful at worst. The right metrics drive improvement; the wrong ones create perverse incentives.
Here are metrics that actually matter.
The Problem with Common Metrics
Metrics That Don’t Work
bad_metrics:
lines_of_code:
problem: Incentivizes verbosity
reality: Best code is deleted code
result: Bloated, unmaintainable systems
story_points:
problem: Velocity becomes target
reality: Point inflation
result: Gaming instead of delivery
tickets_closed:
problem: Incentivizes splitting
reality: Quality of work ignored
result: Shallow work, rework
hours_worked:
problem: Presence over impact
reality: Burnout, inefficiency
result: Declining productivity
Goodhart’s Law
goodharts_law:
statement: "When a measure becomes a target, it ceases to be a good measure"
examples:
- Target: Deploy frequency → Result: Empty deployments
- Target: Test coverage → Result: Meaningless tests
- Target: PR merge time → Result: Rubber-stamp reviews
solution: Measure outcomes, not activities
DORA Metrics
The Four Key Metrics
dora_metrics:
deployment_frequency:
what: How often code deploys to production
elite: Multiple times per day
high: Weekly to monthly
medium: Monthly to every 6 months
low: Less than every 6 months
lead_time_for_changes:
what: Time from commit to production
elite: Less than one hour
high: One day to one week
medium: One week to one month
low: More than one month
mean_time_to_recover:
what: Time to restore service after incident
elite: Less than one hour
high: Less than one day
medium: One day to one week
low: More than one week
change_failure_rate:
what: Percentage of deployments causing incidents
elite: 0-15%
high: 16-30%
medium: 31-45%
low: 46-60%
Measuring DORA
deployment_frequency:
data_source: CI/CD pipeline
calculation: Count of production deployments / time period
lead_time:
data_source: Git + deployment events
calculation: Median time from first commit to deployment
mttr:
data_source: Incident management system
calculation: Mean time from incident start to resolution
change_failure_rate:
data_source: Incident management + deployments
calculation: Incidents caused by changes / total deployments
# Calculate deployment frequency
def deployment_frequency(deployments, days=30):
recent = [d for d in deployments if d.date > datetime.now() - timedelta(days=days)]
return len(recent) / days
# Calculate lead time
def lead_time(pull_requests, days=30):
recent = [pr for pr in pull_requests if pr.merged_at > datetime.now() - timedelta(days=days)]
lead_times = [(pr.merged_at - pr.first_commit_at).total_seconds() / 3600 for pr in recent]
return statistics.median(lead_times) # Return median hours
System Reliability
SLIs, SLOs, SLAs
reliability_metrics:
availability:
formula: Uptime / Total time
example: "99.9% = 8.76 hours downtime/year"
measurement: Synthetic monitoring, real user monitoring
latency:
formula: Request duration at percentile
example: "p99 < 200ms"
measurement: APM, distributed tracing
error_rate:
formula: Failed requests / Total requests
example: "< 0.1% 5xx errors"
measurement: Application logs, load balancer metrics
throughput:
formula: Requests handled per time unit
example: "10,000 RPS sustained"
measurement: Load balancer, application metrics
Error Budget
error_budget:
concept: "Amount of unreliability allowed before slowing feature work"
calculation:
slo: 99.9%
budget: 0.1%
monthly_minutes: 43,200
budget_minutes: 43.2
usage:
- Track budget consumption
- Slow features when budget exhausted
- Balance reliability and velocity
Developer Experience
SPACE Framework
space_framework:
satisfaction_wellbeing:
what: How developers feel about work
measures:
- Survey satisfaction scores
- Retention rates
- Burnout indicators
performance:
what: Outcomes of developer work
measures:
- Quality (defects, incidents)
- Customer impact
- Code review quality
activity:
what: What developers do (use carefully)
measures:
- Commits, PRs (context matters)
- Deployments
- Code reviews completed
communication:
what: How developers collaborate
measures:
- PR review turnaround
- Documentation quality
- Knowledge sharing
efficiency:
what: Getting work done without friction
measures:
- Build times
- Test suite duration
- Time to first productive day
Developer Friction
friction_metrics:
time_to_first_commit:
what: Days from start to first merged code
target: "< 1 week"
indicates: Onboarding effectiveness
build_time:
what: Time from code change to runnable build
target: "< 5 minutes"
indicates: Development loop speed
test_suite_duration:
what: Time to run full test suite
target: "< 10 minutes"
indicates: Feedback loop quality
deploy_wait_time:
what: Time from merge to production
target: "< 1 hour"
indicates: Pipeline efficiency
pr_review_time:
what: Time from PR open to first review
target: "< 4 hours"
indicates: Team collaboration
Business Alignment
Impact Metrics
impact_metrics:
feature_adoption:
what: Users using new features
why: Measures actual value delivery
customer_incidents:
what: Customer-reported issues
why: Quality from user perspective
revenue_per_engineer:
what: Company revenue / engineering headcount
why: Engineering leverage (use carefully)
time_to_market:
what: Idea to customer availability
why: Competitive advantage
Technical Debt Indicators
tech_debt_metrics:
rework_rate:
what: Changes to recently changed code
interpretation: High rework indicates quality issues
incident_frequency:
what: Production incidents per service
interpretation: Reliability of codebase
deployment_pain:
what: Failed deployments, rollbacks
interpretation: Deployment automation quality
dependency_age:
what: Average age of dependencies
interpretation: Security and maintenance burden
Implementing Metrics
Start Small
implementation_approach:
phase_1:
duration: 1-2 months
metrics:
- Deployment frequency
- Lead time
focus: Establish baseline, build tooling
phase_2:
duration: 2-3 months
metrics:
- Add MTTR
- Add change failure rate
focus: Incident correlation
phase_3:
duration: Ongoing
metrics:
- Developer experience
- Business alignment
focus: Continuous improvement
Avoid Common Pitfalls
pitfalls:
individual_metrics:
problem: Creates competition, gaming
solution: Team-level metrics only
too_many_metrics:
problem: Attention fragmented
solution: 3-5 key metrics maximum
no_context:
problem: Numbers without meaning
solution: Add trends, comparisons, targets
punitive_use:
problem: Metrics become weapons
solution: Use for improvement, not judgment
Key Takeaways
- Most common engineering metrics create perverse incentives
- DORA metrics (deployment frequency, lead time, MTTR, change failure rate) predict performance
- Error budgets balance reliability with feature velocity
- Developer experience metrics catch productivity killers
- Connect engineering metrics to business outcomes
- Measure at team level, not individual
- Use metrics for improvement, not judgment
- Start with few metrics, add as you mature
- Context matters more than raw numbers
- Goodhart’s Law: targets corrupt measures
Metrics are a lens for improvement, not a scorecard. Use them to ask questions, not to judge performance.