Kubernetes Resource Management: Requests, Limits, and Right-Sizing

Kubernetes resource management seems simple: set requests and limits. In practice, it’s one of the most commonly misconfigured aspects of Kubernetes. Too low, and pods get killed or starved. Too high, and you waste money on unused capacity.

Here’s how to manage Kubernetes resources effectively.

Understanding Resources

Requests vs Limits

resources:
  requests:
    cpu: "100m"      # Guaranteed minimum
    memory: "128Mi"  # Used for scheduling

  limits:
    cpu: "500m"      # Maximum allowed (throttled)
    memory: "256Mi"  # Maximum allowed (OOM killed)

Requests: What the scheduler uses to place pods Limits: Maximum the pod can use

How They Work

┌─────────────────────────────────────────────────────────────────┐
│                          Node                                    │
│  Allocatable: 4 CPU, 16Gi Memory                                │
│                                                                  │
│  ┌──────────────────┐  ┌──────────────────┐                     │
│  │     Pod A        │  │     Pod B        │                     │
│  │ Request: 1 CPU   │  │ Request: 0.5 CPU │                     │
│  │ Limit: 2 CPU     │  │ Limit: 1 CPU     │                     │
│  └──────────────────┘  └──────────────────┘                     │
│                                                                  │
│  Requested: 1.5 CPU (scheduling considers this)                 │
│  Available for scheduling: 2.5 CPU                              │
│  Actual limit: 3 CPU (can burst if available)                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

CPU vs Memory Behavior

cpu:
  under_request: Gets guaranteed share
  between_request_and_limit: Gets more if available
  at_limit: Throttled (not killed)
  behavior: Compressible resource

memory:
  under_request: Fine
  between_request_and_limit: May be reclaimed under pressure
  at_limit: OOM killed
  behavior: Incompressible resource

QoS Classes

Kubernetes assigns Quality of Service based on configuration:

Guaranteed

# Requests == Limits for all resources
resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

# Result: QoS = Guaranteed
# - Highest priority
# - Last to be evicted
# - Best for production workloads

Burstable

# Requests < Limits (or only one set)
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

# Result: QoS = Burstable
# - Can use more than requested when available
# - Evicted before Guaranteed pods
# - Good for variable workloads

BestEffort

# No requests or limits set
resources: {}

# Result: QoS = BestEffort
# - First to be evicted under pressure
# - No guaranteed resources
# - Only for truly optional workloads

Common Mistakes

No Limits

# Dangerous: No limits
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  # No limits - pod can consume unlimited resources

# Problem:
# - One pod can starve others
# - Memory leak crashes the node
# - CPU hog affects all neighbors

Requests Too Low

# Underprovisioned
resources:
  requests:
    cpu: "10m"     # Way too low for real work
    memory: "32Mi" # Unrealistic

# Problems:
# - Scheduler packs too many pods on nodes
# - Resource contention
# - Poor performance

Limits Too High

# Overprovisioned
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "4000m"   # 4 CPUs for a simple service?
    memory: "8Gi"  # 8GB for a stateless API?

# Problems:
# - Wasted cluster capacity
# - Higher costs
# - False sense of resource availability

Memory Limit Without Request

# Problematic
resources:
  limits:
    memory: "512Mi"
  # No memory request - defaults to limit

# Issue: May get scheduled on node that can't support it

Right-Sizing

Observation-Based Sizing

# Get actual usage
kubectl top pods -n production

# Over time
kubectl top pods -n production --containers | tee usage.log

VPA Recommendations

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Recommendations only

# Check recommendations
kubectl describe vpa my-app-vpa

# Shows:
# - Lower bound (minimum reasonable)
# - Target (recommended)
# - Upper bound (handle spikes)

Metrics-Based Analysis

# Average CPU usage
avg(rate(container_cpu_usage_seconds_total{
  pod=~"my-app-.*"
}[5m])) by (pod)

# Memory working set
avg(container_memory_working_set_bytes{
  pod=~"my-app-.*"
}) by (pod)

# P95 CPU
histogram_quantile(0.95,
  rate(container_cpu_usage_seconds_total{
    pod=~"my-app-.*"
  }[1h])
)

Sizing Strategy

strategy:
  requests:
    cpu: P75 of observed usage + 20% buffer
    memory: P95 of observed usage + 10% buffer

  limits:
    cpu: P99 of observed usage × 2 (or no limit)
    memory: Requests × 1.5 (or based on known max)

example:
  observed:
    cpu_p75: 80m
    cpu_p99: 150m
    memory_p95: 200Mi

  configuration:
    requests:
      cpu: "100m"    # 80m + 20%
      memory: "220Mi" # 200Mi + 10%
    limits:
      cpu: "300m"    # 150m × 2
      memory: "330Mi" # 220Mi × 1.5

Autoscaling

Horizontal Pod Autoscaler

Scale replicas based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Vertical Pod Autoscaler

Adjust resource requests automatically:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Automatically apply recommendations
  resourcePolicy:
    containerPolicies:
      - containerName: my-app
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2
          memory: 2Gi

LimitRange and ResourceQuota

Default Limits

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      min:
        cpu: "10m"
        memory: "32Mi"
      max:
        cpu: "2"
        memory: "2Gi"

Namespace Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Monitoring Resources

Key Metrics

efficiency_metrics:
  - CPU utilization vs request
  - Memory utilization vs request
  - Request vs limit ratio
  - OOM kill count
  - Throttling time

cluster_metrics:
  - Total requested vs allocatable
  - Namespace usage
  - Node utilization

Alerts

alerts:
  - name: HighMemoryUsage
    expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
    message: Pod approaching memory limit

  - name: CPUThrottling
    expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0
    message: Pod experiencing CPU throttling

  - name: OverProvisionedNamespace
    expr: sum(kube_pod_container_resource_requests{resource="cpu"}) / sum(kube_node_status_allocatable{resource="cpu"}) > 0.9
    message: Namespace approaching CPU request limit

Key Takeaways

Always set both requests and limits; defaults are dangerous
Requests are for scheduling; limits are for protection
CPU is throttled at limit; memory causes OOM kills
Guaranteed QoS (requests==limits) is safest for production
Right-size based on observed usage, not guesses
VPA provides recommendations; use them
HPA scales replicas; VPA adjusts individual pod resources
LimitRange enforces defaults and bounds per namespace
ResourceQuota controls total namespace consumption
Monitor utilization vs. requests to identify waste

Proper resource configuration is the foundation of efficient Kubernetes clusters. Get it right, and you’ll have stable, cost-effective infrastructure. Get it wrong, and you’ll have either outages or wasted money.