Kubernetes Resource Management Done Right

September 5, 2022

Kubernetes resource management seems simple: set requests and limits. In practice, it’s one of the most misunderstood aspects of Kubernetes. Bad settings cause OOMKills, throttling, node pressure, and wasted money.

Here’s how to manage Kubernetes resources properly.

Understanding Resources

CPU vs Memory

resource_characteristics:
  cpu:
    compressible: true
    exceeded_behavior: Throttling (process slows down)
    unit: Millicores (1000m = 1 core)
    impact: Performance degradation, not crashes

  memory:
    compressible: false
    exceeded_behavior: OOMKilled (process terminated)
    unit: Bytes (Mi, Gi)
    impact: Container restarts, potential data loss

Requests vs Limits

requests:
  purpose: Scheduling and guaranteed resources
  guarantee: Always available to container
  scheduler: Uses for placement decisions
  formula: Sum(requests) <= Node allocatable

limits:
  purpose: Maximum resource usage
  behavior:
    cpu: Throttled when exceeded
    memory: OOMKilled when exceeded
  formula: Can exceed node capacity (overcommit)

Setting Values

CPU Settings

# Typical patterns
cpu_settings:
  cpu_bound_workload:
    request: "500m"
    limit: null  # Often no limit is better
    rationale: Let it use available CPU

  latency_sensitive:
    request: "1000m"
    limit: "1000m"  # Guaranteed CPU
    rationale: Prevent noisy neighbor throttling

  background_job:
    request: "100m"
    limit: null
    rationale: Low priority, use spare CPU
# CPU limit considered harmful?
no_cpu_limit_argument:
  problem: CPU limits cause throttling even when CPU is available
  evidence:
    - Container gets throttled
    - Latency spikes at throttle boundaries
    - CPU sits idle while containers throttle

  recommendation:
    - Set requests accurately
    - Consider omitting CPU limits
    - Use resource quotas to prevent runaway
    - Monitor and adjust

Memory Settings

# Memory requires limits
memory_settings:
  always_set_both:
    request: "256Mi"
    limit: "512Mi"  # 2x request typical
    rationale: Prevent OOMKills, allow burst

  equal_for_predictable:
    request: "512Mi"
    limit: "512Mi"
    rationale: Predictable, QoS Guaranteed

  sized_by_measurement:
    approach: Profile actual usage
    request: P90 actual usage
    limit: Maximum observed + buffer

Measuring Actual Usage

Prometheus Queries

# Average CPU usage
avg(rate(container_cpu_usage_seconds_total{
  namespace="production",
  container!="POD"
}[5m])) by (pod)

# P95 memory usage
quantile(0.95,
  container_memory_working_set_bytes{
    namespace="production",
    container!="POD"
  }
) by (pod)

# CPU throttling percentage
sum(rate(container_cpu_cfs_throttled_periods_total[5m]))
/
sum(rate(container_cpu_cfs_periods_total[5m]))
* 100

VPA Recommendations

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Just recommend
  resourcePolicy:
    containerPolicies:
      - containerName: app
        minAllowed:
          cpu: "50m"
          memory: "64Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"
# Get VPA recommendations
kubectl describe vpa my-app-vpa

# Look for:
# Target:     Cpu: 200m, Memory: 300Mi
# Lower Bound: Cpu: 50m, Memory: 100Mi
# Upper Bound: Cpu: 500m, Memory: 600Mi

Quality of Service

QoS Classes

qos_classes:
  guaranteed:
    condition: requests == limits (CPU and memory)
    priority: Highest (last to be evicted)
    use_case: Critical workloads

  burstable:
    condition: requests < limits
    priority: Medium
    use_case: Most workloads

  besteffort:
    condition: No requests or limits
    priority: Lowest (first evicted)
    use_case: Never in production
# Guaranteed QoS
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: app
      resources:
        requests:
          cpu: "500m"
          memory: "256Mi"
        limits:
          cpu: "500m"      # Equal to request
          memory: "256Mi"  # Equal to request

Node Pressure and Eviction

eviction_thresholds:
  memory_available:
    soft: "100Mi"  # Graceful eviction starts
    hard: "50Mi"   # Immediate eviction

  nodefs_available:
    soft: "10%"
    hard: "5%"

eviction_order:
  1. BestEffort pods
  2. Burstable pods exceeding requests
  3. Burstable pods within requests
  4. Guaranteed pods (last resort)

Resource Quotas

# Namespace-level limits
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "50"

LimitRange Defaults

# Default resources for pods without specifications
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      max:
        cpu: "4"
        memory: "8Gi"
      min:
        cpu: "50m"
        memory: "64Mi"
      type: Container

Common Problems

OOMKilled

oomkilled_diagnosis:
  check_events:
    command: kubectl describe pod <pod>
    look_for: "OOMKilled"

  check_metrics:
    query: container_memory_working_set_bytes vs limit

  solutions:
    - Increase memory limit
    - Fix memory leak in application
    - Profile actual memory usage

CPU Throttling

cpu_throttling_diagnosis:
  metrics:
    throttled: container_cpu_cfs_throttled_periods_total
    total: container_cpu_cfs_periods_total

  solutions:
    - Increase CPU limit
    - Remove CPU limit entirely
    - Use more replicas instead

Pending Pods

pending_diagnosis:
  check_events:
    command: kubectl describe pod <pod>
    look_for: "Insufficient cpu/memory"

  solutions:
    - Reduce requests
    - Add nodes
    - Enable cluster autoscaler
    - Check node taints and tolerations

Key Takeaways

Resource management is iterative. Measure, adjust, repeat.