Kubernetes resource management seems simple: set requests and limits. In practice, it’s one of the most misunderstood aspects of Kubernetes. Bad settings cause OOMKills, throttling, node pressure, and wasted money.
Here’s how to manage Kubernetes resources properly.
Understanding Resources
CPU vs Memory
resource_characteristics:
cpu:
compressible: true
exceeded_behavior: Throttling (process slows down)
unit: Millicores (1000m = 1 core)
impact: Performance degradation, not crashes
memory:
compressible: false
exceeded_behavior: OOMKilled (process terminated)
unit: Bytes (Mi, Gi)
impact: Container restarts, potential data loss
Requests vs Limits
requests:
purpose: Scheduling and guaranteed resources
guarantee: Always available to container
scheduler: Uses for placement decisions
formula: Sum(requests) <= Node allocatable
limits:
purpose: Maximum resource usage
behavior:
cpu: Throttled when exceeded
memory: OOMKilled when exceeded
formula: Can exceed node capacity (overcommit)
Setting Values
CPU Settings
# Typical patterns
cpu_settings:
cpu_bound_workload:
request: "500m"
limit: null # Often no limit is better
rationale: Let it use available CPU
latency_sensitive:
request: "1000m"
limit: "1000m" # Guaranteed CPU
rationale: Prevent noisy neighbor throttling
background_job:
request: "100m"
limit: null
rationale: Low priority, use spare CPU
# CPU limit considered harmful?
no_cpu_limit_argument:
problem: CPU limits cause throttling even when CPU is available
evidence:
- Container gets throttled
- Latency spikes at throttle boundaries
- CPU sits idle while containers throttle
recommendation:
- Set requests accurately
- Consider omitting CPU limits
- Use resource quotas to prevent runaway
- Monitor and adjust
Memory Settings
# Memory requires limits
memory_settings:
always_set_both:
request: "256Mi"
limit: "512Mi" # 2x request typical
rationale: Prevent OOMKills, allow burst
equal_for_predictable:
request: "512Mi"
limit: "512Mi"
rationale: Predictable, QoS Guaranteed
sized_by_measurement:
approach: Profile actual usage
request: P90 actual usage
limit: Maximum observed + buffer
Measuring Actual Usage
Prometheus Queries
# Average CPU usage
avg(rate(container_cpu_usage_seconds_total{
namespace="production",
container!="POD"
}[5m])) by (pod)
# P95 memory usage
quantile(0.95,
container_memory_working_set_bytes{
namespace="production",
container!="POD"
}
) by (pod)
# CPU throttling percentage
sum(rate(container_cpu_cfs_throttled_periods_total[5m]))
/
sum(rate(container_cpu_cfs_periods_total[5m]))
* 100
VPA Recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Just recommend
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
# Get VPA recommendations
kubectl describe vpa my-app-vpa
# Look for:
# Target: Cpu: 200m, Memory: 300Mi
# Lower Bound: Cpu: 50m, Memory: 100Mi
# Upper Bound: Cpu: 500m, Memory: 600Mi
Quality of Service
QoS Classes
qos_classes:
guaranteed:
condition: requests == limits (CPU and memory)
priority: Highest (last to be evicted)
use_case: Critical workloads
burstable:
condition: requests < limits
priority: Medium
use_case: Most workloads
besteffort:
condition: No requests or limits
priority: Lowest (first evicted)
use_case: Never in production
# Guaranteed QoS
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m" # Equal to request
memory: "256Mi" # Equal to request
Node Pressure and Eviction
eviction_thresholds:
memory_available:
soft: "100Mi" # Graceful eviction starts
hard: "50Mi" # Immediate eviction
nodefs_available:
soft: "10%"
hard: "5%"
eviction_order:
1. BestEffort pods
2. Burstable pods exceeding requests
3. Burstable pods within requests
4. Guaranteed pods (last resort)
Resource Quotas
# Namespace-level limits
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: team-a
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"
pods: "50"
LimitRange Defaults
# Default resources for pods without specifications
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
spec:
limits:
- default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "50m"
memory: "64Mi"
type: Container
Common Problems
OOMKilled
oomkilled_diagnosis:
check_events:
command: kubectl describe pod <pod>
look_for: "OOMKilled"
check_metrics:
query: container_memory_working_set_bytes vs limit
solutions:
- Increase memory limit
- Fix memory leak in application
- Profile actual memory usage
CPU Throttling
cpu_throttling_diagnosis:
metrics:
throttled: container_cpu_cfs_throttled_periods_total
total: container_cpu_cfs_periods_total
solutions:
- Increase CPU limit
- Remove CPU limit entirely
- Use more replicas instead
Pending Pods
pending_diagnosis:
check_events:
command: kubectl describe pod <pod>
look_for: "Insufficient cpu/memory"
solutions:
- Reduce requests
- Add nodes
- Enable cluster autoscaler
- Check node taints and tolerations
Key Takeaways
- CPU is compressible (throttling); memory is not (OOMKill)
- Set requests based on actual measured usage
- Consider omitting CPU limits (throttling is often worse)
- Always set memory limits (prevent runaway consumption)
- VPA provides recommendations based on actual usage
- QoS class determines eviction priority
- Use LimitRange for default values
- Use ResourceQuota to limit namespace usage
- Monitor for throttling and OOMKills
- Profile actual usage before setting values
Resource management is iterative. Measure, adjust, repeat.