Kubernetes resource management seems simple: set requests and limits. In practice, it’s one of the most commonly misconfigured aspects of Kubernetes. Too low, and pods get killed or starved. Too high, and you waste money on unused capacity.
Here’s how to manage Kubernetes resources effectively.
Understanding Resources
Requests vs Limits
resources:
requests:
cpu: "100m" # Guaranteed minimum
memory: "128Mi" # Used for scheduling
limits:
cpu: "500m" # Maximum allowed (throttled)
memory: "256Mi" # Maximum allowed (OOM killed)
Requests: What the scheduler uses to place pods Limits: Maximum the pod can use
How They Work
┌─────────────────────────────────────────────────────────────────┐
│ Node │
│ Allocatable: 4 CPU, 16Gi Memory │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Pod A │ │ Pod B │ │
│ │ Request: 1 CPU │ │ Request: 0.5 CPU │ │
│ │ Limit: 2 CPU │ │ Limit: 1 CPU │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
│ Requested: 1.5 CPU (scheduling considers this) │
│ Available for scheduling: 2.5 CPU │
│ Actual limit: 3 CPU (can burst if available) │
│ │
└─────────────────────────────────────────────────────────────────┘
CPU vs Memory Behavior
cpu:
under_request: Gets guaranteed share
between_request_and_limit: Gets more if available
at_limit: Throttled (not killed)
behavior: Compressible resource
memory:
under_request: Fine
between_request_and_limit: May be reclaimed under pressure
at_limit: OOM killed
behavior: Incompressible resource
QoS Classes
Kubernetes assigns Quality of Service based on configuration:
Guaranteed
# Requests == Limits for all resources
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "256Mi"
# Result: QoS = Guaranteed
# - Highest priority
# - Last to be evicted
# - Best for production workloads
Burstable
# Requests < Limits (or only one set)
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Result: QoS = Burstable
# - Can use more than requested when available
# - Evicted before Guaranteed pods
# - Good for variable workloads
BestEffort
# No requests or limits set
resources: {}
# Result: QoS = BestEffort
# - First to be evicted under pressure
# - No guaranteed resources
# - Only for truly optional workloads
Common Mistakes
No Limits
# Dangerous: No limits
resources:
requests:
cpu: "100m"
memory: "128Mi"
# No limits - pod can consume unlimited resources
# Problem:
# - One pod can starve others
# - Memory leak crashes the node
# - CPU hog affects all neighbors
Requests Too Low
# Underprovisioned
resources:
requests:
cpu: "10m" # Way too low for real work
memory: "32Mi" # Unrealistic
# Problems:
# - Scheduler packs too many pods on nodes
# - Resource contention
# - Poor performance
Limits Too High
# Overprovisioned
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "4000m" # 4 CPUs for a simple service?
memory: "8Gi" # 8GB for a stateless API?
# Problems:
# - Wasted cluster capacity
# - Higher costs
# - False sense of resource availability
Memory Limit Without Request
# Problematic
resources:
limits:
memory: "512Mi"
# No memory request - defaults to limit
# Issue: May get scheduled on node that can't support it
Right-Sizing
Observation-Based Sizing
# Get actual usage
kubectl top pods -n production
# Over time
kubectl top pods -n production --containers | tee usage.log
VPA Recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommendations only
# Check recommendations
kubectl describe vpa my-app-vpa
# Shows:
# - Lower bound (minimum reasonable)
# - Target (recommended)
# - Upper bound (handle spikes)
Metrics-Based Analysis
# Average CPU usage
avg(rate(container_cpu_usage_seconds_total{
pod=~"my-app-.*"
}[5m])) by (pod)
# Memory working set
avg(container_memory_working_set_bytes{
pod=~"my-app-.*"
}) by (pod)
# P95 CPU
histogram_quantile(0.95,
rate(container_cpu_usage_seconds_total{
pod=~"my-app-.*"
}[1h])
)
Sizing Strategy
strategy:
requests:
cpu: P75 of observed usage + 20% buffer
memory: P95 of observed usage + 10% buffer
limits:
cpu: P99 of observed usage × 2 (or no limit)
memory: Requests × 1.5 (or based on known max)
example:
observed:
cpu_p75: 80m
cpu_p99: 150m
memory_p95: 200Mi
configuration:
requests:
cpu: "100m" # 80m + 20%
memory: "220Mi" # 200Mi + 10%
limits:
cpu: "300m" # 150m × 2
memory: "330Mi" # 220Mi × 1.5
Autoscaling
Horizontal Pod Autoscaler
Scale replicas based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Vertical Pod Autoscaler
Adjust resource requests automatically:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Automatically apply recommendations
resourcePolicy:
containerPolicies:
- containerName: my-app
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 2Gi
LimitRange and ResourceQuota
Default Limits
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
min:
cpu: "10m"
memory: "32Mi"
max:
cpu: "2"
memory: "2Gi"
Namespace Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
Monitoring Resources
Key Metrics
efficiency_metrics:
- CPU utilization vs request
- Memory utilization vs request
- Request vs limit ratio
- OOM kill count
- Throttling time
cluster_metrics:
- Total requested vs allocatable
- Namespace usage
- Node utilization
Alerts
alerts:
- name: HighMemoryUsage
expr: container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.9
message: Pod approaching memory limit
- name: CPUThrottling
expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0
message: Pod experiencing CPU throttling
- name: OverProvisionedNamespace
expr: sum(kube_pod_container_resource_requests{resource="cpu"}) / sum(kube_node_status_allocatable{resource="cpu"}) > 0.9
message: Namespace approaching CPU request limit
Key Takeaways
- Always set both requests and limits; defaults are dangerous
- Requests are for scheduling; limits are for protection
- CPU is throttled at limit; memory causes OOM kills
- Guaranteed QoS (requests==limits) is safest for production
- Right-size based on observed usage, not guesses
- VPA provides recommendations; use them
- HPA scales replicas; VPA adjusts individual pod resources
- LimitRange enforces defaults and bounds per namespace
- ResourceQuota controls total namespace consumption
- Monitor utilization vs. requests to identify waste
Proper resource configuration is the foundation of efficient Kubernetes clusters. Get it right, and you’ll have stable, cost-effective infrastructure. Get it wrong, and you’ll have either outages or wasted money.