Docker crossed from development curiosity to production reality sometime in 2014. Two years later, the ecosystem has matured significantly, but running containers at scale still requires careful planning and operational discipline. Here’s what we’ve learned from deploying Docker across multiple production environments.
The Promise and Reality
Docker’s value proposition is compelling: consistent environments from development to production, rapid deployment, efficient resource utilization, and simplified dependency management. These benefits are real, but they come with operational complexity that isn’t immediately obvious.
The marketing materials show containers spinning up in milliseconds and developers shipping code with confidence. The reality involves wrestling with networking, debugging opaque failures, managing image sprawl, and building operational tooling that the Docker ecosystem hasn’t yet standardized.
None of this means Docker isn’t worth adopting. It means approaching it with realistic expectations and investing in the operational foundations that make it work.
Image Management
Keep Images Small
Every megabyte in your image is a megabyte that must be transferred on every deployment to every host. Large images slow deployments, consume bandwidth, and waste storage.
Start with minimal base images. Alpine Linux provides a functional base in roughly 5MB. Compare that to Ubuntu’s 188MB or the default Debian image at 125MB. For most applications, Alpine’s musl libc and BusyBox utilities are sufficient.
Use multi-stage builds to separate build dependencies from runtime. Your Node.js application doesn’t need webpack in production; your Go service doesn’t need the compiler. Build in one stage, copy artifacts to a minimal runtime stage.
FROM golang:1.6 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o service .
FROM alpine:3.4
COPY --from=builder /app/service /service
CMD ["/service"]
Layer Caching Matters
Docker builds images in layers, caching each layer for reuse. Order your Dockerfile instructions from least to most frequently changing. Dependencies change less often than application code, so install dependencies before copying source files.
# Dependencies first (changes rarely)
COPY package.json .
RUN npm install
# Application code last (changes frequently)
COPY src/ src/
Tag Immutably
The latest tag is convenient and dangerous. In development, it ensures you’re always running current code. In production, it makes rollbacks impossible and introduces deployment non-determinism.
Tag images with commit SHAs, build numbers, or semantic versions. When something breaks, you need to deploy exactly the previous version, not whatever latest happens to point to now.
Networking
Container networking is where most production complexity lives. Docker’s default bridge networking works for simple cases but breaks down in multi-host environments.
Service Discovery
Containers get IP addresses dynamically. Hard-coding addresses is impossible; you need service discovery. Options include:
DNS-based discovery. Docker’s built-in DNS resolves container names to IP addresses within a network. Simple and sufficient for single-host deployments.
External service discovery. Consul, etcd, or ZooKeeper provide distributed service registries. Containers register on startup, deregister on shutdown, and query the registry to find dependencies.
Orchestrator-provided discovery. Kubernetes, Docker Swarm, and Mesos provide built-in service discovery. If you’re using an orchestrator, leverage its native capabilities.
Overlay Networks
Multi-host networking requires overlay networks that span physical hosts. Docker’s overlay driver creates encrypted tunnels between hosts, presenting a flat network to containers regardless of physical topology.
Overlay networks add latency—typically 1-2ms per hop. For most applications, this is negligible. For latency-sensitive workloads, measure carefully.
Host Networking
For maximum performance, containers can share the host’s network namespace directly. This eliminates NAT overhead and network virtualization but sacrifices isolation. Use sparingly for specific performance-critical services.
Storage
Containers are ephemeral by design. Data written inside a container disappears when the container stops. Persistent data requires explicit storage configuration.
Volumes for Persistent Data
Docker volumes exist outside container lifecycles. Mount volumes for databases, uploaded files, and any data that must survive container restarts.
Name your volumes explicitly rather than letting Docker generate random names. postgres-data is easier to manage than a3f2b9c4d8e1.
Volume Drivers
For multi-host deployments, local volumes don’t suffice—containers need access to the same data regardless of which host they run on. Volume drivers like Flocker, REX-Ray, and vendor-specific plugins integrate with networked storage.
Choose storage backends based on performance requirements. Block storage (EBS, Ceph RBD) provides good performance for databases. Object storage (S3, Swift) suits static assets and backups. NFS and distributed filesystems work for shared configuration.
Database Considerations
Running databases in containers is controversial. The arguments against: databases need careful resource management, persistent storage, and operational attention that containers complicate. The arguments for: consistency, portability, and simplified provisioning.
Our experience: stateless applications benefit most from containerization. Databases can run in containers, but require careful volume configuration, resource limits, and operational procedures that account for container orchestration behaviors.
Logging and Monitoring
Centralized Logging
Container logs default to stdout/stderr, captured by Docker’s logging drivers. In production, you need these logs aggregated centrally.
Configure Docker to forward logs to a log aggregator: ELK stack, Graylog, Splunk, or cloud logging services. Include container metadata—image name, container ID, host—in structured log entries.
Avoid logging to files inside containers. It wastes ephemeral storage, complicates log rotation, and loses data when containers restart.
Metrics Collection
Monitor both host metrics (CPU, memory, disk, network at the host level) and container metrics (resource usage per container). Docker’s stats API exposes per-container metrics; tools like cAdvisor, Prometheus, and Datadog collect and aggregate them.
Set resource limits and alert when containers approach them. A container without memory limits can consume all host memory, affecting other containers. A container with limits will be OOM-killed, which is preferable to impacting neighbors.
Health Checks
Docker 1.12 introduced native health checks. Define checks in your Dockerfile:
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost/health || exit 1
Orchestrators use health check status for service discovery and automatic recovery. An unhealthy container gets removed from load balancer pools; a persistently unhealthy container gets restarted.
Security
Run as Non-Root
By default, processes inside containers run as root. If a container is compromised, the attacker has root access within the container’s namespaces. While container isolation limits the blast radius, running as non-root adds defense in depth.
Create application users in your Dockerfile:
RUN adduser -D appuser
USER appuser
Read-Only Filesystems
If your application doesn’t need to write to the filesystem, run with --read-only. This prevents attackers from modifying container contents, even if they gain code execution.
For applications that need to write to specific locations (temp files, caches), mount writable volumes only where necessary.
Image Scanning
Container images inherit vulnerabilities from their base images and installed packages. Regularly scan images for known CVEs using tools like Clair, Trivy, or commercial scanning services.
Automate scanning in your CI pipeline. Block deployments of images with critical vulnerabilities. Rebuild and redeploy when base images publish security updates.
Registry Security
Container registries are high-value targets—compromise the registry, compromise every deployment. Run private registries with TLS, authentication, and access controls. Prefer managed registries (Docker Hub, ECR, GCR) that handle security operations.
Sign images cryptographically to verify provenance. Docker Content Trust, built on Notary, provides image signing and verification.
Orchestration
Running a few containers on a single host requires minimal tooling. Running hundreds of containers across dozens of hosts requires orchestration.
The Orchestration Landscape
The container orchestration space is fragmented in 2016. Major options include:
Docker Swarm. Native Docker clustering. Simpler than alternatives, less feature-rich. Good for Docker-native workflows and teams that want minimal operational overhead.
Kubernetes. Google’s container orchestrator, donated to the CNCF. Most feature-rich and complex. Strong community momentum. Steeper learning curve but more capability.
Mesos with Marathon. Data center operating system with container scheduling. Proven at large scale (Twitter, Airbnb). More complex to operate but handles mixed workloads well.
Amazon ECS. AWS-native container orchestration. Deep AWS integration, less portable. Good choice if you’re AWS-committed and want managed infrastructure.
Choosing an Orchestrator
For teams starting with containers, Docker Swarm’s simplicity is appealing. For teams planning significant scale, Kubernetes’ feature set justifies the learning investment. For teams with existing Mesos infrastructure, Marathon adds containers without additional operational overhead.
We’ll likely see consolidation in this space. The market is moving toward Kubernetes as the standard, but it’s early to declare winners.
Deployment Strategies
Blue-Green Deployments
Run two identical production environments: blue (current) and green (new). Deploy to green, verify health, switch traffic from blue to green. If problems emerge, switch back instantly.
With containers, blue-green is straightforward: deploy new containers, verify health checks pass, update load balancer configuration, drain old containers.
Rolling Updates
Update containers incrementally: start new containers, verify health, stop old containers, repeat until complete. Maintains capacity throughout deployment but extends the deployment window.
Orchestrators automate rolling updates. Configure the parallelism (how many containers update simultaneously) and health check thresholds.
Canary Releases
Route a small percentage of traffic to new containers while the majority continues hitting the current version. Monitor error rates and latency; if the new version performs well, gradually increase its traffic share.
Canary releases require sophisticated traffic management—typically a service mesh or programmable load balancer.
Lessons Learned
After two years, here’s what we know:
Start simple. Run containers on a single host before attempting multi-host orchestration. Master image building before optimizing layer caching. Understand Docker networking before adding overlay complexity.
Invest in observability. Container environments are dynamic. Hosts change, containers move, IP addresses rotate. Without strong logging, metrics, and tracing, debugging production issues becomes impossible.
Treat images as artifacts. Build once, deploy everywhere. The same image runs in development, staging, and production. Configuration varies through environment variables, not image modifications.
Plan for failure. Containers crash. Hosts fail. Networks partition. Design applications to handle container restarts, implement health checks, and let orchestrators handle recovery.
Security isn’t optional. Container isolation isn’t perfect. Defense in depth—minimal images, non-root users, read-only filesystems, network policies—limits the impact of compromises.
Docker has transformed how we build and deploy software. The transformation isn’t free—it requires new skills, new tooling, and new operational practices. But for teams willing to invest, the benefits compound over time.
Key Takeaways
- Keep images small using Alpine base images and multi-stage builds
- Implement centralized logging and metrics collection before going to production
- Run containers as non-root users with read-only filesystems where possible
- Choose orchestration tools based on your team’s scale and operational maturity
- Treat containers as ephemeral; persist data through explicit volume configuration