“It works on my machine” often translates to “I don’t understand networking.” Application developers increasingly work with networked systems—microservices, APIs, cloud infrastructure—but many lack fundamental networking knowledge. When things fail, they’re helpless.
Understanding networking won’t make you a network engineer. But it will help you debug problems, design better systems, and communicate effectively with infrastructure teams. Here are the fundamentals every developer should know.
The Network Stack
Network communication happens in layers. Each layer handles specific concerns and provides services to layers above.
Physical Layer
The actual transmission medium: ethernet cables, fiber optics, wireless signals. Developers rarely interact with this layer directly, but understanding that physical constraints exist matters. Light speed limits latency. Bandwidth has physical limits.
Link Layer (Ethernet)
Handles communication on a local network segment. MAC addresses identify devices. Switches forward frames between devices on the same network.
Key concepts:
- MAC address: Hardware address identifying a network interface (e.g.,
00:1A:2B:3C:4D:5E) - Frame: Unit of data at this layer
- MTU (Maximum Transmission Unit): Largest frame size the network can handle (typically 1500 bytes for Ethernet)
Network Layer (IP)
Handles routing between networks. IP addresses identify hosts across the internet. Routers forward packets between networks.
Key concepts:
- IP address: Logical address identifying a host (e.g.,
192.168.1.100for IPv4) - Subnet: Range of IP addresses on the same network (e.g.,
192.168.1.0/24) - Routing: Determining the path packets take between networks
- NAT: Network Address Translation, allowing multiple hosts to share one public IP
Transport Layer (TCP/UDP)
Handles end-to-end communication between applications. Port numbers identify specific services on a host.
TCP (Transmission Control Protocol):
- Connection-oriented
- Reliable delivery (retransmits lost packets)
- Ordered delivery (packets arrive in sequence)
- Flow control (prevents overwhelming receivers)
- Used for HTTP, databases, most application protocols
UDP (User Datagram Protocol):
- Connectionless
- Unreliable (packets may be lost)
- Unordered (packets may arrive out of sequence)
- No flow control
- Used for DNS, video streaming, real-time games
Application Layer
Application-specific protocols built on TCP or UDP: HTTP, SMTP, FTP, PostgreSQL protocol, etc. This is where developers spend most time.
TCP Deep Dive
Most application communication uses TCP. Understanding TCP explains many application behaviors.
Three-Way Handshake
TCP connections begin with a handshake:
- Client sends SYN (synchronize)
- Server responds with SYN-ACK (synchronize-acknowledge)
- Client responds with ACK (acknowledge)
This takes one round-trip time (RTT) before data can flow. For distant servers, RTT might be 100-200ms—noticeable latency just to establish a connection.
Connection State
TCP connections have states:
- ESTABLISHED: Active connection
- TIME_WAIT: Connection closed, waiting before releasing resources (can cause port exhaustion if you create many short-lived connections)
- CLOSE_WAIT: Waiting for application to close (often indicates application bugs)
Check connection states with netstat or ss:
ss -tan | grep ESTABLISHED
Congestion Control
TCP adjusts sending rate based on network conditions. New connections start slowly (slow start), increasing rate until packet loss indicates congestion.
This means:
- New connections are slower than established ones
- Connection reuse (keepalive) improves performance
- Packet loss dramatically affects throughput
Nagle’s Algorithm and Delayed ACK
By default, TCP batches small packets (Nagle’s algorithm) and delays acknowledgments. This improves efficiency but adds latency.
For latency-sensitive applications (interactive protocols, real-time systems), you might disable Nagle’s algorithm:
socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
DNS
Domain Name System translates hostnames to IP addresses. Every networked application depends on DNS.
DNS Resolution
When you connect to api.example.com:
- Application calls resolver library
- Resolver checks local cache
- If not cached, queries configured DNS server
- DNS server may query other servers (root, TLD, authoritative)
- Response cached according to TTL (Time To Live)
DNS Issues
Common DNS problems:
- Slow resolution: Poor DNS server performance or connectivity
- Stale cache: TTL too long, serving outdated records
- DNS failure: Can’t resolve hostnames at all
Check DNS resolution:
dig api.example.com
nslookup api.example.com
DNS and Application Design
- Cache DNS responses appropriately (but respect TTL)
- Handle DNS failures gracefully
- Consider DNS as a failure point in dependency chains
- Use IP addresses directly for truly critical paths (but lose flexibility)
HTTP
Most applications communicate via HTTP. Understanding HTTP’s network behavior helps debug issues.
Connection Reuse
HTTP/1.1 introduced persistent connections: multiple requests over one TCP connection. This avoids handshake overhead per request.
HTTP/2 multiplexes many requests over one connection, further improving efficiency.
Connection reuse matters most for high-latency connections (mobile, distant servers).
HTTP Latency Anatomy
A simple HTTP request involves:
- DNS resolution (if hostname not cached)
- TCP handshake (if new connection)
- TLS handshake (if HTTPS, adds 1-2 RTT)
- Request transmission
- Server processing
- Response transmission
For a new HTTPS connection to a server 100ms away:
- DNS: 100ms (if not cached)
- TCP handshake: 100ms
- TLS handshake: 200ms
- Request/response: 200ms
- Total: 600ms for one request
With connection reuse:
- Request/response: 200ms
- Total: 200ms
This is why connection pooling and keepalive matter.
Debugging Network Issues
When network things break, systematic debugging helps.
Check Connectivity
Basic connectivity tests:
# Can you reach the host at all?
ping api.example.com
# Can you reach the specific port?
telnet api.example.com 443
nc -zv api.example.com 443
DNS Debugging
Verify DNS resolution:
# What IP does the hostname resolve to?
dig api.example.com
# Is DNS resolution slow?
time dig api.example.com
Connection Debugging
Examine connection state:
# Active connections
ss -tan
# Connections to specific host
ss -tan dst api.example.com
# Connections in TIME_WAIT (potential port exhaustion)
ss -tan state time-wait | wc -l
Traffic Capture
For deep debugging, capture network traffic:
# Capture HTTP traffic
tcpdump -i any -A port 80
# Capture and write to file for Wireshark analysis
tcpdump -i any -w capture.pcap host api.example.com
Traffic captures reveal exactly what’s happening on the wire: malformed requests, unexpected responses, timing issues.
Application-Level Tools
Use application-level tools too:
# HTTP request with timing breakdown
curl -w "@curl-format.txt" -o /dev/null -s https://api.example.com
# Where curl-format.txt contains:
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_starttransfer: %{time_starttransfer}\n
# time_total: %{time_total}\n
Designing for Networks
Understanding networking influences application design:
Connection Pooling
Reuse TCP connections rather than creating new ones per request. Most HTTP client libraries support connection pooling; ensure it’s enabled and configured appropriately.
Timeouts
Set explicit timeouts for all network operations:
- Connection timeout: How long to wait for connection establishment
- Read timeout: How long to wait for response data
- Total timeout: Maximum time for complete operation
Operations without timeouts can hang indefinitely when networks fail.
Retries with Backoff
Networks have transient failures. Implement retries with exponential backoff:
def request_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
return requests.get(url, timeout=5)
except requests.RequestException:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # 1, 2, 4 seconds
Graceful Degradation
Design applications to function (perhaps with reduced capability) when network dependencies fail. Cache responses, provide fallbacks, and fail fast when dependencies are unavailable.
Key Takeaways
- Network communication happens in layers; each layer has specific responsibilities
- TCP provides reliable, ordered delivery at the cost of connection overhead and latency
- DNS is a dependency for virtually all networked applications; understand and handle DNS failures
- HTTP performance depends heavily on connection reuse and minimizing round trips
- Systematic debugging (connectivity, DNS, connections, traffic capture) finds network issues
- Design for networks: connection pooling, timeouts, retries, and graceful degradation