When everyone went remote, VPNs became critical infrastructure overnight. Systems designed for 20% remote usage suddenly needed to handle 100%. Many organizations learned hard lessons about VPN scalability.
Here’s what we learned and how to do better.
What Broke
Capacity Assumptions
before:
design_assumption: 20% concurrent users
typical_load: 500 connections
peak_load: 1000 connections
provisioned_capacity: 1500 connections
after:
actual_requirement: 100% workforce
needed_connections: 5000
available: 1500
result: Degraded performance, failed connections
Bandwidth Bottlenecks
Traditional VPN architecture:
┌─────────────────────┐
│ Corporate │
Remote Users ───► VPN Gateway ──────│ Network │
│ │ (All traffic) │
│ └─────────────────────┘
│ │
│ ▼
└───────────────────► Internet
(Backhauled) (Cloud services)
Problem: All traffic, including cloud, routes through corporate
Result: Massive bandwidth consumption at VPN gateway
Single Points of Failure
failures_observed:
- VPN gateway hardware failure
- License server unreachable
- Authentication service overloaded
- Certificate expiration
- ISP issues at data center
Emergency Scaling
Quick Wins
What organizations did immediately:
capacity:
- Added VPN licenses
- Deployed additional gateways
- Upgraded hardware
- Added bandwidth
optimization:
- Enabled split tunneling (carefully)
- Moved cloud services off VPN
- Staggered work hours
- Prioritized critical users
redundancy:
- Added backup gateways
- Multi-ISP connectivity
- Geographic distribution
Split Tunneling
Route only corporate traffic through VPN:
# Before: All traffic through VPN
full_tunnel:
bandwidth_per_user: High (all traffic)
cloud_latency: High (backhauled)
vpn_load: Very high
# After: Corporate only through VPN
split_tunnel:
vpn_traffic:
- Internal applications
- Corporate databases
- Admin interfaces
direct_internet:
- Microsoft 365
- Salesforce
- Zoom
- General browsing
result:
bandwidth_reduction: 60-80%
user_experience: Improved
security_tradeoff: Requires endpoint protection
Cloud VPN Options
aws_client_vpn:
- Managed service
- Scales automatically
- Pay per connection
- Integrates with VPC
azure_vpn_gateway:
- Native Azure integration
- Multiple SKUs for scaling
- Site-to-site and point-to-site
third_party_cloud:
- Zscaler Private Access
- Cloudflare Access
- Prisma Access (Palo Alto)
Better Architecture
Regional Distribution
┌─────────────────┐
│ Corporate DC │
└────────┬────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Gateway │ │ Gateway │ │ Gateway │
│ US-West │ │ US-East │ │ EU │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
US-West US-East EU Users
Users Users
High Availability
ha_configuration:
load_balancing:
- DNS round-robin
- Global load balancer
- Anycast routing
redundancy:
- Active-active gateways
- Automatic failover
- Health monitoring
geographic:
- Multiple regions
- User-closest routing
- Data sovereignty compliance
Monitoring and Alerting
metrics_to_watch:
capacity:
- Active connections vs. limit
- Bandwidth utilization
- CPU/memory on gateways
performance:
- Connection time
- Throughput per user
- Latency
availability:
- Gateway health
- Authentication success rate
- Connection drops
alerts:
- capacity > 70%: Warning
- capacity > 85%: Critical
- gateway_down: Page immediately
- auth_failure_rate > 5%: Investigate
What Comes Next
Zero Trust Transition
VPN is a bridge, not a destination:
migration_path:
phase_1_immediate:
- Scale existing VPN
- Split tunneling
- Add capacity
phase_2_short_term:
- Cloud apps via zero trust proxy
- VPN for legacy only
- Improved monitoring
phase_3_long_term:
- Identity-based access everywhere
- VPN deprecated
- Zero trust architecture
Application-Level Access
Move away from network-level trust:
current_model:
VPN → Full network access → Applications
target_model:
Identity → Policy → Specific application access
benefits:
- No broad network access
- Per-app authorization
- Better visibility
- Easier to scale
Cloud-First Architecture
Design for remote access from the start:
principles:
- Applications accessible from anywhere
- Identity is the perimeter
- Assume hostile network
- Encrypt everything
implementation:
- SaaS where possible
- Cloud-native applications
- Zero trust access
- Strong authentication everywhere
Lessons Learned
Planning
capacity_planning:
- Plan for 100% remote, not 20%
- Test at expected scale
- Have headroom for growth
- Regular capacity reviews
redundancy:
- No single points of failure
- Test failover regularly
- Geographic distribution
- Multiple ISPs
Operations
monitoring:
- Real-time capacity visibility
- Performance metrics
- User experience tracking
- Proactive alerting
response:
- Runbooks for common issues
- Escalation procedures
- Vendor support contracts
- Capacity addition process
Architecture
design_principles:
- Split tunnel where possible
- Cloud services direct
- Regional distribution
- Scalable infrastructure
future_direction:
- Zero trust over VPN
- Application-level access
- Identity-based security
- Cloud-native architecture
Key Takeaways
- VPNs designed for partial remote use couldn’t handle 100%
- Split tunneling reduces VPN load significantly; requires endpoint protection
- Geographic distribution improves performance and redundancy
- Cloud VPN services scale better than on-prem hardware
- Monitor capacity proactively; don’t wait for complaints
- Plan for peak, not average; have headroom
- VPN is transitional; zero trust is the direction
- Application-level access beats network-level access
- Design systems assuming remote access from the start
The VPN stress test of 2020 revealed architectural weaknesses. Use this as an opportunity to build more resilient, scalable remote access—ideally moving toward zero trust rather than doubling down on network perimeters.