Loading content...
When engineers discuss hybrid cloud connectivity, two technologies dominate the conversation: VPN (Virtual Private Network) and dedicated private connections (AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect). These aren't competing technologies—they're complementary tools that solve different problems and often coexist in production architectures.
Understanding when and how to use each is fundamental to hybrid cloud design. A misconfigured VPN can introduce crippling latency; an oversized Direct Connect circuit can waste thousands monthly. The difference between a hybrid architecture that enables business agility and one that becomes a maintenance burden often comes down to these connectivity decisions.
This page provides the technical depth needed to design, implement, and operate both connectivity types at production scale.
By the end of this page, you will understand the protocols, architectures, and operational characteristics of VPN and private connection technologies. You'll learn how to design for performance, redundancy, and cost optimization—and critically, how to choose between them based on workload requirements.
A Virtual Private Network creates an encrypted tunnel through the public internet, making geographically separated networks appear as a single, unified private network. For hybrid cloud, VPN connects on-premises data centers to cloud Virtual Private Clouds (VPCs).
| Phase | Purpose | Key Parameters | Result |
|---|---|---|---|
| Phase 1 (IKE) | Establish secure channel for negotiation | Encryption algo, hash algo, DH group, lifetime, authentication method | IKE Security Association (SA) |
| Phase 2 (IPSec) | Negotiate parameters for data encryption | ESP cipher, hash, PFS group, proxy identities (subnets) | IPSec Security Association (SA) |
AWS VPN uses IKEv2 with AES-256-GCM. Azure VPN Gateway also supports IKEv2. GCP Cloud VPN offers both Classic VPN (static routing) and HA VPN (IKEv2 with BGP). Always check provider documentation for supported parameters—mismatches cause tunnel establishment failures.
Cloud providers offer several VPN deployment patterns, each with different availability, performance, and complexity characteristics. Selecting the right pattern depends on your SLA requirements and operational maturity.
Active-Active vs Active-Passive:
VPN tunnels can operate in two modes:
Active-Active — Both tunnels carry traffic simultaneously using ECMP (Equal-Cost Multi-Path) routing. Doubles effective bandwidth but requires BGP and customer gateway support.
Active-Passive — Primary tunnel carries all traffic; secondary activates only on primary failure. Simpler but wastes standby capacity and introduces failover delay.
Production deployments should prefer Active-Active for both bandwidth and seamless failover.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# AWS Site-to-Site VPN with High Availability# Creates two VPN connections to two Customer Gateways (on-prem side) resource "aws_vpn_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "hybrid-vpn-gateway" }} # First Customer Gateway (on-prem router 1)resource "aws_customer_gateway" "onprem_1" { bgp_asn = 65000 ip_address = var.onprem_router_1_public_ip type = "ipsec.1" tags = { Name = "onprem-cgw-1" }} # Second Customer Gateway (on-prem router 2)resource "aws_customer_gateway" "onprem_2" { bgp_asn = 65000 ip_address = var.onprem_router_2_public_ip type = "ipsec.1" tags = { Name = "onprem-cgw-2" }} # VPN Connection 1 - Uses BGP for dynamic routingresource "aws_vpn_connection" "vpn_1" { vpn_gateway_id = aws_vpn_gateway.main.id customer_gateway_id = aws_customer_gateway.onprem_1.id type = "ipsec.1" static_routes_only = false # Use BGP tunnel1_inside_cidr = "169.254.10.0/30" tunnel2_inside_cidr = "169.254.10.4/30" tags = { Name = "vpn-connection-1" }} # VPN Connection 2 - Redundant path through second routerresource "aws_vpn_connection" "vpn_2" { vpn_gateway_id = aws_vpn_gateway.main.id customer_gateway_id = aws_customer_gateway.onprem_2.id type = "ipsec.1" static_routes_only = false tunnel1_inside_cidr = "169.254.11.0/30" tunnel2_inside_cidr = "169.254.11.4/30" tags = { Name = "vpn-connection-2" }}Cloud VPN throughput per tunnel is typically limited: AWS ~1.25 Gbps, Azure ~1.25 Gbps (VpnGw1), GCP ~3 Gbps. For higher bandwidth, use multiple tunnels with ECMP or consider Direct Connect. Never assume VPN can handle bulk data migrations without testing.
Private connections (Direct Connect, ExpressRoute, Cloud Interconnect) establish dedicated network links between on-premises infrastructure and cloud providers, bypassing the public internet entirely. This provides consistent latency, higher bandwidth, and enhanced security—at significantly higher cost and complexity.
| Feature | AWS Direct Connect | Azure ExpressRoute | GCP Cloud Interconnect |
|---|---|---|---|
| Port Speeds | 1 Gbps, 10 Gbps, 100 Gbps | 50 Mbps - 100 Gbps | 10 Gbps, 100 Gbps (Dedicated) |
| Locations | 100+ worldwide | 180+ globally | 150+ locations |
| BGP Required | Yes | Yes | Yes (Dedicated), Optional (Partner) |
| Layer 2 Extension | Via partner overlay | ExpressRoute Global Reach | Partner Interconnect |
| Encryption Option | MACsec at edge routers | ExpressRoute Direct with MACsec | MACsec on 100G |
| Multi-Cloud | Via Transit Gateway + peering | ExpressRoute Global Reach | Partner Interconnect |
| Pricing Model | Port-hour + data egress | Bandwidth tier + data egress | Port + egress (no per-GB on some) |
Connection Types:
Dedicated Connection — You lease a physical port directly from the cloud provider's cage in a colocation facility. Requires your equipment to be in or connected to that facility. Highest performance, most complex.
Hosted Connection / Partner Connection — An AWS Partner / Azure Partner provides a sub-interface on their existing aggregation port. Simpler provisioning, potentially lower bandwidth options, shared infrastructure.
Colocation Model — Your equipment is in the same facility as the cloud provider's edge routers. Direct cross-connect establishment. Lowest latency.
Last-Mile Provider — A telecom provides a circuit from your data center to the colocation facility. Adds latency and cost but enables connectivity from anywhere.
A single Direct Connect port provides no redundancy. Physical circuit failures, equipment issues, or colocation facility problems will cause complete connectivity loss. Production deployments must design for resilience.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
# AWS Direct Connect High Availability Reference Architecture# Using AWS CloudFormation to configure resilient Direct Connect AWSTemplateFormatVersion: '2010-09-09'Description: Direct Connect Resilient Architecture Resources: # Direct Connect Gateway - shared across connections DirectConnectGateway: Type: AWS::DirectConnect::DirectConnectGateway Properties: Name: production-dx-gateway AmazonSideAsn: 64512 # Virtual Private Gateway association to VPC GatewayAssociation: Type: AWS::DirectConnect::VirtualPrivateGatewayAssociation Properties: DirectConnectGatewayId: !Ref DirectConnectGateway VirtualPrivateGatewayId: !Ref ProductionVGW # Primary Private VIF - Location 1 PrimaryVIF: Type: AWS::DirectConnect::VirtualInterface Properties: ConnectionId: !Ref PrimaryConnectionId # Physical connection 1 VirtualInterfaceName: primary-private-vif Vlan: 100 Asn: 65000 # Customer ASN DirectConnectGatewayId: !Ref DirectConnectGateway AddressFamily: ipv4 AmazonAddress: 169.254.100.1/30 CustomerAddress: 169.254.100.2/30 # Secondary Private VIF - Location 2 SecondaryVIF: Type: AWS::DirectConnect::VirtualInterface Properties: ConnectionId: !Ref SecondaryConnectionId # Physical connection 2 VirtualInterfaceName: secondary-private-vif Vlan: 200 Asn: 65000 DirectConnectGatewayId: !Ref DirectConnectGateway AddressFamily: ipv4 AmazonAddress: 169.254.200.1/30 CustomerAddress: 169.254.200.2/30 Outputs: DXGatewayId: Value: !Ref DirectConnectGateway Description: Direct Connect Gateway for VIF attachmentsA single Direct Connect circuit has no SLA. AWS explicitly states: 'For production workloads or workloads where availability is critical, we recommend configuring two connections at different Direct Connect locations.' Never deploy mission-critical workloads on single-path connectivity.
Border Gateway Protocol (BGP) is the routing protocol that makes hybrid connectivity work. Understanding BGP is essential for designing traffic paths, implementing failover, and troubleshooting connectivity issues.
| Scenario | Technique | Configuration |
|---|---|---|
| Prefer Direct Connect over VPN | LOCAL_PREF | Set higher LOCAL_PREF (e.g., 200) on Direct Connect learned routes |
| Prefer primary DC location | AS_PATH Prepending | Prepend ASN on secondary location advertisements |
| Load balance across paths | ECMP | Ensure identical AS_PATH length and metrics on all paths |
| Control AWS-side preference | MED | Set lower MED on preferred customer gateway |
| Limit route propagation | BGP Communities | Use AWS-specific communities to control regional advertisement |
Failover Behavior:
When a BGP session fails (tunnel down, interface failure), routes are withdrawn. The BGP Hold Timer (default 90s in many implementations, configurable lower) determines how long before routes are considered stale.
For faster failover:
AWS VPN supports BGP with configurable timers. Direct Connect supports BFD for rapid failure detection.
Bidirectional Forwarding Detection (BFD) can detect link failures in milliseconds, triggering BGP route withdrawal immediately. For production hybrid architectures requiring minimal failover time, enabling BFD on Direct Connect Virtual Interfaces is highly recommended.
Connectivity is only useful if it performs well. Hybrid architectures introduce latency, potential bandwidth constraints, and TCP optimization challenges that require careful attention.
| Connection Type | Typical Latency | Latency Variability | Suitable Workloads |
|---|---|---|---|
| VPN over Internet | 20-100ms+ | High (jitter, congestion) | Async operations, batch jobs, fault-tolerant apps |
| Direct Connect (same metro) | 1-5ms | Very Low | Transactional databases, real-time sync |
| Direct Connect (cross-region) | 10-50ms | Low | Cross-region replication, DR workloads |
| Direct Connect + VPN overlay | 5-15ms overhead | Low | Encrypted Direct Connect traffic |
12345678910111213141516171819202122232425262728
#!/bin/bash# Linux TCP optimization for high-latency hybrid links# Apply on application servers connecting across hybrid boundary # Enable TCP SACK (Selective Acknowledgment)sysctl -w net.ipv4.tcp_sack=1 # Enable TCP Window Scalingsysctl -w net.ipv4.tcp_window_scaling=1 # Increase TCP buffer sizes for high BDP links# BDP = Bandwidth * Delay (e.g., 100 Mbps * 50ms = 625KB)sysctl -w net.core.rmem_max=16777216sysctl -w net.core.wmem_max=16777216sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216" # Enable TCP timestamps for PAWS (Protection Against Wrapped Sequences)sysctl -w net.ipv4.tcp_timestamps=1 # Congestion control: consider bbr for WAN linksmodprobe tcp_bbrsysctl -w net.ipv4.tcp_congestion_control=bbr # MTU path discoverysysctl -w net.ipv4.ip_no_pmtu_disc=0 echo "TCP optimization applied for hybrid connectivity"On high-latency links, small TCP windows cause underutilization regardless of available bandwidth. Calculate BDP: if you have 100 Mbps and 50ms RTT, you need 625KB in-flight data to saturate the link. Default TCP window sizes may be insufficient.
Connectivity costs are often underestimated in hybrid cloud budgets. Understanding the pricing models and optimization strategies can result in significant savings.
VPN Costs:
| Cost Component | Pricing (Example) | Notes |
|---|---|---|
| Port Hour (1 Gbps) | ~$0.30/hour ($219/month) | Fixed cost regardless of utilization |
| Port Hour (10 Gbps) | ~$2.25/hour ($1,642/month) | Significant commitment |
| Data Transfer Out (same region) | ~$0.02/GB | Reduced vs internet egress (~$0.09/GB) |
| Data Transfer Out (cross-region) | ~$0.02/GB | Same as standard inter-region rates |
| Partner Port (Hosted) | Partner-dependent | Often lower for sub-1Gbps needs |
| Cross-Connect | $100-500/month | Colocation facility charges |
Direct Connect's reduced data transfer rates ($0.02/GB vs $0.09/GB) mean it becomes cost-effective at scale. For AWS, if you transfer >4TB/month, Direct Connect 1 Gbps often pays for itself through egress savings alone. Build a spreadsheet with your expected traffic patterns.
We've covered the technical depth of both connectivity options. Let's consolidate into a decision framework:
| Factor | Choose VPN | Choose Direct Connect |
|---|---|---|
| Budget | Tight; under $500/month | Can invest $1000+/month |
| Deployment Speed | Need connectivity in hours/days | Can wait weeks/months |
| Latency Requirement | Tolerant of 50-100ms+ variability | Requires consistent <10ms |
| Bandwidth | Under 500 Mbps sustained | Gbps-level throughput needed |
| Data Volume | Low-medium (< 5 TB/month) | High (many TBs/month) |
| Compliance | Standard security OK | Data must not traverse internet |
| Redundancy | Acceptable periodic downtime | 99.99% availability required |
What's next:
With connectivity established, the next challenge is data. How do you synchronize databases across hybrid boundaries? How do you handle data sovereignty requirements? The next page explores Hybrid Data Strategies—replication patterns, caching, and consistency models for data that spans on-prem and cloud.
You now have comprehensive knowledge of VPN and Direct Connect technologies. This technical foundation enables you to design, implement, and operate hybrid connectivity that meets your organization's performance, security, and budget requirements.