Loading learning content...
Every time you access a website, stream video, or send an email across the internet, your packets traverse multiple autonomous systems—independent networks operated by different organizations worldwide. The protocol that enables these 70,000+ networks to cooperate and exchange routing information is the Border Gateway Protocol (BGP).
BGP is the de facto standard for inter-domain routing, implementing path vector concepts to enable the global internet to function as a coherent whole despite being composed of independently managed networks. It's often called "the protocol that runs the internet," and for good reason: without BGP, the internet as we know it would not exist.
This page explores how BGP implements path vector routing in practice, from session establishment to route advertisement to the complex decision-making that determines how packets flow across the global network.
By the end of this page, you will understand BGP's role as the internet's exterior gateway protocol, how BGP sessions are established and maintained, the structure of BGP messages and UPDATE processing, the critical distinction between eBGP and iBGP, how BGP implements the path vector concepts covered in previous pages, and BGP's operational characteristics that make it suitable for internet-scale routing.
The Border Gateway Protocol (BGP) is a path vector routing protocol designed for inter-domain routing—exchanging routing information between autonomous systems (ASNs). BGP is defined in RFC 4271 and is currently at version 4 (BGP-4).
BGP's Core Purpose:
BGP enables autonomous systems to:
Why BGP for Inter-Domain Routing?
The internet's architecture demands specific characteristics that BGP provides:
| Requirement | Challenge | BGP Solution |
|---|---|---|
| Scale | 70,000+ ASNs, 950,000+ prefixes | Incremental updates, abstraction via AS path |
| Autonomy | Each AS operates independently | Policy-based routing, local administrative control |
| Stability | Global impact of routing mistakes | Slow convergence, damping, conservative defaults |
| Trust | No central authority, competitive operators | Path visibility, cryptographic extensions (RPKI/BGPsec) |
| Flexibility | Diverse business relationships | Rich attribute set, community signaling |
| Loop Prevention | Complex multi-path topologies | AS path loop detection (guaranteed) |
BGP vs Interior Gateway Protocols:
It's essential to understand BGP's position relative to IGPs (OSPF, IS-IS, EIGRP):
| Aspect | IGP (OSPF, IS-IS) | EGP (BGP) |
|---|---|---|
| Scope | Within one organization | Between organizations |
| Trust Level | High (single administration) | Low (competing entities) |
| Optimization Goal | Fastest convergence | Stability, policy compliance |
| Metric | Cost, bandwidth, delay | AS path, policy attributes |
| Information Sharing | Full topology (link state) | Path + attributes |
| Typical Scale | Hundreds of routers | Millions of routes |
| Update Frequency | Sub-second | Designed for stability |
BGP version 4, defined in RFC 4271 (2006, updating RFC 1771 from 1995), introduced CIDR support and is the only version in widespread use today. Extensions for multiprotocol support (RFC 4760), 4-byte ASNs (RFC 6793), and security (RPKI, BGPsec) have evolved the protocol while maintaining backward compatibility.
BGP operates fundamentally differently from IGPs. Rather than discovering neighbors automatically, BGP requires explicit configuration of peer relationships. Sessions are established over TCP port 179, providing reliable, ordered delivery of routing information.
The BGP Session Lifecycle:
1. Configuration Phase: Operators configure BGP neighbors explicitly:
2. TCP Connection: BGP initiates a TCP connection to port 179. Either peer can initiate, but configuration must permit the connection. The TCP session provides:
3. BGP State Machine: Upon TCP connection, BGP follows a finite state machine:
BGP States Explained:
| State | Description | Transitions |
|---|---|---|
| Idle | Initial state; no connection | Start → Connect |
| Connect | Attempting TCP connection | Success → OpenSent, Fail → Active |
| Active | Waiting for peer connection (passive) | Success → OpenSent, Retry → Connect |
| OpenSent | OPEN message sent, awaiting reply | OPEN received → OpenConfirm |
| OpenConfirm | OPEN exchange complete, awaiting KEEPALIVE | KEEPALIVE → Established |
| Established | Fully operational, exchanging routes | Error/Timeout → Idle |
Only in the Established state does BGP exchange routing information. Any error (invalid message, timeout, session reset) returns the session to Idle.
Timers Governing Sessions:
Hold Timer: Default 90 seconds
Negotiated between peers (use minimum)
If no message received within hold time → session reset
Keepalive: Default 30 seconds (typically 1/3 of hold timer)
Sent periodically to maintain session
Indicates "I'm alive, no routing changes"
Connect Retry: Default 120 seconds
Time between connection attempts after failure
BGP's default 90-second hold timer means peer failure can take up to 90 seconds to detect. Bidirectional Forwarding Detection (BFD) provides sub-second failure detection independent of BGP timers. When BFD detects failure, it immediately notifies BGP to tear down the session, enabling rapid failover without modifying BGP timers.
BGP uses four message types to establish sessions and exchange routing information. Each message follows a common header format with a 16-byte marker, 2-byte length, and 1-byte type indicator.
Message Type Summary:
| Type | Code | Purpose | When Sent |
|---|---|---|---|
| OPEN | 1 | Establish session, negotiate capabilities | After TCP connection established |
| UPDATE | 2 | Advertise or withdraw routes | When routing information changes |
| KEEPALIVE | 3 | Maintain session, confirm OPEN | Periodically (every 30s default) |
| NOTIFICATION | 4 | Report error, close session | Upon error detection |
OPEN Message:
Sent immediately after TCP connection. Contains:
OPEN Message Fields:
Version: BGP version (4)
My AS: Sender's Autonomous System Number
Hold Time: Maximum seconds between messages (0 = no keepalives)
BGP ID: Router ID (typically highest loopback IP)
Opt Params: Capability negotiation (multiprotocol, 4-byte ASN, etc.)
The OPEN exchange allows peers to negotiate capabilities. If capabilities don't match (e.g., one requires features the other doesn't support), the session fails.
UPDATE Message:
The workhorse of BGP—carries routing information:
UPDATE Message Fields:
Withdrawn Routes: List of prefixes to remove from table
Path Attributes: Attributes applying to announced prefixes
- ORIGIN, AS_PATH, NEXT_HOP (mandatory)
- LOCAL_PREF, MED, COMMUNITY, etc.
NLRI: Network Layer Reachability Information (announced prefixes)
A single UPDATE can withdraw some routes and announce others. This efficiency is crucial at internet scale.
123456789101112131415161718192021222324252627282930313233343536373839
BGP UPDATE Message (Example)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Header: Marker: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF Length: 00 47 (71 bytes) Type: 02 (UPDATE) Withdrawn Routes Length: 00 00 (0 bytes - no withdrawals) Total Path Attribute Length: 00 1E (30 bytes) Path Attributes: ORIGIN (Type 1): Flags: 40 (Transitive, Well-known) Type: 01 Len: 01 Value: 00 (IGP) AS_PATH (Type 2): Flags: 40 (Transitive, Well-known) Type: 02 Len: 0A Value: 02 02 FD E9 FD EA (AS_SEQUENCE: 65001 65002) NEXT_HOP (Type 3): Flags: 40 (Transitive, Well-known) Type: 03 Len: 04 Value: C0 A8 01 01 (192.168.1.1) LOCAL_PREF (Type 5): Flags: 40 Type: 05 Len: 04 Value: 00 00 00 64 (100) NLRI (Announced Prefixes): 18 C0 00 02 (192.0.2.0/24)KEEPALIVE Message:
Simplest message—just the 19-byte header with type 4. Sent periodically to indicate the session is healthy without routing changes.
NOTIFICATION Message:
Sent when an error occurs. Contains error code and subcode:
| Error Code | Meaning | Common Subcodes |
|---|---|---|
| 1 | Message Header Error | Bad length, bad type |
| 2 | OPEN Message Error | Bad AS, bad BGP ID, bad capability |
| 3 | UPDATE Message Error | Malformed attribute, invalid NLRI |
| 4 | Hold Timer Expired | No messages within hold time |
| 5 | Finite State Machine | Unexpected message for current state |
| 6 | Cease | Administrative shutdown, resource limit |
After sending a NOTIFICATION, the session is terminated.
Any NOTIFICATION message immediately terminates the BGP session. There is no recovery mechanism within the message exchange—the only option is to re-establish the session from scratch. This conservative approach ensures that error conditions don't propagate bad routing information.
BGP operates in two modes with fundamentally different behaviors: External BGP (eBGP) between autonomous systems and Internal BGP (iBGP) within an autonomous system. Understanding these differences is essential for proper BGP deployment.
eBGP (External BGP):
Connects routers in different autonomous systems:
Why iBGP Doesn't Modify AS Path:
If iBGP modified the AS path, internal routers would see artificially lengthened paths:
Incorrect (if iBGP prepended):
Edge router learns: 192.0.2.0/24 via AS_PATH 65002
Redistributes to iBGP: 192.0.2.0/24 via AS_PATH 65001 65002
Internal router sees: Path length 2 for internal route!
Correct (iBGP preserves):
Edge router learns: 192.0.2.0/24 via AS_PATH 65002
Redistributes to iBGP: 192.0.2.0/24 via AS_PATH 65002
Internal router sees: Path length 1 (accurate inter-domain path)
The AS path represents the inter-domain path, not internal hops. iBGP routers are within the same AS and don't add to the AS path length.
The iBGP Full Mesh Requirement:
iBGP has a split-horizon rule: routes learned via iBGP cannot be advertised to other iBGP peers. This prevents iBGP loops but requires:
| Behavior | eBGP | iBGP |
|---|---|---|
| AS Path | Prepends local ASN | Unchanged |
| Next-Hop | Changed to advertising router | Unchanged (by default) |
| Advertisement Rule | Based on export policy | Split-horizon (no iBGP→iBGP) |
| Loop Prevention | AS path check | Full mesh / route reflectors |
| Physical Requirement | Usually direct connection | Any reachability via IGP |
| Session Config | ebgp-multihop if indirect | update-source for loopback |
Since iBGP doesn't change the next-hop by default, interior routers may receive routes with next-hops they can't reach (external peer IPs). Configure 'neighbor x.x.x.x next-hop-self' on eBGP-facing routers to set the next-hop to their own loopback, ensuring iBGP peers can reach it via the IGP.
The iBGP full mesh requirement creates scalability challenges: 100 routers require 4,950 sessions. Two mechanisms address this: Route Reflectors and Confederations.
Route Reflectors (RR):
Route reflectors break the iBGP split-horizon rule in a controlled manner:
Traditional Full Mesh: With Route Reflector:
R1 ──── R2 R1
│╲ ╱│ │
│ ╲╱ │ │
│ ╱╲ │ RR (Route Reflector)
│╱ ╲│ ╱│╲
R3 ──── R4 R2 R3 R4
6 sessions 3 sessions
How Route Reflection Works:
12345678910111213141516171819202122
! Route Reflector Configurationrouter bgp 65001 bgp cluster-id 1.1.1.1 ! Client peers - RR will reflect routes between them neighbor 10.0.0.1 remote-as 65001 neighbor 10.0.0.1 route-reflector-client neighbor 10.0.0.2 remote-as 65001 neighbor 10.0.0.2 route-reflector-client neighbor 10.0.0.3 remote-as 65001 neighbor 10.0.0.3 route-reflector-client ! Non-client iBGP peer (if any) neighbor 10.0.0.10 remote-as 65001 ! (no route-reflector-client = standard iBGP rules) ! Route Reflection Rules:! From client: Reflect to all clients + non-clients! From non-client: Reflect to clients only! From eBGP: Reflect to all clients + non-clientsRR Loop Prevention:
Route reflectors use two attributes to prevent loops:
| Attribute | Purpose | Check |
|---|---|---|
| ORIGINATOR_ID | Identifies first advertiser | If ORIGINATOR_ID = local router ID, discard |
| CLUSTER_LIST | List of RR clusters traversed | If CLUSTER_LIST contains local cluster ID, discard |
Confederations:
An alternative to RRs: divide the AS into sub-ASes using private ASNs:
External View:
AS 65001
Internal View:
┌───────────────────────────┐
│ AS 65001 │
│ ┌─────────┐ ┌─────────┐ │
│ │Sub-AS │ │Sub-AS │ │
│ │ 65500 │─│ 65501 │ │
│ └─────────┘ └─────────┘ │
│ │ │ │
│ ┌─────────┐ │
│ │Sub-AS │ │
│ │ 65502 │ │
│ └─────────┘ │
└───────────────────────────┘
Route reflectors are far more common than confederations due to simpler operation. Confederations provide more natural policy boundaries but add complexity. Most large networks use hierarchical RR designs (RRs in multiple regions, potentially reflecting to each other) rather than confederations.
Understanding BGP theory is essential, but production deployment requires additional considerations around security, stability, and operational practices.
The Global BGP Routing Table:
As of 2024, the global BGP table contains approximately:
Each BGP router in the default-free zone (DFZ) must store and process this entire table. This requires:
| Practice | Purpose | Implementation |
|---|---|---|
| Maximum-Prefix Limits | Prevent table explosion from leaks | Set per-neighbor limits with warning thresholds |
| Prefix Filtering | Block bogons, too-specifics, own space | Maintain updated filter lists from IRR/RPKI |
| MD5/TCP-AO Authentication | Prevent session hijacking | Configure matching passwords on both peers |
| GTSM (TTL Security) | Prevent distant attack attempts | Require TTL=255 for received packets |
| BFD | Fast failure detection | Sub-second failover independent of hold timer |
| Graceful Restart | Survive control plane failures | Maintain forwarding during short outages |
Common BGP Deployment Patterns:
Pattern 1: Multi-homed Enterprise
┌────────────┐
│ Enterprise │
│ AS 65001 │
└─────┬──────┘
│ iBGP
┌──────────┴──────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ Router │ │ Router │
│ R1 │ │ R2 │
└────┬────┘ └────┬────┘
│eBGP │eBGP
┌────▼────┐ ┌────▼────┐
│ ISP A │ │ ISP B │
│AS 65010 │ │AS 65020 │
└─────────┘ └─────────┘
Pattern 2: Transit Provider
Upstream Tier 1 Providers
│ │
│eBGP │eBGP
┌───────▼───────────▼───────┐
│ Transit Network │
│ AS 65100 │
│ ┌─RR──┐ ┌─RR──┐ │
│ │ 1 │────│ 2 │ │
│ └──┬──┘ └──┬──┘ │
│ │iBGP │iBGP │
│ ┌──┴──────────┴──┐ │
│ │ Edge Routers │ │
│ └──┬──────────┬──┘ │
└─────┼──────────┼──────────┘
│eBGP │eBGP
Customer A Customer B
1234567891011121314151617181920212223242526272829303132333435
! Production eBGP Session Configurationrouter bgp 65001 bgp log-neighbor-changes no bgp enforce-first-as bgp bestpath as-path multipath-relax ! Upstream transit provider neighbor 192.0.2.1 remote-as 65010 neighbor 192.0.2.1 description UPSTREAM-ISP-A neighbor 192.0.2.1 password SECRET-MD5-KEY neighbor 192.0.2.1 ttl-security hops 1 neighbor 192.0.2.1 maximum-prefix 900000 90 restart 15 neighbor 192.0.2.1 route-map UPSTREAM-IN in neighbor 192.0.2.1 route-map UPSTREAM-OUT out neighbor 192.0.2.1 fall-over bfd ! Graceful restart configuration bgp graceful-restart bgp graceful-restart restart-time 120 bgp graceful-restart stalepath-time 360 ! Security filtersip prefix-list BOGON-V4 seq 5 deny 0.0.0.0/8 le 32ip prefix-list BOGON-V4 seq 10 deny 10.0.0.0/8 le 32ip prefix-list BOGON-V4 seq 15 deny 127.0.0.0/8 le 32ip prefix-list BOGON-V4 seq 20 deny 169.254.0.0/16 le 32ip prefix-list BOGON-V4 seq 25 deny 172.16.0.0/12 le 32ip prefix-list BOGON-V4 seq 30 deny 192.168.0.0/16 le 32ip prefix-list BOGON-V4 seq 35 deny 224.0.0.0/4 le 32ip prefix-list BOGON-V4 seq 100 permit 0.0.0.0/0 le 24 ! Apply to import policyroute-map UPSTREAM-IN deny 10 match ip address prefix-list BOGON-V4route-map UPSTREAM-IN permit 100A single BGP misconfiguration can cause global routing problems. Always use prefix filters, maximum-prefix limits, and verify configurations before deploying. Test changes during maintenance windows and have rollback procedures ready. Consider using BGP looking glasses to verify how your announcements appear from external perspectives.
BGP has evolved significantly since its initial specification. Modern deployments leverage numerous extensions that enhance functionality, security, and flexibility.
Multiprotocol BGP (MP-BGP):
RFC 4760 extended BGP to carry routing information for multiple address families:
| AFI | SAFI | Description |
|---|---|---|
| 1 | 1 | IPv4 Unicast |
| 1 | 2 | IPv4 Multicast |
| 1 | 4 | IPv4 MPLS Labels |
| 1 | 128 | IPv4 MPLS VPN |
| 2 | 1 | IPv6 Unicast |
| 2 | 2 | IPv6 Multicast |
| 2 | 4 | IPv6 MPLS Labels |
| 2 | 128 | IPv6 MPLS VPN |
| 25 | 65 | EVPN |
MP-BGP is foundational for MPLS VPNs, IPv6 deployment, and modern data center fabrics.
4-Byte AS Numbers:
The original 2-byte ASN space (0-65535) was exhausted. RFC 6793 introduced 4-byte ASNs:
2-byte range: 0 - 65535
4-byte range: 0 - 4,294,967,295
Notation:
asdot: 3.15 (old-style, rarely used)
asplain: 196623 (recommended)
Private ranges:
2-byte: 64512 - 65534
4-byte: 4200000000 - 4294967294
RPKI (Resource Public Key Infrastructure):
RPKI provides cryptographic verification of route origins:
RPKI Validation:
Prefix: 192.0.2.0/24
Origin AS: 65001
ROA exists for AS 65001 → Valid
Prefix: 192.0.2.0/24
Origin AS: 65999 (wrong AS)
ROA exists for AS 65001 → Invalid → DROP
RPKI adoption is accelerating, with major networks implementing "reject invalid" policies.
| Extension | RFC | Purpose | Status |
|---|---|---|---|
| MP-BGP | 4760 | Multi-address-family support | Universal |
| 4-Byte ASN | 6793 | Extended AS number space | Universal |
| Add-Path | 7911 | Advertise multiple paths per prefix | Growing adoption |
| RPKI/ROA | 6480-7115 | Route origin validation | Major providers |
| BGPsec | 8205 | Full path validation | Limited deployment |
| Large Communities | 8092 | 96-bit community tags | Growing adoption |
| Flowspec | 5575 | Distribute DDoS mitigation rules | ISPs/Security |
Traditional BGP advertises only the best path. Add-Path (RFC 7911) allows advertising multiple paths, enabling faster convergence when the best path fails—receivers already know alternatives. This is particularly valuable with route reflectors, where the RR's best path might not be optimal for all clients.
BGP is the practical embodiment of path vector routing, implementing the concepts of AS path information, loop prevention, and policy routing at global scale. Understanding BGP operations—from session establishment to route selection to modern extensions—is essential for anyone working with internet routing.
Key Concepts Mastered:
What's Next:
Having explored path vector concepts and BGP implementation, the final page provides a comprehensive comparison with other routing paradigms. We'll examine how path vector, distance vector, and link state approaches differ, when each is appropriate, and how they work together in modern networks.
You now understand how BGP implements path vector routing, the mechanics of BGP sessions and message processing, the distinction between eBGP and iBGP, scaling mechanisms, production deployment practices, and modern protocol extensions.