Computer NetworksPeer-to-Peer Model

Peer-to-Peer Model

LevelIntermediate

Duration55 mins

TopicPeer-to-Peer Model

2 / 5

Decentralized Architecture

Architecture Without Architects

When you design a client-server system, you draw boxes and arrows: clients connect to load balancers, which distribute traffic to servers, which access databases. The architecture exists on a whiteboard before any code is written.

Decentralized systems are different. There is no global blueprint. No central planner decides which peers connect to which. The architecture emerges from simple rules that individual peers follow independently. This emergence is both P2P's greatest strength and its most profound engineering challenge.

In this page, we'll explore how P2P systems organize themselves, the structural patterns that arise from local interactions, and the mechanisms that enable coordination without central control.

Learning Objectives

By the end of this page, you'll understand: how unstructured and structured P2P overlays work, the mathematics of Distributed Hash Tables (DHTs), how peers discover and maintain connections, and the tradeoffs between different decentralized topologies.

Degrees of Decentralization

"Decentralization" is not binary—it's a spectrum. Understanding where a system falls on this spectrum is crucial for evaluating its properties and tradeoffs.

The Decentralization Spectrum:

Decentralization Levels in Network Systems
Level	Description	Control Point	Examples
Centralized	Single authority controls all operations	Central server(s)	Traditional web apps, email servers
Federated	Multiple authorities cooperate under shared rules	Federation members	Email (SMTP), Matrix, Mastodon
Distributed with Coordinators	Peers handle data, servers coordinate	Coordinator nodes	BitTorrent with trackers, original Skype
Partially Decentralized	Some privileged nodes (super-peers) aid coordination	Super-peers	Kazaa, early Gnutella 0.6
Fully Decentralized	All peers equal, no special nodes required	None (emergent)	Bitcoin, Gnutella 0.4, IPFS

Why Full Decentralization is Rare:

Fully decentralized systems face significant practical challenges:

Bootstrap Problem — How does a new peer find any other peer without asking a known server?
Discovery Latency — Finding resources without an index requires network-wide search
Consistency — Reaching agreement without a coordinator requires complex consensus
Performance — Distributed algorithms often have higher latency than server lookups

Most "decentralized" systems actually use a hybrid model—decentralizing the expensive parts (data transfer, storage) while keeping some coordination centralized or semi-centralized.

The Bootstrap Paradox:

Every P2P network faces a paradox: to join a decentralized network, you need to know at least one peer already in the network. Solutions include:

Hardcoded bootstrap nodes — IP addresses baked into client software
DNS-based discovery — DNS records that point to current bootstrap nodes
Social sharing — Users share peer addresses out-of-band (copied from friends, websites)
DHT bootstrap services — Public services that respond to initial DHT queries

Practical Decentralization

Even Bitcoin—often called fully decentralized—ships with hardcoded bootstrap nodes. The goal isn't eliminating all central points, but ensuring the system continues functioning if any particular node disappears. Decentralization is about resilience, not purity.

Unstructured Overlay Networks

The simplest form of P2P architecture is the unstructured overlay. In unstructured networks, peers connect to each other without following a predetermined topology. The network structure is essentially random or driven by local decisions.

How Unstructured Overlays Work:

A new peer connects to bootstrap nodes and requests a list of other active peers
The peer establishes connections to some subset of those peers (its "neighbors")
To search for content, the peer floods queries through its neighbors, who propagate to their neighbors
When a peer has content matching a query, it responds directly to the requester
Data transfer occurs directly between the content holder and requester

Converting Mermaid diagram...

Gnutella: The Classic Unstructured Network:

Gnutella (2000) exemplified unstructured P2P. Its protocol defined simple message types:

PING — "I'm alive, tell me about other peers"
PONG — Response listing known peers
QUERY — "Does anyone have files matching this search?"
QUERYHIT — "I have matching files, here's how to get them"
PUSH — Request for firewall-protected peers to initiate connection

Queries spread through the network with a TTL (Time To Live) that decrements at each hop. This limits flooding but also limits reach—content held only by distant peers might never be found.

Scalability Problems:

Unstructured networks face fundamental scalability limits:

O(N) messages per query — Each search may reach most of the network
No guaranteed discovery — Rare content may be unfindable within TTL limits
Bandwidth explosion — Popular networks choke under query traffic
Redundant queries — Same query received multiple times by the same peer

Gnutella 0.4 struggled past thousands of users. Its successor, Gnutella 0.6, introduced "ultrapeers"—high-capacity nodes that index content from multiple regular peers, creating a two-tier semi-structured network.

Random Walks and Gossip

Modern unstructured networks often replace flooding with 'random walks'—queries that traverse a random path through the network—or gossip protocols where peers probabilistically share information. These reduce message overhead while maintaining reasonable discovery rates for popular content.

Structured Overlays: DHT Fundamentals

Distributed Hash Tables (DHTs) represent the most elegant solution to P2P's discovery problem. A DHT provides a distributed key-value store with one remarkable property: any key can be found in O(log N) hops regardless of network size.

The Core DHT Concept:

DHTs work by assigning every peer and every piece of content to positions in a key space (typically a ring of integers from 0 to 2^n - 1, using n-bit identifiers):

Each peer has a node ID, typically a hash of its IP address or public key
Each piece of content has a key, typically a hash of the content or its name
Content is stored at the peer whose node ID is "closest" to the content's key
To find content, route toward the key until reaching the responsible peer

Converting Mermaid diagram...

The "Closest" Definition:

Different DHT systems define "closest" differently:

Chord: Successor—the first node with ID ≥ key (clockwise on ring)
Kademlia: XOR distance—bitwise XOR of node ID and key, interpreted as integer
Pastry: Prefix matching—longest shared prefix in base-16 representation

Each definition creates different routing properties, but all enable O(log N) lookup.

Why O(log N) Lookups?

DHT routing tables are structured so each peer knows about peers at exponentially increasing distances. In Chord with N nodes:

Peer knows its immediate successor (distance 1)
Peer knows a peer ~N/2 away
Peer knows a peer ~N/4 away
Peer knows a peer ~N/8 away
...and so on

This "finger table" structure means each routing decision cuts the remaining distance roughly in half—classic binary search translated to a distributed setting.

DHT vs. Blockchain

DHTs and blockchains are often confused because both are distributed. The key difference: DHTs distribute data storage and lookup (content is on one or few nodes), while blockchains replicate all data to all nodes (every node has full ledger). DHTs optimize for efficiency; blockchains optimize for consensus.

Chord Protocol Deep Dive

Chord (MIT, 2001) is the foundational DHT protocol, influential for its elegance and mathematical precision. Understanding Chord's design illuminates core DHT concepts.

Chord's Key Insight:

Arrange all possible keys on a ring (modular arithmetic). Each node is responsible for keys between itself and its predecessor. To find any key, route clockwise around the ring—but accelerate by jumping to fingers.

Chord Finger Table
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* Chord Finger Table for Node n with m-bit identifiers */
 
// Entry i in the finger table of node n:
// finger[i].start = (n + 2^(i-1)) mod 2^m     // where i = 1..m
// finger[i].node  = successor(finger[i].start)
 
// Example: Node 8 in a 6-bit (64-key) ring
// m = 6, n = 8
 
i   | start = (8 + 2^(i-1)) mod 64 | successor
----|------------------------------|----------
1   | (8 + 1)  mod 64 = 9          | Node 14  (first node ≥ 9)
2   | (8 + 2)  mod 64 = 10         | Node 14
3   | (8 + 4)  mod 64 = 12         | Node 14
4   | (8 + 8)  mod 64 = 16         | Node 21
5   | (8 + 16) mod 64 = 24         | Node 32
6   | (8 + 32) mod 64 = 40         | Node 42
 
// Each finger reaches exponentially further around the ring
// Lookup for any key requires at most O(log n) = O(m) hops

Chord Lookup Algorithm:

function find_successor(key):
    if key is between (n, successor]
        return successor
    else
        n' = closest_preceding_finger(key)
        return n'.find_successor(key)  // Remote call

function closest_preceding_finger(key):
    for i = m downto 1:
        if finger[i].node is between (n, key)
            return finger[i].node
    return n

The lookup recursively finds the closest known predecessor of the target key, then asks that node to continue the search. Each hop approximately halves the remaining distance to the target.

Chord Maintenance:

Unlike static data structures, DHT networks constantly change as peers join and leave. Chord maintains correctness through:

Stabilization — Periodic process where nodes verify and update successor pointers
Fix Fingers — Periodic refresh of finger table entries
Predecessor checking — Nodes verify their predecessor is still alive

These background processes ensure routing remains correct even as the network evolves.

Churn is the Enemy

In networks with high churn (peers frequently joining/leaving), maintenance overhead dominates. Lookups may fail because routing tables reference departed nodes. Production DHTs like Kademlia add redundancy—multiple contacts per routing entry—to handle churn gracefully.

Kademlia: The Modern DHT Standard

While Chord provided theoretical foundation, Kademlia (2002) became the dominant real-world DHT, powering BitTorrent's distributed tracker, IPFS, Ethereum's node discovery, and many other systems.

Kademlia's Key Innovations:

Kademlia Design Principles

•XOR Metric — Distance between nodes/keys is XOR of their IDs. XOR is symmetric (d(a,b) = d(b,a)), enabling bidirectional routing and reducing maintenance traffic.
•k-Buckets — Routing tables organized by distance range. Each bucket stores k contacts sharing a common prefix length. This provides redundancy against node failure.
•Parallel Lookups — Queries multiple nodes simultaneously, using fastest responders. This reduces latency and handles unresponsive nodes gracefully.
•Iterative Lookups — Rather than recursive forwarding, queries return closer nodes for the requester to contact. This gives the requester control and visibility.
•Natural Churn Resistance — Recently-seen nodes are preferred, and buckets are refreshed lazily. Nodes that stick around get trusted; transient nodes cycle out.

The XOR Distance Metric:

Kademlia's brilliance is using XOR as a distance function:

distance(a, b) = a XOR b

Why XOR works beautifully:

d(a, a) = 0 — Node is at distance 0 from itself
d(a, b) > 0 for a ≠ b — Different nodes have positive distance
d(a, b) = d(b, a) — Distance is symmetric (unlike Chord's successor relationship)
d(a, b) + d(b, c) ≥ d(a, c) — Triangle inequality holds

For any target, there's exactly one node at each distance. This enables consistent routing without asymmetric special cases.

k-Bucket Structure:

For 160-bit node IDs, Kademlia maintains 160 buckets:

Bucket 0: Nodes sharing 0-bit prefix (different in first bit)
Bucket 1: Nodes sharing 1-bit prefix
Bucket 159: Nodes sharing 159-bit prefix (almost identical ID)

Each bucket holds k contacts (typically k=20). Buckets for nearby distances are more populated; distant buckets may have few or no entries since fewer nodes exist at those distances.

Kademlia in Production

BitTorrent's mainline DHT, used by hundreds of millions of peers, is Kademlia. IPFS uses Kademlia for content routing. Ethereum uses Kademlia-inspired protocols for node discovery. Kademlia's robustness has been proven at planetary scale.

Peer Discovery and Maintenance

A decentralized network is only as good as its ability to discover and maintain peer connections. This infrastructure, often invisible to users, determines network health.

Discovery Mechanisms

•Bootstrap Nodes — Hard-coded stable peers that accept initial connections
•DNS Seeds — Domain names that resolve to current bootstrap IP addresses
•Peer Exchange (PEX) — Connected peers share known peer addresses
•Local Discovery — Multicast on LAN to find nearby peers
•DHT Bootstrap — Query DHT for peers near random keys

Maintenance Operations

•Heartbeat/Ping — Periodic liveness checks for connected peers
•Routing Table Refresh — Replace stale entries with fresh contacts
•Bucket Refresh — Query DHT for keys in sparse bucket ranges
•Reputation Tracking — Favor peers with good response history
•Connection Management — Limit connections to avoid resource exhaustion

The Join Process:

When a new peer joins a DHT network:

1. Contact bootstrap node, learn initial peers
2. Generate own node ID (hash of IP/key)
3. Perform lookup for own ID to find nearest neighbors
4. Populate routing table from lookup responses
5. Store any data for keys this node is now responsible for
6. Begin participating in lookups and storage

Handling Departures:

Peers leave networks two ways: gracefully or abruptly.

Graceful departure:

Peer notifies neighbors of departure
Transfers responsibility for stored keys to successor
Neighbors update routing tables

Abrupt departure (crash, network failure):

Neighbors detect via failed heartbeats
Mark peer as unresponsive after threshold
Routing tables self-heal via refresh mechanisms
Stored data may be lost (requiring replication strategies)

Replication is crucial: if each key is stored on k nodes, up to k-1 can fail simultaneously without data loss.

Eclipse Attacks

An attacker who controls all of a victim's routing table entries can 'eclipse' that peer from the real network, controlling everything they see. Defense requires diverse peer selection, limiting connections per IP range, and cryptographic node ID generation.

Super-Peer Architectures

Between fully decentralized and client-server lies an important middle ground: super-peer (or ultrapeer/hub) architectures. These systems acknowledge that peers are not truly equal—some have more bandwidth, storage, or uptime.

The Super-Peer Model:

Converting Mermaid diagram...

How Super-Peers Work:

Nodes with high bandwidth, long uptime, and public IP addresses become super-peers
Regular nodes ("leaves") connect to one or a few super-peers
Leaves register their shared content with super-peers
Super-peers maintain indexes and handle queries across their leaves
Super-peers communicate among themselves for network-wide queries
Data transfer still happens directly between leaves

Kazaa and FastTrack:

Kazaa (2001) popularized super-peer architecture. It automatically promoted capable nodes to super-peer status based on:

Connection speed
Processing power
Firewall/NAT status (reachable nodes preferred)
Uptime

This hybrid model scaled far better than pure flooding networks while retaining decentralization's resilience.

Advantages and Disadvantages:

Super-Peer Architecture Tradeoffs
Aspect	Advantage	Disadvantage
Query Efficiency	Queries resolved within super-peer indexes	Super-peers become bottlenecks
Network Load	Leaves have minimal overhead	Super-peers bear disproportionate load
Resilience	Super-peer failure only affects its leaves	Coordinated attack on super-peers is effective
Heterogeneity	Matches node capabilities to roles	Creates unequal "classes" of peers
NAT Handling	Super-peers typically have public IPs	NAT-bound nodes can't become super-peers

Skype's Original Architecture

Original Skype (2003-2012) used super-peers for call setup and NAT traversal. Calls routed through super-peers when direct connection failed. When Microsoft acquired Skype, they moved to centralized supernodes in data centers—trading decentralization for operational control.

Summary: Decentralized Architectures

We've explored the architectural foundations of decentralized P2P systems. Let's consolidate the key concepts:

Key Takeaways

•Decentralization is a spectrum — From federated to fully decentralized, with practical systems often choosing hybrid points.
•Unstructured overlays use flooding — Simple but O(N) message complexity limits scalability. Best for popular content.
•DHTs enable O(log N) lookup — Structured overlays like Chord and Kademlia provide guaranteed, efficient discovery.
•XOR distance powers Kademlia — Symmetric, elegant distance metric enabling the dominant real-world DHT protocol.
•Maintenance is continuous — Peer discovery, routing table refresh, and churn handling require constant background work.
•Super-peers offer practical compromise — Capable nodes take coordination burden, enabling scale while retaining some decentralization.

What's next:

With architectural foundations established, we'll examine P2P's most visible application: file sharing. We'll explore how systems like BitTorrent revolutionized large file distribution, the protocols enabling efficient content exchange, and the engineering that makes swarm-based transfer work.

Page Complete

You now understand the architectural spectrum from centralized to fully decentralized, how unstructured and structured overlays work, the mathematics of DHT routing, and the practical super-peer compromise. Next, we'll see these concepts applied to file sharing systems.

2 / 5

Loading learning content...

Computer NetworksPeer-to-Peer Model

Peer-to-Peer Model

LevelIntermediate

Duration55 mins

TopicPeer-to-Peer Model

2 / 5

Decentralized Architecture

Architecture Without Architects

In this page, we'll explore how P2P systems organize themselves, the structural patterns that arise from local interactions, and the mechanisms that enable coordination without central control.

Learning Objectives

Degrees of Decentralization

"Decentralization" is not binary—it's a spectrum. Understanding where a system falls on this spectrum is crucial for evaluating its properties and tradeoffs.

The Decentralization Spectrum:

Decentralization Levels in Network Systems
Level	Description	Control Point	Examples
Centralized	Single authority controls all operations	Central server(s)	Traditional web apps, email servers
Federated	Multiple authorities cooperate under shared rules	Federation members	Email (SMTP), Matrix, Mastodon
Distributed with Coordinators	Peers handle data, servers coordinate	Coordinator nodes	BitTorrent with trackers, original Skype
Partially Decentralized	Some privileged nodes (super-peers) aid coordination	Super-peers	Kazaa, early Gnutella 0.6
Fully Decentralized	All peers equal, no special nodes required	None (emergent)	Bitcoin, Gnutella 0.4, IPFS

Why Full Decentralization is Rare:

Fully decentralized systems face significant practical challenges:

Bootstrap Problem — How does a new peer find any other peer without asking a known server?
Discovery Latency — Finding resources without an index requires network-wide search
Consistency — Reaching agreement without a coordinator requires complex consensus
Performance — Distributed algorithms often have higher latency than server lookups

Most "decentralized" systems actually use a hybrid model—decentralizing the expensive parts (data transfer, storage) while keeping some coordination centralized or semi-centralized.

The Bootstrap Paradox:

Every P2P network faces a paradox: to join a decentralized network, you need to know at least one peer already in the network. Solutions include:

Hardcoded bootstrap nodes — IP addresses baked into client software
DNS-based discovery — DNS records that point to current bootstrap nodes
Social sharing — Users share peer addresses out-of-band (copied from friends, websites)
DHT bootstrap services — Public services that respond to initial DHT queries

Practical Decentralization

Unstructured Overlay Networks

How Unstructured Overlays Work:

A new peer connects to bootstrap nodes and requests a list of other active peers
The peer establishes connections to some subset of those peers (its "neighbors")
To search for content, the peer floods queries through its neighbors, who propagate to their neighbors
When a peer has content matching a query, it responds directly to the requester
Data transfer occurs directly between the content holder and requester

Converting Mermaid diagram...

Gnutella: The Classic Unstructured Network:

Gnutella (2000) exemplified unstructured P2P. Its protocol defined simple message types:

PING — "I'm alive, tell me about other peers"
PONG — Response listing known peers
QUERY — "Does anyone have files matching this search?"
QUERYHIT — "I have matching files, here's how to get them"
PUSH — Request for firewall-protected peers to initiate connection

Queries spread through the network with a TTL (Time To Live) that decrements at each hop. This limits flooding but also limits reach—content held only by distant peers might never be found.

Scalability Problems:

Unstructured networks face fundamental scalability limits:

O(N) messages per query — Each search may reach most of the network
No guaranteed discovery — Rare content may be unfindable within TTL limits
Bandwidth explosion — Popular networks choke under query traffic
Redundant queries — Same query received multiple times by the same peer

Random Walks and Gossip

Structured Overlays: DHT Fundamentals

The Core DHT Concept:

DHTs work by assigning every peer and every piece of content to positions in a key space (typically a ring of integers from 0 to 2^n - 1, using n-bit identifiers):

Each peer has a node ID, typically a hash of its IP address or public key
Each piece of content has a key, typically a hash of the content or its name
Content is stored at the peer whose node ID is "closest" to the content's key
To find content, route toward the key until reaching the responsible peer

Converting Mermaid diagram...

The "Closest" Definition:

Different DHT systems define "closest" differently:

Chord: Successor—the first node with ID ≥ key (clockwise on ring)
Kademlia: XOR distance—bitwise XOR of node ID and key, interpreted as integer
Pastry: Prefix matching—longest shared prefix in base-16 representation

Each definition creates different routing properties, but all enable O(log N) lookup.

Why O(log N) Lookups?

DHT routing tables are structured so each peer knows about peers at exponentially increasing distances. In Chord with N nodes:

Peer knows its immediate successor (distance 1)
Peer knows a peer ~N/2 away
Peer knows a peer ~N/4 away
Peer knows a peer ~N/8 away
...and so on

This "finger table" structure means each routing decision cuts the remaining distance roughly in half—classic binary search translated to a distributed setting.

DHT vs. Blockchain

Chord Protocol Deep Dive

Chord (MIT, 2001) is the foundational DHT protocol, influential for its elegance and mathematical precision. Understanding Chord's design illuminates core DHT concepts.

Chord's Key Insight:

Chord Finger Table
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* Chord Finger Table for Node n with m-bit identifiers */
 
// Entry i in the finger table of node n:
// finger[i].start = (n + 2^(i-1)) mod 2^m     // where i = 1..m
// finger[i].node  = successor(finger[i].start)
 
// Example: Node 8 in a 6-bit (64-key) ring
// m = 6, n = 8
 
i   | start = (8 + 2^(i-1)) mod 64 | successor
----|------------------------------|----------
1   | (8 + 1)  mod 64 = 9          | Node 14  (first node ≥ 9)
2   | (8 + 2)  mod 64 = 10         | Node 14
3   | (8 + 4)  mod 64 = 12         | Node 14
4   | (8 + 8)  mod 64 = 16         | Node 21
5   | (8 + 16) mod 64 = 24         | Node 32
6   | (8 + 32) mod 64 = 40         | Node 42
 
// Each finger reaches exponentially further around the ring
// Lookup for any key requires at most O(log n) = O(m) hops

Chord Lookup Algorithm:

function find_successor(key):
    if key is between (n, successor]
        return successor
    else
        n' = closest_preceding_finger(key)
        return n'.find_successor(key)  // Remote call

function closest_preceding_finger(key):
    for i = m downto 1:
        if finger[i].node is between (n, key)
            return finger[i].node
    return n

The lookup recursively finds the closest known predecessor of the target key, then asks that node to continue the search. Each hop approximately halves the remaining distance to the target.

Chord Maintenance:

Unlike static data structures, DHT networks constantly change as peers join and leave. Chord maintains correctness through:

Stabilization — Periodic process where nodes verify and update successor pointers
Fix Fingers — Periodic refresh of finger table entries
Predecessor checking — Nodes verify their predecessor is still alive

These background processes ensure routing remains correct even as the network evolves.

Churn is the Enemy

Kademlia: The Modern DHT Standard

Kademlia's Key Innovations:

Kademlia Design Principles

•XOR Metric — Distance between nodes/keys is XOR of their IDs. XOR is symmetric (d(a,b) = d(b,a)), enabling bidirectional routing and reducing maintenance traffic.
•k-Buckets — Routing tables organized by distance range. Each bucket stores k contacts sharing a common prefix length. This provides redundancy against node failure.
•Parallel Lookups — Queries multiple nodes simultaneously, using fastest responders. This reduces latency and handles unresponsive nodes gracefully.
•Iterative Lookups — Rather than recursive forwarding, queries return closer nodes for the requester to contact. This gives the requester control and visibility.
•Natural Churn Resistance — Recently-seen nodes are preferred, and buckets are refreshed lazily. Nodes that stick around get trusted; transient nodes cycle out.

The XOR Distance Metric:

Kademlia's brilliance is using XOR as a distance function:

distance(a, b) = a XOR b

Why XOR works beautifully:

d(a, a) = 0 — Node is at distance 0 from itself
d(a, b) > 0 for a ≠ b — Different nodes have positive distance
d(a, b) = d(b, a) — Distance is symmetric (unlike Chord's successor relationship)
d(a, b) + d(b, c) ≥ d(a, c) — Triangle inequality holds

For any target, there's exactly one node at each distance. This enables consistent routing without asymmetric special cases.

k-Bucket Structure:

For 160-bit node IDs, Kademlia maintains 160 buckets:

Bucket 0: Nodes sharing 0-bit prefix (different in first bit)
Bucket 1: Nodes sharing 1-bit prefix
Bucket 159: Nodes sharing 159-bit prefix (almost identical ID)

Each bucket holds k contacts (typically k=20). Buckets for nearby distances are more populated; distant buckets may have few or no entries since fewer nodes exist at those distances.

Kademlia in Production

Peer Discovery and Maintenance

A decentralized network is only as good as its ability to discover and maintain peer connections. This infrastructure, often invisible to users, determines network health.

Discovery Mechanisms

•Bootstrap Nodes — Hard-coded stable peers that accept initial connections
•DNS Seeds — Domain names that resolve to current bootstrap IP addresses
•Peer Exchange (PEX) — Connected peers share known peer addresses
•Local Discovery — Multicast on LAN to find nearby peers
•DHT Bootstrap — Query DHT for peers near random keys

Maintenance Operations

•Heartbeat/Ping — Periodic liveness checks for connected peers
•Routing Table Refresh — Replace stale entries with fresh contacts
•Bucket Refresh — Query DHT for keys in sparse bucket ranges
•Reputation Tracking — Favor peers with good response history
•Connection Management — Limit connections to avoid resource exhaustion

The Join Process:

When a new peer joins a DHT network:

1. Contact bootstrap node, learn initial peers
2. Generate own node ID (hash of IP/key)
3. Perform lookup for own ID to find nearest neighbors
4. Populate routing table from lookup responses
5. Store any data for keys this node is now responsible for
6. Begin participating in lookups and storage

Handling Departures:

Peers leave networks two ways: gracefully or abruptly.

Graceful departure:

Peer notifies neighbors of departure
Transfers responsibility for stored keys to successor
Neighbors update routing tables

Abrupt departure (crash, network failure):

Neighbors detect via failed heartbeats
Mark peer as unresponsive after threshold
Routing tables self-heal via refresh mechanisms
Stored data may be lost (requiring replication strategies)

Replication is crucial: if each key is stored on k nodes, up to k-1 can fail simultaneously without data loss.

Eclipse Attacks

Super-Peer Architectures

The Super-Peer Model:

Converting Mermaid diagram...

How Super-Peers Work:

Nodes with high bandwidth, long uptime, and public IP addresses become super-peers
Regular nodes ("leaves") connect to one or a few super-peers
Leaves register their shared content with super-peers
Super-peers maintain indexes and handle queries across their leaves
Super-peers communicate among themselves for network-wide queries
Data transfer still happens directly between leaves

Kazaa and FastTrack:

Kazaa (2001) popularized super-peer architecture. It automatically promoted capable nodes to super-peer status based on:

Connection speed
Processing power
Firewall/NAT status (reachable nodes preferred)
Uptime

This hybrid model scaled far better than pure flooding networks while retaining decentralization's resilience.

Advantages and Disadvantages:

Super-Peer Architecture Tradeoffs
Aspect	Advantage	Disadvantage
Query Efficiency	Queries resolved within super-peer indexes	Super-peers become bottlenecks
Network Load	Leaves have minimal overhead	Super-peers bear disproportionate load
Resilience	Super-peer failure only affects its leaves	Coordinated attack on super-peers is effective
Heterogeneity	Matches node capabilities to roles	Creates unequal "classes" of peers
NAT Handling	Super-peers typically have public IPs	NAT-bound nodes can't become super-peers

Skype's Original Architecture

Summary: Decentralized Architectures

We've explored the architectural foundations of decentralized P2P systems. Let's consolidate the key concepts:

Key Takeaways

•Decentralization is a spectrum — From federated to fully decentralized, with practical systems often choosing hybrid points.
•Unstructured overlays use flooding — Simple but O(N) message complexity limits scalability. Best for popular content.
•DHTs enable O(log N) lookup — Structured overlays like Chord and Kademlia provide guaranteed, efficient discovery.
•XOR distance powers Kademlia — Symmetric, elegant distance metric enabling the dominant real-world DHT protocol.
•Maintenance is continuous — Peer discovery, routing table refresh, and churn handling require constant background work.
•Super-peers offer practical compromise — Capable nodes take coordination burden, enabling scale while retaining some decentralization.

What's next:

Page Complete

2 / 5