Traffic Shaping - Learning Module

Loading content...

0/228

Token Bucket Algorithm

The Key to Controlled Flexibility

Imagine you're given an expense account with these rules: you receive $100 per day automatically deposited, you can save up to $500 maximum in your account, and you can spend as fast as you want—as long as you have money in the account. Some days you spend nothing and your balance grows. Then one day, you make a large purchase using your accumulated balance. Over time, your average spending matches your income, but you have the flexibility to spend in bursts when needed.

This is the Token Bucket Algorithm—an elegant traffic shaping mechanism that solves the leaky bucket's fundamental limitation by allowing controlled bursting while still enforcing average rate limits. It has become the dominant algorithm for traffic shaping, rate limiting, and QoS implementation in modern networks.

What You Will Learn

By the end of this page, you will deeply understand the Token Bucket Algorithm—how it differs from leaky bucket, its mathematical properties, implementation approaches, single and dual rate variants, and why it dominates modern traffic management. You will be able to implement token bucket and analyze its behavior in network scenarios.

The Conceptual Model

The Token Bucket Algorithm uses a different metaphor than the leaky bucket:

Components:

Bucket: Contains tokens (credits/permits), not packets
Token generation: Tokens added to bucket at a constant rate (r tokens/second)
Bucket capacity: Maximum tokens the bucket can hold (b tokens)
Token consumption: Each packet consumes tokens equal to its size

Behavior:

Tokens accumulate in the bucket at rate r
When a packet arrives, it needs tokens equal to its size to transmit
If sufficient tokens exist, packet transmits immediately, consuming tokens
If insufficient tokens, packet must wait (shaping) or be dropped/marked (policing)
Tokens never exceed the bucket capacity b—excess tokens are discarded

Converting Mermaid diagram...

Key Insight: Why Token Bucket Allows Bursting

The crucial difference from leaky bucket:

Aspect	Leaky Bucket	Token Bucket
What's in bucket	Packets (data)	Tokens (permits)
Output rate	Constant (leak rate)	Variable (up to line rate)
Burst capability	None—output ≤ leak rate	Yes—can use accumulated tokens
Empty bucket	No packets to send	Must wait for tokens

With token bucket, if you've been idle and accumulated tokens, you can transmit a burst immediately. This 'burst credit' is essential for:

Interactive applications with variable load
TCP window opening after idle periods
Responsive user experience

The Credit Model

Think of tokens as 'bandwidth credits' that you earn at a steady rate. You can save up credits (up to the bucket limit) and spend them whenever you want. If you spend faster than you earn, you eventually run out and must wait. The long-term average spending rate is limited by the earning rate, but short-term bursts are allowed.

Mathematical Foundation

The mathematical analysis of token bucket reveals its powerful properties.

Core Parameters:

Let us define:

r = Token rate (token generation rate) in bytes per second
b = Bucket size (maximum tokens) in bytes
n(t) = Number of tokens at time t
R = Peak transmission rate (line rate or max burst rate)

Token Dynamics:

Token Bucket Mathematical Relationships
Property	Formula	Interpretation
Token accumulation	dn/dt = r (when n < b)	Tokens accumulate at constant rate
Token bound	0 ≤ n(t) ≤ b	Never negative, never exceeds bucket size
On transmission	n(t⁺) = n(t⁻) - packet_size	Transmitting consumes tokens
Conforming traffic	a(t) - a(s) ≤ b + r(t - s)	Arrival curve same as leaky bucket
Maximum burst	b bytes at rate R	Initial burst limited by bucket size

Arrival Curve and Burst:

Token bucket with parameters (r, b) constrains conforming traffic to the arrival curve:

A(t) = b + r·t

This is identical to leaky bucket! The difference is in the output:

Leaky bucket output: D(t) = r·t (constant rate)
Token bucket output: D(t) ≤ b + R·t for short bursts, D(t) ≤ r·t long-term

Burst Duration:

When bucket is full, a burst of b bytes can be transmitted at peak rate R:

Burst duration = b / (R - r)

After the burst, transmission rate falls to r (must wait for token accumulation).

Example Calculation:

Bucket size (b): 1 MB
Token rate (r): 10 Mbps = 1.25 MB/s
Line rate (R): 100 Mbps = 12.5 MB/s
Maximum burst: 1 MB at 100 Mbps (takes 80ms)
Burst duration: 1 MB / (12.5 - 1.25) MB/s = 89ms
After burst: rate drops to 10 Mbps sustained

Same Long-Term, Different Short-Term

Token bucket and leaky bucket enforce the same long-term average rate (r). The difference is entirely in short-term behavior: leaky bucket smooths everything to constant rate, while token bucket allows bursts up to size b. Both constrain traffic to the same arrival curve A(t) = b + rt.

Token Bucket Implementation

Token bucket can be implemented efficiently using lazy calculation—tokens are computed on-demand rather than with a separate timer.

Algorithm (Pseudo-code):

Token Bucket:

Parameters:
  bucket_size: maximum tokens (bytes)
  token_rate: token generation rate (bytes per second)
  
State:
  tokens: current token count (starts at bucket_size)
  last_update: timestamp of last token calculation
  
ConsumeTokens(packet_size, current_time):
  // Add tokens accumulated since last update
  elapsed = current_time - last_update
  tokens = min(bucket_size, tokens + token_rate * elapsed)
  last_update = current_time
  
  // Check if packet can be transmitted
  if tokens >= packet_size:
    tokens -= packet_size
    return CONFORMING  // Transmit immediately
  else:
    return NON_CONFORMING  // Wait, drop, or mark

Key Implementation Points:

Lazy token calculation: Tokens computed only when a packet arrives, not continuously
Timestamp precision: Nanosecond precision needed for high-rate links
Integer arithmetic: Use fixed-point math to avoid floating-point imprecision
Overflow protection: Handle clock wraparound and large elapsed times

token_bucket.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import time
from enum import Enum
from typing import Tuple
import threading
 
class TokenBucketResult(Enum):
    CONFORMING = "green"       # Within committed rate
    NON_CONFORMING = "red"     # Exceeds rate, no tokens
 
class TokenBucket:
    """
    Implements a Token Bucket traffic shaper/policer.
    
    Allows controlled bursting up to bucket size while
    enforcing average rate over time.
    """
    
    def __init__(
        self, 
        bucket_size_bytes: int,
        token_rate_bps: float,
        initial_tokens: int = None
    ):
        """
        Initialize the token bucket.
        
        Args:
            bucket_size_bytes: Maximum bucket capacity (burst size)
            token_rate_bps: Rate of token generation (bytes per second)
            initial_tokens: Starting tokens (defaults to full bucket)
        """
        self.bucket_size = bucket_size_bytes
        self.token_rate = token_rate_bps
        self.tokens = initial_tokens if initial_tokens is not None else bucket_size_bytes
        self.last_update = time.time()
        self.lock = threading.Lock()
        
        # Statistics
        self.total_conforming = 0
        self.total_non_conforming = 0
    
    def _add_tokens(self, current_time: float):
        """Calculate and add tokens accumulated since last update."""
        elapsed = current_time - self.last_update
        if elapsed > 0:
            new_tokens = elapsed * self.token_rate
            self.tokens = min(self.bucket_size, self.tokens + new_tokens)
            self.last_update = current_time
    
    def consume(self, packet_size: int) -> TokenBucketResult:
        """
        Attempt to consume tokens for a packet.
        
        Args:
            packet_size: Size of the packet in bytes
            
        Returns:
            CONFORMING if packet can be sent,
            NON_CONFORMING if insufficient tokens
        """
        with self.lock:
            current_time = time.time()
            self._add_tokens(current_time)
            
            if self.tokens >= packet_size:
                self.tokens -= packet_size
                self.total_conforming += packet_size
                return TokenBucketResult.CONFORMING
            else:
                self.total_non_conforming += packet_size
                return TokenBucketResult.NON_CONFORMING
    
    def consume_blocking(self, packet_size: int, timeout: float = None) -> bool:
        """
        Wait for tokens if necessary, then consume.
        
        This implements traffic shaping (delay) rather than policing (drop).
        
        Args:
            packet_size: Size of the packet in bytes
            timeout: Maximum time to wait (None = wait forever)
            
        Returns:
            True if tokens consumed, False if timeout
        """
        start_time = time.time()
        
        while True:
            with self.lock:
                current_time = time.time()
                self._add_tokens(current_time)
                
                if self.tokens >= packet_size:
                    self.tokens -= packet_size
                    self.total_conforming += packet_size
                    return True
                
                # Calculate wait time for needed tokens
                tokens_needed = packet_size - self.tokens
                wait_time = tokens_needed / self.token_rate
            
            # Check timeout
            if timeout is not None:
                elapsed = time.time() - start_time
                if elapsed + wait_time > timeout:
                    return False
            
            # Sleep
            time.sleep(min(wait_time, 0.1))  # Cap sleep at 100ms
    
    def get_stats(self) -> dict:
        """Return current statistics."""
        with self.lock:
            self._add_tokens(time.time())
            return {
                "current_tokens": self.tokens,
                "bucket_fill_pct": self.tokens / self.bucket_size * 100,
                "total_conforming": self.total_conforming,
                "total_non_conforming": self.total_non_conforming,
                "conformance_rate": (
                    self.total_conforming / 
                    (self.total_conforming + self.total_non_conforming + 0.001)
                ),
            }
 
 
# Example usage
if __name__ == "__main__":
    # 100 KB burst, 10 Mbps average (1.25 MB/s)
    bucket = TokenBucket(
        bucket_size_bytes=100_000,
        token_rate_bps=1_250_000
    )
    
    print("=== Burst Test (bucket starts full) ===")
    # Immediate burst - should all succeed (100 KB in bucket)
    for i in range(10):
        result = bucket.consume(10_000)  # 10 KB packets
        print(f"Packet {i+1}: {result.value}, tokens: {bucket.tokens:.0f}")
    
    print("\n=== Continuing (bucket now empty) ===")
    # These should fail (no tokens left)
    for i in range(3):
        result = bucket.consume(10_000)
        print(f"Packet {i+1}: {result.value}")
    
    print("\n=== After 0.1s wait ===")
    time.sleep(0.1)  # Accumulate ~125 KB of tokens
    for i in range(5):
        result = bucket.consume(10_000)
        print(f"Packet {i+1}: {result.value}")
    
    print(f"\nStats: {bucket.get_stats()}")

Non-Blocking vs Blocking

The implementation above shows both approaches: consume() is non-blocking (returns immediately, caller handles non-conforming), while consume_blocking() implements shaping by waiting for tokens. Production systems often use non-blocking with a separate queue for waiting packets.

Single Rate Token Bucket (srTCM)

The Single Rate Three Color Marker (srTCM) defined in RFC 2697 extends basic token bucket to provide three-level conformance classification.

Structure:

Two buckets share a single token rate:

Committed Bucket (C): Size = CBS (Committed Burst Size)
Excess Bucket (E): Size = EBS (Excess Burst Size)

Tokens flow into the committed bucket at rate CIR (Committed Information Rate). When the committed bucket is full, overflows fill the excess bucket.

Three Colors:

Single Rate Token Bucket Colors
Color	Condition	Meaning	Typical Action
🟢 Green	Packet size ≤ tokens in C bucket	Within committed rate	Forward unchanged
🟡 Yellow	Packet size > C tokens, ≤ C+E tokens	Within excess burst	Forward, mark for potential drop
🔴 Red	Packet size > C+E tokens	Exceeds all limits	Drop or mark lowest priority

srTCM Algorithm:

srTCM Parameters:
  CIR: Committed Information Rate (tokens per second)
  CBS: Committed Burst Size (committed bucket capacity)
  EBS: Excess Burst Size (excess bucket capacity)

State:
  Tc: Tokens in committed bucket (0 to CBS)
  Te: Tokens in excess bucket (0 to EBS)
  last_update: Timestamp

On Packet Arrival:
  // Update tokens based on elapsed time
  tokens_to_add = CIR × elapsed_time
  
  // Fill committed bucket first
  Tc = min(CBS, Tc + tokens_to_add)
  
  // Overflow to excess bucket
  if Tc == CBS:
    overflow = tokens_to_add - (CBS - previous_Tc)
    Te = min(EBS, Te + overflow)
  
  // Classify packet
  if packet_size <= Tc:
    Tc -= packet_size
    return GREEN
  else if packet_size <= Tc + Te:
    Te -= (packet_size - Tc)
    Tc = 0
    return YELLOW
  else:
    return RED

Use Case Example:

Enterprise buys 100 Mbps committed service:

CIR = 100 Mbps, CBS = 1 MB (committed)
EBS = 2 MB (excess burst allowed)

Traffic Pattern	Classification	Result
Steady 80 Mbps	All green	Guaranteed delivery
Burst to 150 Mbps (short)	Green + yellow	Mostly delivered
Sustained 150 Mbps	Green + yellow + red	Guaranteed 100 Mbps only

Color-Aware vs Color-Blind

Network devices can operate in 'color-aware' mode (respecting incoming color markings) or 'color-blind' mode (re-classifying all traffic). Color-aware mode allows end-to-end differentiated service; color-blind mode enforces local policy regardless of upstream markings.

Two Rate Token Bucket (trTCM)

The Two Rate Three Color Marker (trTCM) defined in RFC 2698 provides more flexible traffic contracts by using two independent token rates.

Structure:

Two independent buckets with separate rates:

Committed Bucket (C): Rate = CIR, Size = CBS
Peak Bucket (P): Rate = PIR, Size = PBS

Each bucket fills at its own rate independently.

Difference from srTCM:

srTCM (Single Rate)

•One token rate (CIR)
•Two buckets (CBS, EBS)
•Tokens overflow from C to E
•Measures burst tolerance
•Yellow = using saved burst credits
•Common for bursty SLAs

trTCM (Two Rate)

•Two token rates (CIR, PIR)
•Two buckets (CBS, PBS)
•Buckets fill independently
•Measures sustained + peak rate
•Yellow = between CIR and PIR
•Common for tiered services

trTCM Algorithm:

trTCM Parameters:
  CIR: Committed Information Rate
  CBS: Committed Burst Size
  PIR: Peak Information Rate (PIR > CIR)
  PBS: Peak Burst Size

State:
  Tc: Tokens in committed bucket
  Tp: Tokens in peak bucket

On Packet Arrival:
  // Update both buckets independently
  Tc = min(CBS, Tc + CIR × elapsed_time)
  Tp = min(PBS, Tp + PIR × elapsed_time)
  
  // Classify (check peak bucket first!)
  if packet_size > Tp:
    return RED     // Exceeds peak rate
  else if packet_size > Tc:
    Tp -= packet_size
    return YELLOW  // Between CIR and PIR
  else:
    Tc -= packet_size
    Tp -= packet_size
    return GREEN   // Within committed rate

Key Differences in Behavior:

Scenario	srTCM Result	trTCM Result
Short burst, then idle	Green (C bucket fills first)	Depends on burst size
Sustained traffic at 1.5×CIR	Yellow only if burst saved	Yellow (if < PIR)
Sustained traffic at 2×CIR	Red (no capacity)	Red (exceeds PIR)

Use Case Example:

Service plan with guaranteed + burstable:

CIR = 100 Mbps (guaranteed), CBS = 500 KB
PIR = 200 Mbps (max allowed), PBS = 1 MB

Customer behavior:

Up to 100 Mbps: All green, guaranteed
100-200 Mbps: Yellow, delivered if network has capacity
Over 200 Mbps: Red, dropped or severely marked

Order of Checking Matters

In trTCM, the peak bucket is checked FIRST. This is counter-intuitive but critical: if a packet exceeds peak rate, it's red regardless of committed bucket status. The committed bucket only distinguishes green from yellow for packets that pass the peak rate check.

Token Bucket vs Leaky Bucket: Deep Comparison

Understanding when to use each algorithm is crucial for network design. Let's analyze their differences in depth.

Fundamental Difference:

Leaky Bucket: Controls output rate (when packets leave)
Token Bucket: Controls permission to send (when packets may leave)

This leads to dramatically different behavior:

Leaky Bucket vs Token Bucket Comparison
Aspect	Leaky Bucket	Token Bucket
What's in the bucket	Packets (data)	Tokens (permits)
Bucket fills with	Arriving packets	Time (tokens accumulate)
Bucket drains by	Constant rate transmission	Packet consumption
Maximum output rate	Exactly r (constant)	Line rate R (if tokens available)
Average output rate	≤ r	≤ r
Burst handling	Absorbed and smoothed	Transmitted immediately if tokens exist
After idle period	No pent-up capacity	Full bucket of credits to spend
Output burstiness	Zero (perfectly smooth)	Up to bucket size b
Work conserving	No	Yes (when tokens available)
Delay characteristic	Variable (queue-based)	Zero when tokens available

Graphical Comparison:

Consider 1 MB of data arriving as a burst, with:

Rate (r) = 1 Mbps (125 KB/s)
Bucket size (b) = 500 KB
Line rate (R) = 100 Mbps

Leaky Bucket Output:

Time 0:     1 MB arrives
Time 0-8s:  Output at exact 125 KB/s (8 seconds to transmit 1 MB)
Total time: 8 seconds

Token Bucket Output (starts with full bucket):

Time 0:      1 MB arrives, 500 KB tokens available
Time 0-40ms: Burst 500 KB at 100 Mbps (uses all tokens)
Time 40ms+:  Remaining 500 KB at 125 KB/s (4 seconds)
Total time:  ~4 seconds

Token bucket is twice as fast for this burst scenario!

Decision Guide

Use Leaky Bucket when you need constant output rate (feeding a fixed-rate channel, eliminating jitter). Use Token Bucket for everything else—it's more flexible, better for TCP performance, and what modern traffic shapers typically implement.

Hierarchical Token Buckets (HTB)

Real networks need more than simple rate limiting—they need to share bandwidth across multiple classes hierarchically. Hierarchical Token Bucket (HTB) extends token bucket concepts to create sophisticated bandwidth allocation trees.

HTB Structure:

                    root (total bandwidth)
                      │
         ┌────────────┼────────────┐
         │            │            │
      Voice        Video       Best Effort
     (strict)     (10 Mbps)    (5 Mbps)
                      │
              ┌───────┼───────┐
              │       │       │
           User1   User2   User3
           3 Mbps  3 Mbps  4 Mbps

Key HTB Concepts:

HTB Components

•Rate (guaranteed) — Minimum bandwidth guaranteed to the class
•Ceil (ceiling) — Maximum bandwidth the class can use (if available)
•Burst — Token bucket burst size for the class
•Priority — Order for receiving excess bandwidth (0 = highest)
•Parent — Which class this borrows from and is limited by

Borrowing Mechanism:

HTB allows classes to 'borrow' unused bandwidth from siblings:

Each class has a guaranteed rate and a ceiling
If a class is below ceiling and parent has capacity, it can borrow
Borrowed bandwidth is distributed by priority
When the lending class needs its bandwidth back, borrowers must yield

Example Configuration:

# Linux tc HTB configuration example

# Root qdisc
tc qdisc add dev eth0 root handle 1: htb default 30

# Root class (100 Mbps total)
tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit

# Voice class: 5 Mbps guaranteed, 10 Mbps ceiling, high priority
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit ceil 10mbit prio 0

# Video class: 30 Mbps guaranteed, 50 Mbps ceiling
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 30mbit ceil 50mbit prio 1

# Best effort: 10 Mbps guaranteed, 100 Mbps ceiling (uses leftovers)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 10mbit ceil 100mbit prio 2

# Filters to classify traffic
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
    match ip dport 5060 0xffff classid 1:10  # SIP -> voice class

Benefits of HTB:

Feature	Benefit
Hierarchical structure	Matches organizational boundaries (departments, users)
Guaranteed rates	SLA compliance for critical traffic
Ceiling rates	Allows bursting into unused capacity
Priority-based sharing	High-priority traffic gets excess first
Work conservation	No bandwidth wasted when classes not using share

HTB in Production

HTB is one of the most widely used queuing disciplines in Linux networking. It's the foundation for ISP customer bandwidth management, enterprise QoS, container bandwidth limits, and VM traffic shaping. Understanding HTB principles is essential for network engineering at scale.

Real-World Token Bucket Applications

Token bucket is ubiquitous in modern networking and software systems. Let's examine key applications:

1. API Rate Limiting:

Virtually every web API uses token bucket for rate limiting:

Rate: requests per second allowed
Bucket: burst tolerance
Classical implementation in nginx, API gateways, cloud providers

api_rate_limiter.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import time
from redis import Redis
 
class RedisTokenBucket:
    """
    Distributed token bucket using Redis for API rate limiting.
    Thread-safe and works across multiple servers.
    """
    
    def __init__(
        self,
        redis: Redis,
        key_prefix: str,
        rate_per_second: float,
        bucket_size: int
    ):
        self.redis = redis
        self.key_prefix = key_prefix
        self.rate = rate_per_second
        self.bucket_size = bucket_size
    
    def allow(self, key: str, tokens: int = 1) -> tuple[bool, dict]:
        """
        Check if request is allowed and consume tokens.
        
        Uses Redis Lua script for atomic operation.
        """
        now = time.time()
        bucket_key = f"{self.key_prefix}:{key}"
        
        # Lua script for atomic token bucket operation
        lua_script = """
        local tokens_key = KEYS[1]
        local timestamp_key = KEYS[2]
        
        local rate = tonumber(ARGV[1])
        local bucket_size = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        local requested = tonumber(ARGV[4])
        
        -- Get current state
        local tokens = tonumber(redis.call('get', tokens_key) or bucket_size)
        local last_update = tonumber(redis.call('get', timestamp_key) or now)
        
        -- Add tokens based on elapsed time
        local elapsed = math.max(0, now - last_update)
        local new_tokens = math.min(bucket_size, tokens + (elapsed * rate))
        
        -- Check and consume
        local allowed = 0
        if new_tokens >= requested then
            new_tokens = new_tokens - requested
            allowed = 1
        end
        
        -- Update state
        redis.call('set', tokens_key, new_tokens)
        redis.call('set', timestamp_key, now)
        redis.call('expire', tokens_key, 3600)
        redis.call('expire', timestamp_key, 3600)
        
        return {allowed, new_tokens, bucket_size - new_tokens}
        """
        
        result = self.redis.eval(
            lua_script,
            2,  # Number of keys
            f"{bucket_key}:tokens",
            f"{bucket_key}:timestamp",
            self.rate,
            self.bucket_size,
            now,
            tokens
        )
        
        allowed, remaining, retry_after = result
        return bool(allowed), {
            "remaining": int(remaining),
            "limit": self.bucket_size,
            "retry_after": retry_after / self.rate if not allowed else 0
        }
 
 
# Example usage for API
if __name__ == "__main__":
    redis_client = Redis(host='localhost', port=6379)
    
    # 100 requests per second, 500 request burst
    limiter = RedisTokenBucket(
        redis=redis_client,
        key_prefix="api:ratelimit",
        rate_per_second=100,
        bucket_size=500
    )
    
    # Simulate API calls
    user_id = "user_12345"
    for i in range(10):
        allowed, info = limiter.allow(user_id)
        print(f"Request {i+1}: {'Allowed' if allowed else 'Rate Limited'}, {info}")

Token Bucket Applications Across Industry

•Linux Traffic Control (tc) — TBF (Token Bucket Filter) and HTB qdiscs implement token bucket for kernel-level traffic shaping on every Linux router and server.
•Cisco IOS QoS — Class-Based Weighted Fair Queuing (CBWFQ) and traffic shaping use token bucket for enterprise router QoS.
•AWS VPC — Credit-based networking for EC2 instances uses token bucket principles—credits accumulate when idle, consumed under load.
•Kubernetes — Network policies and bandwidth limiting in CNI plugins use token bucket for per-pod rate enforcement.
•Cloud API Gateways — AWS API Gateway, Google Cloud Endpoints, and Azure API Management all use token bucket rate limiting.
•Database Connection Pools — Rate limiting database queries uses token bucket to prevent overwhelming the database.
•Message Queues — Kafka, RabbitMQ, and SQS use token bucket concepts for producer rate limiting.
•CDN Edge — Cloudflare, Fastly, and Akamai use token bucket for origin shielding and abuse prevention.

HTTP Rate Limit Headers

When you see HTTP headers like X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset, you're seeing token bucket state exposed. These headers tell clients how many 'tokens' remain (Remaining), the bucket size (Limit), and when tokens refill (Reset).

Summary: Token Bucket Algorithm

We've comprehensively explored the Token Bucket Algorithm—the dominant traffic shaping mechanism in modern networking due to its flexibility, efficiency, and TCP-friendly behavior.

Key Takeaways

•Token bucket allows controlled bursting — Unlike leaky bucket's constant output, token bucket permits bursts up to bucket size using accumulated tokens.
•Tokens represent transmission permits — Tokens accumulate at rate r, packets consume tokens to transmit, creating flexible rate control.
•Same arrival curve as leaky bucket — Both constrain traffic to A(t) = b + rt, differing only in output behavior.
•srTCM/trTCM enable three-color marking — Single and two-rate variants classify traffic into green/yellow/red for differentiated treatment.
•HTB provides hierarchical bandwidth allocation — Tree structures enable complex multi-class bandwidth sharing with guarantees and ceilings.
•Token bucket dominates modern systems — From Linux tc to API rate limiting to cloud networking, token bucket is everywhere.
•Work-conserving behavior benefits TCP — Immediate transmission when tokens available cooperates with TCP's congestion control.

What's Next:

With the two fundamental algorithms mastered—leaky bucket and token bucket—we'll now explore Rate Limiting patterns in the next page. You'll learn how these algorithms are applied to build practical rate limiting systems for APIs, services, and network protection.

Page Complete

You now have comprehensive understanding of the Token Bucket Algorithm—its model, mathematics, implementation, single/dual rate variants, hierarchical extensions, and widespread applications. This knowledge is directly applicable to designing and configuring traffic management systems.

Token Bucket Algorithm

The Key to Controlled Flexibility

What You Will Learn

The Conceptual Model

The Token Bucket Algorithm uses a different metaphor than the leaky bucket:

Components:

Bucket: Contains tokens (credits/permits), not packets
Token generation: Tokens added to bucket at a constant rate (r tokens/second)
Bucket capacity: Maximum tokens the bucket can hold (b tokens)
Token consumption: Each packet consumes tokens equal to its size

Behavior:

Tokens accumulate in the bucket at rate r
When a packet arrives, it needs tokens equal to its size to transmit
If sufficient tokens exist, packet transmits immediately, consuming tokens
If insufficient tokens, packet must wait (shaping) or be dropped/marked (policing)
Tokens never exceed the bucket capacity b—excess tokens are discarded

Converting Mermaid diagram...

Key Insight: Why Token Bucket Allows Bursting

The crucial difference from leaky bucket:

Aspect	Leaky Bucket	Token Bucket
What's in bucket	Packets (data)	Tokens (permits)
Output rate	Constant (leak rate)	Variable (up to line rate)
Burst capability	None—output ≤ leak rate	Yes—can use accumulated tokens
Empty bucket	No packets to send	Must wait for tokens

With token bucket, if you've been idle and accumulated tokens, you can transmit a burst immediately. This 'burst credit' is essential for:

Interactive applications with variable load
TCP window opening after idle periods
Responsive user experience

The Credit Model

Mathematical Foundation

The mathematical analysis of token bucket reveals its powerful properties.

Core Parameters:

Let us define:

r = Token rate (token generation rate) in bytes per second
b = Bucket size (maximum tokens) in bytes
n(t) = Number of tokens at time t
R = Peak transmission rate (line rate or max burst rate)

Token Dynamics:

Token Bucket Mathematical Relationships
Property	Formula	Interpretation
Token accumulation	dn/dt = r (when n < b)	Tokens accumulate at constant rate
Token bound	0 ≤ n(t) ≤ b	Never negative, never exceeds bucket size
On transmission	n(t⁺) = n(t⁻) - packet_size	Transmitting consumes tokens
Conforming traffic	a(t) - a(s) ≤ b + r(t - s)	Arrival curve same as leaky bucket
Maximum burst	b bytes at rate R	Initial burst limited by bucket size

Arrival Curve and Burst:

Token bucket with parameters (r, b) constrains conforming traffic to the arrival curve:

A(t) = b + r·t

This is identical to leaky bucket! The difference is in the output:

Leaky bucket output: D(t) = r·t (constant rate)
Token bucket output: D(t) ≤ b + R·t for short bursts, D(t) ≤ r·t long-term

Burst Duration:

When bucket is full, a burst of b bytes can be transmitted at peak rate R:

Burst duration = b / (R - r)

After the burst, transmission rate falls to r (must wait for token accumulation).

Example Calculation:

Bucket size (b): 1 MB
Token rate (r): 10 Mbps = 1.25 MB/s
Line rate (R): 100 Mbps = 12.5 MB/s
Maximum burst: 1 MB at 100 Mbps (takes 80ms)
Burst duration: 1 MB / (12.5 - 1.25) MB/s = 89ms
After burst: rate drops to 10 Mbps sustained

Same Long-Term, Different Short-Term

Token Bucket Implementation

Token bucket can be implemented efficiently using lazy calculation—tokens are computed on-demand rather than with a separate timer.

Algorithm (Pseudo-code):

Token Bucket:

Parameters:
  bucket_size: maximum tokens (bytes)
  token_rate: token generation rate (bytes per second)
  
State:
  tokens: current token count (starts at bucket_size)
  last_update: timestamp of last token calculation
  
ConsumeTokens(packet_size, current_time):
  // Add tokens accumulated since last update
  elapsed = current_time - last_update
  tokens = min(bucket_size, tokens + token_rate * elapsed)
  last_update = current_time
  
  // Check if packet can be transmitted
  if tokens >= packet_size:
    tokens -= packet_size
    return CONFORMING  // Transmit immediately
  else:
    return NON_CONFORMING  // Wait, drop, or mark

Key Implementation Points:

Lazy token calculation: Tokens computed only when a packet arrives, not continuously
Timestamp precision: Nanosecond precision needed for high-rate links
Integer arithmetic: Use fixed-point math to avoid floating-point imprecision
Overflow protection: Handle clock wraparound and large elapsed times

token_bucket.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import time
from enum import Enum
from typing import Tuple
import threading
 
class TokenBucketResult(Enum):
    CONFORMING = "green"       # Within committed rate
    NON_CONFORMING = "red"     # Exceeds rate, no tokens
 
class TokenBucket:
    """
    Implements a Token Bucket traffic shaper/policer.
    
    Allows controlled bursting up to bucket size while
    enforcing average rate over time.
    """
    
    def __init__(
        self, 
        bucket_size_bytes: int,
        token_rate_bps: float,
        initial_tokens: int = None
    ):
        """
        Initialize the token bucket.
        
        Args:
            bucket_size_bytes: Maximum bucket capacity (burst size)
            token_rate_bps: Rate of token generation (bytes per second)
            initial_tokens: Starting tokens (defaults to full bucket)
        """
        self.bucket_size = bucket_size_bytes
        self.token_rate = token_rate_bps
        self.tokens = initial_tokens if initial_tokens is not None else bucket_size_bytes
        self.last_update = time.time()
        self.lock = threading.Lock()
        
        # Statistics
        self.total_conforming = 0
        self.total_non_conforming = 0
    
    def _add_tokens(self, current_time: float):
        """Calculate and add tokens accumulated since last update."""
        elapsed = current_time - self.last_update
        if elapsed > 0:
            new_tokens = elapsed * self.token_rate
            self.tokens = min(self.bucket_size, self.tokens + new_tokens)
            self.last_update = current_time
    
    def consume(self, packet_size: int) -> TokenBucketResult:
        """
        Attempt to consume tokens for a packet.
        
        Args:
            packet_size: Size of the packet in bytes
            
        Returns:
            CONFORMING if packet can be sent,
            NON_CONFORMING if insufficient tokens
        """
        with self.lock:
            current_time = time.time()
            self._add_tokens(current_time)
            
            if self.tokens >= packet_size:
                self.tokens -= packet_size
                self.total_conforming += packet_size
                return TokenBucketResult.CONFORMING
            else:
                self.total_non_conforming += packet_size
                return TokenBucketResult.NON_CONFORMING
    
    def consume_blocking(self, packet_size: int, timeout: float = None) -> bool:
        """
        Wait for tokens if necessary, then consume.
        
        This implements traffic shaping (delay) rather than policing (drop).
        
        Args:
            packet_size: Size of the packet in bytes
            timeout: Maximum time to wait (None = wait forever)
            
        Returns:
            True if tokens consumed, False if timeout
        """
        start_time = time.time()
        
        while True:
            with self.lock:
                current_time = time.time()
                self._add_tokens(current_time)
                
                if self.tokens >= packet_size:
                    self.tokens -= packet_size
                    self.total_conforming += packet_size
                    return True
                
                # Calculate wait time for needed tokens
                tokens_needed = packet_size - self.tokens
                wait_time = tokens_needed / self.token_rate
            
            # Check timeout
            if timeout is not None:
                elapsed = time.time() - start_time
                if elapsed + wait_time > timeout:
                    return False
            
            # Sleep
            time.sleep(min(wait_time, 0.1))  # Cap sleep at 100ms
    
    def get_stats(self) -> dict:
        """Return current statistics."""
        with self.lock:
            self._add_tokens(time.time())
            return {
                "current_tokens": self.tokens,
                "bucket_fill_pct": self.tokens / self.bucket_size * 100,
                "total_conforming": self.total_conforming,
                "total_non_conforming": self.total_non_conforming,
                "conformance_rate": (
                    self.total_conforming / 
                    (self.total_conforming + self.total_non_conforming + 0.001)
                ),
            }
 
 
# Example usage
if __name__ == "__main__":
    # 100 KB burst, 10 Mbps average (1.25 MB/s)
    bucket = TokenBucket(
        bucket_size_bytes=100_000,
        token_rate_bps=1_250_000
    )
    
    print("=== Burst Test (bucket starts full) ===")
    # Immediate burst - should all succeed (100 KB in bucket)
    for i in range(10):
        result = bucket.consume(10_000)  # 10 KB packets
        print(f"Packet {i+1}: {result.value}, tokens: {bucket.tokens:.0f}")
    
    print("\n=== Continuing (bucket now empty) ===")
    # These should fail (no tokens left)
    for i in range(3):
        result = bucket.consume(10_000)
        print(f"Packet {i+1}: {result.value}")
    
    print("\n=== After 0.1s wait ===")
    time.sleep(0.1)  # Accumulate ~125 KB of tokens
    for i in range(5):
        result = bucket.consume(10_000)
        print(f"Packet {i+1}: {result.value}")
    
    print(f"\nStats: {bucket.get_stats()}")

Non-Blocking vs Blocking

Single Rate Token Bucket (srTCM)

The Single Rate Three Color Marker (srTCM) defined in RFC 2697 extends basic token bucket to provide three-level conformance classification.

Structure:

Two buckets share a single token rate:

Committed Bucket (C): Size = CBS (Committed Burst Size)
Excess Bucket (E): Size = EBS (Excess Burst Size)

Tokens flow into the committed bucket at rate CIR (Committed Information Rate). When the committed bucket is full, overflows fill the excess bucket.

Three Colors:

Single Rate Token Bucket Colors
Color	Condition	Meaning	Typical Action
🟢 Green	Packet size ≤ tokens in C bucket	Within committed rate	Forward unchanged
🟡 Yellow	Packet size > C tokens, ≤ C+E tokens	Within excess burst	Forward, mark for potential drop
🔴 Red	Packet size > C+E tokens	Exceeds all limits	Drop or mark lowest priority

srTCM Algorithm:

srTCM Parameters:
  CIR: Committed Information Rate (tokens per second)
  CBS: Committed Burst Size (committed bucket capacity)
  EBS: Excess Burst Size (excess bucket capacity)

State:
  Tc: Tokens in committed bucket (0 to CBS)
  Te: Tokens in excess bucket (0 to EBS)
  last_update: Timestamp

On Packet Arrival:
  // Update tokens based on elapsed time
  tokens_to_add = CIR × elapsed_time
  
  // Fill committed bucket first
  Tc = min(CBS, Tc + tokens_to_add)
  
  // Overflow to excess bucket
  if Tc == CBS:
    overflow = tokens_to_add - (CBS - previous_Tc)
    Te = min(EBS, Te + overflow)
  
  // Classify packet
  if packet_size <= Tc:
    Tc -= packet_size
    return GREEN
  else if packet_size <= Tc + Te:
    Te -= (packet_size - Tc)
    Tc = 0
    return YELLOW
  else:
    return RED

Use Case Example:

Enterprise buys 100 Mbps committed service:

CIR = 100 Mbps, CBS = 1 MB (committed)
EBS = 2 MB (excess burst allowed)

Traffic Pattern	Classification	Result
Steady 80 Mbps	All green	Guaranteed delivery
Burst to 150 Mbps (short)	Green + yellow	Mostly delivered
Sustained 150 Mbps	Green + yellow + red	Guaranteed 100 Mbps only

Color-Aware vs Color-Blind

Two Rate Token Bucket (trTCM)

The Two Rate Three Color Marker (trTCM) defined in RFC 2698 provides more flexible traffic contracts by using two independent token rates.

Structure:

Two independent buckets with separate rates:

Committed Bucket (C): Rate = CIR, Size = CBS
Peak Bucket (P): Rate = PIR, Size = PBS

Each bucket fills at its own rate independently.

Difference from srTCM:

srTCM (Single Rate)

•One token rate (CIR)
•Two buckets (CBS, EBS)
•Tokens overflow from C to E
•Measures burst tolerance
•Yellow = using saved burst credits
•Common for bursty SLAs

trTCM (Two Rate)

•Two token rates (CIR, PIR)
•Two buckets (CBS, PBS)
•Buckets fill independently
•Measures sustained + peak rate
•Yellow = between CIR and PIR
•Common for tiered services

trTCM Algorithm:

trTCM Parameters:
  CIR: Committed Information Rate
  CBS: Committed Burst Size
  PIR: Peak Information Rate (PIR > CIR)
  PBS: Peak Burst Size

State:
  Tc: Tokens in committed bucket
  Tp: Tokens in peak bucket

On Packet Arrival:
  // Update both buckets independently
  Tc = min(CBS, Tc + CIR × elapsed_time)
  Tp = min(PBS, Tp + PIR × elapsed_time)
  
  // Classify (check peak bucket first!)
  if packet_size > Tp:
    return RED     // Exceeds peak rate
  else if packet_size > Tc:
    Tp -= packet_size
    return YELLOW  // Between CIR and PIR
  else:
    Tc -= packet_size
    Tp -= packet_size
    return GREEN   // Within committed rate

Key Differences in Behavior:

Scenario	srTCM Result	trTCM Result
Short burst, then idle	Green (C bucket fills first)	Depends on burst size
Sustained traffic at 1.5×CIR	Yellow only if burst saved	Yellow (if < PIR)
Sustained traffic at 2×CIR	Red (no capacity)	Red (exceeds PIR)

Use Case Example:

Service plan with guaranteed + burstable:

CIR = 100 Mbps (guaranteed), CBS = 500 KB
PIR = 200 Mbps (max allowed), PBS = 1 MB

Customer behavior:

Up to 100 Mbps: All green, guaranteed
100-200 Mbps: Yellow, delivered if network has capacity
Over 200 Mbps: Red, dropped or severely marked

Order of Checking Matters

Token Bucket vs Leaky Bucket: Deep Comparison

Understanding when to use each algorithm is crucial for network design. Let's analyze their differences in depth.

Fundamental Difference:

Leaky Bucket: Controls output rate (when packets leave)
Token Bucket: Controls permission to send (when packets may leave)

This leads to dramatically different behavior:

Leaky Bucket vs Token Bucket Comparison
Aspect	Leaky Bucket	Token Bucket
What's in the bucket	Packets (data)	Tokens (permits)
Bucket fills with	Arriving packets	Time (tokens accumulate)
Bucket drains by	Constant rate transmission	Packet consumption
Maximum output rate	Exactly r (constant)	Line rate R (if tokens available)
Average output rate	≤ r	≤ r
Burst handling	Absorbed and smoothed	Transmitted immediately if tokens exist
After idle period	No pent-up capacity	Full bucket of credits to spend
Output burstiness	Zero (perfectly smooth)	Up to bucket size b
Work conserving	No	Yes (when tokens available)
Delay characteristic	Variable (queue-based)	Zero when tokens available

Graphical Comparison:

Consider 1 MB of data arriving as a burst, with:

Rate (r) = 1 Mbps (125 KB/s)
Bucket size (b) = 500 KB
Line rate (R) = 100 Mbps

Leaky Bucket Output:

Time 0:     1 MB arrives
Time 0-8s:  Output at exact 125 KB/s (8 seconds to transmit 1 MB)
Total time: 8 seconds

Token Bucket Output (starts with full bucket):

Time 0:      1 MB arrives, 500 KB tokens available
Time 0-40ms: Burst 500 KB at 100 Mbps (uses all tokens)
Time 40ms+:  Remaining 500 KB at 125 KB/s (4 seconds)
Total time:  ~4 seconds

Token bucket is twice as fast for this burst scenario!

Decision Guide

Hierarchical Token Buckets (HTB)

HTB Structure:

                    root (total bandwidth)
                      │
         ┌────────────┼────────────┐
         │            │            │
      Voice        Video       Best Effort
     (strict)     (10 Mbps)    (5 Mbps)
                      │
              ┌───────┼───────┐
              │       │       │
           User1   User2   User3
           3 Mbps  3 Mbps  4 Mbps

Key HTB Concepts:

HTB Components

•Rate (guaranteed) — Minimum bandwidth guaranteed to the class
•Ceil (ceiling) — Maximum bandwidth the class can use (if available)
•Burst — Token bucket burst size for the class
•Priority — Order for receiving excess bandwidth (0 = highest)
•Parent — Which class this borrows from and is limited by

Borrowing Mechanism:

HTB allows classes to 'borrow' unused bandwidth from siblings:

Each class has a guaranteed rate and a ceiling
If a class is below ceiling and parent has capacity, it can borrow
Borrowed bandwidth is distributed by priority
When the lending class needs its bandwidth back, borrowers must yield

Example Configuration:

# Linux tc HTB configuration example

# Root qdisc
tc qdisc add dev eth0 root handle 1: htb default 30

# Root class (100 Mbps total)
tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit

# Voice class: 5 Mbps guaranteed, 10 Mbps ceiling, high priority
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit ceil 10mbit prio 0

# Video class: 30 Mbps guaranteed, 50 Mbps ceiling
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 30mbit ceil 50mbit prio 1

# Best effort: 10 Mbps guaranteed, 100 Mbps ceiling (uses leftovers)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 10mbit ceil 100mbit prio 2

# Filters to classify traffic
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
    match ip dport 5060 0xffff classid 1:10  # SIP -> voice class

Benefits of HTB:

Feature	Benefit
Hierarchical structure	Matches organizational boundaries (departments, users)
Guaranteed rates	SLA compliance for critical traffic
Ceiling rates	Allows bursting into unused capacity
Priority-based sharing	High-priority traffic gets excess first
Work conservation	No bandwidth wasted when classes not using share

HTB in Production

Real-World Token Bucket Applications

Token bucket is ubiquitous in modern networking and software systems. Let's examine key applications:

1. API Rate Limiting:

Virtually every web API uses token bucket for rate limiting:

Rate: requests per second allowed
Bucket: burst tolerance
Classical implementation in nginx, API gateways, cloud providers

api_rate_limiter.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import time
from redis import Redis
 
class RedisTokenBucket:
    """
    Distributed token bucket using Redis for API rate limiting.
    Thread-safe and works across multiple servers.
    """
    
    def __init__(
        self,
        redis: Redis,
        key_prefix: str,
        rate_per_second: float,
        bucket_size: int
    ):
        self.redis = redis
        self.key_prefix = key_prefix
        self.rate = rate_per_second
        self.bucket_size = bucket_size
    
    def allow(self, key: str, tokens: int = 1) -> tuple[bool, dict]:
        """
        Check if request is allowed and consume tokens.
        
        Uses Redis Lua script for atomic operation.
        """
        now = time.time()
        bucket_key = f"{self.key_prefix}:{key}"
        
        # Lua script for atomic token bucket operation
        lua_script = """
        local tokens_key = KEYS[1]
        local timestamp_key = KEYS[2]
        
        local rate = tonumber(ARGV[1])
        local bucket_size = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        local requested = tonumber(ARGV[4])
        
        -- Get current state
        local tokens = tonumber(redis.call('get', tokens_key) or bucket_size)
        local last_update = tonumber(redis.call('get', timestamp_key) or now)
        
        -- Add tokens based on elapsed time
        local elapsed = math.max(0, now - last_update)
        local new_tokens = math.min(bucket_size, tokens + (elapsed * rate))
        
        -- Check and consume
        local allowed = 0
        if new_tokens >= requested then
            new_tokens = new_tokens - requested
            allowed = 1
        end
        
        -- Update state
        redis.call('set', tokens_key, new_tokens)
        redis.call('set', timestamp_key, now)
        redis.call('expire', tokens_key, 3600)
        redis.call('expire', timestamp_key, 3600)
        
        return {allowed, new_tokens, bucket_size - new_tokens}
        """
        
        result = self.redis.eval(
            lua_script,
            2,  # Number of keys
            f"{bucket_key}:tokens",
            f"{bucket_key}:timestamp",
            self.rate,
            self.bucket_size,
            now,
            tokens
        )
        
        allowed, remaining, retry_after = result
        return bool(allowed), {
            "remaining": int(remaining),
            "limit": self.bucket_size,
            "retry_after": retry_after / self.rate if not allowed else 0
        }
 
 
# Example usage for API
if __name__ == "__main__":
    redis_client = Redis(host='localhost', port=6379)
    
    # 100 requests per second, 500 request burst
    limiter = RedisTokenBucket(
        redis=redis_client,
        key_prefix="api:ratelimit",
        rate_per_second=100,
        bucket_size=500
    )
    
    # Simulate API calls
    user_id = "user_12345"
    for i in range(10):
        allowed, info = limiter.allow(user_id)
        print(f"Request {i+1}: {'Allowed' if allowed else 'Rate Limited'}, {info}")

Token Bucket Applications Across Industry

•Linux Traffic Control (tc) — TBF (Token Bucket Filter) and HTB qdiscs implement token bucket for kernel-level traffic shaping on every Linux router and server.
•Cisco IOS QoS — Class-Based Weighted Fair Queuing (CBWFQ) and traffic shaping use token bucket for enterprise router QoS.
•AWS VPC — Credit-based networking for EC2 instances uses token bucket principles—credits accumulate when idle, consumed under load.
•Kubernetes — Network policies and bandwidth limiting in CNI plugins use token bucket for per-pod rate enforcement.
•Cloud API Gateways — AWS API Gateway, Google Cloud Endpoints, and Azure API Management all use token bucket rate limiting.
•Database Connection Pools — Rate limiting database queries uses token bucket to prevent overwhelming the database.
•Message Queues — Kafka, RabbitMQ, and SQS use token bucket concepts for producer rate limiting.
•CDN Edge — Cloudflare, Fastly, and Akamai use token bucket for origin shielding and abuse prevention.

HTTP Rate Limit Headers

Summary: Token Bucket Algorithm

We've comprehensively explored the Token Bucket Algorithm—the dominant traffic shaping mechanism in modern networking due to its flexibility, efficiency, and TCP-friendly behavior.

Key Takeaways

•Token bucket allows controlled bursting — Unlike leaky bucket's constant output, token bucket permits bursts up to bucket size using accumulated tokens.
•Tokens represent transmission permits — Tokens accumulate at rate r, packets consume tokens to transmit, creating flexible rate control.
•Same arrival curve as leaky bucket — Both constrain traffic to A(t) = b + rt, differing only in output behavior.
•srTCM/trTCM enable three-color marking — Single and two-rate variants classify traffic into green/yellow/red for differentiated treatment.
•HTB provides hierarchical bandwidth allocation — Tree structures enable complex multi-class bandwidth sharing with guarantees and ceilings.
•Token bucket dominates modern systems — From Linux tc to API rate limiting to cloud networking, token bucket is everywhere.
•Work-conserving behavior benefits TCP — Immediate transmission when tokens available cooperates with TCP's congestion control.

What's Next:

Page Complete