Loading content...
Imagine you're given an expense account with these rules: you receive $100 per day automatically deposited, you can save up to $500 maximum in your account, and you can spend as fast as you want—as long as you have money in the account. Some days you spend nothing and your balance grows. Then one day, you make a large purchase using your accumulated balance. Over time, your average spending matches your income, but you have the flexibility to spend in bursts when needed.
This is the Token Bucket Algorithm—an elegant traffic shaping mechanism that solves the leaky bucket's fundamental limitation by allowing controlled bursting while still enforcing average rate limits. It has become the dominant algorithm for traffic shaping, rate limiting, and QoS implementation in modern networks.
By the end of this page, you will deeply understand the Token Bucket Algorithm—how it differs from leaky bucket, its mathematical properties, implementation approaches, single and dual rate variants, and why it dominates modern traffic management. You will be able to implement token bucket and analyze its behavior in network scenarios.
The Token Bucket Algorithm uses a different metaphor than the leaky bucket:
Components:
Behavior:
Key Insight: Why Token Bucket Allows Bursting
The crucial difference from leaky bucket:
| Aspect | Leaky Bucket | Token Bucket |
|---|---|---|
| What's in bucket | Packets (data) | Tokens (permits) |
| Output rate | Constant (leak rate) | Variable (up to line rate) |
| Burst capability | None—output ≤ leak rate | Yes—can use accumulated tokens |
| Empty bucket | No packets to send | Must wait for tokens |
With token bucket, if you've been idle and accumulated tokens, you can transmit a burst immediately. This 'burst credit' is essential for:
Think of tokens as 'bandwidth credits' that you earn at a steady rate. You can save up credits (up to the bucket limit) and spend them whenever you want. If you spend faster than you earn, you eventually run out and must wait. The long-term average spending rate is limited by the earning rate, but short-term bursts are allowed.
The mathematical analysis of token bucket reveals its powerful properties.
Core Parameters:
Let us define:
Token Dynamics:
| Property | Formula | Interpretation |
|---|---|---|
| Token accumulation | dn/dt = r (when n < b) | Tokens accumulate at constant rate |
| Token bound | 0 ≤ n(t) ≤ b | Never negative, never exceeds bucket size |
| On transmission | n(t⁺) = n(t⁻) - packet_size | Transmitting consumes tokens |
| Conforming traffic | a(t) - a(s) ≤ b + r(t - s) | Arrival curve same as leaky bucket |
| Maximum burst | b bytes at rate R | Initial burst limited by bucket size |
Arrival Curve and Burst:
Token bucket with parameters (r, b) constrains conforming traffic to the arrival curve:
A(t) = b + r·t
This is identical to leaky bucket! The difference is in the output:
Burst Duration:
When bucket is full, a burst of b bytes can be transmitted at peak rate R:
Burst duration = b / (R - r)
After the burst, transmission rate falls to r (must wait for token accumulation).
Example Calculation:
Token bucket and leaky bucket enforce the same long-term average rate (r). The difference is entirely in short-term behavior: leaky bucket smooths everything to constant rate, while token bucket allows bursts up to size b. Both constrain traffic to the same arrival curve A(t) = b + rt.
Token bucket can be implemented efficiently using lazy calculation—tokens are computed on-demand rather than with a separate timer.
Algorithm (Pseudo-code):
Token Bucket:
Parameters:
bucket_size: maximum tokens (bytes)
token_rate: token generation rate (bytes per second)
State:
tokens: current token count (starts at bucket_size)
last_update: timestamp of last token calculation
ConsumeTokens(packet_size, current_time):
// Add tokens accumulated since last update
elapsed = current_time - last_update
tokens = min(bucket_size, tokens + token_rate * elapsed)
last_update = current_time
// Check if packet can be transmitted
if tokens >= packet_size:
tokens -= packet_size
return CONFORMING // Transmit immediately
else:
return NON_CONFORMING // Wait, drop, or mark
Key Implementation Points:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153
import timefrom enum import Enumfrom typing import Tupleimport threading class TokenBucketResult(Enum): CONFORMING = "green" # Within committed rate NON_CONFORMING = "red" # Exceeds rate, no tokens class TokenBucket: """ Implements a Token Bucket traffic shaper/policer. Allows controlled bursting up to bucket size while enforcing average rate over time. """ def __init__( self, bucket_size_bytes: int, token_rate_bps: float, initial_tokens: int = None ): """ Initialize the token bucket. Args: bucket_size_bytes: Maximum bucket capacity (burst size) token_rate_bps: Rate of token generation (bytes per second) initial_tokens: Starting tokens (defaults to full bucket) """ self.bucket_size = bucket_size_bytes self.token_rate = token_rate_bps self.tokens = initial_tokens if initial_tokens is not None else bucket_size_bytes self.last_update = time.time() self.lock = threading.Lock() # Statistics self.total_conforming = 0 self.total_non_conforming = 0 def _add_tokens(self, current_time: float): """Calculate and add tokens accumulated since last update.""" elapsed = current_time - self.last_update if elapsed > 0: new_tokens = elapsed * self.token_rate self.tokens = min(self.bucket_size, self.tokens + new_tokens) self.last_update = current_time def consume(self, packet_size: int) -> TokenBucketResult: """ Attempt to consume tokens for a packet. Args: packet_size: Size of the packet in bytes Returns: CONFORMING if packet can be sent, NON_CONFORMING if insufficient tokens """ with self.lock: current_time = time.time() self._add_tokens(current_time) if self.tokens >= packet_size: self.tokens -= packet_size self.total_conforming += packet_size return TokenBucketResult.CONFORMING else: self.total_non_conforming += packet_size return TokenBucketResult.NON_CONFORMING def consume_blocking(self, packet_size: int, timeout: float = None) -> bool: """ Wait for tokens if necessary, then consume. This implements traffic shaping (delay) rather than policing (drop). Args: packet_size: Size of the packet in bytes timeout: Maximum time to wait (None = wait forever) Returns: True if tokens consumed, False if timeout """ start_time = time.time() while True: with self.lock: current_time = time.time() self._add_tokens(current_time) if self.tokens >= packet_size: self.tokens -= packet_size self.total_conforming += packet_size return True # Calculate wait time for needed tokens tokens_needed = packet_size - self.tokens wait_time = tokens_needed / self.token_rate # Check timeout if timeout is not None: elapsed = time.time() - start_time if elapsed + wait_time > timeout: return False # Sleep time.sleep(min(wait_time, 0.1)) # Cap sleep at 100ms def get_stats(self) -> dict: """Return current statistics.""" with self.lock: self._add_tokens(time.time()) return { "current_tokens": self.tokens, "bucket_fill_pct": self.tokens / self.bucket_size * 100, "total_conforming": self.total_conforming, "total_non_conforming": self.total_non_conforming, "conformance_rate": ( self.total_conforming / (self.total_conforming + self.total_non_conforming + 0.001) ), } # Example usageif __name__ == "__main__": # 100 KB burst, 10 Mbps average (1.25 MB/s) bucket = TokenBucket( bucket_size_bytes=100_000, token_rate_bps=1_250_000 ) print("=== Burst Test (bucket starts full) ===") # Immediate burst - should all succeed (100 KB in bucket) for i in range(10): result = bucket.consume(10_000) # 10 KB packets print(f"Packet {i+1}: {result.value}, tokens: {bucket.tokens:.0f}") print("\n=== Continuing (bucket now empty) ===") # These should fail (no tokens left) for i in range(3): result = bucket.consume(10_000) print(f"Packet {i+1}: {result.value}") print("\n=== After 0.1s wait ===") time.sleep(0.1) # Accumulate ~125 KB of tokens for i in range(5): result = bucket.consume(10_000) print(f"Packet {i+1}: {result.value}") print(f"\nStats: {bucket.get_stats()}")The implementation above shows both approaches: consume() is non-blocking (returns immediately, caller handles non-conforming), while consume_blocking() implements shaping by waiting for tokens. Production systems often use non-blocking with a separate queue for waiting packets.
The Single Rate Three Color Marker (srTCM) defined in RFC 2697 extends basic token bucket to provide three-level conformance classification.
Structure:
Two buckets share a single token rate:
Tokens flow into the committed bucket at rate CIR (Committed Information Rate). When the committed bucket is full, overflows fill the excess bucket.
Three Colors:
| Color | Condition | Meaning | Typical Action |
|---|---|---|---|
| 🟢 Green | Packet size ≤ tokens in C bucket | Within committed rate | Forward unchanged |
| 🟡 Yellow | Packet size > C tokens, ≤ C+E tokens | Within excess burst | Forward, mark for potential drop |
| 🔴 Red | Packet size > C+E tokens | Exceeds all limits | Drop or mark lowest priority |
srTCM Algorithm:
srTCM Parameters:
CIR: Committed Information Rate (tokens per second)
CBS: Committed Burst Size (committed bucket capacity)
EBS: Excess Burst Size (excess bucket capacity)
State:
Tc: Tokens in committed bucket (0 to CBS)
Te: Tokens in excess bucket (0 to EBS)
last_update: Timestamp
On Packet Arrival:
// Update tokens based on elapsed time
tokens_to_add = CIR × elapsed_time
// Fill committed bucket first
Tc = min(CBS, Tc + tokens_to_add)
// Overflow to excess bucket
if Tc == CBS:
overflow = tokens_to_add - (CBS - previous_Tc)
Te = min(EBS, Te + overflow)
// Classify packet
if packet_size <= Tc:
Tc -= packet_size
return GREEN
else if packet_size <= Tc + Te:
Te -= (packet_size - Tc)
Tc = 0
return YELLOW
else:
return RED
Use Case Example:
Enterprise buys 100 Mbps committed service:
| Traffic Pattern | Classification | Result |
|---|---|---|
| Steady 80 Mbps | All green | Guaranteed delivery |
| Burst to 150 Mbps (short) | Green + yellow | Mostly delivered |
| Sustained 150 Mbps | Green + yellow + red | Guaranteed 100 Mbps only |
Network devices can operate in 'color-aware' mode (respecting incoming color markings) or 'color-blind' mode (re-classifying all traffic). Color-aware mode allows end-to-end differentiated service; color-blind mode enforces local policy regardless of upstream markings.
The Two Rate Three Color Marker (trTCM) defined in RFC 2698 provides more flexible traffic contracts by using two independent token rates.
Structure:
Two independent buckets with separate rates:
Each bucket fills at its own rate independently.
Difference from srTCM:
trTCM Algorithm:
trTCM Parameters:
CIR: Committed Information Rate
CBS: Committed Burst Size
PIR: Peak Information Rate (PIR > CIR)
PBS: Peak Burst Size
State:
Tc: Tokens in committed bucket
Tp: Tokens in peak bucket
On Packet Arrival:
// Update both buckets independently
Tc = min(CBS, Tc + CIR × elapsed_time)
Tp = min(PBS, Tp + PIR × elapsed_time)
// Classify (check peak bucket first!)
if packet_size > Tp:
return RED // Exceeds peak rate
else if packet_size > Tc:
Tp -= packet_size
return YELLOW // Between CIR and PIR
else:
Tc -= packet_size
Tp -= packet_size
return GREEN // Within committed rate
Key Differences in Behavior:
| Scenario | srTCM Result | trTCM Result |
|---|---|---|
| Short burst, then idle | Green (C bucket fills first) | Depends on burst size |
| Sustained traffic at 1.5×CIR | Yellow only if burst saved | Yellow (if < PIR) |
| Sustained traffic at 2×CIR | Red (no capacity) | Red (exceeds PIR) |
Use Case Example:
Service plan with guaranteed + burstable:
Customer behavior:
In trTCM, the peak bucket is checked FIRST. This is counter-intuitive but critical: if a packet exceeds peak rate, it's red regardless of committed bucket status. The committed bucket only distinguishes green from yellow for packets that pass the peak rate check.
Understanding when to use each algorithm is crucial for network design. Let's analyze their differences in depth.
Fundamental Difference:
This leads to dramatically different behavior:
| Aspect | Leaky Bucket | Token Bucket |
|---|---|---|
| What's in the bucket | Packets (data) | Tokens (permits) |
| Bucket fills with | Arriving packets | Time (tokens accumulate) |
| Bucket drains by | Constant rate transmission | Packet consumption |
| Maximum output rate | Exactly r (constant) | Line rate R (if tokens available) |
| Average output rate | ≤ r | ≤ r |
| Burst handling | Absorbed and smoothed | Transmitted immediately if tokens exist |
| After idle period | No pent-up capacity | Full bucket of credits to spend |
| Output burstiness | Zero (perfectly smooth) | Up to bucket size b |
| Work conserving | No | Yes (when tokens available) |
| Delay characteristic | Variable (queue-based) | Zero when tokens available |
Graphical Comparison:
Consider 1 MB of data arriving as a burst, with:
Leaky Bucket Output:
Time 0: 1 MB arrives
Time 0-8s: Output at exact 125 KB/s (8 seconds to transmit 1 MB)
Total time: 8 seconds
Token Bucket Output (starts with full bucket):
Time 0: 1 MB arrives, 500 KB tokens available
Time 0-40ms: Burst 500 KB at 100 Mbps (uses all tokens)
Time 40ms+: Remaining 500 KB at 125 KB/s (4 seconds)
Total time: ~4 seconds
Token bucket is twice as fast for this burst scenario!
Use Leaky Bucket when you need constant output rate (feeding a fixed-rate channel, eliminating jitter). Use Token Bucket for everything else—it's more flexible, better for TCP performance, and what modern traffic shapers typically implement.
Real networks need more than simple rate limiting—they need to share bandwidth across multiple classes hierarchically. Hierarchical Token Bucket (HTB) extends token bucket concepts to create sophisticated bandwidth allocation trees.
HTB Structure:
root (total bandwidth)
│
┌────────────┼────────────┐
│ │ │
Voice Video Best Effort
(strict) (10 Mbps) (5 Mbps)
│
┌───────┼───────┐
│ │ │
User1 User2 User3
3 Mbps 3 Mbps 4 Mbps
Key HTB Concepts:
Borrowing Mechanism:
HTB allows classes to 'borrow' unused bandwidth from siblings:
Example Configuration:
# Linux tc HTB configuration example
# Root qdisc
tc qdisc add dev eth0 root handle 1: htb default 30
# Root class (100 Mbps total)
tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit
# Voice class: 5 Mbps guaranteed, 10 Mbps ceiling, high priority
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit ceil 10mbit prio 0
# Video class: 30 Mbps guaranteed, 50 Mbps ceiling
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 30mbit ceil 50mbit prio 1
# Best effort: 10 Mbps guaranteed, 100 Mbps ceiling (uses leftovers)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 10mbit ceil 100mbit prio 2
# Filters to classify traffic
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
match ip dport 5060 0xffff classid 1:10 # SIP -> voice class
Benefits of HTB:
| Feature | Benefit |
|---|---|
| Hierarchical structure | Matches organizational boundaries (departments, users) |
| Guaranteed rates | SLA compliance for critical traffic |
| Ceiling rates | Allows bursting into unused capacity |
| Priority-based sharing | High-priority traffic gets excess first |
| Work conservation | No bandwidth wasted when classes not using share |
HTB is one of the most widely used queuing disciplines in Linux networking. It's the foundation for ISP customer bandwidth management, enterprise QoS, container bandwidth limits, and VM traffic shaping. Understanding HTB principles is essential for network engineering at scale.
Token bucket is ubiquitous in modern networking and software systems. Let's examine key applications:
1. API Rate Limiting:
Virtually every web API uses token bucket for rate limiting:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import timefrom redis import Redis class RedisTokenBucket: """ Distributed token bucket using Redis for API rate limiting. Thread-safe and works across multiple servers. """ def __init__( self, redis: Redis, key_prefix: str, rate_per_second: float, bucket_size: int ): self.redis = redis self.key_prefix = key_prefix self.rate = rate_per_second self.bucket_size = bucket_size def allow(self, key: str, tokens: int = 1) -> tuple[bool, dict]: """ Check if request is allowed and consume tokens. Uses Redis Lua script for atomic operation. """ now = time.time() bucket_key = f"{self.key_prefix}:{key}" # Lua script for atomic token bucket operation lua_script = """ local tokens_key = KEYS[1] local timestamp_key = KEYS[2] local rate = tonumber(ARGV[1]) local bucket_size = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local requested = tonumber(ARGV[4]) -- Get current state local tokens = tonumber(redis.call('get', tokens_key) or bucket_size) local last_update = tonumber(redis.call('get', timestamp_key) or now) -- Add tokens based on elapsed time local elapsed = math.max(0, now - last_update) local new_tokens = math.min(bucket_size, tokens + (elapsed * rate)) -- Check and consume local allowed = 0 if new_tokens >= requested then new_tokens = new_tokens - requested allowed = 1 end -- Update state redis.call('set', tokens_key, new_tokens) redis.call('set', timestamp_key, now) redis.call('expire', tokens_key, 3600) redis.call('expire', timestamp_key, 3600) return {allowed, new_tokens, bucket_size - new_tokens} """ result = self.redis.eval( lua_script, 2, # Number of keys f"{bucket_key}:tokens", f"{bucket_key}:timestamp", self.rate, self.bucket_size, now, tokens ) allowed, remaining, retry_after = result return bool(allowed), { "remaining": int(remaining), "limit": self.bucket_size, "retry_after": retry_after / self.rate if not allowed else 0 } # Example usage for APIif __name__ == "__main__": redis_client = Redis(host='localhost', port=6379) # 100 requests per second, 500 request burst limiter = RedisTokenBucket( redis=redis_client, key_prefix="api:ratelimit", rate_per_second=100, bucket_size=500 ) # Simulate API calls user_id = "user_12345" for i in range(10): allowed, info = limiter.allow(user_id) print(f"Request {i+1}: {'Allowed' if allowed else 'Rate Limited'}, {info}")When you see HTTP headers like X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset, you're seeing token bucket state exposed. These headers tell clients how many 'tokens' remain (Remaining), the bucket size (Limit), and when tokens refill (Reset).
We've comprehensively explored the Token Bucket Algorithm—the dominant traffic shaping mechanism in modern networking due to its flexibility, efficiency, and TCP-friendly behavior.
What's Next:
With the two fundamental algorithms mastered—leaky bucket and token bucket—we'll now explore Rate Limiting patterns in the next page. You'll learn how these algorithms are applied to build practical rate limiting systems for APIs, services, and network protection.
You now have comprehensive understanding of the Token Bucket Algorithm—its model, mathematics, implementation, single/dual rate variants, hierarchical extensions, and widespread applications. This knowledge is directly applicable to designing and configuring traffic management systems.