System Design HLDRate Limiting and Throttling

Rate Limiting and Throttling

LevelAdvanced

Duration90 mins

TopicRate Limiting and Throttling

1 / 5

Protection Against Abuse

The Unseen Battlefield

Every second of every day, your production systems face a relentless barrage of requests. Most are legitimate—users navigating your application, services communicating, integrations pulling data. But hidden within this traffic flows a darker current: automated scripts probing for weaknesses, credential-stuffing bots testing stolen passwords, scrapers harvesting your data, and malicious actors attempting to overwhelm your infrastructure.

Rate limiting is your first line of defense—a mechanism that governs how frequently clients can make requests to your systems. Far from being a simple traffic management technique, rate limiting is a sophisticated security control that protects availability, prevents abuse, ensures fair resource allocation, and maintains service quality under adversarial conditions.

What You Will Learn

By the end of this page, you will understand why rate limiting is a non-negotiable security requirement, the comprehensive threat landscape it addresses, the various abuse vectors you must defend against, and the foundational principles that guide effective rate limiting design in production systems.

Why Rate Limiting Is Non-Negotiable

In the early days of web development, rate limiting was often treated as an afterthought—something to add when traffic became problematic. Today, this approach is not merely inadequate; it's dangerous. Modern distributed systems face a threat landscape that demands proactive, architectural-level protection.

The fundamental reality: Any system exposed to untrusted clients—whether public APIs, web applications, or internal services accessible to multiple teams—will eventually face abuse. The question isn't if but when and how severe.

Core Reasons Rate Limiting Is Essential

•Availability Protection — Without rate limiting, a single misbehaving client can consume resources needed by thousands of legitimate users. A runaway script hitting your API 10,000 times per second can degrade service for everyone.
•Cost Control — Cloud resources cost money. Every CPU cycle, every byte of bandwidth, every database query has a price. Unbounded requests translate to unbounded costs, and attackers don't pay your cloud bills.
•Security Defense — Many attack vectors rely on high request volumes: brute-force authentication, credential stuffing, enumeration attacks, and application-level DoS. Rate limiting is a critical mitigation.
•Fair Resource Allocation — In multi-tenant systems, one customer's aggressive usage shouldn't impact others. Rate limiting ensures resource fairness across your customer base.
•Service Quality Maintenance — Even legitimate traffic spikes can overwhelm systems. Rate limiting provides backpressure that maintains response times for requests that are accepted.
•Operational Stability — Predictable load patterns are essential for capacity planning, alerting, and on-call sustainability. Rate limiting creates that predictability.

The Cost of Inaction

Organizations that delay implementing rate limiting often learn its importance through painful incidents: massive AWS bills from scraper attacks, complete service outages from credential-stuffing floods, or degraded user experience that drives customers to competitors. The cost of implementing rate limiting is measured in engineering hours; the cost of not implementing it is measured in revenue, reputation, and recovery time.

The Threat Landscape

Understanding the threats you face is prerequisite to designing effective defenses. The threat landscape for any internet-facing service is vast and continuously evolving, but several categories of abuse consistently appear across industries and system types.

The spectrum of attackers ranges from naive scripts to sophisticated, well-resourced adversaries. Your rate limiting strategy must account for this entire spectrum—simple defenses stop simple attacks, but you also need depth to resist determined adversaries.

Common Threat Categories and Their Characteristics
Threat Category	Attacker Goal	Request Pattern	Rate Limiting Response
Credential Stuffing	Test stolen username/password pairs	High-volume login attempts from distributed IPs	Strict authentication endpoint limits, account-level tracking
Brute Force Attacks	Guess secrets (passwords, tokens, codes)	Sequential attempts against single account/resource	Per-account limits with exponential backoff
Web Scraping	Extract data for resale or competition	Systematic crawling of product/content pages	Page-level limits, bot detection integration
API Abuse	Exceed free tier, bypass quotas	Maximum rate sustained requests	Tiered limits based on subscription/trust level
Enumeration Attacks	Discover valid users, resources, or IDs	Probe endpoints with variations	Per-endpoint limits, behavioral analysis
Application DoS	Exhaust server resources	Expensive operations at high volume	Operation-specific limits, computational cost tracking
Inventory Hoarding	Reserve items without purchasing	Add-to-cart flooding during sales	Session-based limits, action-specific throttling
Competitive Intelligence	Monitor pricing, availability	Periodic polling of public endpoints	Aggregate limits, detection and blocking

The evolution of attack sophistication:

Modern attackers have adapted to basic rate limiting. They understand IP-based limits and work around them through:

Residential proxy networks — Millions of IP addresses from compromised home routers and IoT devices
Cloud provider IP rotation — Scripted creation of cloud instances to obtain fresh IPs
Botnet distribution — Thousands of infected machines each sending a few requests
Human CAPTCHA farms — Low-wage workers solving challenges that should stop bots
Browser automation — Selenium, Puppeteer, and Playwright simulating real browsers

This sophistication doesn't make rate limiting useless—it makes layered, intelligent rate limiting essential.

Defense in Depth

Rate limiting is most effective as one layer in a defense-in-depth strategy. Combine it with bot detection, CAPTCHA challenges, behavioral analysis, device fingerprinting, and authentication controls. No single mechanism stops all attacks, but layers compound to create robust protection.

Abuse Vectors in Detail

Let's examine the most prevalent abuse vectors in detail, understanding their mechanics, impact, and how rate limiting specifically addresses each.

Credential stuffing is the automated injection of stolen username/password pairs into login forms. Attackers obtain these credentials from data breaches—billions of credentials are available on dark web marketplaces—and test them against other services, exploiting password reuse.

Scale of the problem: A single attacker might test millions of credential pairs per day across thousands of targets. Success rates of 0.1-2% might seem low, but against millions of attempts, this yields thousands of compromised accounts.

Request patterns:

High volume of login attempts
Each attempt uses different credentials
Often distributed across many source IPs
Attempts may be slow and stealthy or fast and obvious
Target is typically the /login, /auth, or /token endpoint

Rate limiting countermeasures:

Per-IP limits on authentication endpoints — Limit login attempts from each IP (e.g., 10/minute)
Per-account limits — Track failed attempts per username regardless of source IP
Global endpoint limits — Cap total authentication throughput during attacks
Progressive delays — Increase response time after repeated failures
Integration with CAPTCHA — Trigger challenges after limit approaches

The Economic Model of Abuse

Understanding the economics of abuse is crucial for designing effective rate limits. Attackers operate under constraints just like any other business—they seek maximum return for minimum investment. Your goal is to shift these economics unfavorably.

The attacker's cost-benefit calculation:

Attackers consider: time to achieve goal, infrastructure costs (IPs, compute, proxies), success probability, value of successful attack, and risk of detection/consequences. Effective rate limiting increases time and infrastructure costs while reducing success probability.

Attacker Costs You Increase

•Time cost — Delays and limits extend attack duration from minutes to days or weeks
•IP infrastructure — Forcing IP rotation requires proxy services ($50-500+/month for quality residential proxies)
•Compute resources — Browser automation to bypass detection is far more expensive than simple HTTP requests
•Human labor — Triggering CAPTCHAs forces use of solving services (~$2-3 per 1000 challenges)
•Complexity — Layered defenses require sophisticated tooling and expertise

Attacker Benefits You Reduce

•Success rate — Limits reduce how many attempts succeed within time constraints
•Data freshness — Scrapers get stale data if limited to weekly instead of real-time access
•Attack scalability — What works against one target may not scale to many
•Account value — Limits on what compromised accounts can do reduces incentive to attack
•Anonymity — Detection and logging increases risk of consequences

The defender's efficiency:

Rate limiting is highly asymmetric in defenders' favor:

Defender	Attacker
One-time implementation	Ongoing cat-and-mouse
Marginal cost per request ≈ $0	$0.001-0.01+ per request through proxies
Legitimate users unaffected	All traffic pays the cost
Scales naturally with infrastructure	Must scale attack infrastructure

The key insight: even imperfect rate limiting dramatically changes attack economics. You don't need to stop every request—you need to make attacks economically unviable.

The Tipping Point

For many attack types, there's a rate threshold below which the attack becomes impractical. Credential stuffing at 1 request/minute/IP might take weeks to test a meaningful credential set—long enough for credentials to be rotated and the attacker's infrastructure to be detected and blocked. Your goal is to find and enforce that tipping point.

Principles of Effective Rate Limiting

Before diving into algorithms and implementations, let's establish the principles that guide effective rate limiting design. These principles hold true regardless of which specific algorithm or architecture you choose.

Foundational Principles

•Principle 1: Limit at Multiple Layers — Apply limits at network edge (WAF/CDN), load balancer, API gateway, and application layers. Each layer catches different attack patterns and provides defense even if other layers fail.
•Principle 2: Granular Over Global — A single global limit is easily avoided by distributed attacks. Limit per IP, per user, per account, per API key, per endpoint, and per action type. The right dimension depends on the threat.
•Principle 3: Different Limits for Different Endpoints — Not all endpoints are equal. Login attempts need strict limits (5/minute), while reading public content might allow 100/minute. Authentication, payment, and admin endpoints always need the strictest limits.
•Principle 4: Consider the Legitimate Maximum — Set limits based on what legitimate users actually need, with reasonable headroom. If your heaviest legitimate users make 60 requests/minute, a limit of 100/minute provides buffer while still constraining abuse.
•Principle 5: Fail Closed with Grace — When rate limited, return clear, helpful responses. Include retry timing when possible. Never silently drop requests—clients need feedback to adjust behavior.
•Principle 6: Monitor and Adjust — Rate limits aren't set-and-forget. Monitor limit hits, analyze patterns, adjust thresholds based on real traffic, and iterate continuously.
•Principle 7: Distinguish by Trust Level — Authenticated users might get higher limits than anonymous. Verified accounts higher than new. Paid tiers higher than free. Enterprise customers might have custom limits.
•Principle 8: Plan for Distributed Attacks — Assume attackers control many IPs. Limits that only work against single-source attacks provide false security. Consider aggregate limits and behavioral detection.

The Art of Limit Setting

Setting rate limits is as much art as science. Start conservative (lower limits), monitor aggressively, and adjust based on legitimate user impact. It's easier to relax limits for users who complain than to recover from an attack that exploited limits that were too permissive.

Rate Limiting Dimensions

Effective rate limiting requires tracking requests across multiple dimensions. Understanding which dimension to limit is often more important than the specific limit value.

Rate Limiting Dimensions and Their Applications
Dimension	Use Case	Strengths	Weaknesses
Source IP	General abuse prevention	Simple, no state required beyond IP	Bypassable with IP rotation; punishes users behind NAT/proxy
API Key	Developer/integration limits	Precise accountability, revocable	Requires authentication; shared keys are problematic
User Account	Per-user fairness	Follows user across IPs/devices	Requires authentication; doesn't limit unauthenticated abuse
Session	Anonymous user tracking	Works without login	Sessions can be discarded and recreated
Endpoint/Route	Protect expensive operations	Granular risk-based protection	Requires endpoint classification; complexity grows
Geographical Region	Block high-risk regions	Simple, effective for regional attacks	Collateral damage; sophisticated attackers use VPNs
User Agent/Device	Bot detection integration	Catches simple bots	Trivially spoofable; mostly useful as one signal
Combination (composite key)	High-precision limiting	Catches distributed attacks	Requires more state; higher complexity

Composite keys for precision:

The most effective rate limiting often combines dimensions. Examples:

IP + Endpoint: 10 login attempts per IP per minute (prevents credential stuffing while allowing normal browsing)
Account + Action Type: 3 password changes per account per day (prevents account takeover via password cycling)
API Key + Endpoint Class: Premium endpoints have per-key weekly limits
IP + Geographic Anomaly: Stricter limits for IPs in regions where the account has never logged in

The choice of dimensions should be driven by your threat model. For authentication abuse, account-level limits are essential since attackers don't care which IP they use to compromise your account.

The NAT Problem

Many users share IP addresses: corporate networks, universities, mobile carriers using Carrier-Grade NAT, coffee shops. An aggressive IP-based limit of 10 requests/minute might impact hundreds of legitimate users behind a corporate proxy. Always consider this when setting IP-based limits—you may need higher IP limits combined with other dimensions for precision.

The Defense Hierarchy

Rate limiting exists within a broader defense hierarchy. Understanding where it fits helps you deploy it effectively and know when to rely on other mechanisms.

Converting Mermaid diagram...

Defense layer responsibilities:

Network edge (CDN/WAF): Block volumetric attacks, known bad IPs, and obvious bot patterns before traffic reaches your infrastructure
Load balancer: Enforce connection limits, distribute load, and provide basic request rate limiting
API Gateway: Primary rate limiting enforcement with sophisticated algorithms and multi-dimensional limits
Authentication layer: Apply user-level and account-level limits post-authentication
Business logic: Action-specific limits (e.g., max 3 checkout attempts per session, max 10 friend requests per day)

Why layer? Because attackers target weaknesses at all levels. A network-layer defense won't catch application-layer abuse. A business logic limit won't help if the attacker is overwhelming your database with unauthenticated requests. Each layer focuses on what it can see best.

The Cost of State

Each rate limiting layer that tracks state consumes memory and may require distributed coordination. A WAF tracking millions of IPs, an API gateway tracking per-key limits, and a business layer tracking per-user actions all have storage and synchronization costs. Design your layers thoughtfully—not every layer needs full-precision tracking.

Summary: Protection Against Abuse

We've established why rate limiting is a critical security control and explored the threat landscape it addresses. Let's consolidate the key insights:

Key Takeaways

•Rate limiting is non-negotiable — Every internet-facing system faces abuse. The question is when, not if.
•The threat landscape is diverse — Credential stuffing, brute force, scraping, application DoS, and more require different limiting strategies.
•Economics drive attackers — Effective rate limiting makes attacks economically unviable, not technically impossible.
•Multiple dimensions are essential — IP-only limiting is insufficient. Combine IP, user, endpoint, and action-type limits.
•Layered defense wins — Rate limit at network, transport, application, and business logic layers.
•Tune to legitimate use — Set limits based on what real users need, with margin for spikes.
•Sophisticated attackers adapt — Expect distributed attacks, proxy networks, and evasion. Plan accordingly.
•Monitor and iterate — Rate limits require ongoing tuning based on traffic patterns and attack evolution.

What's next:

Now that we understand why rate limiting is essential and the threats it addresses, we'll explore the algorithms that power rate limiting implementations. The next page covers Token Bucket, Leaky Bucket, Fixed Window, Sliding Window Log, and Sliding Window Counter—understanding their mechanics, trade-offs, and optimal use cases.

Page Complete

You now understand the foundational case for rate limiting as a security control. It's not merely a performance optimization—it's a critical defense mechanism that protects availability, controls costs, prevents abuse, and ensures fair resource allocation. Next, we'll dive into the algorithms that make rate limiting work.

1 / 5

Loading learning content...

System Design HLDRate Limiting and Throttling

Rate Limiting and Throttling

LevelAdvanced

Duration90 mins

TopicRate Limiting and Throttling

1 / 5

Protection Against Abuse

The Unseen Battlefield

What You Will Learn

Why Rate Limiting Is Non-Negotiable

Core Reasons Rate Limiting Is Essential

•Availability Protection — Without rate limiting, a single misbehaving client can consume resources needed by thousands of legitimate users. A runaway script hitting your API 10,000 times per second can degrade service for everyone.
•Cost Control — Cloud resources cost money. Every CPU cycle, every byte of bandwidth, every database query has a price. Unbounded requests translate to unbounded costs, and attackers don't pay your cloud bills.
•Security Defense — Many attack vectors rely on high request volumes: brute-force authentication, credential stuffing, enumeration attacks, and application-level DoS. Rate limiting is a critical mitigation.
•Fair Resource Allocation — In multi-tenant systems, one customer's aggressive usage shouldn't impact others. Rate limiting ensures resource fairness across your customer base.
•Service Quality Maintenance — Even legitimate traffic spikes can overwhelm systems. Rate limiting provides backpressure that maintains response times for requests that are accepted.
•Operational Stability — Predictable load patterns are essential for capacity planning, alerting, and on-call sustainability. Rate limiting creates that predictability.

The Cost of Inaction

The Threat Landscape

Common Threat Categories and Their Characteristics
Threat Category	Attacker Goal	Request Pattern	Rate Limiting Response
Credential Stuffing	Test stolen username/password pairs	High-volume login attempts from distributed IPs	Strict authentication endpoint limits, account-level tracking
Brute Force Attacks	Guess secrets (passwords, tokens, codes)	Sequential attempts against single account/resource	Per-account limits with exponential backoff
Web Scraping	Extract data for resale or competition	Systematic crawling of product/content pages	Page-level limits, bot detection integration
API Abuse	Exceed free tier, bypass quotas	Maximum rate sustained requests	Tiered limits based on subscription/trust level
Enumeration Attacks	Discover valid users, resources, or IDs	Probe endpoints with variations	Per-endpoint limits, behavioral analysis
Application DoS	Exhaust server resources	Expensive operations at high volume	Operation-specific limits, computational cost tracking
Inventory Hoarding	Reserve items without purchasing	Add-to-cart flooding during sales	Session-based limits, action-specific throttling
Competitive Intelligence	Monitor pricing, availability	Periodic polling of public endpoints	Aggregate limits, detection and blocking

The evolution of attack sophistication:

Modern attackers have adapted to basic rate limiting. They understand IP-based limits and work around them through:

Residential proxy networks — Millions of IP addresses from compromised home routers and IoT devices
Cloud provider IP rotation — Scripted creation of cloud instances to obtain fresh IPs
Botnet distribution — Thousands of infected machines each sending a few requests
Human CAPTCHA farms — Low-wage workers solving challenges that should stop bots
Browser automation — Selenium, Puppeteer, and Playwright simulating real browsers

This sophistication doesn't make rate limiting useless—it makes layered, intelligent rate limiting essential.

Defense in Depth

Abuse Vectors in Detail

Let's examine the most prevalent abuse vectors in detail, understanding their mechanics, impact, and how rate limiting specifically addresses each.

Request patterns:

High volume of login attempts
Each attempt uses different credentials
Often distributed across many source IPs
Attempts may be slow and stealthy or fast and obvious
Target is typically the /login, /auth, or /token endpoint

Rate limiting countermeasures:

Per-IP limits on authentication endpoints — Limit login attempts from each IP (e.g., 10/minute)
Per-account limits — Track failed attempts per username regardless of source IP
Global endpoint limits — Cap total authentication throughput during attacks
Progressive delays — Increase response time after repeated failures
Integration with CAPTCHA — Trigger challenges after limit approaches

The Economic Model of Abuse

The attacker's cost-benefit calculation:

Attacker Costs You Increase

•Time cost — Delays and limits extend attack duration from minutes to days or weeks
•IP infrastructure — Forcing IP rotation requires proxy services ($50-500+/month for quality residential proxies)
•Compute resources — Browser automation to bypass detection is far more expensive than simple HTTP requests
•Human labor — Triggering CAPTCHAs forces use of solving services (~$2-3 per 1000 challenges)
•Complexity — Layered defenses require sophisticated tooling and expertise

Attacker Benefits You Reduce

•Success rate — Limits reduce how many attempts succeed within time constraints
•Data freshness — Scrapers get stale data if limited to weekly instead of real-time access
•Attack scalability — What works against one target may not scale to many
•Account value — Limits on what compromised accounts can do reduces incentive to attack
•Anonymity — Detection and logging increases risk of consequences

The defender's efficiency:

Rate limiting is highly asymmetric in defenders' favor:

Defender	Attacker
One-time implementation	Ongoing cat-and-mouse
Marginal cost per request ≈ $0	$0.001-0.01+ per request through proxies
Legitimate users unaffected	All traffic pays the cost
Scales naturally with infrastructure	Must scale attack infrastructure

The key insight: even imperfect rate limiting dramatically changes attack economics. You don't need to stop every request—you need to make attacks economically unviable.

The Tipping Point

Principles of Effective Rate Limiting

Foundational Principles

•Principle 1: Limit at Multiple Layers — Apply limits at network edge (WAF/CDN), load balancer, API gateway, and application layers. Each layer catches different attack patterns and provides defense even if other layers fail.
•Principle 2: Granular Over Global — A single global limit is easily avoided by distributed attacks. Limit per IP, per user, per account, per API key, per endpoint, and per action type. The right dimension depends on the threat.
•Principle 3: Different Limits for Different Endpoints — Not all endpoints are equal. Login attempts need strict limits (5/minute), while reading public content might allow 100/minute. Authentication, payment, and admin endpoints always need the strictest limits.
•Principle 4: Consider the Legitimate Maximum — Set limits based on what legitimate users actually need, with reasonable headroom. If your heaviest legitimate users make 60 requests/minute, a limit of 100/minute provides buffer while still constraining abuse.
•Principle 5: Fail Closed with Grace — When rate limited, return clear, helpful responses. Include retry timing when possible. Never silently drop requests—clients need feedback to adjust behavior.
•Principle 6: Monitor and Adjust — Rate limits aren't set-and-forget. Monitor limit hits, analyze patterns, adjust thresholds based on real traffic, and iterate continuously.
•Principle 7: Distinguish by Trust Level — Authenticated users might get higher limits than anonymous. Verified accounts higher than new. Paid tiers higher than free. Enterprise customers might have custom limits.
•Principle 8: Plan for Distributed Attacks — Assume attackers control many IPs. Limits that only work against single-source attacks provide false security. Consider aggregate limits and behavioral detection.

The Art of Limit Setting

Rate Limiting Dimensions

Effective rate limiting requires tracking requests across multiple dimensions. Understanding which dimension to limit is often more important than the specific limit value.

Rate Limiting Dimensions and Their Applications
Dimension	Use Case	Strengths	Weaknesses
Source IP	General abuse prevention	Simple, no state required beyond IP	Bypassable with IP rotation; punishes users behind NAT/proxy
API Key	Developer/integration limits	Precise accountability, revocable	Requires authentication; shared keys are problematic
User Account	Per-user fairness	Follows user across IPs/devices	Requires authentication; doesn't limit unauthenticated abuse
Session	Anonymous user tracking	Works without login	Sessions can be discarded and recreated
Endpoint/Route	Protect expensive operations	Granular risk-based protection	Requires endpoint classification; complexity grows
Geographical Region	Block high-risk regions	Simple, effective for regional attacks	Collateral damage; sophisticated attackers use VPNs
User Agent/Device	Bot detection integration	Catches simple bots	Trivially spoofable; mostly useful as one signal
Combination (composite key)	High-precision limiting	Catches distributed attacks	Requires more state; higher complexity

Composite keys for precision:

The most effective rate limiting often combines dimensions. Examples:

IP + Endpoint: 10 login attempts per IP per minute (prevents credential stuffing while allowing normal browsing)
Account + Action Type: 3 password changes per account per day (prevents account takeover via password cycling)
API Key + Endpoint Class: Premium endpoints have per-key weekly limits
IP + Geographic Anomaly: Stricter limits for IPs in regions where the account has never logged in

The choice of dimensions should be driven by your threat model. For authentication abuse, account-level limits are essential since attackers don't care which IP they use to compromise your account.

The NAT Problem

The Defense Hierarchy

Rate limiting exists within a broader defense hierarchy. Understanding where it fits helps you deploy it effectively and know when to rely on other mechanisms.

Converting Mermaid diagram...

Defense layer responsibilities:

Network edge (CDN/WAF): Block volumetric attacks, known bad IPs, and obvious bot patterns before traffic reaches your infrastructure
Load balancer: Enforce connection limits, distribute load, and provide basic request rate limiting
API Gateway: Primary rate limiting enforcement with sophisticated algorithms and multi-dimensional limits
Authentication layer: Apply user-level and account-level limits post-authentication
Business logic: Action-specific limits (e.g., max 3 checkout attempts per session, max 10 friend requests per day)

The Cost of State

Summary: Protection Against Abuse

We've established why rate limiting is a critical security control and explored the threat landscape it addresses. Let's consolidate the key insights:

Key Takeaways

•Rate limiting is non-negotiable — Every internet-facing system faces abuse. The question is when, not if.
•The threat landscape is diverse — Credential stuffing, brute force, scraping, application DoS, and more require different limiting strategies.
•Economics drive attackers — Effective rate limiting makes attacks economically unviable, not technically impossible.
•Multiple dimensions are essential — IP-only limiting is insufficient. Combine IP, user, endpoint, and action-type limits.
•Layered defense wins — Rate limit at network, transport, application, and business logic layers.
•Tune to legitimate use — Set limits based on what real users need, with margin for spikes.
•Sophisticated attackers adapt — Expect distributed attacks, proxy networks, and evasion. Plan accordingly.
•Monitor and iterate — Rate limits require ongoing tuning based on traffic patterns and attack evolution.

What's next:

Page Complete

1 / 5