Loading learning content...
In the age of social media, where every character counts and attention spans are measured in milliseconds, URL shorteners have become critical infrastructure for the internet. Services like Bitly, TinyURL, and Twitter's t.co handle billions of redirects daily, powering everything from marketing campaigns to tweet embeds.
What appears to be a trivially simple service—converting a long URL into a short one—is actually a fascinating system design challenge that touches on nearly every core distributed systems concept: high-availability storage, extreme read scalability, low-latency serving, analytics at scale, and security considerations.
In this module, we'll design a URL shortening service capable of handling 100 million URL creations per day and 1 billion redirects per day—a scale that requires careful architectural thinking from the ground up.
By the end of this page, you will understand the complete functional and non-functional requirements for a URL shortener at scale. You'll learn to systematically decompose the problem, identify the three core operations (shorten, redirect, analytics), and establish the constraints that will drive all subsequent design decisions.
Before diving into requirements, let's thoroughly understand what a URL shortener actually does and why it exists. A URL shortener is a service that:
https://www.example.com/products/category/electronics/laptops?brand=dell&sort=price&page=12)https://short.url/a7Xk2B)Why do URL shorteners exist?
amzn.to, youtu.be) reinforce brand identity while providing URL shortening.Bitly processes over 10 billion clicks per month. Twitter's t.co wraps every link in every tweet—hundreds of millions of tweets per day. TinyURL has been running since 2002 with billions of redirects. These services demonstrate that URL shortening is serious infrastructure, not a toy project.
Functional requirements define what the system does—the features and capabilities it must provide. For a URL shortener, we identify three core operations plus several supporting features.
The primary function is accepting a long URL and returning a short one. Let's decompose this thoroughly:
| Feature Aspect | Requirements | Design Implications |
|---|---|---|
| Input Validation | Accept valid HTTP/HTTPS URLs; reject malformed URLs, local addresses, forbidden domains | Need URL parsing, validation logic, blocklist management |
| Short Code Generation | Generate unique 6-8 character alphanumeric codes | Need encoding scheme, collision handling, uniqueness guarantees |
| Custom Aliases | Allow users to request specific short codes (premium feature) | Need availability checking, reservation system, alias validation |
| Expiration Support | Optional TTL (time-to-live) for short URLs | Need expiration tracking, cleanup jobs, expiration metadata storage |
| User Association | Link short URLs to user accounts (for management) | Need authentication, user-URL mapping, access control |
| Duplicate Detection | Same long URL → same short code (or new one based on policy) | Need efficient lookup by long URL, policy configuration |
The most frequent operation—redirecting users from short URLs to original destinations. This must be extremely fast and highly available:
Every redirect represents a data point. Analytics transform URL shorteners from simple utilities into valuable marketing infrastructure:
Critical design constraint: analytics collection must never delay or block redirects. Users click links expecting instant response. If analytics processing is slow or failing, redirects must still work. This implies asynchronous analytics processing.
Beyond core operations, production URL shorteners require numerous supporting features. These often distinguish basic hobby projects from enterprise-grade systems:
| Feature Category | Capabilities | Why It Matters |
|---|---|---|
| User Management | Registration, authentication, API keys, rate limits per user | Enable personalized experiences, prevent abuse, monetization |
| Link Management | Edit destination URLs, enable/disable links, bulk operations | Enterprise users need workflow tools, not just URL generation |
| API Access | REST/GraphQL APIs with authentication, webhooks for events | Programmatic access for integration with marketing tools, CMS systems |
| Custom Domains | Users bring their own domains (e.g., link.company.com) | Branding, trust, enterprise requirement |
| QR Code Generation | Generate QR codes for short URLs | Physical-to-digital bridging for marketing materials |
| Security Features | Phishing detection, malware scanning, spam prevention | Protect users, maintain service reputation, legal compliance |
Robust system design anticipates edge cases before they become production incidents:
In a system design interview, you won't cover all features. List the ones you're aware of, then explicitly scope: 'For this discussion, let's focus on core shortening, redirection, and basic analytics, deferring custom domains and QR codes.' This shows awareness without scope creep.
Non-functional requirements define how well the system works—its quality attributes. For a URL shortener, these are particularly critical because the service must be fast, available, and scalable to be useful at all.
Let's establish concrete scale targets that will drive architecture decisions:
| Metric | Target Value | Implication |
|---|---|---|
| URL creations per day | 100 million (1,157 per second avg, 5,000 peak) | Write-heavy during URL creation, needs efficient storage |
| Redirects per day | 1 billion (11,574 per second avg, 50,000+ peak) | Read-heavy overall; 10:1 read/write ratio is conservative |
| Total URLs stored | 100 billion over 10 years | Massive storage requirements; data retention policies needed |
| Analytics events per day | 1 billion+ (one per redirect) | High-throughput event streaming; aggregate storage |
| Geographic distribution | Global (6 continents) | Multi-region deployment; data residency considerations |
URL shorteners are in the critical path for every click. Slow redirects directly impact user experience and can increase bounce rates:
A URL shortener being down means every link using it is broken—potentially millions of links embedded across the internet:
| Aspect | Requirement | Translation |
|---|---|---|
| Redirect availability | 99.99% (four nines) | ~52 minutes downtime per year |
| URL creation availability | 99.9% (three nines) | ~8.7 hours downtime per year |
| Analytics availability | 99.5% | More tolerant; eventual consistency acceptable |
| Data durability | 99.999999999% (eleven nines) | Zero data loss acceptable; use replication |
| Disaster recovery | RPO: 1 minute, RTO: 5 minutes | Minimal data loss, fast recovery |
Notice that redirect availability (99.99%) is higher than creation availability (99.9%). This is intentional—reads are far more critical and frequent. During partial outages, we might disable URL creation while maintaining redirects. This matches business priority: existing links must work.
Rough capacity estimation helps validate whether our design decisions are feasible and identifies the primary resource constraints.
Let's calculate how much storage we need for URL mappings:
123456789101112131415161718192021
URL Storage Calculation======================= Data per URL entry:- Short code: 7 bytes (6-8 chars, let's use 7)- Long URL: 200 bytes (average, can be up to 2KB)- Creation timestamp: 8 bytes- Expiration time: 8 bytes (optional)- User ID: 8 bytes (optional)- Click count: 8 bytes- Flags/metadata: 10 bytes-----------------------------------Total per URL: ~250 bytes URLs created per day: 100,000,000Bytes per day: 100M × 250 = 25 GB/dayBytes per year: 25 GB × 365 = 9.125 TB/yearBytes over 10 years: ~91 TB for URL mappings alone With indexes and replication (3x):Total storage (10 years): ~275 TB + indexes ≈ 400 TBLet's estimate network bandwidth for redirection traffic:
12345678910111213141516171819202122232425262728
Bandwidth Calculation===================== Redirects per day: 1,000,000,000 (1 billion)Redirects per second (avg): 1B / 86,400 ≈ 11,574 RPSRedirects per second (peak): ~50,000 RPS (4-5x average) Request size (incoming):- HTTP headers: 500 bytes (average)- Short URL in path: 20 bytesTotal incoming: ~520 bytes per request Response size (outgoing):- HTTP 301/302 headers: 300 bytes- Location header: 200 bytes (destination URL)- Body (minimal): 100 bytesTotal outgoing: ~600 bytes per response Incoming bandwidth:- Average: 11,574 × 520 bytes = 6.0 MB/s- Peak: 50,000 × 520 bytes = 26 MB/s Outgoing bandwidth:- Average: 11,574 × 600 bytes = 6.9 MB/s- Peak: 50,000 × 600 bytes = 30 MB/s Total peak bandwidth: ~60 MB/s (~480 Mbps)Note: This is modest; network is NOT the bottleneck.Caching is critical for achieving low-latency redirects. Let's estimate cache requirements:
1234567891011121314151617181920212223242526
Cache Sizing Calculation======================== Observation: URL access follows power-law distribution- 20% of URLs receive 80% of traffic- Top 1% of URLs receive ~50% of traffic To achieve 90%+ cache hit rate:- Cache the top 10-20% of URLs by access frequency URLs created per day: 100,000,000Active URLs (accessed in last 30 days): ~500,000,000Top 20% active URLs: 100,000,000 Memory per cached entry:- Short code: 7 bytes- Long URL: 200 bytes- Metadata (TTL, etc): 50 bytes- Redis overhead: 50 bytesTotal: ~300 bytes Cache size for top 20%:100M × 300 bytes = 30 GB per region With 5 global regions: 150 GB total cache(This is very achievable with modern Redis clusters)Our estimates show the system is feasible: ~400TB storage over 10 years is manageable with distributed databases; ~60 MB/s bandwidth is modest; 30GB cache per region is achievable with Redis. The primary challenges are managing 50K+ RPS during peaks and achieving sub-50ms latency globally.
Understanding the read/write ratio is fundamental to URL shortener design because it profoundly impacts every architectural decision.
1234567891011121314
Read/Write Ratio Calculation============================= Writes (URL creation): 100 million/day = 1,157/second avgReads (redirects): 1,000 million/day = 11,574/second avg Read/Write Ratio: 1000M / 100M = 10:1 This is actually conservative. In practice:- Popular links get clicked thousands of times- Many URLs created by spammers never get clicked- Marketing campaigns generate millions of clicks per URL Real-world ratio often: 100:1 or higherThis extreme read-heavy ratio shapes nearly every design decision:
When asked 'Why cache?' in a URL shortener interview, the answer isn't just 'for speed'—it's 'because our read/write ratio exceeds 10:1, caching reduces database load by an order of magnitude while providing sub-10ms latency for the vast majority of requests.' Numbers matter.
Defining clear system boundaries and APIs helps us understand exactly what the system does and how external clients interact with it.
Let's define the primary API contract:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
// ===============================// URL Shortener Core API// =============================== // 1. Create Short URL// POST /api/v1/shorteninterface CreateShortUrlRequest { longUrl: string; // Required: destination URL customAlias?: string; // Optional: user-requested short code expiresAt?: string; // Optional: ISO timestamp for expiration title?: string; // Optional: descriptive title tags?: string[]; // Optional: categorization tags} interface CreateShortUrlResponse { shortCode: string; // Generated short code (e.g., "a7Xk2B") shortUrl: string; // Full short URL (e.g., "https://short.url/a7Xk2B") longUrl: string; // Original URL (echoed back) createdAt: string; // ISO timestamp expiresAt?: string; // Expiration if set} // 2. Redirect (Browser-facing, not API)// GET /{shortCode}// Response: 301/302 Redirect to longUrl// Headers: Location: {longUrl} // 3. Get URL Info// GET /api/v1/urls/{shortCode}interface GetUrlInfoResponse { shortCode: string; shortUrl: string; longUrl: string; createdAt: string; expiresAt?: string; clicks: number; // Total click count status: "active" | "expired" | "disabled";} // 4. Get URL Analytics// GET /api/v1/urls/{shortCode}/analyticsinterface GetAnalyticsRequest { startDate: string; // ISO date endDate: string; // ISO date granularity: "hour" | "day" | "week" | "month";} interface GetAnalyticsResponse { shortCode: string; totalClicks: number; uniqueClicks: number; clicksByDate: { date: string; clicks: number }[]; clicksByCountry: { country: string; clicks: number }[]; clicksByReferrer: { referrer: string; clicks: number }[]; clicksByDevice: { device: string; clicks: number }[];} // 5. Update Short URL// PATCH /api/v1/urls/{shortCode}interface UpdateUrlRequest { longUrl?: string; // Update destination expiresAt?: string; // Update expiration disabled?: boolean; // Enable/disable} // 6. Delete Short URL// DELETE /api/v1/urls/{shortCode}// Response: 204 No ContentPrecise HTTP status codes communicate system state to clients effectively:
| Scenario | Status Code | Response |
|---|---|---|
| URL created successfully | 201 Created | Full ShortUrl response body |
| Redirect successful | 301 or 302 | Empty body, Location header |
| URL not found | 404 Not Found | Error message, suggestions |
| URL expired | 410 Gone | Expiration notice |
| Custom alias taken | 409 Conflict | Suggestion for alternatives |
| Invalid long URL | 400 Bad Request | Validation error details |
| Rate limit exceeded | 429 Too Many Requests | Retry-After header |
| Authentication required | 401 Unauthorized | Auth instructions |
| Server error | 500/503 | Error ID for support |
301 (Permanent) tells browsers to cache the redirect—future visits skip our service entirely. Good for SEO but prevents analytics. 302 (Temporary) makes browsers always hit our service first—essential for click tracking. Most URL shorteners use 302 (or 307) for clicks they want to track, 301 only when explicitly requested.
We've completed a thorough requirements analysis for our URL shortener. Let's consolidate the key findings:
| Operation | Volume | Latency Target | Availability |
|---|---|---|---|
| URL Shortening | 100M/day (1.2K/sec) | <200ms | 99.9% |
| URL Redirection | 1B/day (12K/sec, 50K peak) | <50ms | 99.99% |
| Analytics Query | ~1M/day | <500ms | 99.5% |
With requirements established, subsequent pages will address:
You now have a comprehensive understanding of URL shortener requirements. The three core operations—shorten, redirect, analytics—along with detailed capacity estimates and API design provide the foundation for all subsequent architectural decisions. Next, we'll explore URL encoding strategies for generating unique short codes.