Loading system design...
Design a web service like Pastebin or GitHub Gists where users can store and share plain text or code snippets via a unique URL. The system separates lightweight metadata (stored in a relational database) from potentially large paste content (stored in object storage like S3), making it an excellent study in storage-tier separation.
| Metric | Value |
|---|---|
| New pastes created | 5 million / month (~2 per second) |
| Paste reads | 25 million / month (~10 per second) |
| Read-to-write ratio | 5:1 |
| Average paste size | 10 KB |
| Max paste size | 10 MB |
| Metadata per paste | ~500 bytes |
| Content storage for 5 years | 5M × 12 × 5 × 10KB ≈ 3 TB (S3) |
| Metadata storage for 5 years | 5M × 12 × 5 × 500B ≈ 150 GB (DB) |
| Unique paste keys (8 chars Base62) | 62⁸ ≈ 218 trillion |
Users can create a paste by uploading text content and receive a unique short URL to access it
Anyone with the paste URL can retrieve and view the text content
Pastes support syntax highlighting for popular programming languages (Python, Java, C++, JS, etc.)
Users can set an expiration time on pastes: 10 minutes, 1 hour, 1 day, 1 week, 1 month, never
Pastes can be public (discoverable), unlisted (link-only), or private (requires authentication)
Registered users can view, edit, and delete their own pastes; anonymous pastes are immutable
Provide a RESTful API for programmatic paste creation, retrieval, and deletion
Support paste size up to 10 MB per paste
Track view count and basic analytics per paste
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?