Loading learning content...
At the heart of every distributed system lies a fundamental question that shapes architecture, performance, cost, and operational complexity: How should we store our data?
This isn't a trivial question. The storage paradigm you choose influences everything from how applications read and write data, to how your infrastructure scales, to how much you pay your cloud provider each month. The wrong choice can lead to performance bottlenecks, spiraling costs, or architectural dead-ends that require expensive rewrites.
In modern systems engineering, three distinct storage paradigms dominate: Block Storage, File Storage, and Object Storage. Each was designed for different eras and different problems, yet all three remain relevant today—sometimes within the same system.
By the end of this page, you will deeply understand the architectural differences between block, file, and object storage. You'll know their internal mechanics, performance characteristics, scalability limits, and—most critically—when to choose each paradigm. This knowledge is foundational for every storage decision you'll make as a systems architect.
Block storage is the oldest and most fundamental storage paradigm—the raw material from which all other storage types are built. Understanding block storage is essential because it reveals the underlying mechanics that file and object storage abstract away.
What is a Block?
A block is the smallest addressable unit of storage on a physical disk. Historically, blocks were 512 bytes, but modern systems typically use 4KB blocks (4,096 bytes). When you read or write data to a disk, you're always operating at the block level, even if higher-level abstractions hide this from you.
Block storage presents a volume as a linear sequence of blocks, each identified by a numeric address called a Logical Block Address (LBA). There's no built-in structure—no files, no folders, no metadata about what the blocks contain. It's raw, unstructured storage: just blocks waiting to be written.
How Block Storage Works Internally
When an application writes a file to a block volume, here's what happens:
write() system call)This layered architecture explains both block storage's flexibility and its overhead. The filesystem provides structure, but it also adds latency and complexity.
Cloud providers offer networked block storage as services: AWS Elastic Block Store (EBS), Azure Managed Disks, and Google Persistent Disks. These appear to instances as local disks but are actually network-attached volumes replicated across multiple physical drives for durability. This introduces network latency but provides persistence independent of any single compute instance.
| Aspect | Characteristics | Implications |
|---|---|---|
| Latency | Microseconds (local SSD) to single-digit milliseconds (networked) | Suitable for latency-sensitive workloads like databases |
| IOPS | Thousands to hundreds of thousands per volume | Can handle high-frequency random I/O |
| Throughput | Hundreds of MB/s to multiple GB/s | Depends on block size and storage tier |
| Scalability | Limited by volume size (typically 1-64 TB) | Must add volumes or re-provision for larger capacity |
| Durability | Depends on replication strategy | Cloud block storage typically 99.999% durability |
When Block Storage Excels
Block storage is the right choice when you need:
File storage adds a layer of abstraction above block storage, organizing data into files within a hierarchical directory structure. More importantly, file storage systems are designed for shared access—multiple clients can simultaneously read from and write to the same filesystem.
The Network File System Model
Unlike block storage, where each volume attaches to a single host, file storage uses network protocols to enable concurrent access:
These protocols handle the complex coordination of multiple clients accessing the same files: managing locks, caching, consistency, and permissions across a network.
File Storage Architecture
A typical network file storage system consists of:
The file server is the critical component. It maintains the filesystem's metadata, handles client requests, manages caching, and ensures consistency. This centralization is both file storage's strength (simplified management) and weakness (potential bottleneck, single point of failure).
Traditional file storage struggles at scale. The hierarchical namespace becomes a bottleneck as directory listings with millions of files slow dramatically. The metadata server can become overwhelmed. Locking protocols create contention. These limitations drove the development of object storage for web-scale applications.
| Aspect | Characteristics | Implications |
|---|---|---|
| Latency | Sub-millisecond (local) to milliseconds (network) | Network hops add latency; caching critical for performance |
| IOPS | Limited by server and network | Shared among all clients; can become bottleneck |
| Throughput | Network-limited, typically 1-10 Gbps aggregate | All clients share available bandwidth |
| Scalability | Tens of millions of files; petabytes with scale-out NAS | Metadata operations limit file count |
| Concurrency | Good for read-heavy; complex for write-heavy | Locking creates serialization points |
When File Storage Excels
File storage is the right choice when you need:
Cloud providers offer managed file storage: AWS EFS (Elastic File System), Azure Files, Google Cloud Filestore. These handle the infrastructure complexity—replication, availability, capacity management—while providing standard NFS/SMB interfaces. They scale better than traditional NAS but still have the fundamental performance characteristics of file protocols.
Object storage represents a fundamental reimagining of how data should be stored and accessed at internet scale. Born from the needs of web companies managing petabytes of unstructured data—images, videos, logs, backups—object storage trades the familiar filesystem paradigm for a simpler, infinitely scalable model.
The Object Model
In object storage, data is stored as discrete objects, each consisting of:
Why Object Storage Scales Where Others Fail
The key insight behind object storage is that filesystem semantics—hierarchical directories, in-place modifications, POSIX locking—are the primary barriers to infinite scale. Object storage removes these constraints:
No directory tree: With millions of objects, listing a directory becomes impossibly slow. Object storage eliminates directories entirely. Objects are accessed by their full key, which can be hashed to locate data instantly regardless of total object count.
No locks: File locking protocols break down at scale and across networks. Object storage uses immutable operations—you PUT a new version of an object rather than modifying it in place. This eliminates lock contention.
Stateless protocol: HTTP is stateless, meaning servers don't maintain client sessions. Any server can handle any request, enabling horizontal scaling. Compare this to NFS, where the server must track client state.
Data distribution: Objects are automatically distributed across a cluster, with replication for durability. There's no single server bottleneck—the system can grow by adding nodes.
Object storage's design reflects CAP theorem trade-offs. To achieve partition tolerance and high availability at global scale, early object stores accepted eventual consistency. AWS S3 originally exhibited read-after-write consistency only for new objects. As of December 2020, S3 offers strong read-after-write consistency—a significant engineering achievement that maintains scalability while eliminating eventual consistency complexities.
| Aspect | Characteristics | Implications |
|---|---|---|
| Latency | Tens to hundreds of milliseconds per request | HTTP overhead; not suitable for low-latency workloads |
| IOPS | Varies widely; optimized for throughput, not IOPS | Not designed for high-frequency random access |
| Throughput | Multi-gigabit per second for large objects | Excellent for streaming large files |
| Scalability | Effectively unlimited—exabytes, trillions of objects | Designed for infinite horizontal scale |
| Durability | 11 nines (99.999999999%) or higher | Data replicated across multiple locations |
When Object Storage Excels
Object storage is the right choice when you need:
Now that we understand each paradigm individually, let's systematically compare them across the dimensions that matter most in real-world system design: access patterns, scalability, performance, cost, and operational complexity.
| Dimension | Block Storage | File Storage | Object Storage |
|---|---|---|---|
| Access Protocol | SCSI, iSCSI, NVMe | NFS, SMB/CIFS | HTTP/REST (S3 API) |
| Namespace Model | Flat (LBAs) | Hierarchical (directories/files) | Flat (key-based, simulated hierarchy) |
| Smallest Unit | Block (4KB typical) | File (variable size) | Object (variable size) |
| Concurrent Access | Single host | Multi-host (with coordination) | Unlimited (stateless) |
| Modification Model | In-place updates | In-place updates with locking | Object replacement only |
| Scalability Limit | Volume size (tens of TB) | Millions of files, PBs with scale-out | Trillions of objects, exabytes |
| Typical Latency | Sub-millisecond to milliseconds | Milliseconds | Tens of milliseconds |
| Cost (per GB) | Highest | Middle | Lowest |
| Durability | Volume-dependent | Server-dependent | 11+ nines (distributed) |
Decision Framework for Storage Selection
Use this systematic approach when choosing storage for a new system:
1. Characterize Your Access Pattern
2. Assess Scalability Requirements
3. Evaluate Performance Needs
4. Consider Cost Constraints
Real-world systems often combine storage paradigms. A video streaming platform might use: block storage for its transcoding servers' local processing, file storage for shared editing workflows, and object storage for final delivery. Each paradigm handles the use case it was designed for. Don't force one paradigm to do everything.
Understanding how we arrived at today's storage landscape illuminates why each paradigm exists and where the industry is heading.
The Historical Progression
1960s-1970s: Block Storage Era Storage was block-based by necessity. Magnetic drums and early disks provided raw block access. Operating systems developed filesystems to impose structure, but the underlying model was blocks and addresses.
1980s-1990s: File Storage Emergence As networks connected computers, sharing files across machines became essential. NFS (1984) and SMB (1983) protocols enabled network-attached storage. File servers became central infrastructure in enterprises.
2000s: Object Storage Revolution Web companies faced scale problems that file storage couldn't solve. Amazon built S3 (launched 2006) to store unlimited objects for EC2. Google's GFS (2003) and Bigtable (2005) influenced the design. The industry recognized that HTTP-based, flat-namespace storage could scale where hierarchical filesystems failed.
2010s-Present: Convergence and Specialization The paradigms now coexist, with each claiming its optimal domain. New technologies blur the lines: cloud block storage that's network-attached, file storage gateways over object storage, and hybrid solutions offering multiple access protocols to the same data.
Modern data architectures often use object storage as a 'data lake'—a central repository for raw data in any format. Analytics engines (Spark, Presto, Athena) query this data directly, treating object storage as a distributed filesystem. This pattern leverages object storage's cost-effectiveness and scalability for massive analytical workloads.
Before we conclude, let's address misconceptions that lead to poor storage decisions:
Misconception 1: "Object storage is just for backup and archive"
This was true in object storage's early days when latency was high and APIs were limited. Today, object storage serves as primary storage for many applications: static assets, user uploads, data lakes, machine learning datasets. The modern applications treat S3 as a primary data store, not a dumping ground for cold data.
Misconception 2: "Block storage is always faster"
Block storage has lower latency per operation, but for large sequential reads, object storage can achieve higher aggregate throughput. A workload that reads 1GB files benefits from object storage's parallel chunking more than from block storage's low-latency 4KB operations.
Misconception 3: "File storage is obsolete"
File storage remains essential for workloads requiring shared POSIX access: legacy applications, content management, development environments, and media workflows. Cloud-managed file storage (EFS, Azure Files) actually grows faster than traditional NAS as organizations lift-and-shift file-based workloads.
Misconception 4: "I should minimize storage types for simplicity"
The opposite is often true. Using the right storage type for each workload simplifies operations by avoiding workarounds. Databases on block storage, shared assets on file storage, archives on object storage—each in its natural habitat—is simpler than forcing one paradigm to serve all needs.
Mismatched storage paradigms waste money and create operational pain. Running object storage workloads on block storage costs 10x more. Using file storage for billions of objects creates bottlenecks and outages. Performance-sensitive databases on high-latency object storage fail to meet SLAs. Always match the paradigm to the workload.
We've covered substantial ground in understanding the fundamental storage paradigms. Let's consolidate the key insights:
What's next:
Now that we understand the fundamental differences between storage paradigms, we'll dive deep into the object storage model specifically. The next page explores the internal architecture of object storage systems—how objects are structured, stored, accessed, and distributed across clusters. This understanding is essential for designing systems that effectively leverage object storage at scale.
You now understand the three fundamental storage paradigms and when to choose each. Block storage provides low-latency, single-host access for databases and VMs. File storage enables shared access with POSIX semantics. Object storage delivers infinite scale at the lowest cost for web-scale data. This foundation prepares you for understanding object storage's internal model in the next page.