Loading learning content...
Amazon S3 has become the de facto standard API for object storage. Countless applications, frameworks, and tools are built assuming S3 compatibility. But what if you need to run object storage on-premises? What if regulatory requirements prevent cloud usage? What if you simply want to avoid cloud vendor lock-in?
MinIO answers these questions with radical simplicity. Instead of building a complex distributed system with multiple component types (like Ceph), MinIO focuses on doing one thing exceptionally well: providing high-performance S3-compatible object storage that's easy to deploy and operate.
Launched in 2014 and written in Go, MinIO has become the most popular self-hosted S3-compatible object storage, with over 44,000 GitHub stars and adoption by companies like Adobe, PayPal, and Siemens. Its appeal lies in its simplicity: a single binary that can run standalone on a laptop for development or distributed across hundreds of nodes for enterprise scale.
By the end of this page, you will understand MinIO's architecture, its erasure coding implementation, how data is distributed across drives and nodes, multi-site replication, bucket versioning, and the operational characteristics that make MinIO the go-to choice for self-hosted object storage.
MinIO was built with a contrarian philosophy: distributed storage doesn't need to be complicated. While systems like Ceph and HDFS introduce multiple specialized components, MinIO takes a minimalist approach.
The key insight: MinIO recognized that most distributed storage complexity exists to handle edge cases and provide features that many workloads don't need. By focusing on object storage (not file or block) and accepting certain constraints (immutable objects, erasure coding only), MinIO achieves remarkable simplicity.
Comparison of Architecture Complexity:
| System | Components | Dependencies |
|---|---|---|
| MinIO | MinIO server (single binary) | None (Go static binary) |
| Ceph | MON, MGR, OSD, MDS, RGW | Multiple daemons, etcd-like consensus |
| HDFS | NameNode, DataNode, ZKFailoverController | JVM, ZooKeeper for HA |
| GlusterFS | glusterd, glusterfsd, geo-replication | Multiple daemons per node |
MinIO's simplicity translates directly to operational benefits: faster deployment, easier troubleshooting, lower training requirements, and fewer moving parts that can fail. For teams without dedicated storage engineers, this simplicity can be the deciding factor.
MinIO's architecture is elegantly simple compared to other distributed storage systems. A MinIO deployment consists of MinIO server processes, each managing local drives, with all nodes being equal peers (no master/slave distinction).
Key Architectural Concepts:
| Term | Description |
|---|---|
| Server | A MinIO process managing one or more drives |
| Drive | A mounted disk path (XFS-formatted recommended) |
| Server Pool | A group of servers forming an erasure set for parity calculation |
| Erasure Set | A subset of drives across which one object's data/parity shards are distributed |
| Bucket | Logical container for objects (S3 concept) |
| Object | A file stored in MinIO, accessed via S3 API |
Deployment Modes:
Standalone (single-node single-drive) — For development and testing. No redundancy.
Standalone erasure code (single-node multi-drive) — One node with 4+ drives. Erasure coding within the node.
Distributed (multi-node) — Multiple nodes, each with multiple drives. Erasure coding spans nodes for node-level fault tolerance.
# Single node, 4 drives
minio server /data1 /data2 /data3 /data4
# Distributed, 4 nodes × 4 drives each
minio server http://node{1...4}/data{1...4}
The distributed syntax ({1...4}) is MinIO-specific shorthand for specifying multiple nodes and drives.
Every MinIO node is a peer—there's no designated master. Any node can serve any request. This eliminates single points of failure and simplifies operations. Use a load balancer for distributing client traffic, but any node can handle any operation.
MinIO uses erasure coding (not replication) as its primary durability mechanism. This provides space efficiency comparable to RAID while tolerating multiple drive failures.
How Erasure Coding Works in MinIO:
When an object is written, MinIO divides it into data shards and computes parity shards using Reed-Solomon encoding:
Object: photo.jpg (8 MB) Erasure Set: 8 drives (default configuration)Data Shards: 4 (50% of drives)Parity Shards: 4 (50% of drives) Distribution: Drive 1: photo.jpg.part.1 (data shard 1, 2MB) Drive 2: photo.jpg.part.2 (data shard 2, 2MB) Drive 3: photo.jpg.part.3 (data shard 3, 2MB) Drive 4: photo.jpg.part.4 (data shard 4, 2MB) Drive 5: photo.jpg.part.5 (parity shard 1, 2MB) Drive 6: photo.jpg.part.6 (parity shard 2, 2MB) Drive 7: photo.jpg.part.7 (parity shard 3, 2MB) Drive 8: photo.jpg.part.8 (parity shard 4, 2MB) Storage used: 16 MB total (2x raw size)Can tolerate: loss of any 4 drives Reading: Need any 4 shards (data or parity) to reconstructErasure Set Sizing:
The number of drives per erasure set determines fault tolerance and storage efficiency:
| Drives | Data | Parity | Efficiency | Fault Tolerance |
|---|---|---|---|---|
| 4 | 2 | 2 | 50% | 2 drives |
| 8 | 4 | 4 | 50% | 4 drives |
| 16 | 8 | 8 | 50% | 8 drives |
| 16 | 12 | 4 | 75% | 4 drives |
MinIO automatically calculates erasure set size based on total drives. For example, 16 drives across 4 nodes creates one erasure set of 16 drives.
Inline vs. Multi-Part:
Small objects (default <128KB) are written inline—all shards in a single write. Larger objects use multi-part upload where each part is independently striped.
You cannot change erasure set configuration after deployment. Adding drives requires adding complete expansion pools, not individual drives. Plan your initial configuration carefully—mistakes require data migration to fix.
MinIO stores objects in a straightforward directory structure on each drive. Understanding this format helps with troubleshooting and recovery.
On-Disk Layout:
/data1/ # Drive mount point├── .minio.sys/ # System metadata│ ├── config/ # Cluster configuration│ ├── buckets/ # Bucket metadata│ └── pool.bin # Pool layout├── bucket-name/ # User bucket│ ├── object-name/ # Object directory│ │ ├── xl.meta # Object metadata (JSON)│ │ └── part.1 # Data/parity shard│ └── prefix/│ └── nested-object/│ ├── xl.meta│ └── part.1└── another-bucket/ └── ...xl.meta File:
The xl.meta file contains object metadata in JSON format:
{
"version": "1.0.0",
"checksum": {
"algorithm": "highwayhash256S",
"sum": "..."
},
"erasure": {
"algorithm": "ReedSolomon",
"data": 4,
"parity": 4
},
"meta": {
"content-type": "image/jpeg",
"x-amz-meta-custom": "value"
},
"parts": [...]
}
Key Points:
Because objects are stored as regular files with self-describing metadata, disaster recovery is straightforward. If MinIO won't start, you can still browse the data directory and recover files manually. Compare this to Ceph's OSD format, which requires Ceph tools to access.
MinIO's primary interface is the S3 API. Compatibility is extensive—most S3 SDKs and tools work without modification.
| Category | Feature | Support |
|---|---|---|
| Basic Operations | GetObject, PutObject, DeleteObject | ✓ Full |
| Bucket Operations | Create, Delete, List, Policy | ✓ Full |
| Multipart Upload | Initiate, Upload, Complete, Abort | ✓ Full |
| Versioning | Bucket versioning, version listing | ✓ Full |
| Object Lock | Governance/Compliance mode, retention | ✓ Full |
| Lifecycle | Expiration, transition rules | ✓ Full |
| Server-Side Encryption | SSE-S3, SSE-KMS, SSE-C | ✓ Full |
| Notifications | Bucket events to Kafka, AMQP, Webhook | ✓ Full |
| Select API | S3 Select for CSV/JSON/Parquet | ✓ Full |
| Access Control | Bucket policies, IAM policies | ✓ Full |
| Presigned URLs | Temporary access URLs | ✓ Full |
Using Standard S3 Tools:
# AWS CLI with MinIO
aws --endpoint-url http://minio:9000 s3 ls
aws --endpoint-url http://minio:9000 s3 cp file.txt s3://bucket/
# Python boto3
import boto3
s3 = boto3.client('s3',
endpoint_url='http://minio:9000',
aws_access_key_id='minioadmin',
aws_secret_access_key='minioadmin'
)
s3.upload_file('file.txt', 'bucket', 'file.txt')
# mc (MinIO Client - specialized tool)
mc alias set myminio http://minio:9000 minioadmin minioadmin
mc cp file.txt myminio/bucket/
MinIO Client (mc):
While standard S3 tools work, MinIO provides mc—a feature-rich CLI for advanced operations:
# Mirror local directory to MinIOmc mirror /local/data myminio/bucket # Watch bucket for changesmc watch myminio/bucket # Find large objectsmc find myminio/bucket --larger 100MB # Server-side copy between bucketsmc cp --recursive myminio/source/ myminio/dest/ # Diff between local and remotemc diff /local/data myminio/bucket # Admin operationsmc admin info myminiomc admin heal myminiomc admin update myminioWhile MinIO targets 100% S3 compatibility, some edge cases differ. Pre-signed URLs may have different default expiration. Some newer S3 features may lag. Always test your specific use case, especially if migrating from AWS S3.
MinIO provides full S3-compatible versioning and object lock capabilities, essential for regulatory compliance and data protection.
Bucket Versioning:
When enabled, every write creates a new version instead of overwriting:
# Enable versioningmc version enable myminio/bucket # Upload creates versionmc cp file.txt myminio/bucket/ # Creates version V1 # Modify and re-uploadecho "modified" >> file.txtmc cp file.txt myminio/bucket/ # Creates version V2 # Original still accessiblemc ls --versions myminio/bucket/file.txt# file.txt (v2) 2024-01-15 10:30:00# file.txt (v1) 2024-01-15 10:00:00 mc cp myminio/bucket/file.txt?versionId=v1 restored.txtObject Lock (WORM):
Object Lock prevents objects from being deleted or overwritten, even by administrators:
| Mode | Description | Can Override |
|---|---|---|
| Governance | Protected but can be bypassed with special permission | Yes, with s3:BypassGovernanceRetention |
| Compliance | Cannot be bypassed by anyone, including root | No, wait for retention to expire |
Use Cases:
# Create bucket with object lock enabledmc mb --with-lock myminio/secure-bucket # Set default retention (30 days governance)mc retention set --default GOVERNANCE 30d myminio/secure-bucket # Upload with specific retentionmc cp file.txt myminio/secure-bucket/ \ --retention-mode compliance \ --retention-until 2025-01-01 # Apply legal holdmc legalhold set myminio/secure-bucket/file.txt # Attempting delete failsmc rm myminio/secure-bucket/file.txt# ERROR: Object is lockedOnce an object is in Compliance mode, NOTHING can delete it until retention expires—not even MinIO administrators or deleting the server. Test thoroughly in development before applying in production. Misconfigured retention periods are extremely costly.
MinIO supports bucket-level replication between clusters for disaster recovery, geo-distribution, and data synchronization.
Replication Modes:
| Mode | Description | Use Case |
|---|---|---|
| Server-Side Replication | Automatic, near-real-time replication | DR, multi-datacenter |
| Active-Active | Bidirectional replication between sites | Global active-active |
| Active-Passive | One-way replication to standby site | Traditional DR |
| Batch Replication | Scheduled bulk sync | Initial seeding, catch-up |
Configuring Bucket Replication:
# Set up remote target
mc admin bucket remote add myminio/source-bucket
http://accesskey:secretkey@remote-minio:9000/target-bucket
--service replication
# Enable replication
mc replicate add myminio/source-bucket
--remote-bucket target-bucket
--replicate "delete,delete-marker,existing-objects"
# Check replication status
mc replicate status myminio/source-bucket
What Gets Replicated:
MinIO exposes replication metrics via Prometheus. Monitor minio_cluster_replication_pending_bytes and minio_cluster_replication_failed_operations to catch replication issues before they cause data loss during site failures.
MinIO is designed for high performance on modern hardware. Understanding its performance characteristics helps optimize deployments.
Benchmark Results (MinIO published):
| Metric | Performance |
|---|---|
| GET (read) | 325 GiB/s aggregate |
| PUT (write) | 165 GiB/s aggregate |
| Single object | 10+ GB/s for large objects |
| Small objects | 100k+ objects/second |
Results from 32-node cluster with NVMe drives and 100GbE networking
Performance Factors:
Small Object Optimization:
Small objects (<128KB by default) are handled specially:
Performance Tuning:
# Increase GOMAXPROCS for more CPU utilization
export GOMAXPROCS=
minio server ...
# Use multiple drives with high queue depth
minio server /mnt/disk{1...12}
# Enable drive caching for metadata-heavy workloads
minio server --config-dir=/config --certs-dir=/certs /data{1...4}
Published benchmarks use optimal conditions (NVMe, 100GbE, large objects). Real-world performance depends on your hardware, object size distribution, and access patterns. Always benchmark with your actual workload before sizing production deployments.
MinIO is built for cloud-native environments, with first-class Kubernetes support through the MinIO Operator.
MinIO Operator:
The operator manages MinIO tenants (clusters) on Kubernetes:
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
name: minio-tenant
namespace: minio
spec:
image: minio/minio:latest
pools:
- servers: 4
volumesPerServer: 4
volumeClaimTemplate:
spec:
storageClassName: local-nvme
resources:
requests:
storage: 1Ti
requestAutoCert: true
s3:
bucketDNS: true
Key Kubernetes Features:
| Feature | Description |
|---|---|
| Operator | Manages tenant lifecycle, upgrades, configuration |
| Console | Web UI for administration, metrics visualization |
| CSI Driver | Use MinIO as PVC backend for other workloads |
| Auto-TLS | Automatic certificate generation and rotation |
| Prometheus Integration | Native metrics export for monitoring |
| Horizontal Scaling | Add pools via tenant spec updates |
| StatefulSet Per Pool | Predictable pod naming and storage binding |
# Install MinIO Operatorhelm repo add minio-operator https://operator.min.iohelm install minio-operator minio-operator/operator # Create tenantkubectl apply -f tenant.yaml # Access MinIO Consolekubectl port-forward svc/minio-tenant-console 9443:9443 # Get credentialskubectl get secret minio-tenant-user-1 -o jsonpath='{.data.CONSOLE_ACCESS_KEY}' | base64 -dFor best performance, use local NVMe drives (local persistent volumes) rather than network storage. MinIO provides its own redundancy through erasure coding—you don't need the storage layer to be redundant too. Replicating already-erasure-coded data through a SAN is wasteful.
We've explored MinIO's approach to self-hosted object storage—achieving S3 compatibility and high performance through radical simplicity. Let's consolidate the key concepts:
What's Next:
We've now explored four major distributed storage systems: HDFS, Ceph, GlusterFS, and MinIO. Each has distinct strengths and ideal use cases. In the final page of this module, we'll synthesize this knowledge with a framework for choosing distributed storage—helping you select the right system for your specific requirements.
You now understand MinIO's architecture at a level sufficient for designing and deploying self-hosted object storage solutions. You can explain erasure coding trade-offs, S3 compatibility, and operational characteristics.