Loading content...
While Apache Cassandra optimized for availability and write performance, a different wide-column store emerged from the Hadoop ecosystem with different priorities: Apache HBase. Built as an open-source implementation of Google's Bigtable, HBase provides random, real-time read/write access to massive datasets—billions of rows and millions of columns—while integrating seamlessly with the Hadoop ecosystem for batch analytics.
HBase powers some of the world's most demanding data platforms. Facebook's messaging system originally ran on HBase before moving to internal solutions. Alibaba processes petabytes of data through HBase for their e-commerce platform. Adobe, Yahoo, and countless others rely on HBase for workloads that require both random access and batch processing on the same data.
The key distinction from Cassandra lies in HBase's architectural choices: where Cassandra prioritizes availability in CAP theorem terms, HBase prioritizes consistency. HBase uses a leader-based architecture with ZooKeeper coordination, ensuring strong consistency guarantees that Cassandra cannot provide by default. This makes HBase the natural choice when you need the wide-column model with transactional semantics.
By the end of this page, you will understand HBase's master-based architecture, how it leverages HDFS for storage and ZooKeeper for coordination, the RegionServer model for data distribution, and when HBase is the right choice compared to Cassandra or other wide-column stores.
HBase's design is a direct descendant of Google's Bigtable paper (2006), which described a distributed storage system for managing petabytes of data across thousands of commodity servers. Understanding this lineage explains many of HBase's architectural decisions.
The Google Bigtable Influence
Google designed Bigtable to meet specific requirements for their internal services:
Bigtable achieved this through a master-based architecture using Google's GFS (distributed file system) for storage and Chubby (distributed lock service) for coordination.
HBase as an Open-Source Clone
HBase maps Bigtable's components to the Hadoop ecosystem:
| Bigtable Component | HBase Equivalent | Function |
|---|---|---|
| GFS | HDFS | Distributed file system for storage |
| Chubby | ZooKeeper | Distributed coordination service |
| Bigtable Master | HMaster | Cluster management, region assignment |
| Tablet Server | RegionServer | Data serving, read/write operations |
| SSTable | HFile | Immutable sorted data files |
| Aspect | HBase | Cassandra |
|---|---|---|
| CAP Position | CP (Consistent, Partition-tolerant) | AP (Available, Partition-tolerant) |
| Architecture | Master-based (HMaster + RegionServers) | Masterless (peer-to-peer) |
| Storage | HDFS (distributed file system) | Local storage per node |
| Coordination | ZooKeeper (centralized) | Gossip protocol (decentralized) |
| Consistency | Strong (row-level atomicity) | Tunable (eventual to strong) |
| Write model | Single RegionServer per row | Any node, any time |
| Failure handling | Failover with brief unavailability | No single point of failure |
HBase's Niche: Hadoop Ecosystem Integration
HBase's primary advantage is its tight integration with the Hadoop ecosystem:
This makes HBase ideal for organizations already invested in the Hadoop ecosystem who need random access to large datasets that are also processed by batch jobs.
Choose HBase when you need: (1) Strong consistency guarantees per row, (2) integration with Hadoop batch processing, (3) random access to massive datasets, or (4) column-level security and cell-level ACLs. If you don't have Hadoop or don't need these features, Cassandra's simpler operational model may be preferable.
HBase's architecture consists of several interconnected components, each with specific responsibilities. Understanding these components is essential for operations and troubleshooting.
The Master Server (HMaster)
The HMaster handles cluster coordination and metadata management:
Importantly, the HMaster is not on the data path—clients communicate directly with RegionServers for reads and writes. The HMaster handles only metadata operations, making it a less critical single point of failure than it might first appear.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
┌────────────────────────────────────────────────────────────────────────────┐│ HBASE ARCHITECTURE │├────────────────────────────────────────────────────────────────────────────┤│ ││ ┌──────────────────┐ ┌──────────────────┐ ││ │ Client │ │ Client │ ││ │ (Java/Thrift/ │ │ │ ││ │ REST) │ │ │ ││ └────────┬─────────┘ └────────┬─────────┘ ││ │ │ ││ │ (1) Get region location from ZK/Meta ││ │ (2) Direct read/write to RegionServer ││ │ │ ││ ▼ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ ZooKeeper Ensemble │ ││ │ │ ││ │ • Root region location • Active HMaster election │ ││ │ • RegionServer liveness • Cluster configuration │ ││ │ • Schema version • Distributed locking │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ HMaster (Active) │ HMaster (Standby) │ ││ │ ──────────────── │ ────────────────── │ ││ │ • Region assignment │ Watches ZK for failover │ ││ │ • DDL operations │ │ ││ │ • Load balancing │ │ ││ │ • NOT on data path │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ││ │ RegionServer 1 │ │ RegionServer 2 │ │ RegionServer 3 │ ... ││ │ ──────────────│ │ ──────────────│ │ ──────────────│ ││ │ Region A │ │ Region C │ │ Region E │ ││ │ Region B │ │ Region D │ │ Region F │ ││ │ │ │ │ │ │ ││ │ • WAL (local) │ │ • WAL (local) │ │ • WAL (local) │ ││ │ • MemStore │ │ • MemStore │ │ • MemStore │ ││ │ • Block Cache │ │ • Block Cache │ │ • Block Cache │ ││ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ HDFS │ ││ │ │ ││ │ HFile (SSTables) - Immutable sorted data files │ ││ │ WAL files - Write-ahead logs for durability │ ││ │ Replicated 3x by default │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────────────┘RegionServers: The Data Serving Workhorses
RegionServers handle all read and write operations. Each RegionServer:
ZooKeeper: The Coordination Layer
ZooKeeper provides distributed coordination services that HBase depends on:
HDFS: The Storage Layer
Unlike Cassandra (which uses local storage), HBase stores all data on HDFS:
HDFS is optimized for throughput, not latency. Typical HDFS read latency is 5-10ms vs. <1ms for local SSD. HBase mitigates this with aggressive caching (block cache, bucket cache) but cannot match pure local storage performance. Consider this when evaluating HBase for latency-sensitive workloads.
HBase's data model closely follows the Bigtable column-family model we explored earlier, with some HBase-specific terminology and characteristics.
Tables, Rows, Column Families, and Cells
HBase organizes data hierarchically:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// HBase table structure example: User Activity Tracking // Table: user_activity// Column Families: 'profile', 'sessions', 'metrics' // Create table with column familiesHTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("user_activity")); // Column family: profile (static data, rarely changes)HColumnDescriptor profileCF = new HColumnDescriptor("profile");profileCF.setMaxVersions(1); // Only keep latest versionprofileCF.setTimeToLive(HConstants.FOREVER); // Never expireprofileCF.setCompression(Compression.Algorithm.SNAPPY);tableDescriptor.addFamily(profileCF); // Column family: sessions (time-series, expires after 30 days)HColumnDescriptor sessionsCF = new HColumnDescriptor("sessions");sessionsCF.setMaxVersions(1);sessionsCF.setTimeToLive(30 * 24 * 60 * 60); // 30 days TTLsessionsCF.setCompression(Compression.Algorithm.LZ4);tableDescriptor.addFamily(sessionsCF); // Column family: metrics (counters, keep history)HColumnDescriptor metricsCF = new HColumnDescriptor("metrics");metricsCF.setMaxVersions(100); // Keep 100 versions for trend analysismetricsCF.setTimeToLive(365 * 24 * 60 * 60); // 1 year TTLtableDescriptor.addFamily(metricsCF); admin.createTable(tableDescriptor); // Data structure after writes://// Row Key: "user_12345"// ├── Column Family: "profile"// │ ├── "name": "Alice" (version: t1)// │ ├── "email": "alice@example.com" (version: t1)// │ └── "created_at": "2024-01-01" (version: t1)// ├── Column Family: "sessions"// │ ├── "sess_20240115_001": {ip: "1.2.3.4", device: "mobile"} (version: t5)// │ ├── "sess_20240115_002": {ip: "5.6.7.8", device: "desktop"} (version: t6)// │ └── ... (older sessions auto-deleted after 30 days)// └── Column Family: "metrics"// ├── "page_views": "1523" (versions: t7=1523, t6=1520, t5=1515, ...)// └── "clicks": "342" (versions: t7=342, t6=340, ...)Regions: The Unit of Distribution
A table is horizontally partitioned into regions, where each region contains a contiguous range of row keys:
The region model provides:
12345678910111213141516171819202122232425262728293031323334353637383940
┌────────────────────────────────────────────────────────────────────────────┐│ TABLE REGION DISTRIBUTION │├────────────────────────────────────────────────────────────────────────────┤│ ││ Table: user_activity (sorted by row key lexicographically) ││ ││ Row Keys: "user_00001" ... "user_99999" ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ Logical Table View │ ││ │ ┌────────────────────────────────────────────────────────────────┐ │ ││ │ │ user_00001 | user_00002 | ... | user_50000 | ... | user_99999 │ │ ││ │ └────────────────────────────────────────────────────────────────┘ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ Physical Region Distribution │ ││ │ │ ││ │ Region 1 Region 2 Region 3 │ ││ │ [user_00001, [user_33334, [user_66667, │ ││ │ user_33333] user_66666] user_99999] │ ││ │ │ │ │ │ ││ │ ▼ ▼ ▼ │ ││ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ││ │ │RegionSrv1│ │RegionSrv2│ │RegionSrv3│ │ ││ │ └──────────┘ └──────────┘ └──────────┘ │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ Client Query: GET user_45000 ││ 1. Client asks ZK/Meta: "Which region has user_45000?" ││ 2. Answer: Region 2 ([user_33334, user_66666]) ││ 3. Client caches region location ││ 4. Client sends GET directly to RegionServer 2 ││ 5. RegionServer 2 reads from MemStore or HFiles ││ 6. RegionServer 2 returns result to client ││ ││ Note: HMaster is NOT involved in data operations! ││ │└────────────────────────────────────────────────────────────────────────────┘New tables start with a single region, creating a hotspot for writes. Pre-split tables with expected region boundaries:
CREATE 'user_data', 'cf1', SPLITS => ['user_3', 'user_6', 'user_9']
This immediately distributes writes across multiple RegionServers.
HBase's read and write paths are optimized for different access patterns. Understanding these paths is essential for performance tuning and capacity planning.
The Write Path
When a client writes to HBase:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
┌────────────────────────────────────────────────────────────────────────────┐│ HBASE WRITE PATH │├────────────────────────────────────────────────────────────────────────────┤│ ││ Client ││ │ ││ │ (1) Put/Delete request for row "user_45000" ││ │ ││ │ (2) Look up region location (cached or from hbase:meta) ││ │ ││ ▼ ││ ┌───────────────────────────────────────────────────────────────────┐ ││ │ RegionServer 2 │ ││ │ (hosts region for user_45000) │ ││ └───────────────────────────────────────────────────────────────────┘ ││ │ ││ │ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ (3) Acquire row lock (ensures atomicity within row) │ ││ │ │ │ ││ │ │ (4) Append to WAL (Write-Ahead Log) │ ││ │ │ • Sequential write to HDFS │ ││ │ │ • Synced to disk for durability │ ││ │ │ • Can batch multiple edits for throughput │ ││ │ │ │ ││ │ │ (5) Write to MemStore (in-memory) │ ││ │ │ • One MemStore per column family per region │ ││ │ │ • Sorted data structure (ConcurrentSkipListMap) │ ││ │ │ │ ││ │ │ (6) Release row lock │ ││ │ │ │ ││ │ │ (7) Return success to client │ ││ │ └─────────────────────────────────────────────────────────────────┘ ││ │ ││ │ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ BACKGROUND OPERATIONS │ ││ │ │ │ ││ │ │ (8) MemStore Flush (when size threshold reached ~128MB) │ ││ │ │ • Create new HFile on HDFS │ ││ │ │ • Clear MemStore │ ││ │ │ • Delete corresponding WAL entries │ ││ │ │ │ ││ │ │ (9) Compaction (background process) │ ││ │ │ • Minor: Merge recent HFiles (reduce file count) │ ││ │ │ • Major: Merge all HFiles (remove deleted data) │ ││ │ │ │ ││ │ └─────────────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────────────┘Critical WAL Behaviors:
hbase.regionserver.optionallogflushinterval)The Read Path
Reading is more complex because data may exist in multiple locations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
┌────────────────────────────────────────────────────────────────────────────┐│ HBASE READ PATH │├────────────────────────────────────────────────────────────────────────────┤│ ││ Client ││ │ ││ │ (1) Get request for row "user_45000" ││ │ ││ │ (2) Look up region location (cached) ││ │ ││ ▼ ││ ┌───────────────────────────────────────────────────────────────────┐ ││ │ RegionServer 2 │ ││ └───────────────────────────────────────────────────────────────────┘ ││ │ ││ │ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ DATA LOCATION SEARCH │ ││ │ │ │ ││ │ │ (3) Block Cache (LRU cache of HFile blocks) │ ││ │ │ └─ Hot data served from memory │ ││ │ │ │ │ ││ │ │ │ (cache miss) │ ││ │ │ ▼ │ ││ │ │ (4) MemStore (recent writes, always checked) │ ││ │ │ └─ O(log n) search in sorted structure │ ││ │ │ │ │ ││ │ │ │ (also check HFiles) │ ││ │ │ ▼ │ ││ │ │ (5) Bloom Filters (per HFile, in memory) │ ││ │ │ └─ "Might this HFile contain row X?" │ ││ │ │ │ │ ││ │ │ │ (positive → read from disk) │ ││ │ │ ▼ │ ││ │ │ (6) HFile Block Index (sparse index) │ ││ │ │ └─ Find which block contains row X │ ││ │ │ │ │ ││ │ │ ▼ │ ││ │ │ (7) Read Block from HDFS │ ││ │ │ └─ Read data block (~64KB) │ ││ │ │ └─ Add to Block Cache │ ││ │ │ │ ││ │ │ (8) Merge Results │ ││ │ │ └─ Combine MemStore + HFiles │ ││ │ │ └─ Apply timestamp filtering │ ││ │ │ └─ Return most recent version(s) │ ││ │ │ │ ││ │ └─────────────────────────────────────────────────────────────────┘ ││ │ ││ │ (9) Return result to client ││ │└────────────────────────────────────────────────────────────────────────────┘HBase provides strong consistency within rows—a significant difference from Cassandra's eventual consistency default. This is achieved through single-master-per-row design and careful coordination via ZooKeeper.
Row-Level Atomicity
HBase guarantees that all operations within a single row are atomic:
However, there are no cross-row transactions in base HBase (though Phoenix adds this capability).
12345678910111213141516171819202122232425262728293031323334353637383940414243
// Row-level atomic operations in HBase // Atomic multi-column put (all or nothing within row)Put put = new Put(Bytes.toBytes("user_12345"));put.addColumn(Bytes.toBytes("profile"), Bytes.toBytes("name"), Bytes.toBytes("Alice"));put.addColumn(Bytes.toBytes("profile"), Bytes.toBytes("email"), Bytes.toBytes("alice@example.com"));put.addColumn(Bytes.toBytes("metrics"), Bytes.toBytes("last_updated"), Bytes.toBytes(System.currentTimeMillis()));table.put(put); // Atomic: all columns written together // Check-and-put: read-modify-write atomicallybyte[] rowKey = Bytes.toBytes("user_12345");byte[] family = Bytes.toBytes("profile");byte[] qualifier = Bytes.toBytes("status"); // Only update if current value is "pending"Put updatePut = new Put(rowKey);updatePut.addColumn(family, qualifier, Bytes.toBytes("active")); boolean success = table.checkAndPut( rowKey, family, qualifier, Bytes.toBytes("pending"), // Expected current value updatePut // New value if condition met);// Returns true if update was applied, false if condition not met // Atomic increment (counter pattern)long newValue = table.incrementColumnValue( Bytes.toBytes("user_12345"), Bytes.toBytes("metrics"), Bytes.toBytes("page_views"), 1 // Increment by 1);// Atomic: no lost updates even under concurrent increments // Batch operations within a row are also atomicRowMutations mutations = new RowMutations(Bytes.toBytes("user_12345"));mutations.add(new Put(Bytes.toBytes("user_12345")) .addColumn(Bytes.toBytes("profile"), Bytes.toBytes("name"), Bytes.toBytes("Alice Updated")));mutations.add(new Delete(Bytes.toBytes("user_12345")) .addColumn(Bytes.toBytes("profile"), Bytes.toBytes("old_field")));table.mutateRow(mutations); // Atomic: both put and delete applied togetherZooKeeper's Role in Consistency
ZooKeeper enables HBase's consistency guarantees by providing:
1. RegionServer Liveness Detection
2. HMaster Election
3. Catalog Table (hbase:meta) Location
1234567891011121314151617181920212223242526
/hbase (root znode)├── /hbase/root-region-server # Location of hbase:meta region│ └── <data>: "regionserver2:16020"│├── /hbase/master # Active HMaster│ └── <data>: "hmaster1:16000"│├── /hbase/backup-masters # Standby HMasters│ ├── hmaster2:16000 # Ephemeral znodes│ └── hmaster3:16000│├── /hbase/rs # RegionServer znodes│ ├── regionserver1,16020,1704067200 # Ephemeral: alive while RS running│ ├── regionserver2,16020,1704067201│ ├── regionserver3,16020,1704067202│ └── ...│├── /hbase/table # Table metadata│ ├── user_activity # Table state (enabled/disabled)│ └── orders│├── /hbase/splitWAL # WAL splitting coordination│ └── <task znodes for recovery>│└── /hbase/region-in-transition # Regions being reassigned └── <transient state during moves>While HBase can survive individual ZooKeeper node failures (it's a quorum-based system), losing the ZooKeeper quorum (majority of nodes) will prevent HBase from functioning. Always deploy ZooKeeper as a 3+ node ensemble across different failure domains.
Both HBase and Cassandra are wide-column stores, but they serve different needs. Understanding their differences helps you choose the right tool for your use case.
Architectural Differences Summary
| Aspect | HBase | Cassandra |
|---|---|---|
| Consistency Model | Strong (row-level) | Eventual (tunable) |
| Architecture | Master-based (HMaster) | Masterless (peer-to-peer) |
| Storage Backend | HDFS | Local disk per node |
| Coordination | ZooKeeper | Gossip protocol |
| Write Latency | ~10-20ms (HDFS overhead) | ~1-5ms (local writes) |
| Read Latency | ~5-50ms (depends on cache) | ~1-10ms |
| Availability | Region unavailable during failover (~30s) | Always available |
| Scaling | Add RegionServers + wait for rebalance | Add nodes, vnodes auto-balance |
| Multi-DC | Replication via HBase-DR or CopyTable | Built-in multi-DC replication |
| Operational Complexity | High (ZK + HDFS + HBase) | Medium (single system) |
| Hadoop Integration | Native MapReduce/Spark support | External connectors |
Choose HBase When:
Choose Cassandra When:
One practical heuristic: If you're already using Hadoop and need random access to data that's also processed by Spark/MapReduce, choose HBase. If you're starting fresh and need a distributed database with simple operations and high availability, choose Cassandra.
We've explored Apache HBase as the Hadoop-native wide-column store, understanding its architecture, data model, and trade-offs compared to Cassandra. Let's consolidate the key insights:
What's Next:
With a deep understanding of both column-family model fundamentals and two major implementations (Cassandra and HBase), we'll now explore the workload characteristics that make wide-column stores shine. The next page examines write-optimized workloads—understanding why these databases excel at high-throughput writes and how to design systems that leverage this strength.
You now understand HBase's architecture, consistency model, and how it differs from Cassandra. This knowledge enables you to evaluate HBase for your use cases, design effective data models, and make informed decisions about which wide-column store fits your requirements.