Database Management SystemDocument and Other Models

Document and Other Data Models

LevelBeginner

Duration75 mins

TopicDocument and Other Models

4 / 5

Column-Family Model

Scaling to Planetary Proportions

When Google needed to store the entire indexed web—billions of pages with their content, links, and metadata—traditional databases couldn't cope. The solution was Bigtable, a distributed storage system that introduced the column-family model to the world. This model, later adopted and refined by Apache Cassandra, Apache HBase, and ScyllaDB, powers some of the largest data systems ever built.

The column-family model represents a fundamental rethinking of data organization. Rather than storing data row-by-row (as in relational and document databases), column-family stores organize data by columns. This seemingly simple change has profound implications for storage efficiency, query patterns, and horizontal scalability.

When Facebook needed to store messages for 2 billion users, they chose Cassandra. When Apple needed to store 10 petabytes of iCloud data, they built on HBase. These systems handle millions of writes per second while distributing data across thousands of nodes.

What You Will Learn

This page provides comprehensive coverage of the column-family data model. You'll understand its origins in Google's Bigtable paper, master the concepts of column families, rows, and sparse storage, learn data modeling patterns, explore consistency and partitioning strategies, and recognize the workloads where column-family stores are the optimal choice.

Understanding Column-Oriented Storage

Before diving into column-family databases, we must understand the fundamental difference between row-oriented and column-oriented storage—and how column-family stores blend both approaches.

Row-Oriented Storage (Traditional RDBMS, Document DBs):

Data is stored contiguously by row. All fields for a single record sit together on disk.

Row 1: [id=1, name="Alice", email="alice@ex.com", age=30, city="NYC"]
Row 2: [id=2, name="Bob", email="bob@ex.com", age=25, city="LA"]
Row 3: [id=3, name="Carol", email="carol@ex.com", age=35, city="NYC"]

Advantages: Excellent for transactional workloads that read/write entire records. Disadvantages: Wasteful when queries only need specific columns (must read entire rows).

Column-Oriented Storage (Analytical Databases):

Data is stored contiguously by column. All values for a single field sit together.

name:  ["Alice", "Bob", "Carol"]
email: ["alice@ex.com", "bob@ex.com", "carol@ex.com"]
age:   [30, 25, 35]
city:  ["NYC", "LA", "NYC"]

Advantages: Excellent for analytical queries scanning specific columns. Enables powerful compression (same-type values cluster together). Disadvantages: Expensive for transactional workloads touching entire records.

Row vs Column Storage Trade-offs
Characteristic	Row-Oriented	Column-Oriented
Single Record Read	Fast (one seek)	Slow (multiple seeks)
Column Scan (aggregate)	Slow (read all columns)	Fast (read only needed columns)
Single Record Write	Fast (append to one location)	Slow (update multiple column files)
Compression	Limited (mixed types)	Excellent (same-type values)
Best For	OLTP, transactions	OLAP, analytics

Column-Family: A Hybrid Approach:

Column-family databases like Cassandra and HBase take a middle path. Data is organized by column families—groups of related columns stored together. Within each column family, data can be accessed by row or by column depending on the query.

This hybrid enables:

Efficient writes (append to column family)
Flexible schema (columns can vary per row)
Good read performance for specific column families
Horizontal scalability across nodes

Terminology Clarification

The term 'column-family' is often confused with 'columnar' databases. Columnar databases (Vertica, ClickHouse, Redshift) are purely column-oriented for analytics. Column-family databases (Cassandra, HBase) organize columns into families and optimize for different access patterns. They share concepts but serve different purposes.

The Column-Family Data Model

The column-family model organizes data using a nested map structure. Understanding this structure is essential for effective data modeling.

Core Concepts:

Row Key (Partition Key): The primary identifier for a row. Determines data distribution across nodes.
Column Family: A container for columns, similar to a table. Column families are defined at schema time.
Column: A name-value pair within a column family. Unlike relational columns, columns can vary per row (sparse storage).
Timestamp: Each cell version is timestamped, enabling versioning and time-travel queries.

The logical structure can be visualized as a nested map:

Map<RowKey, Map<ColumnFamily, Map<Column, Map<Timestamp, Value>>>>

column-family-structure.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Logical Structure of Column-Family Data
 
{
  // Row Key: "user:alice"
  "user:alice": {
    // Column Family: "profile"
    "profile": {
      "name": {"2024-01-15T10:00:00": "Alice Chen"},
      "email": {"2024-01-15T10:00:00": "alice@example.com"},
      "status": {
        "2024-01-20T08:00:00": "active",      // Latest version
        "2024-01-01T00:00:00": "pending"      // Previous version
      }
    },
    // Column Family: "activity"
    "activity": {
      "last_login": {"2024-01-20T08:30:00": "2024-01-20T08:30:00Z"},
      "login_count": {"2024-01-20T08:30:00": "142"},
      "last_action": {"2024-01-20T09:15:00": "viewed_dashboard"}
    }
  },
 
  // Row Key: "user:bob"
  "user:bob": {
    "profile": {
      "name": {"2024-01-10T09:00:00": "Bob Smith"},
      "email": {"2024-01-10T09:00:00": "bob@example.com"},
      "phone": {"2024-01-10T09:00:00": "+1-555-123-4567"}  // Column not in Alice's row!
    },
    "activity": {
      "last_login": {"2024-01-19T14:20:00": "2024-01-19T14:20:00Z"}
      // Note: Bob has fewer activity columns than Alice
    }
  }
}

Sparse Storage:

A critical feature of column-family stores is sparse storage. Unlike relational tables where every row must have every column (with NULLs for missing data), column-family stores only store columns that have values. This is transformative for:

Wide tables: Schemas with thousands of potential columns where each row uses a subset
Evolving schemas: Adding new columns doesn't affect existing rows
Storage efficiency: No space wasted on NULL markers

Example: An e-commerce product catalog might have 500 possible attribute columns (color, size, voltage, material, etc.), but each product only has 10-20 relevant attributes. In a relational database, every product row carries 480+ NULLs. In a column-family store, only populated columns exist.

Column Families at Schema Time

While columns within a family are flexible, column families themselves are typically defined at table creation time. Creating many column families impacts performance (each is stored separately). The general guidance is to group frequently-accessed-together columns into the same family.

Distributed Architecture

Column-family databases are designed from the ground up for massive horizontal scalability. They distribute data across clusters of commodity servers, handling petabytes of data and millions of operations per second.

Partitioning Strategy:

Data is distributed based on the partition key (often the row key or part of it). Each partition is assigned to specific nodes using:

Consistent Hashing: Nodes and keys are mapped to a hash ring. Each node owns a range of the ring. This minimizes data movement when nodes join or leave.
Token Ranges: Each node is assigned token ranges. Keys hash to tokens, determining their home node.
Virtual Nodes (vnodes): Each physical node owns multiple small token ranges, improving load distribution and recovery performance.

cassandra-partitioning.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Cassandra Table with Composite Key
 
CREATE TABLE user_activity (
    user_id UUID,
    activity_date DATE,
    activity_time TIMESTAMP,
    activity_type TEXT,
    details TEXT,
    PRIMARY KEY ((user_id, activity_date), activity_time)
);
 
-- Key Structure:
-- ((user_id, activity_date)) = Partition Key (determines node)
-- activity_time = Clustering Key (determines order within partition)
 
-- This design:
-- 1. Distributes by user AND date (partitions per user per day)
-- 2. Stores activities within a day sorted by time
-- 3. Enables efficient range queries within a partition
 
-- Efficient: Get all activities for a user on a specific day
SELECT * FROM user_activity 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
  AND activity_date = '2024-01-15';
 
-- Efficient: Get activities in a time range within a day
SELECT * FROM user_activity 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
  AND activity_date = '2024-01-15'
  AND activity_time >= '2024-01-15 09:00:00'
  AND activity_time < '2024-01-15 17:00:00';
 
-- INEFFICIENT: Query without partition key (full cluster scan)
-- SELECT * FROM user_activity WHERE activity_type = 'login';

Replication:

Data durability is achieved through replication across multiple nodes. Key concepts:

Replication Factor (RF): Number of copies of each partition (typically 3)
Replication Strategy: How replicas are placed (SimpleStrategy for dev, NetworkTopologyStrategy for production across data centers)
Rack/DC Awareness: Spread replicas across failure domains (racks, data centers)

Consistency Levels in Cassandra
Level	Writes	Reads	Use Case
ONE	Ack from 1 replica	Read from 1 replica	Fastest; eventual consistency; suitable for logs
QUORUM	Ack from majority (RF/2 + 1)	Read from majority	Strong consistency with availability
ALL	Ack from all replicas	Read from all replicas	Strongest; any node failure blocks operation
LOCAL_QUORUM	Quorum in local DC	Quorum in local DC	Low latency with cross-DC replication
EACH_QUORUM	Quorum in each DC	N/A	Strong consistency across data centers

Write Path:

Client sends write to any node (coordinator)
Coordinator determines replica nodes via partition key
Write is sent to all replicas simultaneously
Coordinator waits for required acknowledgments (based on consistency level)
Success returned to client

Locally, writes go to:

Commit Log: Append-only, durable log for crash recovery
Memtable: In-memory sorted structure for recent writes
When memtable fills, it's flushed to SSTable on disk

Partition Key Design is Critical

Poor partition key choice leads to 'hot partitions'—single nodes handling disproportionate traffic. If user_id is the partition key for a celebrity's activity, their node becomes a bottleneck. Design partition keys to distribute load evenly (composite keys, time bucketing).

Data Modeling Principles

Data modeling in column-family databases inverts relational thinking. Instead of normalizing data and creating flexible schemas, you model for queries—designing tables around specific access patterns.

The Golden Rule:

Design your tables based on your queries, not your entities.

In relational databases, you model entities (Users, Orders, Products) and join them at query time. In column-family stores, you model query patterns and denormalize data to support them.

Example: E-commerce Order System

Queries needed:

Get all orders for a user
Get a specific order by ID
Get all orders for a product
Get daily order statistics

query-driven-modeling.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Query-Driven Data Modeling in Cassandra
 
-- Query 1: Get all orders for a user (recent first)
CREATE TABLE orders_by_user (
    user_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    total DECIMAL,
    status TEXT,
    items LIST<FROZEN<order_item>>,
    PRIMARY KEY (user_id, order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);
 
-- Query 2: Get a specific order by ID
CREATE TABLE orders_by_id (
    order_id UUID PRIMARY KEY,
    user_id UUID,
    order_date TIMESTAMP,
    total DECIMAL,
    status TEXT,
    items LIST<FROZEN<order_item>>,
    shipping_address FROZEN<address>,
    billing_address FROZEN<address>
);
 
-- Query 3: Get all orders for a product
CREATE TABLE orders_by_product (
    product_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    user_id UUID,
    quantity INT,
    price DECIMAL,
    PRIMARY KEY (product_id, order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);
 
-- Query 4: Daily order statistics
CREATE TABLE daily_order_stats (
    stat_date DATE,
    hour INT,
    order_count COUNTER,
    total_revenue COUNTER,  -- Store as cents for counter
    PRIMARY KEY (stat_date, hour)
);
 
-- Custom types for nested data
CREATE TYPE order_item (
    product_id UUID,
    product_name TEXT,
    quantity INT,
    price DECIMAL
);
 
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip TEXT,
    country TEXT
);

Notice the denormalization: The same order data exists in multiple tables. This violates relational normalization principles but is the correct approach for column-family stores because:

No JOINs: Column-family stores don't support joins at query time
Partition locality: Each query accesses a single partition
Write performance: Writes are cheap; reads are optimized

The application is responsible for maintaining consistency across denormalized tables—typically through batch writes or eventual consistency.

Data Modeling Best Practices

•Start with queries — List all queries your application needs, then design tables to support each
•One table per query pattern — Embrace denormalization; maintain multiple views of the same data
•Partition for distribution — Choose partition keys that spread data evenly across nodes
•Cluster for sorting — Use clustering columns to store related data in sorted order for efficient range scans
•Limit partition size — Keep partitions under ~100MB; use time bucketing for time-series data
•Embed or reference carefully — Embed small, bounded data; reference large, growing data
•Use counters for aggregates — Counter tables enable distributed increment operations

The Chebotko Diagram

Artem Chebotko developed a visual notation for Cassandra data modeling. Tables are drawn with partition keys (K) and clustering columns (C) clearly marked. This notation helps visualize data distribution and access patterns. Search for 'Chebotko diagram' for examples.

Write and Read Patterns

Column-family stores are optimized for write-heavy workloads with specific read patterns. Understanding the internal mechanics explains their performance characteristics.

The LSM Tree Architecture:

Most column-family stores use Log-Structured Merge (LSM) trees for storage:

Writes go to memory first (Memtable): Very fast, sorted in-memory structure
Memtables flush to disk (SSTable): Immutable, sorted files
SSTables are periodically merged (Compaction): Removes duplicates, reclaims space

This architecture makes writes extremely fast (memory + sequential I/O) at the cost of read overhead (may need to check multiple SSTables).

lsm-tree-flow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# LSM Tree Write Path
 
Client Write
    │
    ▼
┌─────────────┐     ┌─────────────┐
│ Commit Log  │     │  Memtable   │
│ (Append)    │     │ (In-Memory) │
└──────┬──────┘     └──────┬──────┘
       │                   │
       │    Durability     │  When full
       │                   ▼
       │           ┌─────────────┐
       │           │  SSTable 1  │
       │           │ (Immutable) │
       │           └──────┬──────┘
       │                  │
       │           ┌─────────────┐
       │           │  SSTable 2  │
       │           └──────┬──────┘
       │                  │
       │           ┌──────▼──────┐
       │           │ Compaction  │
       │           │  (Merge)    │
       │           └─────────────┘
 
 
# LSM Tree Read Path
 
Client Read
    │
    ▼
┌─────────────┐
│  Memtable   │ ◄── Check first (most recent)
└──────┬──────┘
       │ Not found
       ▼
┌─────────────┐
│ Bloom Filter│ ◄── Quick "definitely not here" check
└──────┬──────┘
       │ Maybe here
       ▼
┌─────────────┐
│SSTable Index│ ◄── Locate potential block
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Data Block  │ ◄── Read actual data
└─────────────┘

Read Optimization Strategies:

Since reads may span multiple SSTables, column-family stores employ several optimization techniques:

Bloom Filters: Probabilistic data structures that quickly determine if a key is definitely not in an SSTable, avoiding unnecessary disk reads.
Key Cache: Caches SSTable index positions for frequently accessed partition keys.
Row Cache: Caches complete rows for hot data (use carefully—can cause memory pressure).
Compression: SSTables are compressed, reducing I/O at the cost of CPU.
Compaction Strategies: Different strategies optimize for different workloads.

Compaction Strategies
Strategy	Behavior	Best For
SizeTiered (STCS)	Merge SSTables of similar size	Write-heavy workloads; default strategy
Leveled (LCS)	Organize SSTables into levels; tighter control	Read-heavy workloads; more predictable latencies
TimeWindow (TWCS)	Compact time-bucketed SSTables	Time-series data; efficient expiration
Unified (UCS)	Adaptive; combines STCS and LCS benefits	Mixed workloads; newer Scylla/Cassandra

Updates Are Inserts

Column-family stores don't update in place. Every 'update' writes a new timestamped version. Deletes write 'tombstones'. Compaction eventually removes old versions and tombstones. This means: frequent updates to the same rows create write amplification; deletes don't immediately reclaim space.

Apache Cassandra Deep Dive

Apache Cassandra is the most widely deployed column-family database. Originally developed at Facebook for inbox search, it combines Bigtable's data model with Amazon Dynamo's distributed architecture.

Key Architectural Features:

Peer-to-peer: No master node; any node can serve any request
Tunable consistency: Choose consistency level per operation
Linear scalability: Add nodes to increase capacity/throughput
Multi-datacenter replication: Built-in cross-DC support
CQL: SQL-like query language (with limitations)

cassandra-operations.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Cassandra CQL Examples
 
-- Create keyspace (database) with replication strategy
CREATE KEYSPACE ecommerce
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,    -- 3 replicas in dc1
    'dc2': 2     -- 2 replicas in dc2
};
 
USE ecommerce;
 
-- Create table with composite primary key
CREATE TABLE products (
    category TEXT,
    product_id UUID,
    name TEXT,
    price DECIMAL,
    description TEXT,
    attributes MAP<TEXT, TEXT>,
    tags SET<TEXT>,
    created_at TIMESTAMP,
    PRIMARY KEY (category, product_id)
);
 
-- Insert with collection types
INSERT INTO products (category, product_id, name, price, attributes, tags, created_at)
VALUES (
    'electronics',
    uuid(),
    'Wireless Headphones',
    99.99,
    {'brand': 'AudioMax', 'color': 'black', 'bluetooth': '5.0'},
    {'wireless', 'audio', 'bestseller'},
    toTimestamp(now())
);
 
-- Query within partition (efficient)
SELECT * FROM products WHERE category = 'electronics';
 
-- Query with clustering key range (efficient)
SELECT * FROM products 
WHERE category = 'electronics' 
  AND product_id >= minTimeuuid('2024-01-01')
  AND product_id < minTimeuuid('2024-02-01');
 
-- Update collection
UPDATE products 
SET tags = tags + {'new-arrival'}
WHERE category = 'electronics' AND product_id = ?;
 
-- Delete with TTL (data expires automatically)
INSERT INTO products (category, product_id, name, price, created_at)
VALUES ('flash-sale', uuid(), 'Limited Offer Item', 49.99, toTimestamp(now()))
USING TTL 86400;  -- Expires in 24 hours
 
-- Lightweight transactions (LWT) for compare-and-set
UPDATE products 
SET price = 89.99 
WHERE category = 'electronics' AND product_id = ?
IF price = 99.99;  -- Only update if current price is 99.99

Cassandra Strengths

•Linear horizontal scalability
•No single point of failure
•Built-in multi-DC replication
•Tunable consistency levels
•Excellent write performance
•Proven at massive scale (Apple, Netflix)

Cassandra Limitations

•No JOINs or subqueries
•Limited aggregation capabilities
•Denormalization required
•Eventual consistency by default
•Schema changes can be complex
•Operational complexity at scale

ScyllaDB: Cassandra-Compatible, Faster

ScyllaDB is a C++ reimplementation of Cassandra, offering the same data model and CQL but with significantly better performance (10x throughput claimed). It's wire-compatible, meaning applications can switch without code changes. Consider ScyllaDB for new deployments requiring Cassandra semantics.

Apache HBase and the Hadoop Ecosystem

Apache HBase is a column-family database modeled directly after Google's Bigtable. Unlike Cassandra's peer-to-peer architecture, HBase uses a master-slave design built on top of HDFS (Hadoop Distributed File System).

HBase Architecture:

HMaster: Coordinates region assignment, schema changes, load balancing
RegionServers: Serve read/write requests for assigned regions
Regions: Horizontal partitions of tables by row key range
HDFS: Underlying distributed storage (write-ahead logs, HFiles)
ZooKeeper: Coordination, master election, configuration

This architecture provides strong consistency (unlike Cassandra's eventual consistency) but introduces the HMaster as a potential bottleneck.

Cassandra vs HBase Comparison
Aspect	Cassandra	HBase
Architecture	Peer-to-peer, masterless	Master-slave (HMaster + RegionServers)
Consistency	Tunable (eventual to strong)	Strong (single-row ACID)
Storage	Local storage on each node	HDFS (shared storage)
Query Language	CQL (SQL-like)	Java API, shell, Phoenix (SQL layer)
Hadoop Integration	Spark connector	Native (runs on Hadoop)
Multi-DC	Built-in	Add-on (HBase replication)
Best For	Write-heavy, geo-distributed	Hadoop ecosystem, batch + random access

hbase-operations.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// HBase Java API Examples
 
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
 
// Connect to HBase
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
TableName tableName = TableName.valueOf("users");
 
// Create table with column families
Admin admin = connection.getAdmin();
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
tableDesc.addFamily(new HColumnDescriptor("profile"));
tableDesc.addFamily(new HColumnDescriptor("activity"));
admin.createTable(tableDesc);
 
// Get table reference
Table table = connection.getTable(tableName);
 
// Put (Insert/Update)
Put put = new Put(Bytes.toBytes("user:alice"));
put.addColumn(
    Bytes.toBytes("profile"),      // Column family
    Bytes.toBytes("name"),          // Column qualifier
    Bytes.toBytes("Alice Chen")     // Value
);
put.addColumn(
    Bytes.toBytes("profile"),
    Bytes.toBytes("email"),
    Bytes.toBytes("alice@example.com")
);
put.addColumn(
    Bytes.toBytes("activity"),
    Bytes.toBytes("last_login"),
    Bytes.toBytes("2024-01-20T08:30:00Z")
);
table.put(put);
 
// Get (Read single row)
Get get = new Get(Bytes.toBytes("user:alice"));
get.addFamily(Bytes.toBytes("profile"));  // Only profile family
Result result = table.get(get);
byte[] name = result.getValue(
    Bytes.toBytes("profile"), 
    Bytes.toBytes("name")
);
System.out.println("Name: " + Bytes.toString(name));
 
// Scan (Range query)
Scan scan = new Scan()
    .withStartRow(Bytes.toBytes("user:a"))
    .withStopRow(Bytes.toBytes("user:z"))
    .addFamily(Bytes.toBytes("profile"));
 
ResultScanner scanner = table.getScanner(scan);
for (Result row : scanner) {
    String rowKey = Bytes.toString(row.getRow());
    String userName = Bytes.toString(
        row.getValue(Bytes.toBytes("profile"), Bytes.toBytes("name"))
    );
    System.out.println(rowKey + ": " + userName);
}
scanner.close();

When to Choose HBase:

You're already invested in the Hadoop ecosystem
You need strong consistency for single-row operations
You require tight integration with Spark, MapReduce, or Hive
Your workload involves both batch analytics and random access
You can tolerate the operational complexity of HMaster + ZooKeeper

Apache Phoenix

Apache Phoenix provides a SQL layer over HBase, enabling SQL queries, secondary indexes, and JDBC connectivity. This dramatically lowers the barrier to entry for teams familiar with SQL while retaining HBase's scalability.

Time-Series and IoT Workloads

Column-family databases excel at time-series data—metrics, logs, sensor readings, and event streams. The write-optimized LSM architecture handles high ingestion rates, while the sorted storage enables efficient time-range queries.

Time-Series Modeling Patterns:

The key challenge in time-series modeling is partition sizing. Too wide (all data for a sensor in one partition) creates hot spots and unbounded growth. Too narrow (every reading in separate partition) prevents efficient range scans.

time-series-modeling.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Time-Series Data Modeling in Cassandra
 
-- Pattern 1: Time-Bucketed Partitions
-- Partition by sensor + time bucket; order by timestamp within bucket
 
CREATE TABLE sensor_readings (
    sensor_id UUID,
    bucket TIMESTAMP,           -- Hourly bucket
    reading_time TIMESTAMP,
    temperature FLOAT,
    humidity FLOAT,
    pressure FLOAT,
    PRIMARY KEY ((sensor_id, bucket), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
  AND default_time_to_live = 2592000  -- 30 days TTL
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 1};
 
-- Writes: Application calculates bucket from reading time
-- E.g., bucket = reading_time rounded down to hour
 
INSERT INTO sensor_readings (sensor_id, bucket, reading_time, temperature, humidity, pressure)
VALUES (
    123e4567-e89b-12d3-a456-426614174000,
    '2024-01-15 10:00:00',      -- Bucket
    '2024-01-15 10:23:45',      -- Exact time
    22.5, 45.0, 1013.25
);
 
-- Efficient: Get last hour of readings for a sensor
SELECT * FROM sensor_readings 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000
  AND bucket = '2024-01-15 10:00:00'
ORDER BY reading_time DESC
LIMIT 100;
 
-- Aggregate materialized view for dashboards
CREATE TABLE sensor_hourly_stats (
    sensor_id UUID,
    hour TIMESTAMP,
    min_temperature FLOAT,
    max_temperature FLOAT,
    avg_temperature FLOAT,
    reading_count INT,
    PRIMARY KEY (sensor_id, hour)
) WITH CLUSTERING ORDER BY (hour DESC);
 
-- Pattern 2: Rollup Tables for Different Granularities
-- Minute -> Hour -> Day -> Month hierarchies
-- Application or batch job populates rollups

Time-Series Modeling Strategies
Strategy	Partition Key	Pros	Cons
Single partition per entity	sensor_id	Simple queries	Unbounded partition growth; hot spots
Time-bucketed	(sensor_id, bucket)	Bounded partitions; time-range efficient	Cross-bucket queries need multiple fetches
Reverse timestamp	sensor_id, -timestamp	Latest data first	Complex timestamp handling
Sharded by time	(sensor_id, shard)	Spread writes	More complex routing logic

IoT at Scale:

IoT deployments generate massive write volumes:

1 million sensors × 1 reading/second = 86.4 billion readings/day
Each reading ~100 bytes = 8.64 TB/day raw data

Column-family stores handle this by:

Distributing writes across many nodes
Using TTL to automatically expire old data
Compaction strategies optimized for time-windowed data (TWCS)
Pre-aggregating at ingest time for dashboard queries

Purpose-Built Time-Series Databases

For pure time-series workloads, consider purpose-built databases like InfluxDB, TimescaleDB (Postgres extension), or QuestDB. They offer time-series-specific features (downsampling, continuous queries, optimized storage) that column-family stores lack. Use Cassandra/HBase when time-series is one of multiple workloads.

Summary: Column-Family Model Essentials

The column-family model enables data systems at scales that relational databases cannot approach. By organizing data around column families, embracing denormalization, and optimizing for write-heavy workloads, these databases power some of the world's largest applications. Let's consolidate the essential concepts:

Key Takeaways

•Column-family combines row and column organization — Data is grouped by column families, enabling both row access and column scans
•Sparse storage enables flexible schemas — Columns vary per row without NULL overhead; perfect for wide, evolving schemas
•Distributed architecture is foundational — Consistent hashing, replication, and tunable consistency enable petabyte-scale deployments
•Model for queries, not entities — Denormalization is required; design tables around access patterns, not relational normalization
•LSM trees optimize for writes — Writes are fast (memory + append); reads require checking multiple SSTables (mitigated by Bloom filters and caching)
•Cassandra and HBase lead the ecosystem — Cassandra for peer-to-peer, multi-DC; HBase for Hadoop integration and strong consistency
•Time-series fits naturally — Time-bucketed partitions, TTL, and compaction strategies make column-family stores ideal for IoT and metrics
•Operational complexity is the trade-off — Scaling requires understanding compaction, tombstones, partition sizing, and consistency tuning

What's Next:

We've now explored four modern data models: document, key-value, graph, and column-family. Each excels in specific scenarios. The final page of this module addresses the critical question: how do you choose? We'll develop a framework for model selection based on data characteristics, query patterns, scalability requirements, and organizational factors.

Page Complete

You now possess comprehensive knowledge of the column-family data model—its theoretical foundations, distributed architecture, data modeling principles, implementation systems, and optimal use cases. This understanding enables you to evaluate when column-family stores are the right tool for massive-scale, write-intensive workloads.

4 / 5

Loading learning content...

Database Management SystemDocument and Other Models

Document and Other Data Models

LevelBeginner

Duration75 mins

TopicDocument and Other Models

4 / 5

Column-Family Model

Scaling to Planetary Proportions

What You Will Learn

Understanding Column-Oriented Storage

Row-Oriented Storage (Traditional RDBMS, Document DBs):

Data is stored contiguously by row. All fields for a single record sit together on disk.

Row 1: [id=1, name="Alice", email="alice@ex.com", age=30, city="NYC"]
Row 2: [id=2, name="Bob", email="bob@ex.com", age=25, city="LA"]
Row 3: [id=3, name="Carol", email="carol@ex.com", age=35, city="NYC"]

Advantages: Excellent for transactional workloads that read/write entire records. Disadvantages: Wasteful when queries only need specific columns (must read entire rows).

Column-Oriented Storage (Analytical Databases):

Data is stored contiguously by column. All values for a single field sit together.

name:  ["Alice", "Bob", "Carol"]
email: ["alice@ex.com", "bob@ex.com", "carol@ex.com"]
age:   [30, 25, 35]
city:  ["NYC", "LA", "NYC"]

Row vs Column Storage Trade-offs
Characteristic	Row-Oriented	Column-Oriented
Single Record Read	Fast (one seek)	Slow (multiple seeks)
Column Scan (aggregate)	Slow (read all columns)	Fast (read only needed columns)
Single Record Write	Fast (append to one location)	Slow (update multiple column files)
Compression	Limited (mixed types)	Excellent (same-type values)
Best For	OLTP, transactions	OLAP, analytics

Column-Family: A Hybrid Approach:

This hybrid enables:

Efficient writes (append to column family)
Flexible schema (columns can vary per row)
Good read performance for specific column families
Horizontal scalability across nodes

Terminology Clarification

The Column-Family Data Model

The column-family model organizes data using a nested map structure. Understanding this structure is essential for effective data modeling.

Core Concepts:

Row Key (Partition Key): The primary identifier for a row. Determines data distribution across nodes.
Column Family: A container for columns, similar to a table. Column families are defined at schema time.
Column: A name-value pair within a column family. Unlike relational columns, columns can vary per row (sparse storage).
Timestamp: Each cell version is timestamped, enabling versioning and time-travel queries.

The logical structure can be visualized as a nested map:

Map<RowKey, Map<ColumnFamily, Map<Column, Map<Timestamp, Value>>>>

column-family-structure.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Logical Structure of Column-Family Data
 
{
  // Row Key: "user:alice"
  "user:alice": {
    // Column Family: "profile"
    "profile": {
      "name": {"2024-01-15T10:00:00": "Alice Chen"},
      "email": {"2024-01-15T10:00:00": "alice@example.com"},
      "status": {
        "2024-01-20T08:00:00": "active",      // Latest version
        "2024-01-01T00:00:00": "pending"      // Previous version
      }
    },
    // Column Family: "activity"
    "activity": {
      "last_login": {"2024-01-20T08:30:00": "2024-01-20T08:30:00Z"},
      "login_count": {"2024-01-20T08:30:00": "142"},
      "last_action": {"2024-01-20T09:15:00": "viewed_dashboard"}
    }
  },
 
  // Row Key: "user:bob"
  "user:bob": {
    "profile": {
      "name": {"2024-01-10T09:00:00": "Bob Smith"},
      "email": {"2024-01-10T09:00:00": "bob@example.com"},
      "phone": {"2024-01-10T09:00:00": "+1-555-123-4567"}  // Column not in Alice's row!
    },
    "activity": {
      "last_login": {"2024-01-19T14:20:00": "2024-01-19T14:20:00Z"}
      // Note: Bob has fewer activity columns than Alice
    }
  }
}

Sparse Storage:

Wide tables: Schemas with thousands of potential columns where each row uses a subset
Evolving schemas: Adding new columns doesn't affect existing rows
Storage efficiency: No space wasted on NULL markers

Column Families at Schema Time

Distributed Architecture

Partitioning Strategy:

Data is distributed based on the partition key (often the row key or part of it). Each partition is assigned to specific nodes using:

Consistent Hashing: Nodes and keys are mapped to a hash ring. Each node owns a range of the ring. This minimizes data movement when nodes join or leave.
Token Ranges: Each node is assigned token ranges. Keys hash to tokens, determining their home node.
Virtual Nodes (vnodes): Each physical node owns multiple small token ranges, improving load distribution and recovery performance.

cassandra-partitioning.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Cassandra Table with Composite Key
 
CREATE TABLE user_activity (
    user_id UUID,
    activity_date DATE,
    activity_time TIMESTAMP,
    activity_type TEXT,
    details TEXT,
    PRIMARY KEY ((user_id, activity_date), activity_time)
);
 
-- Key Structure:
-- ((user_id, activity_date)) = Partition Key (determines node)
-- activity_time = Clustering Key (determines order within partition)
 
-- This design:
-- 1. Distributes by user AND date (partitions per user per day)
-- 2. Stores activities within a day sorted by time
-- 3. Enables efficient range queries within a partition
 
-- Efficient: Get all activities for a user on a specific day
SELECT * FROM user_activity 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
  AND activity_date = '2024-01-15';
 
-- Efficient: Get activities in a time range within a day
SELECT * FROM user_activity 
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000 
  AND activity_date = '2024-01-15'
  AND activity_time >= '2024-01-15 09:00:00'
  AND activity_time < '2024-01-15 17:00:00';
 
-- INEFFICIENT: Query without partition key (full cluster scan)
-- SELECT * FROM user_activity WHERE activity_type = 'login';

Replication:

Data durability is achieved through replication across multiple nodes. Key concepts:

Replication Factor (RF): Number of copies of each partition (typically 3)
Replication Strategy: How replicas are placed (SimpleStrategy for dev, NetworkTopologyStrategy for production across data centers)
Rack/DC Awareness: Spread replicas across failure domains (racks, data centers)

Consistency Levels in Cassandra
Level	Writes	Reads	Use Case
ONE	Ack from 1 replica	Read from 1 replica	Fastest; eventual consistency; suitable for logs
QUORUM	Ack from majority (RF/2 + 1)	Read from majority	Strong consistency with availability
ALL	Ack from all replicas	Read from all replicas	Strongest; any node failure blocks operation
LOCAL_QUORUM	Quorum in local DC	Quorum in local DC	Low latency with cross-DC replication
EACH_QUORUM	Quorum in each DC	N/A	Strong consistency across data centers

Write Path:

Client sends write to any node (coordinator)
Coordinator determines replica nodes via partition key
Write is sent to all replicas simultaneously
Coordinator waits for required acknowledgments (based on consistency level)
Success returned to client

Locally, writes go to:

Commit Log: Append-only, durable log for crash recovery
Memtable: In-memory sorted structure for recent writes
When memtable fills, it's flushed to SSTable on disk

Partition Key Design is Critical

Data Modeling Principles

The Golden Rule:

Design your tables based on your queries, not your entities.

In relational databases, you model entities (Users, Orders, Products) and join them at query time. In column-family stores, you model query patterns and denormalize data to support them.

Example: E-commerce Order System

Queries needed:

Get all orders for a user
Get a specific order by ID
Get all orders for a product
Get daily order statistics

query-driven-modeling.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- Query-Driven Data Modeling in Cassandra
 
-- Query 1: Get all orders for a user (recent first)
CREATE TABLE orders_by_user (
    user_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    total DECIMAL,
    status TEXT,
    items LIST<FROZEN<order_item>>,
    PRIMARY KEY (user_id, order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);
 
-- Query 2: Get a specific order by ID
CREATE TABLE orders_by_id (
    order_id UUID PRIMARY KEY,
    user_id UUID,
    order_date TIMESTAMP,
    total DECIMAL,
    status TEXT,
    items LIST<FROZEN<order_item>>,
    shipping_address FROZEN<address>,
    billing_address FROZEN<address>
);
 
-- Query 3: Get all orders for a product
CREATE TABLE orders_by_product (
    product_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    user_id UUID,
    quantity INT,
    price DECIMAL,
    PRIMARY KEY (product_id, order_date, order_id)
) WITH CLUSTERING ORDER BY (order_date DESC, order_id ASC);
 
-- Query 4: Daily order statistics
CREATE TABLE daily_order_stats (
    stat_date DATE,
    hour INT,
    order_count COUNTER,
    total_revenue COUNTER,  -- Store as cents for counter
    PRIMARY KEY (stat_date, hour)
);
 
-- Custom types for nested data
CREATE TYPE order_item (
    product_id UUID,
    product_name TEXT,
    quantity INT,
    price DECIMAL
);
 
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip TEXT,
    country TEXT
);

Notice the denormalization: The same order data exists in multiple tables. This violates relational normalization principles but is the correct approach for column-family stores because:

No JOINs: Column-family stores don't support joins at query time
Partition locality: Each query accesses a single partition
Write performance: Writes are cheap; reads are optimized

The application is responsible for maintaining consistency across denormalized tables—typically through batch writes or eventual consistency.

Data Modeling Best Practices

•Start with queries — List all queries your application needs, then design tables to support each
•One table per query pattern — Embrace denormalization; maintain multiple views of the same data
•Partition for distribution — Choose partition keys that spread data evenly across nodes
•Cluster for sorting — Use clustering columns to store related data in sorted order for efficient range scans
•Limit partition size — Keep partitions under ~100MB; use time bucketing for time-series data
•Embed or reference carefully — Embed small, bounded data; reference large, growing data
•Use counters for aggregates — Counter tables enable distributed increment operations

The Chebotko Diagram

Write and Read Patterns

Column-family stores are optimized for write-heavy workloads with specific read patterns. Understanding the internal mechanics explains their performance characteristics.

The LSM Tree Architecture:

Most column-family stores use Log-Structured Merge (LSM) trees for storage:

Writes go to memory first (Memtable): Very fast, sorted in-memory structure
Memtables flush to disk (SSTable): Immutable, sorted files
SSTables are periodically merged (Compaction): Removes duplicates, reclaims space

This architecture makes writes extremely fast (memory + sequential I/O) at the cost of read overhead (may need to check multiple SSTables).

lsm-tree-flow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# LSM Tree Write Path
 
Client Write
    │
    ▼
┌─────────────┐     ┌─────────────┐
│ Commit Log  │     │  Memtable   │
│ (Append)    │     │ (In-Memory) │
└──────┬──────┘     └──────┬──────┘
       │                   │
       │    Durability     │  When full
       │                   ▼
       │           ┌─────────────┐
       │           │  SSTable 1  │
       │           │ (Immutable) │
       │           └──────┬──────┘
       │                  │
       │           ┌─────────────┐
       │           │  SSTable 2  │
       │           └──────┬──────┘
       │                  │
       │           ┌──────▼──────┐
       │           │ Compaction  │
       │           │  (Merge)    │
       │           └─────────────┘
 
 
# LSM Tree Read Path
 
Client Read
    │
    ▼
┌─────────────┐
│  Memtable   │ ◄── Check first (most recent)
└──────┬──────┘
       │ Not found
       ▼
┌─────────────┐
│ Bloom Filter│ ◄── Quick "definitely not here" check
└──────┬──────┘
       │ Maybe here
       ▼
┌─────────────┐
│SSTable Index│ ◄── Locate potential block
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Data Block  │ ◄── Read actual data
└─────────────┘

Read Optimization Strategies:

Since reads may span multiple SSTables, column-family stores employ several optimization techniques:

Bloom Filters: Probabilistic data structures that quickly determine if a key is definitely not in an SSTable, avoiding unnecessary disk reads.
Key Cache: Caches SSTable index positions for frequently accessed partition keys.
Row Cache: Caches complete rows for hot data (use carefully—can cause memory pressure).
Compression: SSTables are compressed, reducing I/O at the cost of CPU.
Compaction Strategies: Different strategies optimize for different workloads.

Compaction Strategies
Strategy	Behavior	Best For
SizeTiered (STCS)	Merge SSTables of similar size	Write-heavy workloads; default strategy
Leveled (LCS)	Organize SSTables into levels; tighter control	Read-heavy workloads; more predictable latencies
TimeWindow (TWCS)	Compact time-bucketed SSTables	Time-series data; efficient expiration
Unified (UCS)	Adaptive; combines STCS and LCS benefits	Mixed workloads; newer Scylla/Cassandra

Updates Are Inserts

Apache Cassandra Deep Dive

Key Architectural Features:

Peer-to-peer: No master node; any node can serve any request
Tunable consistency: Choose consistency level per operation
Linear scalability: Add nodes to increase capacity/throughput
Multi-datacenter replication: Built-in cross-DC support
CQL: SQL-like query language (with limitations)

cassandra-operations.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- Cassandra CQL Examples
 
-- Create keyspace (database) with replication strategy
CREATE KEYSPACE ecommerce
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,    -- 3 replicas in dc1
    'dc2': 2     -- 2 replicas in dc2
};
 
USE ecommerce;
 
-- Create table with composite primary key
CREATE TABLE products (
    category TEXT,
    product_id UUID,
    name TEXT,
    price DECIMAL,
    description TEXT,
    attributes MAP<TEXT, TEXT>,
    tags SET<TEXT>,
    created_at TIMESTAMP,
    PRIMARY KEY (category, product_id)
);
 
-- Insert with collection types
INSERT INTO products (category, product_id, name, price, attributes, tags, created_at)
VALUES (
    'electronics',
    uuid(),
    'Wireless Headphones',
    99.99,
    {'brand': 'AudioMax', 'color': 'black', 'bluetooth': '5.0'},
    {'wireless', 'audio', 'bestseller'},
    toTimestamp(now())
);
 
-- Query within partition (efficient)
SELECT * FROM products WHERE category = 'electronics';
 
-- Query with clustering key range (efficient)
SELECT * FROM products 
WHERE category = 'electronics' 
  AND product_id >= minTimeuuid('2024-01-01')
  AND product_id < minTimeuuid('2024-02-01');
 
-- Update collection
UPDATE products 
SET tags = tags + {'new-arrival'}
WHERE category = 'electronics' AND product_id = ?;
 
-- Delete with TTL (data expires automatically)
INSERT INTO products (category, product_id, name, price, created_at)
VALUES ('flash-sale', uuid(), 'Limited Offer Item', 49.99, toTimestamp(now()))
USING TTL 86400;  -- Expires in 24 hours
 
-- Lightweight transactions (LWT) for compare-and-set
UPDATE products 
SET price = 89.99 
WHERE category = 'electronics' AND product_id = ?
IF price = 99.99;  -- Only update if current price is 99.99

Cassandra Strengths

•Linear horizontal scalability
•No single point of failure
•Built-in multi-DC replication
•Tunable consistency levels
•Excellent write performance
•Proven at massive scale (Apple, Netflix)

Cassandra Limitations

•No JOINs or subqueries
•Limited aggregation capabilities
•Denormalization required
•Eventual consistency by default
•Schema changes can be complex
•Operational complexity at scale

ScyllaDB: Cassandra-Compatible, Faster

Apache HBase and the Hadoop Ecosystem

HBase Architecture:

HMaster: Coordinates region assignment, schema changes, load balancing
RegionServers: Serve read/write requests for assigned regions
Regions: Horizontal partitions of tables by row key range
HDFS: Underlying distributed storage (write-ahead logs, HFiles)
ZooKeeper: Coordination, master election, configuration

This architecture provides strong consistency (unlike Cassandra's eventual consistency) but introduces the HMaster as a potential bottleneck.

Cassandra vs HBase Comparison
Aspect	Cassandra	HBase
Architecture	Peer-to-peer, masterless	Master-slave (HMaster + RegionServers)
Consistency	Tunable (eventual to strong)	Strong (single-row ACID)
Storage	Local storage on each node	HDFS (shared storage)
Query Language	CQL (SQL-like)	Java API, shell, Phoenix (SQL layer)
Hadoop Integration	Spark connector	Native (runs on Hadoop)
Multi-DC	Built-in	Add-on (HBase replication)
Best For	Write-heavy, geo-distributed	Hadoop ecosystem, batch + random access

hbase-operations.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// HBase Java API Examples
 
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
 
// Connect to HBase
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
TableName tableName = TableName.valueOf("users");
 
// Create table with column families
Admin admin = connection.getAdmin();
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
tableDesc.addFamily(new HColumnDescriptor("profile"));
tableDesc.addFamily(new HColumnDescriptor("activity"));
admin.createTable(tableDesc);
 
// Get table reference
Table table = connection.getTable(tableName);
 
// Put (Insert/Update)
Put put = new Put(Bytes.toBytes("user:alice"));
put.addColumn(
    Bytes.toBytes("profile"),      // Column family
    Bytes.toBytes("name"),          // Column qualifier
    Bytes.toBytes("Alice Chen")     // Value
);
put.addColumn(
    Bytes.toBytes("profile"),
    Bytes.toBytes("email"),
    Bytes.toBytes("alice@example.com")
);
put.addColumn(
    Bytes.toBytes("activity"),
    Bytes.toBytes("last_login"),
    Bytes.toBytes("2024-01-20T08:30:00Z")
);
table.put(put);
 
// Get (Read single row)
Get get = new Get(Bytes.toBytes("user:alice"));
get.addFamily(Bytes.toBytes("profile"));  // Only profile family
Result result = table.get(get);
byte[] name = result.getValue(
    Bytes.toBytes("profile"), 
    Bytes.toBytes("name")
);
System.out.println("Name: " + Bytes.toString(name));
 
// Scan (Range query)
Scan scan = new Scan()
    .withStartRow(Bytes.toBytes("user:a"))
    .withStopRow(Bytes.toBytes("user:z"))
    .addFamily(Bytes.toBytes("profile"));
 
ResultScanner scanner = table.getScanner(scan);
for (Result row : scanner) {
    String rowKey = Bytes.toString(row.getRow());
    String userName = Bytes.toString(
        row.getValue(Bytes.toBytes("profile"), Bytes.toBytes("name"))
    );
    System.out.println(rowKey + ": " + userName);
}
scanner.close();

When to Choose HBase:

You're already invested in the Hadoop ecosystem
You need strong consistency for single-row operations
You require tight integration with Spark, MapReduce, or Hive
Your workload involves both batch analytics and random access
You can tolerate the operational complexity of HMaster + ZooKeeper

Apache Phoenix

Time-Series and IoT Workloads

Time-Series Modeling Patterns:

time-series-modeling.cql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Time-Series Data Modeling in Cassandra
 
-- Pattern 1: Time-Bucketed Partitions
-- Partition by sensor + time bucket; order by timestamp within bucket
 
CREATE TABLE sensor_readings (
    sensor_id UUID,
    bucket TIMESTAMP,           -- Hourly bucket
    reading_time TIMESTAMP,
    temperature FLOAT,
    humidity FLOAT,
    pressure FLOAT,
    PRIMARY KEY ((sensor_id, bucket), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
  AND default_time_to_live = 2592000  -- 30 days TTL
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_unit': 'DAYS',
                    'compaction_window_size': 1};
 
-- Writes: Application calculates bucket from reading time
-- E.g., bucket = reading_time rounded down to hour
 
INSERT INTO sensor_readings (sensor_id, bucket, reading_time, temperature, humidity, pressure)
VALUES (
    123e4567-e89b-12d3-a456-426614174000,
    '2024-01-15 10:00:00',      -- Bucket
    '2024-01-15 10:23:45',      -- Exact time
    22.5, 45.0, 1013.25
);
 
-- Efficient: Get last hour of readings for a sensor
SELECT * FROM sensor_readings 
WHERE sensor_id = 123e4567-e89b-12d3-a456-426614174000
  AND bucket = '2024-01-15 10:00:00'
ORDER BY reading_time DESC
LIMIT 100;
 
-- Aggregate materialized view for dashboards
CREATE TABLE sensor_hourly_stats (
    sensor_id UUID,
    hour TIMESTAMP,
    min_temperature FLOAT,
    max_temperature FLOAT,
    avg_temperature FLOAT,
    reading_count INT,
    PRIMARY KEY (sensor_id, hour)
) WITH CLUSTERING ORDER BY (hour DESC);
 
-- Pattern 2: Rollup Tables for Different Granularities
-- Minute -> Hour -> Day -> Month hierarchies
-- Application or batch job populates rollups

Time-Series Modeling Strategies
Strategy	Partition Key	Pros	Cons
Single partition per entity	sensor_id	Simple queries	Unbounded partition growth; hot spots
Time-bucketed	(sensor_id, bucket)	Bounded partitions; time-range efficient	Cross-bucket queries need multiple fetches
Reverse timestamp	sensor_id, -timestamp	Latest data first	Complex timestamp handling
Sharded by time	(sensor_id, shard)	Spread writes	More complex routing logic

IoT at Scale:

IoT deployments generate massive write volumes:

1 million sensors × 1 reading/second = 86.4 billion readings/day
Each reading ~100 bytes = 8.64 TB/day raw data

Column-family stores handle this by:

Distributing writes across many nodes
Using TTL to automatically expire old data
Compaction strategies optimized for time-windowed data (TWCS)
Pre-aggregating at ingest time for dashboard queries

Purpose-Built Time-Series Databases

Summary: Column-Family Model Essentials

Key Takeaways

•Column-family combines row and column organization — Data is grouped by column families, enabling both row access and column scans
•Sparse storage enables flexible schemas — Columns vary per row without NULL overhead; perfect for wide, evolving schemas
•Distributed architecture is foundational — Consistent hashing, replication, and tunable consistency enable petabyte-scale deployments
•Model for queries, not entities — Denormalization is required; design tables around access patterns, not relational normalization
•LSM trees optimize for writes — Writes are fast (memory + append); reads require checking multiple SSTables (mitigated by Bloom filters and caching)
•Cassandra and HBase lead the ecosystem — Cassandra for peer-to-peer, multi-DC; HBase for Hadoop integration and strong consistency
•Time-series fits naturally — Time-bucketed partitions, TTL, and compaction strategies make column-family stores ideal for IoT and metrics
•Operational complexity is the trade-off — Scaling requires understanding compaction, tombstones, partition sizing, and consistency tuning

What's Next:

Page Complete

4 / 5