Database Management SystemsColumn-Family Databases

Column-Family Databases

LevelIntermediate

Duration75 mins

TopicColumn-Family Databases

3 / 5

Cassandra Example

Apache Cassandra: The Industry Standard

Apache Cassandra is the most widely deployed wide-column store in production environments worldwide. Originally developed at Facebook for inbox search and released as open source in 2008, Cassandra has become the go-to choice for applications requiring:

Massive Write Throughput: Thousands to millions of writes per second
Linear Scalability: Add nodes, get proportional capacity increase
Multi-Datacenter Replication: Global applications with local latency
High Availability: No single point of failure, always writable

Companies like Netflix, Apple, Instagram, Spotify, and Uber run Cassandra clusters with thousands of nodes, handling petabytes of data. Understanding Cassandra isn't just academic—it's preparation for real-world distributed system challenges.

This page moves from theory to practice. We'll design schemas, write queries, and explore the operational aspects that distinguish hobby projects from production systems.

What You Will Learn

By the end of this page, you'll understand CQL (Cassandra Query Language) deeply, be able to design Cassandra schemas for common use cases, understand primary key design and its implications, know how to model data for query patterns, and understand operational considerations like compaction and repair.

CQL Fundamentals

CQL (Cassandra Query Language) looks superficially like SQL but has fundamentally different semantics. Understanding these differences is crucial for effective Cassandra development.

Keyspaces: The Top-Level Container

A keyspace is Cassandra's equivalent of a database schema. It defines:

Replication strategy and factor
Durable writes configuration

Keyspace Creation

CQL

-- Development keyspace (single datacenter, low replication)
CREATE KEYSPACE IF NOT EXISTS dev_app
WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};
 
-- Production keyspace (multi-datacenter, high availability)
CREATE KEYSPACE IF NOT EXISTS prod_app
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us-east-1': 3,      -- 3 replicas in US East
    'eu-west-1': 3,      -- 3 replicas in EU West
    'ap-south-1': 2      -- 2 replicas in Asia Pacific
}
AND DURABLE_WRITES = true;  -- Commit log enabled (default)
 
-- Switch to keyspace
USE prod_app;
 
-- View keyspace details
DESCRIBE KEYSPACE prod_app;

Tables: Defined by Access Patterns

Unlike relational tables that model entities, Cassandra tables model query patterns. Each table is designed to answer specific queries efficiently.

Key Concepts:

Term	Description
Partition Key	Determines data distribution; all data with same partition key is on same node
Clustering Columns	Sort data within a partition; enable range queries within partition
Primary Key	Combination of partition key + clustering columns; unique identifier
Static Columns	Shared across all rows in a partition
Regular Columns	Standard data columns

Table Definition with Primary Key Types

CQL

-- Simple primary key: single column partition key
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    name TEXT,
    created_at TIMESTAMP
);
 
-- Compound primary key: partition key + clustering column
-- Partition key: user_id
-- Clustering column: post_time (orders data within partition)
CREATE TABLE user_posts (
    user_id UUID,
    post_time TIMESTAMP,
    post_id UUID,
    content TEXT,
    PRIMARY KEY (user_id, post_time)
) WITH CLUSTERING ORDER BY (post_time DESC);  -- Most recent first
 
-- Composite partition key: multiple columns form partition key
-- Partition key: (tenant_id, user_id) -- both required for lookups
-- Useful for multi-tenant applications
CREATE TABLE tenant_user_events (
    tenant_id UUID,
    user_id UUID,
    event_time TIMESTAMP,
    event_type TEXT,
    event_data TEXT,
    PRIMARY KEY ((tenant_id, user_id), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
 
-- Multiple clustering columns: hierarchical sorting
CREATE TABLE sensor_readings (
    sensor_id UUID,
    year INT,
    month INT,
    day INT,
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, year, month, day, reading_time)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, reading_time DESC);

Primary Key Is Immutable

Primary key columns cannot be updated after insertion. If you need to 'change' a primary key value, you must delete the old row and insert a new one. This is by design—the primary key determines physical location, and changing it would require moving data.

Data Types and Collections

Cassandra supports rich data types including native types, collections, and user-defined types.

Native Types

Type	Description	Example
UUID	Universally unique identifier	`550e8400-e29b-41d4-a716-446655440000`
TIMEUUID	Time-ordered UUID (v1)	`50554d6e-29bb-11e5-b345-feff819cdc9f`
TEXT/VARCHAR	UTF-8 string	`'Hello World'`
INT/BIGINT	32/64-bit integer	`42`, `9223372036854775807`
DECIMAL	Variable precision	`3.14159265359`
DOUBLE	64-bit floating point	`3.14159`
BOOLEAN	true/false	`true`
TIMESTAMP	Date+time in milliseconds	`'2024-01-15T10:30:00+0000'`
DATE	Date only	`'2024-01-15'`
TIME	Time only (nanoseconds)	`'10:30:00.000000000'`
BLOB	Binary data	`0xCAFEBABE`
INET	IP address	`'192.168.1.1'`

Collection Types

CQL

-- LIST: Ordered collection, allows duplicates
-- Use for: Order history, comment threads
CREATE TABLE articles (
    article_id UUID PRIMARY KEY,
    title TEXT,
    tags LIST<TEXT>,              -- ['tech', 'database', 'cassandra']
    related_ids LIST<UUID>
);
 
-- Update list
UPDATE articles SET tags = tags + ['nosql'] WHERE article_id = ?;
UPDATE articles SET tags = ['new_tag'] + tags WHERE article_id = ?;  -- Prepend
UPDATE articles SET tags[0] = 'updated' WHERE article_id = ?;        -- By index
 
-- SET: Unordered collection, no duplicates
-- Use for: Unique tags, roles, permissions
CREATE TABLE user_settings (
    user_id UUID PRIMARY KEY,
    roles SET<TEXT>,              -- {'admin', 'user', 'moderator'}
    favorite_categories SET<TEXT>
);
 
-- Update set
UPDATE user_settings SET roles = roles + {'editor'} WHERE user_id = ?;
UPDATE user_settings SET roles = roles - {'user'} WHERE user_id = ?;
 
-- MAP: Key-value pairs
-- Use for: Attributes, metadata, preferences
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name TEXT,
    attributes MAP<TEXT, TEXT>,   -- {'color': 'blue', 'size': 'XL'}
    prices MAP<TEXT, DECIMAL>     -- {'USD': 99.99, 'EUR': 89.99}
);
 
-- Update map
UPDATE products SET attributes['weight'] = '500g' WHERE product_id = ?;
UPDATE products SET attributes = attributes + {'material': 'cotton'} WHERE product_id = ?;

User-Defined Types (UDTs)

For complex nested structures, define custom types:

User-Defined Types

CQL

-- Create UDT for address
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);
 
-- Create UDT for contact info
CREATE TYPE contact_info (
    email TEXT,
    phone TEXT,
    address FROZEN<address>  -- Nested UDT must be frozen
);
 
-- Use UDT in table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    billing_address FROZEN<address>,
    shipping_address FROZEN<address>,
    contact FROZEN<contact_info>
);
 
-- Insert with UDT
INSERT INTO customers (customer_id, name, billing_address)
VALUES (
    uuid(),
    'John Doe',
    {
        street: '123 Main St',
        city: 'New York',
        state: 'NY',
        zip_code: '10001',
        country: 'USA'
    }
);
 
-- Query UDT fields
SELECT name, billing_address.city FROM customers WHERE customer_id = ?;

FROZEN Collections and UDTs

The FROZEN keyword treats the entire collection/UDT as a blob—you must replace it entirely to update. Non-frozen collections allow element-level updates but have size limits and performance implications. Use FROZEN for UDTs in primary keys and when you typically update the whole structure.

Query Design Patterns

Cassandra queries are constrained by primary key structure. Understanding what queries are efficient—and which are impossible—is essential for schema design.

The Golden Rules of Cassandra Queries

Partition key columns must be equality (=)
Clustering columns can use equality or range, but in order
Non-primary-key columns cannot be filtered without ALLOW FILTERING (dangerous)
ORDER BY must match clustering column order and direction

Query Capabilities and Limitations

CQL

-- Table for examples
CREATE TABLE orders (
    customer_id UUID,
    order_date DATE,
    order_time TIMESTAMP,
    order_id UUID,
    total DECIMAL,
    status TEXT,
    PRIMARY KEY (customer_id, order_date, order_time)
) WITH CLUSTERING ORDER BY (order_date DESC, order_time DESC);
 
-- ✅ EFFICIENT: Full partition key with equality
SELECT * FROM orders WHERE customer_id = ?;
 
-- ✅ EFFICIENT: Partition key + clustering column range
SELECT * FROM orders 
WHERE customer_id = ?
AND order_date >= '2024-01-01' 
AND order_date <= '2024-01-31';
 
-- ✅ EFFICIENT: Partition key + prefix of clustering columns
SELECT * FROM orders
WHERE customer_id = ?
AND order_date = '2024-01-15';
 
-- ❌ INEFFICIENT: Missing partition key (full cluster scan)
SELECT * FROM orders WHERE order_date = '2024-01-15';
-- Error: Cannot execute this query... ALLOW FILTERING
 
-- ❌ INEFFICIENT: Skipping clustering column
SELECT * FROM orders
WHERE customer_id = ?
AND order_time > '2024-01-15T10:00:00';  -- Skipped order_date!
-- Error: PRIMARY KEY column "order_date" cannot be restricted
 
-- ⚠️ DANGEROUS: Non-key column filter
SELECT * FROM orders
WHERE customer_id = ?
AND status = 'pending'
ALLOW FILTERING;  -- Scans all rows for this customer!
 
-- ✅ EFFICIENT: Ordering matches clustering order
SELECT * FROM orders
WHERE customer_id = ?
ORDER BY order_date DESC, order_time DESC;
 
-- ❌ INVALID: Ordering conflicts with clustering
SELECT * FROM orders
WHERE customer_id = ?
ORDER BY order_date ASC;  -- Table ordered DESC!
-- Error: Order by currently only supports the ordering specified at table definition

The ALLOW FILTERING Trap

ALLOW FILTERING enables inefficient queries but is almost never acceptable in production:

Full partition scan for each query
Performance degrades with partition size
Timeout risk under load
Coordinator memory pressure

When you need ALLOW FILTERING, redesign your schema instead. Create a new table optimized for that query pattern.

ALLOW FILTERING Is a Code Smell

If your code contains ALLOW FILTERING, you almost certainly have a schema design problem. The rare exceptions are: small tables (hundreds of rows), one-time admin queries, or analytics on replicas. Never use it for user-facing production queries.

Practical Query Patterns

Pattern 1: Time-Series Partitioning

Time-Series Query Patterns

CQL

-- Problem: IoT sensor data with millions of readings
-- Query: Get recent readings for a sensor
 
-- Bad design: All readings in one partition
CREATE TABLE sensor_data_bad (
    sensor_id UUID,
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, reading_time)
);
-- Problem: Partition grows unbounded!
 
-- Good design: Time-bucketed partitions
CREATE TABLE sensor_data (
    sensor_id UUID,
    day DATE,                      -- Bucket by day
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY ((sensor_id, day), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
 
-- Query today's readings for a sensor
SELECT * FROM sensor_data
WHERE sensor_id = ? AND day = toDate(now())
LIMIT 100;
 
-- Query last 7 days (requires 7 queries, one per partition)
-- Application-side: Loop over date range
-- Or use: IN clause (limited to small sets)
SELECT * FROM sensor_data
WHERE sensor_id = ?
AND day IN ('2024-01-15', '2024-01-14', '2024-01-13', ...)
LIMIT 100;

Schema Design Methodology

Cassandra schema design follows a disciplined process that starts with queries, not entities. This is fundamentally different from relational design.

The Schema Design Process

Step 1: Define Your Queries

Before writing any schema, enumerate all queries your application needs:

Q1: Get user profile by user_id
Q2: Get all posts by a user, ordered by time (most recent first)
Q3: Get all comments on a post, ordered by time
Q4: Get user's timeline (posts from users they follow)
Q5: Get trending posts in a category

Step 2: Design Tables for Each Query

Each query typically becomes one table. Yes, this means denormalization. Embrace it.

Step 3: Analyze Access Patterns

For each table:

Partition sizing (avoid unbounded growth)
Read/write ratio
Consistency requirements
TTL needs

Query-Driven Schema Design Example

CQL

-- Q1: Get user profile by user_id
-- Simple key-value lookup
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT,
    bio TEXT,
    profile_image_url TEXT,
    created_at TIMESTAMP
);
 
-- Q2: Get all posts by a user, ordered by time
-- Partition by user, cluster by time
CREATE TABLE posts_by_user (
    user_id UUID,
    post_id TIMEUUID,
    content TEXT,
    media_urls SET<TEXT>,
    like_count COUNTER,     -- Counter for likes
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);
 
-- Q3: Get all comments on a post, ordered by time
-- Partition by post, cluster by time
CREATE TABLE comments_by_post (
    post_id TIMEUUID,
    comment_id TIMEUUID,
    user_id UUID,
    content TEXT,
    PRIMARY KEY (post_id, comment_id)
) WITH CLUSTERING ORDER BY (comment_id ASC);  -- Oldest first
 
-- Q4: Get user's timeline
-- Denormalized timeline per user (fan-out on write)
CREATE TABLE timeline_by_user (
    user_id UUID,
    post_id TIMEUUID,
    author_id UUID,
    author_username TEXT,        -- Denormalized! Avoids join
    content_preview TEXT,        -- First 200 chars
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC)
AND default_time_to_live = 2592000;  -- 30 days TTL
 
-- Q5: Get trending posts in a category
-- Bucketed by time window + category
CREATE TABLE trending_posts (
    time_bucket TEXT,            -- '2024-01-15T10' (hourly bucket)
    category TEXT,
    score DECIMAL,               -- Trending score
    post_id TIMEUUID,
    author_username TEXT,
    content_preview TEXT,
    PRIMARY KEY ((time_bucket, category), score, post_id)
) WITH CLUSTERING ORDER BY (score DESC, post_id DESC);

Handling Denormalization Updates

When data is denormalized across tables, updates require touching multiple tables. Strategies:

1. Logged Batches (Same Partition)

Batch Updates

CQL

-- Logged batch: Atomic within same partition
-- Use when: Updates affect same partition
BEGIN BATCH
    INSERT INTO user_posts (user_id, post_id, content) 
    VALUES (?, ?, ?);
    
    UPDATE users SET post_count = post_count + 1 
    WHERE user_id = ?;
APPLY BATCH;
 
-- UNLOGGED batch: No atomicity, slightly faster
-- Use when: Updates affect different partitions but 
-- you want them sent together (reduces round trips)
BEGIN UNLOGGED BATCH
    INSERT INTO posts_by_user (user_id, post_id, content)
    VALUES (?, ?, ?);
    
    INSERT INTO posts_by_category (category, post_id, user_id, content)
    VALUES (?, ?, ?, ?);
APPLY BATCH;

2. Eventual Consistency with Async Writers

For high-throughput scenarios, accept that denormalized views may be slightly behind:

User creates post:
1. Write to posts_by_user (synchronous, visible immediately)
2. Queue message to update timelines, trending, etc.
3. Background workers fan-out updates (eventual)

3. Materialized Views (Use Cautiously)

Cassandra supports materialized views that automatically sync, but they have limitations:

Can only project partition key to new table
Performance overhead
Consistency guarantees are weaker than expected

Production recommendation: Prefer explicit denormalization over materialized views. It's more predictable and debuggable.

Designing for Growth

Always estimate partition sizes. A partition should typically be under 100MB. If you expect unbounded growth (like all user posts forever), bucket by time. Calculate: rows × row_size × expected_growth. If it exceeds limits, add bucketing to partition key.

Lightweight Transactions (LWT)

Cassandra's default eventually-consistent model doesn't guarantee strict ordering of operations. Lightweight Transactions (LWT) provide linearizable consistency—at a cost.

When LWT Is Needed

Unique constraints: Email must not already exist
Conditional updates: Only update if current value matches
Compare-and-set: Optimistic locking patterns

LWT Syntax

Lightweight Transaction Examples

CQL

-- IF NOT EXISTS: Insert only if row doesn't exist
-- Use for: User registration (unique email)
INSERT INTO users (user_id, email, username)
VALUES (uuid(), 'alice@example.com', 'alice')
IF NOT EXISTS;
 
-- Returns: [applied] = true/false
 
-- Result handling:
-- [applied] | user_id   | email              | username
-- true      | null      | null               | null       <- Success
-- false     | abc-123   | alice@example.com  | alice      <- Already exists
 
-- IF condition: Update only if condition matches
-- Use for: Optimistic locking, state transitions
UPDATE orders
SET status = 'shipped'
WHERE order_id = ?
IF status = 'processing';  -- Only if currently 'processing'
 
-- IF EXISTS: Update only if row exists
UPDATE users
SET email = 'newemail@example.com'
WHERE user_id = ?
IF EXISTS;
 
-- Multiple conditions
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = ?
IF quantity > 0  -- Don't go negative
AND reserved = false;  -- Not reserved
 
-- Delete with condition
DELETE FROM sessions
WHERE user_id = ?
IF session_token = ?;  -- Validate token before delete

The Cost of LWT

LWT uses the Paxos consensus protocol, which is expensive:

Aspect	Regular Write	LWT Write
Round Trips	1	4 (prepare, promise, propose, commit)
Latency	~2-5ms	~20-50ms
Throughput	High	Low (contention-limited)
Lock Scope	None	Partition-level

Guidelines for LWT:

Minimize Usage: Use only where truly necessary
Avoid Hot Partitions: LWT on high-contention partitions creates bottlenecks
Batch LWT Carefully: Batches with LWT must affect same partition
Consider Alternatives: Sometimes eventual consistency + conflict resolution is better

LWT Partition Contention

Multiple concurrent LWT operations on the same partition serialize, creating contention. A high-traffic registration flow where all users go through one email uniqueness check (impossible if emails are partition keys) would be fine. But if your LWT involves a counter-like structure, you'll create a severe bottleneck.

Secondary Indexes and Alternatives

Native secondary indexes in Cassandra have severe limitations. Understanding these limitations—and alternatives—is crucial.

Native Secondary Indexes (2i)

Cassandra supports creating indexes on non-primary-key columns:

Secondary Index Creation

CQL

-- Create secondary index
CREATE INDEX users_email_idx ON users (email);
 
-- Now this query works without ALLOW FILTERING
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- Index on collection column
CREATE INDEX articles_tags_idx ON articles (tags);
 
-- Query collection index
SELECT * FROM articles WHERE tags CONTAINS 'cassandra';
 
-- Index on map keys or values
CREATE INDEX product_attrs_keys_idx ON products (KEYS(attributes));
CREATE INDEX product_attrs_values_idx ON products (VALUES(attributes));
CREATE INDEX product_attrs_entries_idx ON products (ENTRIES(attributes));

Why Secondary Indexes Are Problematic

1. Distributed Index Architecture

Unlike relational B-tree indexes, Cassandra's secondary indexes are local to each node. A query by indexed column must:

Scatter to all nodes in the cluster
Each node searches its local index
Gather results

This is essentially a full cluster scan.

2. Anti-Patterns

Scenario	Problem
High-cardinality columns	Index becomes as large as data
Low-cardinality columns	Few values → hot spots
Frequently updated columns	Index maintenance overhead
Large result sets	Memory pressure on coordinator

3. When Secondary Indexes Are Acceptable

Low-cardinality with partition key: WHERE user_id = ? AND status = 'active'
Infrequent, non-critical queries
Small clusters with low latency requirements

Superior Alternatives

1. Denormalized Lookup Tables

Create a table with the query column as partition key:

Lookup Table Pattern

CQL

-- Primary table
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    username TEXT
);
 
-- Lookup table: email → user_id
CREATE TABLE users_by_email (
    email TEXT PRIMARY KEY,
    user_id UUID
);
 
-- Insert requires both tables (batch or async)
BEGIN BATCH
    INSERT INTO users (user_id, email, username) 
    VALUES (?, ?, ?);
    INSERT INTO users_by_email (email, user_id) 
    VALUES (?, ?);
APPLY BATCH;
 
-- Query: Find user by email
-- Step 1: Get user_id from lookup table
SELECT user_id FROM users_by_email WHERE email = ?;
-- Step 2: Get full user record
SELECT * FROM users WHERE user_id = ?;

2. SASI Indexes (Experimental)

SASI (SSTable Attached Secondary Index) provides better performance for certain use cases:

Prefix searches on text
Range queries on numbers
Still local indexes, but more efficient

3. External Search Systems

For complex search requirements:

Elasticsearch/OpenSearch for full-text search
Apache Solr via DataStax Search
Keep Cassandra for transactional data, sync to search system

Index Strategy Recommendation

Default to denormalized lookup tables. They're explicit, predictable, and scale. Use secondary indexes only when: queries are rare, result sets are small, and the indexed column has medium cardinality. For search use cases, integrate Elasticsearch.

Operational Considerations

Running Cassandra in production requires understanding key operational aspects.

Compaction Strategies

Compaction merges SSTables and removes obsolete data. Strategy choice impacts read/write performance:

Strategy	Best For	Trade-offs
SizeTieredCompactionStrategy (STCS)	Write-heavy, mixed workloads	Higher space amplification
LeveledCompactionStrategy (LCS)	Read-heavy, consistent latency	Higher write amplification
TimeWindowCompactionStrategy (TWCS)	Time-series with TTL	Efficient TTL, bucketed compaction

Compaction Configuration

CQL

-- Create table with specific compaction strategy
CREATE TABLE metrics (
    sensor_id UUID,
    hour TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, hour)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'HOURS',
    'compaction_window_size': 1
}
AND default_time_to_live = 604800;  -- 7 days TTL
 
-- Change compaction strategy (online migration)
ALTER TABLE users WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};
 
-- Check compaction status
SELECT * FROM system.compaction_history;

Repair and Consistency Maintenance

Regular repair ensures replicas stay in sync:

# Full repair (Merkle tree comparison)
nodetool repair -full keyspace_name

# Incremental repair (only unrepaired data)
nodetool repair keyspace_name

# Repair specific table
nodetool repair keyspace_name table_name

# Check repair status
nodetool netstats

Repair Scheduling:

Run repairs at least every gc_grace_seconds (default 10 days)
Schedule during low-traffic periods
Throttle to limit I/O impact

Monitoring Key Metrics

Metric	Warning Threshold	Critical Threshold
Read Latency (p99)	> 10ms	> 100ms
Write Latency (p99)	> 5ms	> 50ms
Compaction Pending	> 50	> 200
Heap Usage	> 75%	> 90%
SSTable Count/Table	> 20	> 50
Dropped Mutations	> 0	-

gc_grace_seconds Is Critical

Cassandra uses tombstones (deletion markers) that persist for gc_grace_seconds (default 10 days). If you don't repair within this window, deleted data can 'resurrect.' For tables with TTL-based deletion, ensure repairs run well within this window. Reduce gc_grace_seconds cautiously—it affects deletion propagation.

Summary: Cassandra in Practice

We've covered extensive ground on Apache Cassandra, from CQL fundamentals to production operations. Let's consolidate the key insights:

Key Takeaways

•CQL Looks Like SQL But Isn't — Query constraints are driven by primary key structure, not arbitrary conditions.
•Primary Key = Distribution + Sorting — Partition key determines node placement; clustering columns determine row order within partition.
•Design for Queries, Not Entities — Each query pattern often needs its own table with denormalized data.
•Avoid ALLOW FILTERING — If you need it, your schema design is wrong. Create a new table instead.
•LWT Provides Linearizable Consistency — But at 4x latency cost. Use sparingly for critical operations.
•Secondary Indexes Are Limited — Prefer denormalized lookup tables for predictable performance.
•Compaction Strategy Matters — Choose based on workload: STCS for writes, LCS for reads, TWCS for time-series.
•Regular Repairs Are Essential — Run within gc_grace_seconds to prevent data resurrection.

What's Next:

Now that we understand Cassandra's mechanics, the next page explores Time-Series Data—a workload where column-family stores truly shine, powering IoT, monitoring, and analytics applications.

Page Complete

You now have practical, hands-on knowledge of Apache Cassandra. These patterns—query-driven schema design, denormalization, and operational awareness—are directly applicable to production Cassandra deployments at any scale.

3 / 5

Loading learning content...

Database Management SystemsColumn-Family Databases

Column-Family Databases

LevelIntermediate

Duration75 mins

TopicColumn-Family Databases

3 / 5

Cassandra Example

Apache Cassandra: The Industry Standard

Massive Write Throughput: Thousands to millions of writes per second
Linear Scalability: Add nodes, get proportional capacity increase
Multi-Datacenter Replication: Global applications with local latency
High Availability: No single point of failure, always writable

This page moves from theory to practice. We'll design schemas, write queries, and explore the operational aspects that distinguish hobby projects from production systems.

What You Will Learn

CQL Fundamentals

CQL (Cassandra Query Language) looks superficially like SQL but has fundamentally different semantics. Understanding these differences is crucial for effective Cassandra development.

Keyspaces: The Top-Level Container

A keyspace is Cassandra's equivalent of a database schema. It defines:

Replication strategy and factor
Durable writes configuration

Keyspace Creation

CQL

-- Development keyspace (single datacenter, low replication)
CREATE KEYSPACE IF NOT EXISTS dev_app
WITH REPLICATION = {
    'class': 'SimpleStrategy',
    'replication_factor': 1
};
 
-- Production keyspace (multi-datacenter, high availability)
CREATE KEYSPACE IF NOT EXISTS prod_app
WITH REPLICATION = {
    'class': 'NetworkTopologyStrategy',
    'us-east-1': 3,      -- 3 replicas in US East
    'eu-west-1': 3,      -- 3 replicas in EU West
    'ap-south-1': 2      -- 2 replicas in Asia Pacific
}
AND DURABLE_WRITES = true;  -- Commit log enabled (default)
 
-- Switch to keyspace
USE prod_app;
 
-- View keyspace details
DESCRIBE KEYSPACE prod_app;

Tables: Defined by Access Patterns

Unlike relational tables that model entities, Cassandra tables model query patterns. Each table is designed to answer specific queries efficiently.

Key Concepts:

Term	Description
Partition Key	Determines data distribution; all data with same partition key is on same node
Clustering Columns	Sort data within a partition; enable range queries within partition
Primary Key	Combination of partition key + clustering columns; unique identifier
Static Columns	Shared across all rows in a partition
Regular Columns	Standard data columns

Table Definition with Primary Key Types

CQL

-- Simple primary key: single column partition key
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    name TEXT,
    created_at TIMESTAMP
);
 
-- Compound primary key: partition key + clustering column
-- Partition key: user_id
-- Clustering column: post_time (orders data within partition)
CREATE TABLE user_posts (
    user_id UUID,
    post_time TIMESTAMP,
    post_id UUID,
    content TEXT,
    PRIMARY KEY (user_id, post_time)
) WITH CLUSTERING ORDER BY (post_time DESC);  -- Most recent first
 
-- Composite partition key: multiple columns form partition key
-- Partition key: (tenant_id, user_id) -- both required for lookups
-- Useful for multi-tenant applications
CREATE TABLE tenant_user_events (
    tenant_id UUID,
    user_id UUID,
    event_time TIMESTAMP,
    event_type TEXT,
    event_data TEXT,
    PRIMARY KEY ((tenant_id, user_id), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
 
-- Multiple clustering columns: hierarchical sorting
CREATE TABLE sensor_readings (
    sensor_id UUID,
    year INT,
    month INT,
    day INT,
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, year, month, day, reading_time)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, reading_time DESC);

Primary Key Is Immutable

Data Types and Collections

Cassandra supports rich data types including native types, collections, and user-defined types.

Native Types

Type	Description	Example
UUID	Universally unique identifier	`550e8400-e29b-41d4-a716-446655440000`
TIMEUUID	Time-ordered UUID (v1)	`50554d6e-29bb-11e5-b345-feff819cdc9f`
TEXT/VARCHAR	UTF-8 string	`'Hello World'`
INT/BIGINT	32/64-bit integer	`42`, `9223372036854775807`
DECIMAL	Variable precision	`3.14159265359`
DOUBLE	64-bit floating point	`3.14159`
BOOLEAN	true/false	`true`
TIMESTAMP	Date+time in milliseconds	`'2024-01-15T10:30:00+0000'`
DATE	Date only	`'2024-01-15'`
TIME	Time only (nanoseconds)	`'10:30:00.000000000'`
BLOB	Binary data	`0xCAFEBABE`
INET	IP address	`'192.168.1.1'`

Collection Types

CQL

-- LIST: Ordered collection, allows duplicates
-- Use for: Order history, comment threads
CREATE TABLE articles (
    article_id UUID PRIMARY KEY,
    title TEXT,
    tags LIST<TEXT>,              -- ['tech', 'database', 'cassandra']
    related_ids LIST<UUID>
);
 
-- Update list
UPDATE articles SET tags = tags + ['nosql'] WHERE article_id = ?;
UPDATE articles SET tags = ['new_tag'] + tags WHERE article_id = ?;  -- Prepend
UPDATE articles SET tags[0] = 'updated' WHERE article_id = ?;        -- By index
 
-- SET: Unordered collection, no duplicates
-- Use for: Unique tags, roles, permissions
CREATE TABLE user_settings (
    user_id UUID PRIMARY KEY,
    roles SET<TEXT>,              -- {'admin', 'user', 'moderator'}
    favorite_categories SET<TEXT>
);
 
-- Update set
UPDATE user_settings SET roles = roles + {'editor'} WHERE user_id = ?;
UPDATE user_settings SET roles = roles - {'user'} WHERE user_id = ?;
 
-- MAP: Key-value pairs
-- Use for: Attributes, metadata, preferences
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name TEXT,
    attributes MAP<TEXT, TEXT>,   -- {'color': 'blue', 'size': 'XL'}
    prices MAP<TEXT, DECIMAL>     -- {'USD': 99.99, 'EUR': 89.99}
);
 
-- Update map
UPDATE products SET attributes['weight'] = '500g' WHERE product_id = ?;
UPDATE products SET attributes = attributes + {'material': 'cotton'} WHERE product_id = ?;

User-Defined Types (UDTs)

For complex nested structures, define custom types:

User-Defined Types

CQL

-- Create UDT for address
CREATE TYPE address (
    street TEXT,
    city TEXT,
    state TEXT,
    zip_code TEXT,
    country TEXT
);
 
-- Create UDT for contact info
CREATE TYPE contact_info (
    email TEXT,
    phone TEXT,
    address FROZEN<address>  -- Nested UDT must be frozen
);
 
-- Use UDT in table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    name TEXT,
    billing_address FROZEN<address>,
    shipping_address FROZEN<address>,
    contact FROZEN<contact_info>
);
 
-- Insert with UDT
INSERT INTO customers (customer_id, name, billing_address)
VALUES (
    uuid(),
    'John Doe',
    {
        street: '123 Main St',
        city: 'New York',
        state: 'NY',
        zip_code: '10001',
        country: 'USA'
    }
);
 
-- Query UDT fields
SELECT name, billing_address.city FROM customers WHERE customer_id = ?;

FROZEN Collections and UDTs

Query Design Patterns

Cassandra queries are constrained by primary key structure. Understanding what queries are efficient—and which are impossible—is essential for schema design.

The Golden Rules of Cassandra Queries

Partition key columns must be equality (=)
Clustering columns can use equality or range, but in order
Non-primary-key columns cannot be filtered without ALLOW FILTERING (dangerous)
ORDER BY must match clustering column order and direction

Query Capabilities and Limitations

CQL

-- Table for examples
CREATE TABLE orders (
    customer_id UUID,
    order_date DATE,
    order_time TIMESTAMP,
    order_id UUID,
    total DECIMAL,
    status TEXT,
    PRIMARY KEY (customer_id, order_date, order_time)
) WITH CLUSTERING ORDER BY (order_date DESC, order_time DESC);
 
-- ✅ EFFICIENT: Full partition key with equality
SELECT * FROM orders WHERE customer_id = ?;
 
-- ✅ EFFICIENT: Partition key + clustering column range
SELECT * FROM orders 
WHERE customer_id = ?
AND order_date >= '2024-01-01' 
AND order_date <= '2024-01-31';
 
-- ✅ EFFICIENT: Partition key + prefix of clustering columns
SELECT * FROM orders
WHERE customer_id = ?
AND order_date = '2024-01-15';
 
-- ❌ INEFFICIENT: Missing partition key (full cluster scan)
SELECT * FROM orders WHERE order_date = '2024-01-15';
-- Error: Cannot execute this query... ALLOW FILTERING
 
-- ❌ INEFFICIENT: Skipping clustering column
SELECT * FROM orders
WHERE customer_id = ?
AND order_time > '2024-01-15T10:00:00';  -- Skipped order_date!
-- Error: PRIMARY KEY column "order_date" cannot be restricted
 
-- ⚠️ DANGEROUS: Non-key column filter
SELECT * FROM orders
WHERE customer_id = ?
AND status = 'pending'
ALLOW FILTERING;  -- Scans all rows for this customer!
 
-- ✅ EFFICIENT: Ordering matches clustering order
SELECT * FROM orders
WHERE customer_id = ?
ORDER BY order_date DESC, order_time DESC;
 
-- ❌ INVALID: Ordering conflicts with clustering
SELECT * FROM orders
WHERE customer_id = ?
ORDER BY order_date ASC;  -- Table ordered DESC!
-- Error: Order by currently only supports the ordering specified at table definition

The ALLOW FILTERING Trap

ALLOW FILTERING enables inefficient queries but is almost never acceptable in production:

Full partition scan for each query
Performance degrades with partition size
Timeout risk under load
Coordinator memory pressure

When you need ALLOW FILTERING, redesign your schema instead. Create a new table optimized for that query pattern.

ALLOW FILTERING Is a Code Smell

Practical Query Patterns

Pattern 1: Time-Series Partitioning

Time-Series Query Patterns

CQL

-- Problem: IoT sensor data with millions of readings
-- Query: Get recent readings for a sensor
 
-- Bad design: All readings in one partition
CREATE TABLE sensor_data_bad (
    sensor_id UUID,
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, reading_time)
);
-- Problem: Partition grows unbounded!
 
-- Good design: Time-bucketed partitions
CREATE TABLE sensor_data (
    sensor_id UUID,
    day DATE,                      -- Bucket by day
    reading_time TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY ((sensor_id, day), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
 
-- Query today's readings for a sensor
SELECT * FROM sensor_data
WHERE sensor_id = ? AND day = toDate(now())
LIMIT 100;
 
-- Query last 7 days (requires 7 queries, one per partition)
-- Application-side: Loop over date range
-- Or use: IN clause (limited to small sets)
SELECT * FROM sensor_data
WHERE sensor_id = ?
AND day IN ('2024-01-15', '2024-01-14', '2024-01-13', ...)
LIMIT 100;

Schema Design Methodology

Cassandra schema design follows a disciplined process that starts with queries, not entities. This is fundamentally different from relational design.

The Schema Design Process

Step 1: Define Your Queries

Before writing any schema, enumerate all queries your application needs:

Q1: Get user profile by user_id
Q2: Get all posts by a user, ordered by time (most recent first)
Q3: Get all comments on a post, ordered by time
Q4: Get user's timeline (posts from users they follow)
Q5: Get trending posts in a category

Step 2: Design Tables for Each Query

Each query typically becomes one table. Yes, this means denormalization. Embrace it.

Step 3: Analyze Access Patterns

For each table:

Partition sizing (avoid unbounded growth)
Read/write ratio
Consistency requirements
TTL needs

Query-Driven Schema Design Example

CQL

-- Q1: Get user profile by user_id
-- Simple key-value lookup
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT,
    bio TEXT,
    profile_image_url TEXT,
    created_at TIMESTAMP
);
 
-- Q2: Get all posts by a user, ordered by time
-- Partition by user, cluster by time
CREATE TABLE posts_by_user (
    user_id UUID,
    post_id TIMEUUID,
    content TEXT,
    media_urls SET<TEXT>,
    like_count COUNTER,     -- Counter for likes
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC);
 
-- Q3: Get all comments on a post, ordered by time
-- Partition by post, cluster by time
CREATE TABLE comments_by_post (
    post_id TIMEUUID,
    comment_id TIMEUUID,
    user_id UUID,
    content TEXT,
    PRIMARY KEY (post_id, comment_id)
) WITH CLUSTERING ORDER BY (comment_id ASC);  -- Oldest first
 
-- Q4: Get user's timeline
-- Denormalized timeline per user (fan-out on write)
CREATE TABLE timeline_by_user (
    user_id UUID,
    post_id TIMEUUID,
    author_id UUID,
    author_username TEXT,        -- Denormalized! Avoids join
    content_preview TEXT,        -- First 200 chars
    PRIMARY KEY (user_id, post_id)
) WITH CLUSTERING ORDER BY (post_id DESC)
AND default_time_to_live = 2592000;  -- 30 days TTL
 
-- Q5: Get trending posts in a category
-- Bucketed by time window + category
CREATE TABLE trending_posts (
    time_bucket TEXT,            -- '2024-01-15T10' (hourly bucket)
    category TEXT,
    score DECIMAL,               -- Trending score
    post_id TIMEUUID,
    author_username TEXT,
    content_preview TEXT,
    PRIMARY KEY ((time_bucket, category), score, post_id)
) WITH CLUSTERING ORDER BY (score DESC, post_id DESC);

Handling Denormalization Updates

When data is denormalized across tables, updates require touching multiple tables. Strategies:

1. Logged Batches (Same Partition)

Batch Updates

CQL

-- Logged batch: Atomic within same partition
-- Use when: Updates affect same partition
BEGIN BATCH
    INSERT INTO user_posts (user_id, post_id, content) 
    VALUES (?, ?, ?);
    
    UPDATE users SET post_count = post_count + 1 
    WHERE user_id = ?;
APPLY BATCH;
 
-- UNLOGGED batch: No atomicity, slightly faster
-- Use when: Updates affect different partitions but 
-- you want them sent together (reduces round trips)
BEGIN UNLOGGED BATCH
    INSERT INTO posts_by_user (user_id, post_id, content)
    VALUES (?, ?, ?);
    
    INSERT INTO posts_by_category (category, post_id, user_id, content)
    VALUES (?, ?, ?, ?);
APPLY BATCH;

2. Eventual Consistency with Async Writers

For high-throughput scenarios, accept that denormalized views may be slightly behind:

User creates post:
1. Write to posts_by_user (synchronous, visible immediately)
2. Queue message to update timelines, trending, etc.
3. Background workers fan-out updates (eventual)

3. Materialized Views (Use Cautiously)

Cassandra supports materialized views that automatically sync, but they have limitations:

Can only project partition key to new table
Performance overhead
Consistency guarantees are weaker than expected

Production recommendation: Prefer explicit denormalization over materialized views. It's more predictable and debuggable.

Designing for Growth

Lightweight Transactions (LWT)

Cassandra's default eventually-consistent model doesn't guarantee strict ordering of operations. Lightweight Transactions (LWT) provide linearizable consistency—at a cost.

When LWT Is Needed

Unique constraints: Email must not already exist
Conditional updates: Only update if current value matches
Compare-and-set: Optimistic locking patterns

LWT Syntax

Lightweight Transaction Examples

CQL

-- IF NOT EXISTS: Insert only if row doesn't exist
-- Use for: User registration (unique email)
INSERT INTO users (user_id, email, username)
VALUES (uuid(), 'alice@example.com', 'alice')
IF NOT EXISTS;
 
-- Returns: [applied] = true/false
 
-- Result handling:
-- [applied] | user_id   | email              | username
-- true      | null      | null               | null       <- Success
-- false     | abc-123   | alice@example.com  | alice      <- Already exists
 
-- IF condition: Update only if condition matches
-- Use for: Optimistic locking, state transitions
UPDATE orders
SET status = 'shipped'
WHERE order_id = ?
IF status = 'processing';  -- Only if currently 'processing'
 
-- IF EXISTS: Update only if row exists
UPDATE users
SET email = 'newemail@example.com'
WHERE user_id = ?
IF EXISTS;
 
-- Multiple conditions
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = ?
IF quantity > 0  -- Don't go negative
AND reserved = false;  -- Not reserved
 
-- Delete with condition
DELETE FROM sessions
WHERE user_id = ?
IF session_token = ?;  -- Validate token before delete

The Cost of LWT

LWT uses the Paxos consensus protocol, which is expensive:

Aspect	Regular Write	LWT Write
Round Trips	1	4 (prepare, promise, propose, commit)
Latency	~2-5ms	~20-50ms
Throughput	High	Low (contention-limited)
Lock Scope	None	Partition-level

Guidelines for LWT:

Minimize Usage: Use only where truly necessary
Avoid Hot Partitions: LWT on high-contention partitions creates bottlenecks
Batch LWT Carefully: Batches with LWT must affect same partition
Consider Alternatives: Sometimes eventual consistency + conflict resolution is better

LWT Partition Contention

Secondary Indexes and Alternatives

Native secondary indexes in Cassandra have severe limitations. Understanding these limitations—and alternatives—is crucial.

Native Secondary Indexes (2i)

Cassandra supports creating indexes on non-primary-key columns:

Secondary Index Creation

CQL

-- Create secondary index
CREATE INDEX users_email_idx ON users (email);
 
-- Now this query works without ALLOW FILTERING
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- Index on collection column
CREATE INDEX articles_tags_idx ON articles (tags);
 
-- Query collection index
SELECT * FROM articles WHERE tags CONTAINS 'cassandra';
 
-- Index on map keys or values
CREATE INDEX product_attrs_keys_idx ON products (KEYS(attributes));
CREATE INDEX product_attrs_values_idx ON products (VALUES(attributes));
CREATE INDEX product_attrs_entries_idx ON products (ENTRIES(attributes));

Why Secondary Indexes Are Problematic

1. Distributed Index Architecture

Unlike relational B-tree indexes, Cassandra's secondary indexes are local to each node. A query by indexed column must:

Scatter to all nodes in the cluster
Each node searches its local index
Gather results

This is essentially a full cluster scan.

2. Anti-Patterns

Scenario	Problem
High-cardinality columns	Index becomes as large as data
Low-cardinality columns	Few values → hot spots
Frequently updated columns	Index maintenance overhead
Large result sets	Memory pressure on coordinator

3. When Secondary Indexes Are Acceptable

Low-cardinality with partition key: WHERE user_id = ? AND status = 'active'
Infrequent, non-critical queries
Small clusters with low latency requirements

Superior Alternatives

1. Denormalized Lookup Tables

Create a table with the query column as partition key:

Lookup Table Pattern

CQL

-- Primary table
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email TEXT,
    username TEXT
);
 
-- Lookup table: email → user_id
CREATE TABLE users_by_email (
    email TEXT PRIMARY KEY,
    user_id UUID
);
 
-- Insert requires both tables (batch or async)
BEGIN BATCH
    INSERT INTO users (user_id, email, username) 
    VALUES (?, ?, ?);
    INSERT INTO users_by_email (email, user_id) 
    VALUES (?, ?);
APPLY BATCH;
 
-- Query: Find user by email
-- Step 1: Get user_id from lookup table
SELECT user_id FROM users_by_email WHERE email = ?;
-- Step 2: Get full user record
SELECT * FROM users WHERE user_id = ?;

2. SASI Indexes (Experimental)

SASI (SSTable Attached Secondary Index) provides better performance for certain use cases:

Prefix searches on text
Range queries on numbers
Still local indexes, but more efficient

3. External Search Systems

For complex search requirements:

Elasticsearch/OpenSearch for full-text search
Apache Solr via DataStax Search
Keep Cassandra for transactional data, sync to search system

Index Strategy Recommendation

Operational Considerations

Running Cassandra in production requires understanding key operational aspects.

Compaction Strategies

Compaction merges SSTables and removes obsolete data. Strategy choice impacts read/write performance:

Strategy	Best For	Trade-offs
SizeTieredCompactionStrategy (STCS)	Write-heavy, mixed workloads	Higher space amplification
LeveledCompactionStrategy (LCS)	Read-heavy, consistent latency	Higher write amplification
TimeWindowCompactionStrategy (TWCS)	Time-series with TTL	Efficient TTL, bucketed compaction

Compaction Configuration

CQL

-- Create table with specific compaction strategy
CREATE TABLE metrics (
    sensor_id UUID,
    hour TIMESTAMP,
    value DECIMAL,
    PRIMARY KEY (sensor_id, hour)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'HOURS',
    'compaction_window_size': 1
}
AND default_time_to_live = 604800;  -- 7 days TTL
 
-- Change compaction strategy (online migration)
ALTER TABLE users WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};
 
-- Check compaction status
SELECT * FROM system.compaction_history;

Repair and Consistency Maintenance

Regular repair ensures replicas stay in sync:

# Full repair (Merkle tree comparison)
nodetool repair -full keyspace_name

# Incremental repair (only unrepaired data)
nodetool repair keyspace_name

# Repair specific table
nodetool repair keyspace_name table_name

# Check repair status
nodetool netstats

Repair Scheduling:

Run repairs at least every gc_grace_seconds (default 10 days)
Schedule during low-traffic periods
Throttle to limit I/O impact

Monitoring Key Metrics

Metric	Warning Threshold	Critical Threshold
Read Latency (p99)	> 10ms	> 100ms
Write Latency (p99)	> 5ms	> 50ms
Compaction Pending	> 50	> 200
Heap Usage	> 75%	> 90%
SSTable Count/Table	> 20	> 50
Dropped Mutations	> 0	-

gc_grace_seconds Is Critical

Summary: Cassandra in Practice

We've covered extensive ground on Apache Cassandra, from CQL fundamentals to production operations. Let's consolidate the key insights:

Key Takeaways

•CQL Looks Like SQL But Isn't — Query constraints are driven by primary key structure, not arbitrary conditions.
•Primary Key = Distribution + Sorting — Partition key determines node placement; clustering columns determine row order within partition.
•Design for Queries, Not Entities — Each query pattern often needs its own table with denormalized data.
•Avoid ALLOW FILTERING — If you need it, your schema design is wrong. Create a new table instead.
•LWT Provides Linearizable Consistency — But at 4x latency cost. Use sparingly for critical operations.
•Secondary Indexes Are Limited — Prefer denormalized lookup tables for predictable performance.
•Compaction Strategy Matters — Choose based on workload: STCS for writes, LCS for reads, TWCS for time-series.
•Regular Repairs Are Essential — Run within gc_grace_seconds to prevent data resurrection.

What's Next:

Now that we understand Cassandra's mechanics, the next page explores Time-Series Data—a workload where column-family stores truly shine, powering IoT, monitoring, and analytics applications.

Page Complete

3 / 5