Loading learning content...
Unlike the relational world—where virtually all databases share the table-based model and SQL interface—the NoSQL ecosystem is remarkably diverse. This diversity exists because different data access patterns demand fundamentally different data organizations.
Consider three engineering challenges:
Challenge 1: A web session cache requiring sub-millisecond lookups by session ID, handling 100,000 reads per second.
Challenge 2: A content management system storing articles with varying metadata, supporting flexible queries on multiple fields.
Challenge 3: A social network recommending friends-of-friends, traversing relationship graphs millions of nodes deep.
No single data model optimally serves all three. A hash map serves Challenge 1 but can't query by content. A document store serves Challenge 2 but graph traversal is expensive. A graph database excels at Challenge 3 but isn't optimal for simple key-value lookups.
NoSQL's answer: specialized databases for specialized problems.
By the end of this page, you will understand the four primary NoSQL database categories—their data models, internal architectures, performance characteristics, ideal use cases, and limitations. You'll be equipped to evaluate which category fits your specific requirements.
The NoSQL landscape is typically organized into four primary categories based on their fundamental data models:
1. Key-Value Stores — The simplest model: data stored as key-value pairs, optimized for high-speed lookups by key.
2. Document Databases — Semi-structured documents (typically JSON/BSON) with nested data, supporting queries on document fields.
3. Column-Family Databases — Data organized by columns rather than rows, optimized for wide, sparse tables and time-series data.
4. Graph Databases — Data modeled as nodes (entities) and edges (relationships), optimized for relationship traversal and pattern matching.
Each category represents a different philosophy about how data should be organized, accessed, and scaled. The choice isn't about which is "best"—it's about which best fits your access patterns.
| Category | Data Model | Query Power | Best For | Trade-off |
|---|---|---|---|---|
| Key-Value | Key → Value (opaque) | Get/Put by key only | Caching, sessions, simple lookups | No complex queries |
| Document | JSON/BSON documents | Rich queries on fields | Content, catalogs, user profiles | Join complexity |
| Column-Family | Rows with dynamic columns | Partition/clustering key queries | Time-series, analytics, wide data | Complex modeling |
| Graph | Nodes and edges | Relationship traversal | Social networks, recommendations | Non-graph queries slow |
Key-value stores represent the simplest NoSQL data model: a giant distributed hash map. Data is stored as key-value pairs where the key is a unique identifier and the value is an opaque blob that the database doesn't interpret.
Key → Value
"user:123" → {binary blob: user data}
"session:abc" → {binary blob: session data}
"config:app" → {binary blob: config JSON}
The database provides only:
Some key-value stores extend this with:
The simplicity of key-value stores enables extreme performance:
No query parsing: Just key lookup—O(1) with hashing No schema validation: Value is opaque; no validation overhead Trivial partitioning: Hash the key to locate the shard Minimal coordination: Single-key operations don't span nodes
Result: Sub-millisecond latencies at massive scale.
12345678910111213141516171819202122
# Redis: In-memory key-value store with data structures # Simple key-value operationsSET user:1001 '{"name":"Alice","email":"alice@example.com"}'GET user:1001 # TTL for session management (expires in 3600 seconds)SET session:abc123 '{"userId":"1001","created":"2024-01-15"}' EX 3600TTL session:abc123 # Atomic increment for countersINCR page:views:homepage # Returns 1 (first view)INCR page:views:homepage # Returns 2 # Hash type for structured data (more efficient updates)HSET user:1002 name "Bob" email "bob@example.com" age "30"HGET user:1002 email # Get single fieldHINCRBY user:1002 age 1 # Increment age atomically # Sorted sets for leaderboardsZADD leaderboard 1500 "player:1" 2300 "player:2" 1800 "player:3"ZREVRANGE leaderboard 0 2 WITHSCORES # Top 3 playersRedis — In-memory data structure store; supports strings, hashes, lists, sets, sorted sets. Used for caching, real-time analytics, pub/sub messaging. Single-threaded event loop provides atomicity.
Amazon DynamoDB — Fully managed, serverless, with automatic scaling. Supports key-value and simple document operations. Strong consistency option available.
Memcached — Simple in-memory caching; no persistence, no data structures. Lighter than Redis, purely for caching.
etcd — Distributed key-value store using Raft consensus. Used for configuration management and service discovery in Kubernetes.
Riak KV — Distributed key-value store inspired by Amazon Dynamo. Highly available with configurable consistency.
Choose key-value stores when: (1) Access is only by known keys, (2) Speed is critical (sub-millisecond), (3) Data model is simple, (4) You need caching, session storage, or simple counters. Avoid when: Complex queries are needed, relationships between data matter, or you need to search by data content.
Document databases extend the key-value model by making the value structured and queryable. Instead of opaque blobs, values are semi-structured documents—typically JSON, BSON (binary JSON), or XML—with fields that can be indexed and queried.
A document is a self-contained unit of data with a unique identifier and nested structure:
{
"_id": "product_12345",
"name": "Wireless Headphones",
"brand": "AudioTech",
"price": 149.99,
"categories": ["electronics", "audio", "wireless"],
"specs": {
"battery_life": "40 hours",
"driver_size": "40mm",
"weight": "250g"
},
"reviews": [
{"user": "alice", "rating": 5, "text": "Great sound!"},
{"user": "bob", "rating": 4, "text": "Good value."}
],
"created_at": "2024-01-15T10:30:00Z"
}
Unlike key-value stores, document databases can:
{"brand": "AudioTech"})specs.battery_life)categories contains "audio")Document databases embrace schema-on-read: documents in the same collection can have different structures.
The power: Rapid iteration, polymorphic data, schema evolution without migrations. The responsibility: Application must handle varying structures; validation moves to code.
Modern document databases offer optional schema validation:
This provides a middle ground—flexibility with guardrails.
12345678910111213141516171819202122232425262728293031323334353637
// MongoDB: Rich document queries and aggregations // Insert a documentdb.products.insertOne({ name: "Wireless Headphones", brand: "AudioTech", price: 149.99, categories: ["electronics", "audio"], specs: { battery_life: "40 hours", weight: "250g" }}); // Query by field valuedb.products.find({ brand: "AudioTech" }); // Query nested fieldsdb.products.find({ "specs.battery_life": "40 hours" }); // Query array elementsdb.products.find({ categories: "audio" }); // Range query with sortingdb.products.find({ price: { $gte: 100, $lte: 200 } }) .sort({ price: 1 }); // Aggregation pipeline: Average price by branddb.products.aggregate([ { $group: { _id: "$brand", avgPrice: { $avg: "$price" }, count: { $sum: 1 } }}, { $sort: { avgPrice: -1 }}]); // Text search (requires text index)db.products.createIndex({ name: "text", "specs.features": "text" });db.products.find({ $text: { $search: "wireless noise cancelling" }});MongoDB — The most widely adopted document database. BSON storage, rich query language, multi-document ACID transactions, sharding, replication. Cloud-native (MongoDB Atlas).
Couchbase — Combines document store with built-in caching (Memcached-compatible). N1QL query language (SQL-like). Strong in mobile sync scenarios.
Amazon DocumentDB — MongoDB-compatible managed service on AWS. Separates compute from storage for easy scaling.
CouchDB — REST API, MVCC (multi-version concurrency), built for offline-first with sync capabilities. JSON documents, JavaScript MapReduce.
Firestore (Google Cloud) — Serverless document database. Real-time sync, offline support, strong consistency. Popular for mobile/web applications.
Choose document databases when: (1) Data naturally fits document structure (varied attributes, nested data), (2) Schema needs to evolve rapidly, (3) Queries are diverse but predictable, (4) Most access is single-document or uses known patterns. Examples: Content management, product catalogs, user profiles, event logging.
Column-family databases (also called wide-column stores) organize data by columns rather than rows. Each row can have different columns, and columns are grouped into families. This model excels for sparse, wide data and time-series workloads.
The column-family model has several key concepts:
Row Key: Unique identifier for a row (similar to primary key) Column Family: A grouping of related columns, defined upfront Column: A name-value pair within a column family Timestamp: Each column value is versioned by time
Row Key | Column Family: profile | Column Family: activity
------------+---------------------------------+---------------------------
user:alice | name: "Alice" | last_login: "2024-01-15"
| email: "alice@example.com" | login_count: 42
| avatar: <binary> |
------------+---------------------------------+---------------------------
user:bob | name: "Bob" | last_login: "2024-01-14"
| (no email—sparse columns!) | login_count: 17
| | failed_logins: 3
Key characteristics:
Row-oriented storage (traditional RDBMS):
Column-family storage:
Example query comparison:
"Get the last_login for all users who logged in this month"
The column-store reads significantly less data from disk.
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Apache Cassandra: Distributed column-family database -- Create keyspace (like a database)CREATE KEYSPACE iot_dataWITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3}; USE iot_data; -- Create table with partition key and clustering columnsCREATE TABLE sensor_readings ( device_id UUID, timestamp TIMESTAMP, sensor_type TEXT, value DOUBLE, unit TEXT, PRIMARY KEY ((device_id), timestamp, sensor_type)) WITH CLUSTERING ORDER BY (timestamp DESC); -- The PRIMARY KEY has two parts:-- (device_id) = partition key → determines which node stores data-- timestamp, sensor_type = clustering columns → sort order within partition -- Insert time-series dataINSERT INTO sensor_readings (device_id, timestamp, sensor_type, value, unit)VALUES (123e4567-e89b-12d3-a456-426614174000, '2024-01-15 10:00:00', 'temperature', 22.5, 'celsius'); -- Efficient query: reads from single partitionSELECT * FROM sensor_readings WHERE device_id = 123e4567-e89b-12d3-a456-426614174000AND timestamp >= '2024-01-15 00:00:00'AND timestamp < '2024-01-16 00:00:00'; -- Efficient: latest 100 readings (clustering order is DESC)SELECT * FROM sensor_readings WHERE device_id = 123e4567-e89b-12d3-a456-426614174000LIMIT 100; -- INEFFICIENT: Full table scan! Requires ALLOW FILTERINGSELECT * FROM sensor_readings WHERE value > 25; -- Don't do this!Apache Cassandra — Decentralized, peer-to-peer architecture. No single point of failure. CQL query language. Used by Netflix, Apple, Instagram at massive scale.
ScyllaDB — Cassandra-compatible, written in C++ for higher performance. Drop-in replacement claiming 10x throughput.
Apache HBase — Built on Hadoop HDFS. Strongly consistent. Integrates with Hadoop ecosystem for batch processing.
Google Bigtable — Google's proprietary wide-column store (the original). Cloud Bigtable offers managed access. Powers Google Search, Maps, YouTube.
Azure Cosmos DB (Cassandra API) — Multi-model with Cassandra-compatible API. Global distribution with turnkey replication.
Choose column-family databases when: (1) Write throughput is critical, (2) Data has time-series characteristics, (3) You need massive scale with predictable query patterns, (4) Data is wide and sparse. Examples: IoT sensor data, event logging, metrics/monitoring, messaging systems, activity feeds. Avoid for: Ad-hoc queries, complex analytics, applications requiring joins.
Graph databases represent data as nodes (entities) and edges (relationships between entities). Both nodes and edges can have properties (key-value attributes). This model excels when relationships are as important as the data itself.
Nodes: Entities with labels (types) and properties
(alice:Person {name: "Alice", age: 30})
(bob:Person {name: "Bob", age: 28})
(techcorp:Company {name: "TechCorp", industry: "Software"})
Edges (Relationships): Connections with types, direction, and properties
(alice)-[:KNOWS {since: 2020}]->(bob)
(alice)-[:WORKS_AT {role: "Engineer", since: 2019}]->(techcorp)
(bob)-[:WORKS_AT {role: "Manager", since: 2018}]->(techcorp)
Graph Queries: Traverse relationships, find patterns
// Find Alice's colleagues (people who work at the same company)
MATCH (alice:Person {name: 'Alice'})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(colleague)
RETURN colleague.name
The problem with graphs in relational databases:
Consider finding friends-of-friends-of-friends (3 hops) in a relational database:
-- Relational: 3 self-joins on a million-row table
SELECT DISTINCT f3.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 'alice';
This query becomes exponentially slower as:
Graph-native advantage:
Graph databases store relationships as first-class citizens—edges are directly navigable pointers, not computed joins:
-- Graph: Direct traversal, O(depth) not O(nodes)
MATCH (alice:Person {name: 'Alice'})-[:KNOWS*3]->(friend)
RETURN DISTINCT friend.name;
Traversal time depends on the local subgraph size, not total database size. Finding 3-hop connections for Alice is equally fast whether the database has 1,000 or 1,000,000,000 users—only Alice's neighborhood matters.
123456789101112131415161718192021222324252627282930313233343536
// Neo4j Cypher: Graph query language // Create nodes and relationshipsCREATE (alice:Person {name: 'Alice', title: 'Engineer'})CREATE (bob:Person {name: 'Bob', title: 'Manager'})CREATE (carol:Person {name: 'Carol', title: 'Director'})CREATE (techcorp:Company {name: 'TechCorp'}) CREATE (alice)-[:KNOWS {since: 2020}]->(bob)CREATE (bob)-[:KNOWS {since: 2019}]->(carol)CREATE (alice)-[:WORKS_AT {role: 'Engineer'}]->(techcorp)CREATE (bob)-[:WORKS_AT {role: 'Manager'}]->(techcorp)CREATE (carol)-[:WORKS_AT {role: 'Director'}]->(techcorp); // Pattern matching: Find Alice's colleaguesMATCH (alice:Person {name: 'Alice'})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(colleague)WHERE colleague <> aliceRETURN colleague.name, colleague.title; // Multi-hop traversal: Friends-of-friends Alice doesn't know directlyMATCH (alice:Person {name: 'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)WHERE NOT (alice)-[:KNOWS]->(fof) AND alice <> fofRETURN DISTINCT fof.name; // Path finding: Shortest path between two peopleMATCH path = shortestPath( (alice:Person {name: 'Alice'})-[:KNOWS*]-(target:Person {name: 'Carol'}))RETURN path, length(path) as hops; // Recommendation: People who know people Alice knows, ranked by connection countMATCH (alice:Person {name: 'Alice'})-[:KNOWS]->(friend)-[:KNOWS]->(recommendation)WHERE NOT (alice)-[:KNOWS]->(recommendation) AND alice <> recommendationRETURN recommendation.name, COUNT(*) as mutual_friendsORDER BY mutual_friends DESCLIMIT 5;Neo4j — The most popular graph database. Native graph storage. Cypher query language. Strong developer tooling, visualization.
Amazon Neptune — Fully managed graph database supporting both property graph (Gremlin) and RDF (SPARQL) models.
ArangoDB — Multi-model: document + graph + key-value. AQL query language. Single engine for multiple data models.
JanusGraph — Open-source, distributed graph database. Supports multiple storage backends (Cassandra, HBase). Integrates with TinkerPop/Gremlin.
TigerGraph — Enterprise-focused, optimized for analytics. Native parallel graph computation for massive-scale analytics.
Choose graph databases when: (1) Relationships are central to queries (friend-of-friend, shortest path), (2) Data is naturally connected (social networks, knowledge graphs), (3) Query patterns involve variable-length paths, (4) You need real-time recommendations or fraud detection. Avoid for: Simple CRUD operations, time-series data, heavy aggregation/analytics workloads.
The boundaries between NoSQL categories are blurring. Modern databases increasingly support multiple data models within a single system, offering flexibility without managing multiple database types.
ArangoDB — Single engine supporting documents, graphs, and key-value. Unified AQL query language works across models.
Azure Cosmos DB — Multi-API approach: MongoDB-compatible document, Cassandra-compatible column, Gremlin graphs, and table storage. Same underlying engine.
Couchbase — Primarily document but with key-value access patterns, full-text search, and analytics.
OrientDB — Document and graph in unified model. Records can have relationships like graph databases.
Multi-model databases offer convenience but face a challenge: optimizing for multiple models is hard. A database tuned for document queries may not match a purpose-built graph database for deep traversals.
Use multi-model when:
Use purpose-built when:
The alternative to multi-model is 'polyglot persistence'—using different databases for different purposes. Use Redis for caching, PostgreSQL for transactions, Elasticsearch for search, Neo4j for recommendations. Each excels at its purpose but increases operational complexity. There's no universal right answer—evaluate based on your team's capabilities and requirements.
We've surveyed the four primary NoSQL categories. Each represents a different philosophy about data organization optimized for different access patterns.
| If You Need... | Choose | Example Use Cases |
|---|---|---|
| Fastest possible lookups by key | Key-Value | Caching, sessions, rate limiting |
| Flexible documents with rich queries | Document | Product catalogs, user profiles, CMS |
| High write throughput for time-series | Column-Family | IoT sensors, metrics, event logging |
| Relationship traversal and pattern matching | Graph | Social networks, recommendations, fraud detection |
| Multiple patterns in one system | Multi-Model | Unified architecture, varied access patterns |
What's next:
Now that we understand the NoSQL categories, we'll explore how to choose between them. The next page examines use case selection—practical decision frameworks for matching database technology to specific requirements.
You now understand the four primary NoSQL database categories—key-value, document, column-family, and graph—including their data models, strengths, limitations, and ideal use cases. You're equipped to evaluate which category best fits specific application requirements.