Loading learning content...
Choosing a database is one of the most consequential technical decisions in a project. It influences application architecture, operational complexity, scalability paths, and even hiring requirements. A poor choice creates friction for years; a good choice becomes invisible infrastructure that simply works.
Yet database selection is often treated superficially—following trends, copying other companies, or defaulting to familiar technology. The result: teams wrestling with databases unsuited to their actual access patterns.
Effective database selection requires systematic analysis of requirements, honest assessment of trade-offs, and recognition that there's rarely a single "best" answer.
This page provides frameworks and heuristics for matching NoSQL (and relational) databases to real-world requirements.
By the end of this page, you will have practical frameworks for database selection, understand the key factors that drive technology choices, and be able to evaluate databases against specific use case requirements. You'll see real-world examples demonstrating how characteristics map to database choices.
Before evaluating specific databases, establish a framework for understanding your requirements. The following dimensions drive database selection decisions:
Questions to ask:
Questions to ask:
Questions to ask:
Questions to ask:
Questions to ask:
| Requirement | Relational | Key-Value | Document | Column-Family | Graph |
|---|---|---|---|---|---|
| Complex transactions | ★★★★★ | ★☆☆☆☆ | ★★★☆☆ | ★★☆☆☆ | ★★★☆☆ |
| Flexible schema | ★★☆☆☆ | ★★★★★ | ★★★★★ | ★★★★☆ | ★★★★☆ |
| Simple lookups | ★★★☆☆ | ★★★★★ | ★★★★☆ | ★★★★★ | ★★★☆☆ |
| Complex queries | ★★★★★ | ★☆☆☆☆ | ★★★★☆ | ★★★☆☆ | ★★★☆☆ |
| Relationship queries | ★★★☆☆ | ★☆☆☆☆ | ★★☆☆☆ | ★☆☆☆☆ | ★★★★★ |
| Write throughput | ★★★☆☆ | ★★★★★ | ★★★★☆ | ★★★★★ | ★★★☆☆ |
| Horizontal scale | ★★☆☆☆ | ★★★★★ | ★★★★☆ | ★★★★★ | ★★★☆☆ |
| Strong consistency | ★★★★★ | ★★★☆☆ | ★★★☆☆ | ★★★☆☆ | ★★★★☆ |
The most important factor in database selection is access patterns—how you read and write data. A database optimized for your access patterns will outperform a theoretically 'better' database that doesn't match your usage. Start with: 'What are the 5 most common queries?' and 'What's the write pattern?'
Before examining NoSQL choices, recognize that relational databases remain the right choice for many workloads. NoSQL isn't a replacement—it's an alternative for specific scenarios.
Complex transactions are required: Banking, inventory, order processing—any scenario where partial updates are unacceptable.
Ad-hoc querying is common: Business intelligence, reporting, analytics on structured data. SQL's expressiveness is unmatched for exploratory queries.
Data integrity is paramount: Healthcare records, financial audits, regulatory compliance—domains where data validity is non-negotiable.
Relationships are complex but well-defined: When data fits naturally into normalized tables with foreign key relationships.
The team has SQL expertise: Familiarity reduces errors and speeds development.
PostgreSQL and MySQL are incredibly capable, well-understood, and continuously improving. PostgreSQL now supports JSON documents, full-text search, and horizontal scaling (via Citus). Don't switch to NoSQL because it's trendy—switch because your specific requirements demand trade-offs that NoSQL provides.
Key-value stores are appropriate when access patterns are simple—lookups by known keys—and performance is critical.
Session Management
Key: session:{session_id}
Value: {user_id, expires, permissions, metadata}
Operations: GET (validate session), SET (create/update), DELETE (logout)
Requirements: Sub-millisecond reads, TTL expiration, high throughput
Choice: Redis, Memcached, DynamoDB
Caching Layer
Key: cache:{entity}:{id} (e.g., cache:user:12345)
Value: Serialized entity from primary database
Operations: GET (cache hit/miss), SET (populate cache), DELETE (invalidate)
Requirements: Speed, TTL, eventual consistency acceptable
Choice: Redis, Memcached
Rate Limiting
Key: ratelimit:{user_id}:{window}
Value: Request count in current window
Operations: INCR (atomic increment), GET (check limit), EXPIRE (window reset)
Requirements: Atomic operations, TTL, very high throughput
Choice: Redis
Feature Flags
Key: feature:{feature_name}
Value: {enabled, percentage_rollout, user_whitelist}
Operations: GET (check flag state)
Requirements: Fast reads, rare writes, simple caching
Choice: Redis, etcd (for distributed config)
| Requirement | Best Choice | Why |
|---|---|---|
| In-memory speed, data structures | Redis | Rich data structures: lists, sets, sorted sets, streams |
| Pure caching, simplicity | Memcached | Simpler, multi-threaded, pure cache semantics |
| Managed, serverless scaling | DynamoDB | Auto-scaling, no operational overhead, pay-per-request |
| Distributed config, coordination | etcd | Strong consistency via Raft, Kubernetes-native |
| High availability, eventual consistency | Riak KV | Dynamo-inspired, masterless architecture |
Document databases are appropriate when data is naturally document-shaped with varied attributes, and queries go beyond simple key lookups.
Content Management System
{
"_id": "article_12345",
"title": "Understanding NoSQL Databases",
"author": {"name": "Alice", "bio": "..."},
"body": "...",
"tags": ["database", "nosql", "tutorial"],
"metadata": {"views": 1234, "published": "2024-01-15"},
"comments": [{"user": "bob", "text": "Great article!"}]
}
Queries: By tag, by author, full-text search, recent articles Choice: MongoDB (rich queries), Couchbase (caching + documents)
Product Catalog
{
"_id": "sku_12345",
"name": "Wireless Headphones",
"category": ["electronics", "audio"],
"price": 149.99,
"attributes": {
"battery_life": "40 hours",
"driver_size": "40mm"
}
}
Queries: By category, price range, attribute filters, text search Choice: MongoDB, Elasticsearch (if search is primary)
User Profiles
{
"_id": "user_12345",
"email": "alice@example.com",
"preferences": {"theme": "dark", "notifications": true},
"sessions": [{"device": "mobile", "last_active": "..."}]
}
Access pattern: Usually single-document reads/writes Choice: MongoDB, DynamoDB (if simple access patterns)
| Requirement | Best Choice | Why |
|---|---|---|
| Rich queries, aggregations, transactions | MongoDB | Most complete feature set, ACID transactions |
| Real-time mobile sync, offline-first | Firestore, CouchDB | Built-in sync, conflict resolution |
| Hybrid caching + document | Couchbase | Memcached-compatible caching layer built-in |
| AWS ecosystem, serverless | DynamoDB | Managed, auto-scaling, tight AWS integration |
| Search-first with documents | Elasticsearch | Optimized for full-text search, analytics |
Column-family databases are appropriate for time-series data, high write throughput, and workloads with well-defined query patterns.
IoT Sensor Data
Primary Key: (device_id), timestamp
Columns: sensor readings, status flags, metadata
Query: "Last 24 hours of readings for device X"
Write: Append-only, millions of records/second
Choice: Cassandra, ScyllaDB, TimescaleDB
Metrics and Monitoring
Primary Key: (metric_name, time_bucket), timestamp
Columns: value, tags, aggregates
Query: "Average CPU for server Y in last hour"
Write: High-cardinality metrics from thousands of hosts
Choice: Cassandra, TimescaleDB, InfluxDB
Activity Feeds
Primary Key: (user_id), timestamp
Columns: activity type, actor, object, metadata
Query: "Recent 50 activities for user X"
Write: Fan-out events across millions of users
Choice: Cassandra (used by Instagram, Netflix)
Messaging and Chat History
Primary Key: (conversation_id), message_timestamp
Columns: sender, content, attachments, read_status
Query: "Messages in conversation X, last 100"
Write: Real-time message delivery
Choice: Cassandra, ScyllaDB
| Requirement | Best Choice | Why |
|---|---|---|
| General-purpose wide-column, proven scale | Apache Cassandra | Battle-tested at Netflix, Apple; large community |
| Cassandra-compatible, higher performance | ScyllaDB | C++ reimplementation, 10x performance claims |
| Hadoop ecosystem integration | HBase | Built on HDFS, integrates with Spark, Hive |
| Managed, Google-scale | Cloud Bigtable | Managed, integrates with GCP data ecosystem |
| Purpose-built time-series | TimescaleDB, InfluxDB | Optimized for time-series queries, retention policies |
Column-family databases require careful data modeling—you must design tables around query patterns before writing code. This is a different discipline from relational modeling. Invest in learning or hire expertise; poor data modeling in Cassandra leads to full-table scans and performance disasters.
Graph databases are appropriate when relationships between entities are the primary focus of queries.
Social Network Features
Nodes: Person, Post, Group, Event
Edges: FOLLOWS, LIKES, MEMBER_OF, ATTENDS
Queries:
- Friends-of-friends not yet connected
- Influencer identification (high-degree nodes)
- Community detection
Choice: Neo4j (feature-rich), Neptune (managed AWS)
Recommendation Engine
Nodes: User, Product, Category
Edges: PURCHASED, VIEWED, SIMILAR_TO
Queries:
- "Products bought by people who bought X"
- "Shortest path between user preferences and product"
- Collaborative filtering via graph
Choice: Neo4j, TigerGraph (analytics scale)
Knowledge Graph / Semantic Web
Nodes: Entity (Person, Place, Concept)
Edges: Relationships with types and properties
Queries:
- "Find all people connected to Company X within 3 hops"
- Pattern matching for entities
Choice: Neo4j, Neptune (RDF/SPARQL support)
Fraud Detection
Nodes: Account, Device, Transaction, IP Address
Edges: USES, TRANSACTED_WITH, LOGGED_FROM
Queries:
- "Find accounts sharing devices with known fraudsters"
- Ring detection among accounts
- Abnormal relationship patterns
Choice: Neo4j, TigerGraph, Amazon Neptune
| Requirement | Best Choice | Why |
|---|---|---|
| Enterprise graph, rich features | Neo4j | Most mature, Cypher language, great tooling |
| AWS managed, multi-model | Amazon Neptune | Managed, supports Gremlin and SPARQL |
| Real-time analytics at scale | TigerGraph | Optimized for iterative analytics, massive graphs |
| Multi-model (document + graph) | ArangoDB | Unified AQL for documents and graphs |
| Open source, distributed | JanusGraph | Supports Cassandra/HBase backends, TinkerPop standard |
Let's walk through realistic decision processes for common scenarios.
Requirements:
Decision:
Rationale: Polyglot persistence—each database handles what it does best. Alternatively, use PostgreSQL for everything if scale is modest and team prefers simplicity.
Requirements:
Decision:
Rationale: The write throughput requirement eliminates traditional RDBMS for sensor data. Cassandra's partition design fits time-series naturally.
Requirements:
Decision:
Rationale: At startup scale, MongoDB could handle social graph, but planning for 100M users pushes toward specialized solutions. Instagram famously moved social graph to Cassandra.
Many successful applications start with a single database (often PostgreSQL or MongoDB) and add specialized databases as scaling demands. Don't prematurely optimize with polyglot persistence—it adds operational complexity. Specialize when you have concrete evidence that a specialized database solves a real problem.
Learning from common mistakes is as valuable as understanding best practices.
Every database you add requires: monitoring, backups, security configuration, capacity planning, incident response, and team training. A single well-chosen database often beats three "optimal" ones. Measure the complexity cost against the performance benefit.
We've established frameworks and heuristics for matching databases to requirements. The key is systematic analysis, not intuition or trends.
Module Complete:
This concludes our exploration of NoSQL databases at the overview level. You now understand what NoSQL databases are, why they emerged, the theoretical foundations (CAP/BASE), the four primary categories, and how to select the right database for specific use cases.
Subsequent modules will dive deep into each NoSQL category: key-value stores, document databases, column-family databases, and graph databases—exploring their architectures, query languages, and practical implementation patterns.
You now have a comprehensive understanding of the NoSQL landscape and practical frameworks for database selection. You can analyze requirements systematically, evaluate database categories against specific use cases, and avoid common selection pitfalls. You're prepared to dive deeper into specific NoSQL database categories in the following modules.