Loading learning content...
In any system design interview, the question of which database to use often emerges early and carries enormous weight. This isn't a superficial technology choice—it's a foundational architectural decision that influences every subsequent aspect of your design: data models, query patterns, consistency guarantees, scaling strategies, operational complexity, and ultimately, whether the system succeeds or fails under load.
The interviewer isn't looking for you to name-drop technologies. They want to see systematic reasoning—a structured approach that weighs requirements against capabilities, understands trade-offs, and arrives at a justified recommendation. This is what separates junior candidates who pick databases based on familiarity from senior engineers who select them based on principled analysis.
By the end of this page, you will possess a comprehensive framework for database selection in system design interviews. You'll understand how to analyze requirements, map them to database capabilities, evaluate multiple candidates, articulate trade-offs, and present a well-reasoned recommendation that demonstrates principal-engineer-level thinking.
Effective database selection requires a structured methodology. The DARTS Framework provides a systematic approach that interviewers respect because it demonstrates both technical depth and engineering judgment:
Each phase of this framework builds upon the previous, creating a logical chain of reasoning that leads to a defensible database choice. Let's examine each phase in detail.
When asked 'What database would you use?', resist the urge to immediately name a technology. Instead, say: 'Let me walk through my analysis systematically.' This signals maturity and thoroughness—qualities interviewers value highly in senior candidates.
Phase 1: Data Model Analysis
Before selecting a database, you must deeply understand the shape and nature of your data:
| Data Model Characteristic | Favorable Database Type | Rationale |
|---|---|---|
| Highly relational with complex joins | Relational (PostgreSQL, MySQL) | ACID compliance, foreign keys, SQL joins are fundamental |
| Flexible schema, document-like records | Document (MongoDB, CouchDB) | Schema-less design accommodates varying structures |
| Simple key-value with high throughput | Key-Value (Redis, DynamoDB) | Minimal overhead, optimized for direct access |
| Time-stamped metrics/events | Time-Series (InfluxDB, TimescaleDB) | Optimized for temporal queries, efficient compression |
| Deeply interconnected entities | Graph (Neo4j, Amazon Neptune) | Native traversal operations, path-finding algorithms |
| Wide-column analytical data | Column-Family (Cassandra, HBase) | Optimized for write-heavy, time-partitioned workloads |
Phase 2: Access Pattern Evaluation
The way your application accesses data is often more important than the data model itself. Different databases optimize for radically different access patterns:
A principal-level understanding requires going beyond surface-level categorization into the architectural principles that give each database type its characteristics. Let's examine the major categories through this lens.
Relational Database Management Systems (RDBMS)
Relational databases have remained the backbone of enterprise applications for 50+ years because they solve the hardest problems in data management: maintaining consistency in the face of concurrent modifications and system failures.
Architectural Foundation:
When to Select:
Production Examples:
Relational databases scale vertically more easily than horizontally. Horizontal scaling (sharding) breaks joins across shards and complicates transactions. Consider this fundamental limitation when evaluating for massive scale.
Real-world systems rarely use a single database. Polyglot persistence—using different databases for different use cases within the same system—is the norm at scale. Understanding when and how to combine databases demonstrates senior-level architectural thinking.
The Command Query Responsibility Segregation (CQRS) Pattern
CQRS separates read and write models, often backed by different databases. This acknowledges that read and write workloads have different optimization requirements:
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Writes │────►│ Event Store / │────►│ Read Database │
│ (Commands) │ │ Write Database │ │ (Denormalized) │
└─────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────┐
│ Reads │
│ (Queries) │
└─────────────┘
Write-Optimized Store (e.g., PostgreSQL, Event Store):
Read-Optimized Store (e.g., Elasticsearch, Redis, MongoDB):
Every additional database increases operational burden: more monitoring, more backups, more failure modes, more expertise needed. Only introduce additional databases when the benefit clearly outweighs the complexity. Start simple, add databases when specific pain points emerge.
When presenting database selection in an interview, structure your reasoning as a decision tree that progressively narrows options based on requirements.
START: What is the primary data model?├── Highly Relational (complex joins, referential integrity)│ └── RDBMS: PostgreSQL, MySQL, SQL Server│ ├── Need global distribution? → CockroachDB, Spanner│ └── Need extreme write scale? → Consider sharding or Vitess│├── Documents with Varying Schema│ └── Document DB: MongoDB, CouchDB│ ├── Need search? → Add Elasticsearch│ └── Need transactions? → MongoDB 4.0+│├── Simple Key-Value Access│ └── Key-Value: Redis, DynamoDB│ ├── Need persistence? → Redis with AOF/RDB, DynamoDB│ └── Pure cache? → Memcached, Redis│├── Connected/Graph Data│ └── Graph DB: Neo4j, Neptune│ ├── Need scale? → JanusGraph, Neptune│ └── ACID required? → Neo4j│├── Time-Series / Metrics│ └── Time-Series: InfluxDB, TimescaleDB│ ├── Need SQL? → TimescaleDB│ └── High cardinality? → QuestDB, ClickHouse│└── Wide-Column / Analytics └── Column-Family: Cassandra, HBase, ClickHouse ├── Real-time analytics? → ClickHouse, Druid └── Write-heavy time-series? → CassandraRequirement Prioritization Matrix
Not all requirements are equal. Prioritize them and map to databases:
| Priority | Requirement Type | Example | Database Implication |
|---|---|---|---|
| P0 | Must Have | ACID for payments | Eliminate non-ACID options |
| P1 | Should Have | Sub-100ms reads | Influences caching layer |
| P2 | Nice to Have | Full-text search | May add secondary database |
| P3 | Future Consideration | ML integration | Impacts long-term choice |
Start by identifying P0 requirements—these are non-negotiable filters. Then evaluate remaining candidates against P1 and P2 requirements to differentiate.
Interviewers often appreciate answers that start with 'it depends' followed by a clear framework. This shows you understand that context matters. Say: 'My recommendation depends on X, Y, and Z. Let me walk through how each affects the choice.'
Beyond matching data models and access patterns, several operational and strategic factors influence database selection. Senior engineers consider these carefully.
Scaling Characteristics Comparison
Understanding how different databases scale is crucial for system design:
| Database Type | Vertical Scaling | Horizontal Scaling | Scaling Complexity |
|---|---|---|---|
| PostgreSQL | Excellent | Difficult (Citus, sharding) | Moderate-High |
| MySQL | Excellent | Moderate (Vitess, ProxySQL) | Moderate |
| MongoDB | Good | Built-in sharding | Low-Moderate |
| Cassandra | Limited | Excellent (designed for it) | Low |
| DynamoDB | N/A (managed) | Automatic | Very Low |
| Redis | Excellent (large instances) | Redis Cluster | Moderate |
| Neo4j | Good | Limited (Enterprise) | High |
| ClickHouse | Excellent | Good (sharding) | Moderate |
Let's apply the DARTS framework to a realistic interview scenario.
Design a system for a social media platform that allows users to post updates, follow other users, and view a personalized feed. The system should handle 100 million daily active users and prioritize feed delivery latency.
Step 1: Data Model Analysis
Core entities:
Relationship complexity:
Step 2: Access Pattern Evaluation
Step 3: Requirements Mapping
| Requirement | Priority | Implication |
|---|---|---|
| Feed latency < 200ms | P0 | Pre-computed or cached feeds |
| 100M DAU scale | P0 | Horizontal scaling required |
| Durability (no lost posts) | P0 | Persistent storage with replication |
| Real-time feed updates | P1 | Async updates acceptable |
| Search functionality | P2 | Can be eventually consistent |
Step 4: Trade-off Assessment
Given the requirements, we need:
Step 5: Selection Justification
I recommend a polyglot architecture:
PostgreSQL for user accounts, authentication, and ad billing
Cassandra for posts and feeds
Redis for hot data caching
Elasticsearch for search
This combination balances consistency, availability, and partition tolerance according to the specific needs of each data type.
Even experienced engineers make database selection mistakes. Awareness of common pitfalls helps you avoid them in interviews and production.
When uncertain, it's mature to say: 'Given what I know, I'd start with X because it handles our P0 requirements. But I'd monitor closely and be prepared to introduce Y if query pattern Z emerges.' This shows pragmatism and adaptability.
Database selection is both science and art. Let's consolidate the key takeaways:
What's Next:
With database selection mastered, the next page dives into Schema Design—how to structure your data within the chosen database to optimize for performance, maintainability, and evolution. You'll learn to design schemas that support both current needs and future growth.
You now have a comprehensive framework for database selection in system design interviews. Remember: interviewers care less about which database you choose and more about how systematically and thoughtfully you arrive at your recommendation.