Loading content...
How do you describe the structure of data in a database? Not the data itself—but the shape of it. What kinds of entities exist? What attributes do they have? How do they relate to one another? What rules govern their behavior?
These questions are answered by Data Models—formal frameworks that provide the vocabulary, structures, and rules for describing data at the logical level. A data model is to database design what a blueprint is to architecture: it defines the structure before any concrete data exists.
In this page, we'll explore what data models are, why they matter, and survey the major data modeling paradigms from historical to contemporary.
By the end of this page, you'll understand what a data model is and its three fundamental components. You'll be introduced to major data modeling paradigms—hierarchical, network, relational, object-oriented, and document—understanding their strengths, limitations, and appropriate use cases.
A data model is an abstract representation of data structures and their relationships. More formally:
A data model is a collection of concepts for describing data, relationships among data, and constraints on the data.
Data models serve as the bridge between the messy complexity of the real world and the precise, formal structures that databases can store and manipulate.
| Model | Structure | Operations | Constraints |
|---|---|---|---|
| Relational | Tables, columns, rows | SQL, relational algebra | Keys, foreign keys, CHECK, NOT NULL |
| Document | Documents, embedded objects, arrays | CRUD, aggregation pipelines | Schema validation, unique indexes |
| Graph | Nodes, edges, properties | Traversals, pattern matching (Cypher) | Uniqueness, relationship rules |
| Key-Value | Keys and opaque values | GET, SET, DELETE | Key uniqueness |
| Hierarchical | Parent-child tree structure | Navigation (parent, child, sibling) | Single parent constraint |
A data model is abstract—it defines structures and rules conceptually. Implementation is how a specific DBMS realizes that model. Multiple DBMS products can implement the same data model differently. PostgreSQL, MySQL, and Oracle all implement the relational model, but their internal implementations differ significantly.
Data models can be categorized based on their level of abstraction and purpose. Understanding these categories helps in selecting the right model for different stages of database design.
Record-Based vs. Object-Based Models:
Record-Based Models (relational, network, hierarchical) — Data is represented as fixed-format records. Each record has the same structure (columns). The relational model extended this with powerful algebraic operations.
Object-Based Models (entity-relationship, object-oriented) — Data is represented as objects that can have complex internal structure. Objects encapsulate both data and behavior. The ER model is purely conceptual; OO databases attempted implementation.
Modern data models often blend characteristics: Document databases allow complex nested structures (object-like) but store them as schema-flexible records. Graph databases combine node/edge structures with property storage.
Good database design flows from conceptual to logical to physical. Start with business understanding (conceptual), select an appropriate data model (logical), then optimize for your DBMS (physical). Skipping levels leads to awkward designs that don't match business needs or perform poorly.
The Relational Model, proposed by Edgar F. Codd in 1970, revolutionized database technology and remains the dominant data model today. Its key insight: represent data as mathematical relations (tables), with a solid foundation in set theory and first-order logic.
Why the Relational Model Won:
12345678910111213141516171819202122232425262728293031
-- Relation (Table) definition with constraintsCREATE TABLE employees ( -- Attributes (Columns) with domains employee_id INTEGER PRIMARY KEY, -- Key attribute first_name VARCHAR(50) NOT NULL, -- Domain: strings up to 50 chars last_name VARCHAR(50) NOT NULL, email VARCHAR(100) UNIQUE, -- Uniqueness constraint hire_date DATE NOT NULL, -- Domain: dates salary DECIMAL(10,2) CHECK (salary > 0), -- Domain constraint department_id INTEGER REFERENCES departments(id) -- Foreign key (referential integrity)); -- Tuples (Rows) - instances of the relationINSERT INTO employees VALUES (1, 'Alice', 'Chen', 'achen@co.com', '2020-03-15', 85000.00, 3), (2, 'Bob', 'Smith', 'bsmith@co.com', '2019-07-01', 92000.00, 2), (3, 'Carol', 'Jones', 'cjones@co.com', '2021-01-20', 78000.00, 3); -- Relational operations (declarative)-- Selection (σ): Filter rowsSELECT * FROM employees WHERE salary > 80000; -- Projection (π): Select columnsSELECT first_name, last_name FROM employees; -- Join (⋈): Combine related tablesSELECT e.first_name, d.name AS departmentFROM employees eJOIN departments d ON e.department_id = d.id;In 1985, Codd published 12 rules (actually 13, numbered 0-12) that a DBMS must satisfy to be considered 'fully relational.' No commercial DBMS satisfies all rules completely, but they remain the theoretical benchmark for relational compliance.
Before the relational model dominated, two earlier models shaped database history. Understanding them provides context for why the relational model was revolutionary.
The Hierarchical Model (1960s)
The hierarchical model organizes data as a tree structure with parent-child relationships. Each child record has exactly one parent; each parent can have multiple children. IBM's IMS (Information Management System), developed in 1966, was the dominant implementation.
Structure:
Example: A Company Structure
[COMPANY]
|
+---------+---------+
| |
[DEPARTMENT] [DEPARTMENT]
| |
+--+--+ +----+
| | | |
[EMP] [EMP] [EMP] [EMP]
Limitations:
IBM's IMS is still operational in many large enterprises, particularly in banking and insurance. Legacy systems running on IMS process billions of transactions daily. This longevity demonstrates both the robustness of the software and the cost of migration.
| Aspect | Hierarchical | Network | Relational |
|---|---|---|---|
| Structure | Trees | Directed graphs | Tables |
| Relationships | One-to-many only | One-to-many, many-to-many | Any via foreign keys |
| Access Method | Tree navigation | Pointer traversal | Declarative queries (SQL) |
| Data Independence | Low | Low | High |
| Query Language | Procedural DML | Procedural DML | Declarative SQL |
| Flexibility | Rigid | Moderate | High |
The fundamental problem with hierarchical and network models was program-data dependence. Applications contained explicit navigation code tied to physical data structures. Any structural change required rewriting applications. The relational model's declarative approach freed applications from this burden.
The Document Model emerged in the 2000s as part of the NoSQL movement, addressing limitations of the relational model for certain use cases. Documents are self-describing, semi-structured data objects—typically JSON or BSON—that can contain nested structures and arrays.
Key Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142
// A document representing an order (MongoDB-style){ "_id": ObjectId("507f1f77bcf86cd799439011"), "order_date": ISODate("2024-01-15T10:30:00Z"), "status": "shipped", // Embedded document: Customer info "customer": { "id": 12345, "name": "Alice Chen", "email": "alice@example.com", "address": { "street": "123 Main St", "city": "San Francisco", "state": "CA", "zip": "94102" } }, // Array of embedded documents: Line items "items": [ { "product_id": "PRD-001", "name": "Wireless Keyboard", "quantity": 2, "unit_price": 49.99 }, { "product_id": "PRD-042", "name": "USB Hub", "quantity": 1, "unit_price": 24.99 } ], // Computed/stored: total "total": 124.97, // Flexible: fields can vary between documents "gift_message": "Happy Birthday!", "tracking_number": "1Z999AA10123456784"}When to Use Document Model:
✅ Good Fit:
❌ Poor Fit:
In document databases, the key design decision is when to embed related data versus reference it. Embed when: data is accessed together, has 1:1 or 1:few relationship, rarely changes. Reference when: data has many:many relationships, changes frequently, or is accessed independently.
The database landscape has diversified significantly. Beyond relational and document models, several specialized models have emerged for specific use cases.
Key-Value Model
The simplest NoSQL model: data stored as key-value pairs. The key is a unique identifier; the value is opaque to the database (can be anything: string, JSON, binary blob).
Structure:
user:1001 → {"name": "Alice", "email": "a@ex.com"}
session:abc123 → {"user_id": 1001, "expires": ...}
cache:product:42 → "<html>...product page...</html>"
Operations: GET, SET, DELETE (extremely fast)
Use Cases:
Limitations:
Modern applications often use multiple data models. A single system might use PostgreSQL for transactions, Redis for caching, Neo4j for recommendations, and Pinecone for semantic search. This 'polyglot persistence' approach uses each model where it excels.
Selecting a data model is a critical architectural decision with long-term implications. There's no universally 'best' model—the right choice depends on your specific requirements.
| Requirement | Best Model | Why |
|---|---|---|
| Complex transactions, referential integrity | Relational | ACID guarantees, constraint enforcement |
| Flexible, evolving schema | Document | Schema-less, JSON-native |
| Massive write throughput | Wide-Column | Distributed, optimized for writes |
| Relationship-heavy, traversals | Graph | O(1) relationship traversal |
| Simple, ultra-fast lookups | Key-Value | Minimal overhead, in-memory |
| Semantic/similarity search | Vector | ANN algorithms, embedding support |
| General purpose, unknown future | Relational | Most flexible, best tooling |
If you're unsure, start with a relational database. PostgreSQL with its extension ecosystem (JSONB for documents, PostGIS for geo, pgvector for AI) can handle remarkably diverse workloads. Specialize only when you hit clear limitations or have obvious specialized needs.
We've explored the landscape of data models—from historical to contemporary. Let's consolidate the key insights:
What's Next:
With a solid understanding of data models, we'll explore Database Languages—the formal languages used to define and manipulate data. We'll examine DDL, DML, DCL, and TCL, understanding how each category of language supports different aspects of database management.
You now understand what data models are, how they've evolved, and how to choose between them. This knowledge is fundamental for database design—the model you select shapes every subsequent design decision and the capabilities available to your applications.