Loading content...
When an architect designs a building, they begin not by choosing paint colors or furniture, but by establishing the fundamental structures: walls, floors, beams, and how they connect. Similarly, the structural aspect of a data model defines the fundamental building blocks and organizational principles that determine what data can exist and how it can be arranged.
The structural component answers the most basic question about any database: "What can data look like?" This isn't a trivial question. The answer profoundly shapes everything else—what queries are natural, what updates are efficient, what constraints are expressible, and ultimately, what problems the database is good at solving.
Different data models answer this question differently, which is precisely why we have relational databases, document stores, graph databases, and more. Each structural approach enables certain capabilities while making others difficult or impossible.
By the end of this page, you will understand how the structural component of data models works, including the concepts of data objects, attributes, types, and relationships. You'll be able to analyze the structural choices of any data model and understand how those choices affect system capabilities.
The structural component of a data model formally specifies the types of data objects that can exist, the properties or attributes these objects can have, and the relationships that can connect different objects. This specification acts as a grammar for data—a set of rules that determine what constitutes valid, well-formed data.
Core structural concepts:
Every data model's structural component defines these fundamental elements:
1. Data Objects/Containers: The primary organizational units for data. Different models call these different things:
2. Attributes/Properties: The characteristics or values associated with data objects. An employee might have attributes like name, salary, and hire_date. The structural component specifies what attributes are allowed and, often, what types of values they can hold.
3. Relationships: How different data objects connect or reference each other. An order relates to a customer; an employee relates to a department. Different models express relationships in fundamentally different ways.
Think of structure like the grammar of a language. English grammar says a sentence needs a subject and verb; data model structure says what kinds of objects exist and how they can be composed. Just as grammar violations produce nonsense sentences, structural violations produce invalid data.
The structural specification includes:
These structural rules are not merely documentation—they are enforced by the database system. Attempts to create data that violates the structure result in errors. This enforcement is what makes databases reliable.
Let's examine each structural element more rigorously, understanding the design space and tradeoffs involved.
Data Objects:
The primary structural question is: what is the atomic unit of data? What is the smallest piece that has independent identity and can be operated on individually?
In the relational model, the atomic unit is the tuple (row)—an ordered collection of attribute values. Tuples exist within relations (tables), which are sets of tuples all having the same structure (same attributes).
In the document model, the atomic unit is the document—a self-contained, possibly nested structure (typically JSON or BSON). Documents exist within collections, but unlike relational tables, documents in the same collection may have different structures.
In the graph model, there are two atomic units: nodes (representing entities) and edges (representing relationships). Both can have properties. This dual structure is what makes graph databases uniquely suited for relationship-heavy data.
| Data Model | Primary Object | Grouping Mechanism | Structure Flexibility |
|---|---|---|---|
| Relational | Tuple (Row) | Relation (Table) | Rigid—all tuples share schema |
| Document | Document | Collection | Flexible—documents can differ |
| Graph | Node + Edge | Graph / Labels | Flexible—properties can vary |
| Key-Value | Value (blob) | Keyspace / Bucket | Maximum—value is opaque |
| Column-Family | Row with columns | Column Family | Semi-flexible—sparse columns |
Attributes and Types:
Attributes describe the properties of data objects. The structural component specifies:
Type systems vary significantly across models:
Relational databases traditionally use a fixed set of primitive types: INTEGER, VARCHAR, DATE, BOOLEAN, etc. Each attribute has exactly one type that never changes.
Document databases support nested types: a address attribute might contain an object with street, city, zip sub-attributes. This enables data locality—related information stays together.
Graph databases typically support simple types on nodes and edges, though some modern graph databases allow complex property values.
The fundamental distinction is between schema-on-write (relational) where structure is enforced when data is stored, and schema-on-read (document, key-value) where structure is interpreted when data is accessed. This isn't about which is 'better'—it's about different tradeoffs between development agility and data consistency.
Real-world data is inherently interconnected. Customers place orders; orders contain products; products belong to categories; employees work in departments. How relationships are represented is perhaps the most consequential structural decision a data model makes.
Relational Model: Foreign Keys and Joins
The relational model represents relationships implicitly through shared attribute values. An Orders table might have a customer_id attribute whose values correspond to id values in a Customers table. This "foreign key" creates a link without physically connecting the data.
To traverse the relationship, queries perform joins—operations that match tuples based on attribute values. This approach is extremely flexible but can be expensive for heavily connected data with many-hop traversals.
-- Implicit relationship via matching attribute values
SELECT c.name, o.order_date
FROM Customers c
JOIN Orders o ON c.id = o.customer_id;
Document Model: Embedding and References
Document databases offer two relationship strategies:
Embedding: Include related data directly within the document. An Order document might contain the full Customer information and an array of LineItems. This eliminates joins but creates data duplication.
{
"order_id": 1001,
"customer": {
"name": "Alice Smith",
"email": "alice@example.com"
},
"items": [
{ "product": "Widget", "quantity": 3 },
{ "product": "Gadget", "quantity": 1 }
]
}
Referencing: Store an identifier pointing to another document, similar to foreign keys. This avoids duplication but requires multiple queries to resolve.
The choice between embedding and referencing is a key document database design decision, trading between read efficiency (embedding) and update consistency (referencing).
Graph Model: First-Class Relationships
Graph databases treat relationships as first-class citizens—as important as the entities themselves. Relationships (edges) are stored explicitly with their own identity, properties, and direction.
// Relationship is an explicit, stored structure
(alice:Customer)-[:PLACED {date: '2024-01-15'}]->(order:Order)
This explicit storage makes relationship traversal extremely efficient. Finding "all customers who bought products also bought by Alice" requires multiple joins in relational but is a simple traversal in graph databases.
Implications of relationship representation:
The schema is the concrete specification of structure for a particular database—the actual definition of what tables exist, what columns they have, what types those columns are, and so forth. While the data model provides the structural vocabulary, the schema applies that vocabulary to a specific domain.
Schema in relational databases:
Relational schemas are typically defined using Data Definition Language (DDL):
CREATE TABLE Employees (
employee_id INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
department_id INTEGER REFERENCES Departments(id),
hire_date DATE DEFAULT CURRENT_DATE,
salary DECIMAL(10,2) CHECK (salary > 0)
);
This schema is precise, explicit, and enforced. Every row must conform. The database rejects any data that doesn't match.
Implicit schema in document databases:
Document databases are often called "schemaless," but this is misleading. The data has structure—it's just not enforced by the database. The schema exists implicitly in application code, documentation, or conventions.
Some document databases now offer optional schema validation:
// MongoDB schema validation
db.createCollection("employees", {
validator: {
$jsonSchema: {
required: ["name", "email"],
properties: {
name: { bsonType: "string" },
email: { bsonType: "string" },
salary: { bsonType: "decimal", minimum: 0 }
}
}
}
});
This provides a middle ground—structure when you want it, flexibility when you need it.
Every database has a schema—even if it's implicit. When people say 'schemaless,' they mean 'schema not enforced by the database.' The schema still exists; it's just pushed to application code. This can be liberating or dangerous, depending on how it's managed.
Schema evolution:
Real-world schemas change. Business requirements evolve, new features are added, mistakes are corrected. Different structural approaches handle evolution differently:
Relational schema migration:
Document schema evolution:
Graph schema evolution:
| Characteristic | Relational | Document | Graph |
|---|---|---|---|
| Adding new attribute | ALTER TABLE required | Just add it | Just add it |
| Removing attribute | ALTER TABLE required | Stop using it | Stop using it |
| Changing attribute type | Complex migration | Application handles | Application handles |
| Adding new entity type | CREATE TABLE | Start using it | Start using it |
| Data consistency | Enforced uniformly | May vary by document | May vary by node |
| Migration complexity | High, but controlled | Low, but risky | Low, but risky |
Structural integrity refers to the guarantee that data conforms to the defined structure. This is where the structural and constraint components of a data model interact—structure defines what's possible, constraints define what's valid.
Levels of structural integrity:
1. Type integrity: Values match their declared types. An integer column contains only integers; a date column contains only valid dates. This is the most basic form of structural integrity.
2. Collection integrity: Data objects conform to their container's rules. In relational, every tuple in a relation has exactly the attributes defined for that relation. In strict document schemas, every document matches the validation rules.
3. Referential integrity: Relationships reference valid targets. A foreign key points to an existing row; a document reference resolves to an actual document. This prevents "orphan" references that point nowhere.
4. Structural consistency: The overall structure remains coherent. Hierarchies are properly nested; graphs don't have malformed edges; collections don't contain invalid members.
salary > 0)The best systems enforce structural integrity at multiple levels: application validation for immediate feedback, database constraints for ultimate safety. Never rely on application code alone—databases outlive applications, and direct database access can bypass application rules.
Every structural choice involves trade-offs. Understanding these trade-offs is essential for selecting the right model and designing effective schemas.
Normalization vs. Denormalization:
The relational model encourages normalization—organizing data to eliminate redundancy. Each fact is stored once, and relationships connect the pieces. This minimizes update anomalies but requires joins to reassemble related data.
Document models often favor denormalization—embedding related data together. This optimizes reads (all data in one place) but complicates updates (changes must be made in multiple places).
Flat vs. Nested Structure:
Relational tables are flat—each row is a simple tuple of values. Complex structures must be decomposed into multiple tables.
Documents can be deeply nested—arrays within objects within arrays. This matches hierarchical data naturally but can become unwieldy for deeply nested updates or queries.
| Trade-off | Choice A | Choice B | When to Choose A | When to Choose B |
|---|---|---|---|---|
| Normalization | Normalized (no redundancy) | Denormalized (embedded) | Frequent updates, consistency critical | Read-heavy, performance critical |
| Structure rigidity | Fixed schema | Flexible schema | Data well-understood, consistency needed | Rapid iteration, varying data shapes |
| Nesting | Flat tables | Nested documents | Complex relationships, ad-hoc queries | Hierarchical data, contained access |
| Relationship storage | Implicit (foreign keys) | Explicit (stored edges) | Set-based operations, ACID needs | Graph traversal, relationship queries |
| Type specificity | Rich type system | Minimal types | Data quality, complex types | Flexibility, unknown data shapes |
Practical implications:
These aren't abstract concerns—they affect daily development:
There is no universally correct choice. The structural decisions must match the specific requirements, access patterns, and constraints of each system.
The 'right' structural choice depends entirely on context: access patterns, update frequency, consistency requirements, development pace, and team expertise. Senior engineers choose structures based on these factors, not database trends or personal preferences.
Let's examine a concrete example to see how different data models structure the same information. Consider an e-commerce domain with customers, orders, and products.
Relational Structure:
-- Three separate tables, relationships via foreign keys
CREATE TABLE Customers (
id INTEGER PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255)
);
CREATE TABLE Orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES Customers(id),
order_date DATE
);
CREATE TABLE OrderItems (
order_id INTEGER REFERENCES Orders(id),
product_id INTEGER REFERENCES Products(id),
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
Data is normalized, relationships are implicit, structure is rigid.
Document Structure:
// Single document with embedded data
{
"_id": "order_1001",
"customer": {
"id": "cust_42",
"name": "Alice Smith",
"email": "alice@example.com"
},
"order_date": "2024-01-15",
"items": [
{ "product_id": "prod_1", "name": "Widget", "price": 29.99, "quantity": 3 },
{ "product_id": "prod_2", "name": "Gadget", "price": 49.99, "quantity": 1 }
]
}
Data is denormalized (customer and product info embedded), self-contained, and flexible.
Graph Structure:
// Nodes and explicit relationships
CREATE (alice:Customer {name: 'Alice Smith', email: 'alice@example.com'})
CREATE (order:Order {date: '2024-01-15'})
CREATE (widget:Product {name: 'Widget', price: 29.99})
CREATE (gadget:Product {name: 'Gadget', price: 49.99})
CREATE (alice)-[:PLACED]->(order)
CREATE (order)-[:CONTAINS {quantity: 3}]->(widget)
CREATE (order)-[:CONTAINS {quantity: 1}]->(gadget)
Entities are normalized, relationships are explicit and first-class citizens.
The structural component of a data model defines the vocabulary and rules for data organization. We've explored this component in depth:
What's next:
Structure tells us what data can look like, but not what we can do with it. The next page explores the operational aspect—the operations that data models support for retrieving, creating, modifying, and deleting data. We'll see how structural choices influence what operations are natural and efficient.
You now understand the structural component of data models—the foundation upon which all data organization rests. Different structural approaches enable different capabilities and make different trade-offs. Next, we'll examine how those structures are manipulated through operations.