Data Model Concepts - Learning Module

Loading content...

0/252

Structural Aspect

The Architecture of Data

When an architect designs a building, they begin not by choosing paint colors or furniture, but by establishing the fundamental structures: walls, floors, beams, and how they connect. Similarly, the structural aspect of a data model defines the fundamental building blocks and organizational principles that determine what data can exist and how it can be arranged.

The structural component answers the most basic question about any database: "What can data look like?" This isn't a trivial question. The answer profoundly shapes everything else—what queries are natural, what updates are efficient, what constraints are expressible, and ultimately, what problems the database is good at solving.

Different data models answer this question differently, which is precisely why we have relational databases, document stores, graph databases, and more. Each structural approach enables certain capabilities while making others difficult or impossible.

What You Will Learn

By the end of this page, you will understand how the structural component of data models works, including the concepts of data objects, attributes, types, and relationships. You'll be able to analyze the structural choices of any data model and understand how those choices affect system capabilities.

What is Structure in a Data Model?

The structural component of a data model formally specifies the types of data objects that can exist, the properties or attributes these objects can have, and the relationships that can connect different objects. This specification acts as a grammar for data—a set of rules that determine what constitutes valid, well-formed data.

Core structural concepts:

Every data model's structural component defines these fundamental elements:

1. Data Objects/Containers: The primary organizational units for data. Different models call these different things:

Relational: Tables (Relations)
Document: Collections and Documents
Graph: Nodes and Edges
Key-Value: Keyspaces and Entries

2. Attributes/Properties: The characteristics or values associated with data objects. An employee might have attributes like name, salary, and hire_date. The structural component specifies what attributes are allowed and, often, what types of values they can hold.

3. Relationships: How different data objects connect or reference each other. An order relates to a customer; an employee relates to a department. Different models express relationships in fundamentally different ways.

Structure as Grammar

Think of structure like the grammar of a language. English grammar says a sentence needs a subject and verb; data model structure says what kinds of objects exist and how they can be composed. Just as grammar violations produce nonsense sentences, structural violations produce invalid data.

The structural specification includes:

Names and identifiers: How data objects and attributes are named and referenced
Data types: What kinds of values are permitted (integers, strings, dates, etc.)
Composition rules: How simple elements combine into complex structures
Collection semantics: Whether duplicates are allowed, ordering is preserved, etc.
NULL handling: How missing or unknown values are represented
Relationship cardinalities: One-to-one, one-to-many, many-to-many connections

These structural rules are not merely documentation—they are enforced by the database system. Attempts to create data that violates the structure result in errors. This enforcement is what makes databases reliable.

Structural Elements in Detail

Let's examine each structural element more rigorously, understanding the design space and tradeoffs involved.

Data Objects:

The primary structural question is: what is the atomic unit of data? What is the smallest piece that has independent identity and can be operated on individually?

In the relational model, the atomic unit is the tuple (row)—an ordered collection of attribute values. Tuples exist within relations (tables), which are sets of tuples all having the same structure (same attributes).

In the document model, the atomic unit is the document—a self-contained, possibly nested structure (typically JSON or BSON). Documents exist within collections, but unlike relational tables, documents in the same collection may have different structures.

In the graph model, there are two atomic units: nodes (representing entities) and edges (representing relationships). Both can have properties. This dual structure is what makes graph databases uniquely suited for relationship-heavy data.

Primary Data Objects Across Models
Data Model	Primary Object	Grouping Mechanism	Structure Flexibility
Relational	Tuple (Row)	Relation (Table)	Rigid—all tuples share schema
Document	Document	Collection	Flexible—documents can differ
Graph	Node + Edge	Graph / Labels	Flexible—properties can vary
Key-Value	Value (blob)	Keyspace / Bucket	Maximum—value is opaque
Column-Family	Row with columns	Column Family	Semi-flexible—sparse columns

Attributes and Types:

Attributes describe the properties of data objects. The structural component specifies:

Attribute names: What attributes exist and how they're referenced
Attribute domains/types: What values are legal for each attribute
Attribute constraints: Required vs. optional, unique vs. duplicate-allowed
Attribute composition: Simple (single value) vs. composite (structured) vs. multivalued (lists)

Type systems vary significantly across models:

Relational databases traditionally use a fixed set of primitive types: INTEGER, VARCHAR, DATE, BOOLEAN, etc. Each attribute has exactly one type that never changes.

Document databases support nested types: a address attribute might contain an object with street, city, zip sub-attributes. This enables data locality—related information stays together.

Graph databases typically support simple types on nodes and edges, though some modern graph databases allow complex property values.

Schema Rigidity vs. Flexibility

The fundamental distinction is between schema-on-write (relational) where structure is enforced when data is stored, and schema-on-read (document, key-value) where structure is interpreted when data is accessed. This isn't about which is 'better'—it's about different tradeoffs between development agility and data consistency.

Representing Relationships

Real-world data is inherently interconnected. Customers place orders; orders contain products; products belong to categories; employees work in departments. How relationships are represented is perhaps the most consequential structural decision a data model makes.

Relational Model: Foreign Keys and Joins

The relational model represents relationships implicitly through shared attribute values. An Orders table might have a customer_id attribute whose values correspond to id values in a Customers table. This "foreign key" creates a link without physically connecting the data.

To traverse the relationship, queries perform joins—operations that match tuples based on attribute values. This approach is extremely flexible but can be expensive for heavily connected data with many-hop traversals.

-- Implicit relationship via matching attribute values
SELECT c.name, o.order_date
FROM Customers c
JOIN Orders o ON c.id = o.customer_id;

Document Model: Embedding and References

Document databases offer two relationship strategies:

Embedding: Include related data directly within the document. An Order document might contain the full Customer information and an array of LineItems. This eliminates joins but creates data duplication.

{
  "order_id": 1001,
  "customer": {
    "name": "Alice Smith",
    "email": "alice@example.com"
  },
  "items": [
    { "product": "Widget", "quantity": 3 },
    { "product": "Gadget", "quantity": 1 }
  ]
}

Referencing: Store an identifier pointing to another document, similar to foreign keys. This avoids duplication but requires multiple queries to resolve.

The choice between embedding and referencing is a key document database design decision, trading between read efficiency (embedding) and update consistency (referencing).

Graph Model: First-Class Relationships

Graph databases treat relationships as first-class citizens—as important as the entities themselves. Relationships (edges) are stored explicitly with their own identity, properties, and direction.

// Relationship is an explicit, stored structure
(alice:Customer)-[:PLACED {date: '2024-01-15'}]->(order:Order)

This explicit storage makes relationship traversal extremely efficient. Finding "all customers who bought products also bought by Alice" requires multiple joins in relational but is a simple traversal in graph databases.

Implications of relationship representation:

Implicit Relationships (Relational)

•Relationships discovered at query time
•Extremely flexible—any attribute can join to any other
•Can be expensive for multi-hop traversals
•Easy to analyze with set-based operations
•Natural for ad-hoc queries

Explicit Relationships (Graph)

•Relationships pre-computed and stored
•Very efficient traversal—O(1) per hop
•Less flexible—relationships must be defined upfront
•Complex for set-based aggregate operations
•Natural for path-finding and network analysis

Schema and Schema Evolution

The schema is the concrete specification of structure for a particular database—the actual definition of what tables exist, what columns they have, what types those columns are, and so forth. While the data model provides the structural vocabulary, the schema applies that vocabulary to a specific domain.

Schema in relational databases:

Relational schemas are typically defined using Data Definition Language (DDL):

CREATE TABLE Employees (
    employee_id   INTEGER PRIMARY KEY,
    name          VARCHAR(100) NOT NULL,
    email         VARCHAR(255) UNIQUE,
    department_id INTEGER REFERENCES Departments(id),
    hire_date     DATE DEFAULT CURRENT_DATE,
    salary        DECIMAL(10,2) CHECK (salary > 0)
);

This schema is precise, explicit, and enforced. Every row must conform. The database rejects any data that doesn't match.

Implicit schema in document databases:

Document databases are often called "schemaless," but this is misleading. The data has structure—it's just not enforced by the database. The schema exists implicitly in application code, documentation, or conventions.

Some document databases now offer optional schema validation:

// MongoDB schema validation
db.createCollection("employees", {
  validator: {
    $jsonSchema: {
      required: ["name", "email"],
      properties: {
        name: { bsonType: "string" },
        email: { bsonType: "string" },
        salary: { bsonType: "decimal", minimum: 0 }
      }
    }
  }
});

This provides a middle ground—structure when you want it, flexibility when you need it.

There's No Such Thing as Schemaless

Every database has a schema—even if it's implicit. When people say 'schemaless,' they mean 'schema not enforced by the database.' The schema still exists; it's just pushed to application code. This can be liberating or dangerous, depending on how it's managed.

Schema evolution:

Real-world schemas change. Business requirements evolve, new features are added, mistakes are corrected. Different structural approaches handle evolution differently:

Relational schema migration:

Explicit ALTER TABLE operations
Migrations are versioned and reversible
Changing column types or constraints can be complex
May require downtime for large tables

Document schema evolution:

New fields can be added without migration
Old documents retain old structure (schema versions coexist)
Application code handles multiple schema versions
Can lead to technical debt if not managed carefully

Graph schema evolution:

New node/edge labels and properties can be added freely
Existing data remains unchanged
Similar flexibility to documents, with similar risks

Schema Evolution Characteristics
Characteristic	Relational	Document	Graph
Adding new attribute	ALTER TABLE required	Just add it	Just add it
Removing attribute	ALTER TABLE required	Stop using it	Stop using it
Changing attribute type	Complex migration	Application handles	Application handles
Adding new entity type	CREATE TABLE	Start using it	Start using it
Data consistency	Enforced uniformly	May vary by document	May vary by node
Migration complexity	High, but controlled	Low, but risky	Low, but risky

Structural Integrity

Structural integrity refers to the guarantee that data conforms to the defined structure. This is where the structural and constraint components of a data model interact—structure defines what's possible, constraints define what's valid.

Levels of structural integrity:

1. Type integrity: Values match their declared types. An integer column contains only integers; a date column contains only valid dates. This is the most basic form of structural integrity.

2. Collection integrity: Data objects conform to their container's rules. In relational, every tuple in a relation has exactly the attributes defined for that relation. In strict document schemas, every document matches the validation rules.

3. Referential integrity: Relationships reference valid targets. A foreign key points to an existing row; a document reference resolves to an actual document. This prevents "orphan" references that point nowhere.

4. Structural consistency: The overall structure remains coherent. Hierarchies are properly nested; graphs don't have malformed edges; collections don't contain invalid members.

Enforcement Mechanisms

•Type checking: Database validates types on insert/update, rejecting invalid values
•NOT NULL constraints: Prevents missing values where required
•UNIQUE constraints: Prevents duplicate values where needed
•CHECK constraints: Custom validation rules (e.g., salary > 0)
•Foreign key constraints: Enforces referential integrity between tables
•Schema validation: Document databases can validate against JSON Schema
•Application-level validation: When database doesn't enforce, code must

Defense in Depth

The best systems enforce structural integrity at multiple levels: application validation for immediate feedback, database constraints for ultimate safety. Never rely on application code alone—databases outlive applications, and direct database access can bypass application rules.

Structural Trade-offs

Every structural choice involves trade-offs. Understanding these trade-offs is essential for selecting the right model and designing effective schemas.

Normalization vs. Denormalization:

The relational model encourages normalization—organizing data to eliminate redundancy. Each fact is stored once, and relationships connect the pieces. This minimizes update anomalies but requires joins to reassemble related data.

Document models often favor denormalization—embedding related data together. This optimizes reads (all data in one place) but complicates updates (changes must be made in multiple places).

Flat vs. Nested Structure:

Relational tables are flat—each row is a simple tuple of values. Complex structures must be decomposed into multiple tables.

Documents can be deeply nested—arrays within objects within arrays. This matches hierarchical data naturally but can become unwieldy for deeply nested updates or queries.

Structural Trade-off Matrix
Trade-off	Choice A	Choice B	When to Choose A	When to Choose B
Normalization	Normalized (no redundancy)	Denormalized (embedded)	Frequent updates, consistency critical	Read-heavy, performance critical
Structure rigidity	Fixed schema	Flexible schema	Data well-understood, consistency needed	Rapid iteration, varying data shapes
Nesting	Flat tables	Nested documents	Complex relationships, ad-hoc queries	Hierarchical data, contained access
Relationship storage	Implicit (foreign keys)	Explicit (stored edges)	Set-based operations, ACID needs	Graph traversal, relationship queries
Type specificity	Rich type system	Minimal types	Data quality, complex types	Flexibility, unknown data shapes

Practical implications:

These aren't abstract concerns—they affect daily development:

Query complexity: Normalized structures require joins; developers must write more complex queries but get more flexibility.
Update complexity: Denormalized structures require coordinated updates; a customer name change might need to update thousands of embedded copies.
Schema changes: Rigid schemas require migrations; flexible schemas accumulate structural debt.
Data quality: Enforced structure catches errors early; flexible structure allows bad data to accumulate.
Development speed: Flexible structures accelerate early development; rigid structures pay off as systems mature.

There is no universally correct choice. The structural decisions must match the specific requirements, access patterns, and constraints of each system.

Context Determines Structure

The 'right' structural choice depends entirely on context: access patterns, update frequency, consistency requirements, development pace, and team expertise. Senior engineers choose structures based on these factors, not database trends or personal preferences.

Comparing Structural Approaches

Let's examine a concrete example to see how different data models structure the same information. Consider an e-commerce domain with customers, orders, and products.

Relational Structure:

-- Three separate tables, relationships via foreign keys
CREATE TABLE Customers (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255)
);

CREATE TABLE Orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES Customers(id),
    order_date DATE
);

CREATE TABLE OrderItems (
    order_id INTEGER REFERENCES Orders(id),
    product_id INTEGER REFERENCES Products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Data is normalized, relationships are implicit, structure is rigid.

Document Structure:

// Single document with embedded data
{
  "_id": "order_1001",
  "customer": {
    "id": "cust_42",
    "name": "Alice Smith",
    "email": "alice@example.com"
  },
  "order_date": "2024-01-15",
  "items": [
    { "product_id": "prod_1", "name": "Widget", "price": 29.99, "quantity": 3 },
    { "product_id": "prod_2", "name": "Gadget", "price": 49.99, "quantity": 1 }
  ]
}

Data is denormalized (customer and product info embedded), self-contained, and flexible.

Graph Structure:

// Nodes and explicit relationships
CREATE (alice:Customer {name: 'Alice Smith', email: 'alice@example.com'})
CREATE (order:Order {date: '2024-01-15'})
CREATE (widget:Product {name: 'Widget', price: 29.99})
CREATE (gadget:Product {name: 'Gadget', price: 49.99})

CREATE (alice)-[:PLACED]->(order)
CREATE (order)-[:CONTAINS {quantity: 3}]->(widget)
CREATE (order)-[:CONTAINS {quantity: 1}]->(gadget)

Entities are normalized, relationships are explicit and first-class citizens.

Query Implications

•Relational: Finding all orders with total > $100 requires joining three tables, but is a single declarative query that the optimizer can execute efficiently.
•Document: Finding all orders for one customer is fast (single document access), but finding all customers who ordered a specific product requires scanning all orders.
•Graph: Finding customers who bought products similar to Alice's purchases is efficient (traversal), but calculating aggregate statistics across all orders is complex.

Summary: Structural Aspect

The structural component of a data model defines the vocabulary and rules for data organization. We've explored this component in depth:

Key Takeaways

•Structure defines the data vocabulary — What objects exist (tables, documents, nodes), what attributes they have, and what types those attributes can be.
•Relationships are structurally fundamental — How relationships are represented (implicit via keys, embedded, or explicit edges) shapes the entire system's capability.
•Schemas apply structure to domains — The model provides the vocabulary; the schema uses that vocabulary for a specific application.
•Schema evolution is inevitable — Structural choices determine how easy or hard it is to change the database as requirements evolve.
•Structural integrity ensures correctness — Type checking, constraints, and validation maintain data quality.
•Every structural choice is a trade-off — Normalization vs. denormalization, rigidity vs. flexibility, flatness vs. nesting—context determines the right balance.

What's next:

Structure tells us what data can look like, but not what we can do with it. The next page explores the operational aspect—the operations that data models support for retrieving, creating, modifying, and deleting data. We'll see how structural choices influence what operations are natural and efficient.

Page Complete

You now understand the structural component of data models—the foundation upon which all data organization rests. Different structural approaches enable different capabilities and make different trade-offs. Next, we'll examine how those structures are manipulated through operations.