Data Model Concepts - Learning Module

Loading content...

0/252

Operational Aspect

The Verbs of Data

If the structural component of a data model provides the nouns—tables, documents, nodes, attributes—then the operational component provides the verbs: the actions we can perform on data. What can we retrieve? How can we modify? What transformations are possible? These questions define the operational aspect of every data model.

The operational component is where the database becomes useful. Structure alone is just organization; operations are what enable databases to answer questions, support applications, and power business processes. A beautifully structured database that cannot be efficiently queried is useless.

Moreover, the operational capabilities of a data model profoundly influence how we think about problems. The operations available shape the solutions we can imagine. An engineer fluent in SQL thinks differently about data problems than one fluent in graph traversals or MapReduce. Understanding operations across multiple models expands your problem-solving repertoire.

What You Will Learn

By the end of this page, you will understand the operational component of data models, including CRUD operations, query languages, the distinction between procedural and declarative approaches, and how different models enable different operational patterns. You'll appreciate why query language design is a critical aspect of any data model.

Core Operation Categories

Every data model must define operations for fundamental data manipulation tasks. While the specific syntax and semantics vary, all models address the same core operation categories—often summarized by the acronym CRUD:

Create (Insert): Adding new data to the database. This includes specifying what data to add and where it belongs in the structure.

Read (Query/Select): Retrieving data based on specified criteria. This is typically the most complex operation category, encompassing simple lookups, complex searches, aggregations, and multi-object correlations.

Update (Modify): Changing existing data. Updates must identify what to change, specify the new values, and maintain structural integrity.

Delete (Remove): Eliminating data from the database. Deletion must handle dependent data and maintain referential integrity.

CRUD Operations Across Data Models
Operation	Relational (SQL)	Document (MongoDB)	Graph (Cypher)
Create	`INSERT INTO table VALUES (...)`	`db.collection.insertOne({...})`	`CREATE (n:Label {props})`
Read	`SELECT ... FROM ... WHERE ...`	`db.collection.find({query})`	`MATCH (n:Label) RETURN n`
Update	`UPDATE table SET ... WHERE ...`	`db.collection.updateOne({filter}, {$set: {...}})`	`MATCH (n) SET n.prop = value`
Delete	`DELETE FROM table WHERE ...`	`db.collection.deleteOne({filter})`	`MATCH (n) DELETE n`

Beyond basic CRUD:

While CRUD forms the foundation, sophisticated data models provide operations far beyond simple create/read/update/delete:

Aggregation: Computing summaries (COUNT, SUM, AVG) over groups of data
Joining: Combining data from multiple sources based on relationships
Traversal: Following paths through connected data (especially in graphs)
Projection: Selecting specific attributes from data objects
Filtering: Applying predicates to select subsets of data
Ordering: Sorting results by specified criteria
Grouping: Organizing data into categories for aggregation
Pagination: Retrieving results in manageable chunks
Transformation: Computing derived values from stored data

The richness of these operations determines how much computation can be pushed to the database versus handled in application code.

Procedural vs. Declarative Operations

A fundamental distinction in data model operations is between procedural and declarative approaches. This distinction has profound implications for usability, optimization, and system design.

Procedural (Navigational) Operations:

Procedural operations specify how to retrieve data—the exact sequence of steps the database should follow. Early database systems (hierarchical and network models) used navigational interfaces where programs:

Start at a known record
Follow a link to the next record
Evaluate conditions
Continue or branch based on results
Repeat until done

// Pseudo-code for navigational access
GET FIRST Customer WHERE name = 'Alice'
WHILE more_orders
    GET NEXT Order FOR Customer
    PROCESS order
END

This approach gives programmers fine-grained control but requires intimate knowledge of the physical data organization. Changes to data layout require rewriting programs.

Declarative Operations:

Declarative operations specify what data is desired, leaving the how to the database system. This is the defining characteristic of the relational model and SQL:

-- Declare WHAT we want, not HOW to get it
SELECT o.order_id, o.total
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
WHERE c.name = 'Alice'
ORDER BY o.order_date DESC;

The same query works regardless of whether data is stored in a heap, indexed by B-tree, or distributed across servers. The database's query optimizer determines the execution strategy.

Procedural Limitations

•Tightly coupled to physical structure
•Changes in storage require code changes
•Optimization is programmer's responsibility
•Complex logic for simple queries
•Difficult to parallelize automatically
•Error-prone low-level navigation

Declarative Advantages

•Independent of physical implementation
•Same query works across storage changes
•Database optimizes automatically
•Concise expression of intent
•Enables automatic parallelization
•Higher-level abstraction reduces errors

The Power of Declarative

The declarative approach was Codd's key insight. By separating 'what' from 'how', query optimization becomes the database's problem, not the programmer's. As databases improve, existing queries automatically get faster—without changing a single line of application code.

Query Languages

A query language is the formal notation through which users express operations. Query languages are the user-facing manifestation of a data model's operational component. Different data models have different query languages, each designed to naturally express the model's structure and relationships.

SQL (Structured Query Language):

SQL is the standard query language for relational databases, designed around the concepts of tables, rows, and set operations:

-- SQL expresses operations on tables as sets
SELECT c.name, COUNT(o.id) as order_count, SUM(o.total) as lifetime_value
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customer_id
WHERE c.signup_date > '2023-01-01'
GROUP BY c.id, c.name
HAVING COUNT(o.id) >= 5
ORDER BY lifetime_value DESC
LIMIT 100;

SQL's power comes from its basis in relational algebra and calculus, enabling complex queries through composition of simple operations.

MongoDB Query Language:

MongoDB uses a document-centric query language based on JSON-like syntax, designed for querying nested structures:

// MongoDB queries operate on document collections
db.customers.aggregate([
  { $match: { signup_date: { $gt: ISODate('2023-01-01') } } },
  { $lookup: {
      from: 'orders',
      localField: '_id',
      foreignField: 'customer_id',
      as: 'orders'
  }},
  { $addFields: {
      order_count: { $size: '$orders' },
      lifetime_value: { $sum: '$orders.total' }
  }},
  { $match: { order_count: { $gte: 5 } } },
  { $sort: { lifetime_value: -1 } },
  { $limit: 100 },
  { $project: { name: 1, order_count: 1, lifetime_value: 1 } }
]);

The aggregation pipeline expresses data transformation as a sequence of stages.

Cypher (Graph Query Language):

Cypher uses ASCII-art syntax to express graph patterns, making relationship traversal intuitive:

// Cypher queries express patterns in graph structure
MATCH (c:Customer)-[:PLACED]->(o:Order)
WHERE c.signup_date > date('2023-01-01')
WITH c, COUNT(o) as order_count, SUM(o.total) as lifetime_value
WHERE order_count >= 5
RETURN c.name, order_count, lifetime_value
ORDER BY lifetime_value DESC
LIMIT 100;

// Cypher excels at relationship patterns
MATCH (alice:Customer {name: 'Alice'})-[:PURCHASED]->(:Product)<-[:PURCHASED]-(similar:Customer)
RETURN similar.name, COUNT(*) as shared_products
ORDER BY shared_products DESC;

The arrow syntax (-->) makes relationship direction and traversal explicit and readable.

Language Shapes Thought

Each query language makes certain queries natural and others awkward. SQL excels at set operations and aggregations. Cypher excels at path finding and pattern matching. MongoDB excels at document manipulation and nested data. Learning multiple query languages expands the types of problems you can easily solve.

Read Operations in Depth

Read operations are the most complex and varied aspect of any data model's operations. They must support everything from simple lookups to complex analytical queries. Let's examine the key read operation patterns:

Point Queries (Exact Lookup):

Retrieving a single record by its unique identifier is the simplest read operation:

-- Relational: Lookup by primary key
SELECT * FROM Customers WHERE id = 42;

// Document: Lookup by _id
db.customers.findOne({ _id: ObjectId('...') });

// Graph: Lookup by property
MATCH (c:Customer {id: 42}) RETURN c;

All models optimize point queries with indexes. The performance should be O(log n) or O(1) regardless of database size.

Range Queries:

Retrieving records within a range of values:

-- Find orders in a date range
SELECT * FROM Orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';

Range queries benefit from sorted indexes (B-trees) and are common in time-series and ordered data access.

Complex Predicates:

Combining multiple conditions:

-- Multiple conditions with AND/OR
SELECT * FROM Products
WHERE category = 'Electronics'
  AND price < 500
  AND (brand = 'Apple' OR brand = 'Samsung')
  AND in_stock = true;

Optimizers analyze predicates to determine the best execution strategy, potentially using multiple indexes.

Aggregation Queries:

Computing summaries over groups of data:

-- Aggregate with grouping
SELECT 
    category,
    COUNT(*) as product_count,
    AVG(price) as avg_price,
    MIN(price) as min_price,
    MAX(price) as max_price
FROM Products
GROUP BY category
HAVING COUNT(*) > 10;

Aggregations may require scanning significant data volumes. Column-oriented storage and pre-computed aggregates can dramatically accelerate these queries.

Join Operations:

Combining data from multiple sources is where relational databases excel:

-- Join across three tables
SELECT c.name, o.order_date, p.name as product, oi.quantity
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
JOIN OrderItems oi ON o.id = oi.order_id
JOIN Products p ON oi.product_id = p.id
WHERE c.id = 42;

Join algorithms (nested loop, hash join, merge join) are chosen by the optimizer based on data statistics.

Read Operation Considerations

•Selectivity: How many records match the query? High selectivity (few matches) enables efficient indexed access; low selectivity may require full scans.
•Index availability: Queries on indexed columns are fast; queries on non-indexed columns may scan the entire table.
•Result size: Returning millions of rows to an application is rarely efficient; pagination and aggregation reduce data movement.
•Query complexity: Complex queries with many joins or subqueries require sophisticated optimization.
•Consistency requirements: Real-time reads may require transactional consistency; analytical reads may accept slightly stale data.

Write Operations in Depth

Write operations—creating, updating, and deleting data—have different characteristics and challenges than reads. While reads can be parallelized across replicas and cached, writes must ultimately modify the authoritative data store.

Insert Operations:

-- Single row insert
INSERT INTO Customers (name, email, signup_date)
VALUES ('Alice Smith', 'alice@example.com', CURRENT_DATE);

-- Bulk insert
INSERT INTO Customers (name, email, signup_date)
VALUES 
    ('Bob Jones', 'bob@example.com', CURRENT_DATE),
    ('Carol White', 'carol@example.com', CURRENT_DATE),
    ('David Brown', 'david@example.com', CURRENT_DATE);

Inserts must:

Validate data against constraints (types, NOT NULL, UNIQUE, foreign keys)
Maintain indexes (potentially expensive for many indexes)
Write to transaction logs for durability
Handle auto-generated values (sequences, UUIDs)

Update Operations:

-- Update with condition
UPDATE Customers
SET last_login = CURRENT_TIMESTAMP,
    login_count = login_count + 1
WHERE id = 42;

-- Bulk update
UPDATE Products
SET price = price * 1.10
WHERE category = 'Electronics' AND last_price_update < '2024-01-01';

Updates are more complex than inserts:

Must locate records to update (read + write)
May require exclusive locks during modification
Must maintain referential integrity if key values change
May trigger cascading updates in related tables
Index maintenance for modified indexed columns

Delete Operations:

-- Delete with condition
DELETE FROM Orders
WHERE order_date < '2020-01-01' AND status = 'cancelled';

-- Delete with referential consideration
DELETE FROM Customers WHERE id = 42;
-- May fail if orders exist for this customer (foreign key constraint)
-- Or may cascade to delete related orders (ON DELETE CASCADE)

Deletes must consider:

Referential integrity: What happens to related records?
Cascading: Should deletion propagate to dependent data?
Soft delete: Should data be marked deleted rather than removed?
Space reclamation: When is storage actually freed?

Write Amplification

A single logical write can trigger multiple physical operations: updating the main table, updating each relevant index, writing to the transaction log, and potentially updating materialized views or replicating to secondary nodes. This 'write amplification' is why heavily-indexed tables are slower to write than read.

Upsert Operations:

Many applications need "insert if not exists, update if exists" semantics:

-- PostgreSQL upsert
INSERT INTO UserStats (user_id, login_count, last_login)
VALUES (42, 1, CURRENT_TIMESTAMP)
ON CONFLICT (user_id) DO UPDATE 
SET login_count = UserStats.login_count + 1,
    last_login = CURRENT_TIMESTAMP;

// MongoDB upsert
db.userStats.updateOne(
  { user_id: 42 },
  { $inc: { login_count: 1 }, $set: { last_login: new Date() } },
  { upsert: true }
);

Upserts are essential for idempotent operations, especially in distributed systems where messages may be delivered multiple times.

Transaction Operations

Real-world data modifications often require multiple operations to be treated as a single logical unit. Transactions group operations together with guarantees about atomic execution.

ACID Properties:

Transactions in relational databases provide ACID guarantees:

Atomicity: All operations succeed or all fail. No partial updates.
Consistency: Transactions move the database from one valid state to another.
Isolation: Concurrent transactions don't interfere with each other.
Durability: Once committed, changes survive system failures.

-- Transfer money between accounts (must be atomic)
BEGIN TRANSACTION;

UPDATE Accounts SET balance = balance - 100 WHERE id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE id = 2;

-- Check invariant
IF (SELECT balance FROM Accounts WHERE id = 1) < 0 THEN
    ROLLBACK;  -- Undo everything
ELSE
    COMMIT;    -- Make permanent
END IF;

Isolation Levels:

Databases offer different isolation levels, trading consistency for performance:

Read Uncommitted: Can see uncommitted changes from other transactions (dirty reads)
Read Committed: Only see committed changes, but can see different data if re-queried
Repeatable Read: Queries return same result within a transaction
Serializable: Transactions execute as if sequentially, maximum isolation

-- Set isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
-- Operations here see consistent snapshot
COMMIT;

Transaction Support Across Models
Data Model	Single-Record Transactions	Multi-Record Transactions	Distributed Transactions
Relational	Yes, full ACID	Yes, full ACID	Yes, with 2PC (complex)
Document (MongoDB)	Yes, full ACID	Yes (since v4.0)	Yes, with limitations
Key-Value (Redis)	Atomic commands	MULTI/EXEC for pipelining	Limited (Redis Cluster)
Graph (Neo4j)	Yes, full ACID	Yes, full ACID	Limited in distributed mode

CAP Theorem Implications

Distributed databases face a fundamental trade-off: you cannot have strong consistency, high availability, and network partition tolerance simultaneously. Many NoSQL databases relaxed ACID guarantees to achieve distribution and availability, offering 'eventual consistency' instead. Modern systems often provide tunable consistency, letting applications choose the right trade-off per operation.

Specialized Operations

Beyond standard CRUD, different data models provide specialized operations tailored to their structure and use cases.

Graph Traversal Operations:

Graph databases excel at relationship-based queries:

// Find shortest path between two nodes
MATCH path = shortestPath(
  (a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Bob'})
)
RETURN path;

// Find all paths up to 4 hops
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..4]->(friend)
RETURN DISTINCT friend.name;

// Pattern matching with variable-length relationships
MATCH (user:User)-[:PURCHASED]->(:Product)<-[:PURCHASED]-(other:User)
      -[:PURCHASED]->(recommended:Product)
WHERE NOT (user)-[:PURCHASED]->(recommended)
RETURN recommended, COUNT(*) as score
ORDER BY score DESC;

These traversal operations would require complex recursive queries in SQL.

Document Array Operations:

Document databases provide operations for nested arrays:

// Add element to array
db.posts.updateOne(
  { _id: postId },
  { $push: { comments: { author: 'Alice', text: 'Great post!' } } }
);

// Query within nested arrays
db.posts.find({
  'comments.author': 'Alice',
  'comments.likes': { $gt: 10 }
});

// Update specific array element
db.posts.updateOne(
  { _id: postId, 'comments.author': 'Alice' },
  { $inc: { 'comments.$.likes': 1 } }
);

Full-Text Search Operations:

Many databases now include text search:

-- PostgreSQL full-text search
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, plainto_tsquery('english', 'database optimization') query
WHERE search_vector @@ query
ORDER BY rank DESC;

// MongoDB text search
db.articles.find(
  { $text: { $search: 'database optimization' } },
  { score: { $meta: 'textScore' } }
).sort({ score: { $meta: 'textScore' } });

Other Specialized Operations

•Geospatial queries: Finding points within distance, polygon containment, nearest neighbors
•Time-series operations: Downsampling, gap filling, rolling windows, time-based aggregations
•Vector similarity: Finding similar embeddings for ML/AI applications
•Change streams: Subscribing to real-time data changes
•Stored procedures: Server-side code execution for complex logic
•Window functions: Calculations across sets of related rows (SQL)

Summary: Operational Aspect

The operational component defines what we can do with data—the actions, queries, and transformations that make databases useful:

Key Takeaways

•Operations enable data utility — Structure without operations is just organization. CRUD operations form the foundation; aggregation, joining, and traversal extend the capabilities.
•Declarative beats procedural — Specifying 'what' instead of 'how' enables automatic optimization and shields applications from physical changes.
•Query languages reflect model structure — SQL for tables, Cypher for graphs, aggregation pipelines for documents. Each makes certain queries natural.
•Read operations vary in complexity — From simple point lookups to complex analytical aggregations, understanding query patterns is essential for performance.
•Write operations have hidden costs — Index maintenance, constraint checking, logging, and replication all add overhead to writes.
•Transactions provide guarantees — ACID properties ensure correctness for multi-operation changes, with isolation levels trading consistency for performance.

What's next:

Structure defines what data can look like; operations define what we can do with it; but neither tells us what data must be true. The next page explores the constraint aspect—the rules and invariants that ensure data quality and correctness. Constraints are what separate a database from a simple data dump.

Page Complete

You now understand the operational component of data models—the verbs that bring data to life. Operations transform static structures into dynamic, queryable systems. Next, we'll examine how constraints ensure that all this activity produces correct, meaningful results.