Loading content...
If the structural component of a data model provides the nouns—tables, documents, nodes, attributes—then the operational component provides the verbs: the actions we can perform on data. What can we retrieve? How can we modify? What transformations are possible? These questions define the operational aspect of every data model.
The operational component is where the database becomes useful. Structure alone is just organization; operations are what enable databases to answer questions, support applications, and power business processes. A beautifully structured database that cannot be efficiently queried is useless.
Moreover, the operational capabilities of a data model profoundly influence how we think about problems. The operations available shape the solutions we can imagine. An engineer fluent in SQL thinks differently about data problems than one fluent in graph traversals or MapReduce. Understanding operations across multiple models expands your problem-solving repertoire.
By the end of this page, you will understand the operational component of data models, including CRUD operations, query languages, the distinction between procedural and declarative approaches, and how different models enable different operational patterns. You'll appreciate why query language design is a critical aspect of any data model.
Every data model must define operations for fundamental data manipulation tasks. While the specific syntax and semantics vary, all models address the same core operation categories—often summarized by the acronym CRUD:
Create (Insert): Adding new data to the database. This includes specifying what data to add and where it belongs in the structure.
Read (Query/Select): Retrieving data based on specified criteria. This is typically the most complex operation category, encompassing simple lookups, complex searches, aggregations, and multi-object correlations.
Update (Modify): Changing existing data. Updates must identify what to change, specify the new values, and maintain structural integrity.
Delete (Remove): Eliminating data from the database. Deletion must handle dependent data and maintain referential integrity.
| Operation | Relational (SQL) | Document (MongoDB) | Graph (Cypher) |
|---|---|---|---|
| Create | INSERT INTO table VALUES (...) | db.collection.insertOne({...}) | CREATE (n:Label {props}) |
| Read | SELECT ... FROM ... WHERE ... | db.collection.find({query}) | MATCH (n:Label) RETURN n |
| Update | UPDATE table SET ... WHERE ... | db.collection.updateOne({filter}, {$set: {...}}) | MATCH (n) SET n.prop = value |
| Delete | DELETE FROM table WHERE ... | db.collection.deleteOne({filter}) | MATCH (n) DELETE n |
Beyond basic CRUD:
While CRUD forms the foundation, sophisticated data models provide operations far beyond simple create/read/update/delete:
The richness of these operations determines how much computation can be pushed to the database versus handled in application code.
A fundamental distinction in data model operations is between procedural and declarative approaches. This distinction has profound implications for usability, optimization, and system design.
Procedural (Navigational) Operations:
Procedural operations specify how to retrieve data—the exact sequence of steps the database should follow. Early database systems (hierarchical and network models) used navigational interfaces where programs:
// Pseudo-code for navigational access
GET FIRST Customer WHERE name = 'Alice'
WHILE more_orders
GET NEXT Order FOR Customer
PROCESS order
END
This approach gives programmers fine-grained control but requires intimate knowledge of the physical data organization. Changes to data layout require rewriting programs.
Declarative Operations:
Declarative operations specify what data is desired, leaving the how to the database system. This is the defining characteristic of the relational model and SQL:
-- Declare WHAT we want, not HOW to get it
SELECT o.order_id, o.total
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
WHERE c.name = 'Alice'
ORDER BY o.order_date DESC;
The same query works regardless of whether data is stored in a heap, indexed by B-tree, or distributed across servers. The database's query optimizer determines the execution strategy.
The declarative approach was Codd's key insight. By separating 'what' from 'how', query optimization becomes the database's problem, not the programmer's. As databases improve, existing queries automatically get faster—without changing a single line of application code.
A query language is the formal notation through which users express operations. Query languages are the user-facing manifestation of a data model's operational component. Different data models have different query languages, each designed to naturally express the model's structure and relationships.
SQL (Structured Query Language):
SQL is the standard query language for relational databases, designed around the concepts of tables, rows, and set operations:
-- SQL expresses operations on tables as sets
SELECT c.name, COUNT(o.id) as order_count, SUM(o.total) as lifetime_value
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customer_id
WHERE c.signup_date > '2023-01-01'
GROUP BY c.id, c.name
HAVING COUNT(o.id) >= 5
ORDER BY lifetime_value DESC
LIMIT 100;
SQL's power comes from its basis in relational algebra and calculus, enabling complex queries through composition of simple operations.
MongoDB Query Language:
MongoDB uses a document-centric query language based on JSON-like syntax, designed for querying nested structures:
// MongoDB queries operate on document collections
db.customers.aggregate([
{ $match: { signup_date: { $gt: ISODate('2023-01-01') } } },
{ $lookup: {
from: 'orders',
localField: '_id',
foreignField: 'customer_id',
as: 'orders'
}},
{ $addFields: {
order_count: { $size: '$orders' },
lifetime_value: { $sum: '$orders.total' }
}},
{ $match: { order_count: { $gte: 5 } } },
{ $sort: { lifetime_value: -1 } },
{ $limit: 100 },
{ $project: { name: 1, order_count: 1, lifetime_value: 1 } }
]);
The aggregation pipeline expresses data transformation as a sequence of stages.
Cypher (Graph Query Language):
Cypher uses ASCII-art syntax to express graph patterns, making relationship traversal intuitive:
// Cypher queries express patterns in graph structure
MATCH (c:Customer)-[:PLACED]->(o:Order)
WHERE c.signup_date > date('2023-01-01')
WITH c, COUNT(o) as order_count, SUM(o.total) as lifetime_value
WHERE order_count >= 5
RETURN c.name, order_count, lifetime_value
ORDER BY lifetime_value DESC
LIMIT 100;
// Cypher excels at relationship patterns
MATCH (alice:Customer {name: 'Alice'})-[:PURCHASED]->(:Product)<-[:PURCHASED]-(similar:Customer)
RETURN similar.name, COUNT(*) as shared_products
ORDER BY shared_products DESC;
The arrow syntax (-->) makes relationship direction and traversal explicit and readable.
Each query language makes certain queries natural and others awkward. SQL excels at set operations and aggregations. Cypher excels at path finding and pattern matching. MongoDB excels at document manipulation and nested data. Learning multiple query languages expands the types of problems you can easily solve.
Read operations are the most complex and varied aspect of any data model's operations. They must support everything from simple lookups to complex analytical queries. Let's examine the key read operation patterns:
Point Queries (Exact Lookup):
Retrieving a single record by its unique identifier is the simplest read operation:
-- Relational: Lookup by primary key
SELECT * FROM Customers WHERE id = 42;
// Document: Lookup by _id
db.customers.findOne({ _id: ObjectId('...') });
// Graph: Lookup by property
MATCH (c:Customer {id: 42}) RETURN c;
All models optimize point queries with indexes. The performance should be O(log n) or O(1) regardless of database size.
Range Queries:
Retrieving records within a range of values:
-- Find orders in a date range
SELECT * FROM Orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
Range queries benefit from sorted indexes (B-trees) and are common in time-series and ordered data access.
Complex Predicates:
Combining multiple conditions:
-- Multiple conditions with AND/OR
SELECT * FROM Products
WHERE category = 'Electronics'
AND price < 500
AND (brand = 'Apple' OR brand = 'Samsung')
AND in_stock = true;
Optimizers analyze predicates to determine the best execution strategy, potentially using multiple indexes.
Aggregation Queries:
Computing summaries over groups of data:
-- Aggregate with grouping
SELECT
category,
COUNT(*) as product_count,
AVG(price) as avg_price,
MIN(price) as min_price,
MAX(price) as max_price
FROM Products
GROUP BY category
HAVING COUNT(*) > 10;
Aggregations may require scanning significant data volumes. Column-oriented storage and pre-computed aggregates can dramatically accelerate these queries.
Join Operations:
Combining data from multiple sources is where relational databases excel:
-- Join across three tables
SELECT c.name, o.order_date, p.name as product, oi.quantity
FROM Customers c
JOIN Orders o ON c.id = o.customer_id
JOIN OrderItems oi ON o.id = oi.order_id
JOIN Products p ON oi.product_id = p.id
WHERE c.id = 42;
Join algorithms (nested loop, hash join, merge join) are chosen by the optimizer based on data statistics.
Write operations—creating, updating, and deleting data—have different characteristics and challenges than reads. While reads can be parallelized across replicas and cached, writes must ultimately modify the authoritative data store.
Insert Operations:
-- Single row insert
INSERT INTO Customers (name, email, signup_date)
VALUES ('Alice Smith', 'alice@example.com', CURRENT_DATE);
-- Bulk insert
INSERT INTO Customers (name, email, signup_date)
VALUES
('Bob Jones', 'bob@example.com', CURRENT_DATE),
('Carol White', 'carol@example.com', CURRENT_DATE),
('David Brown', 'david@example.com', CURRENT_DATE);
Inserts must:
Update Operations:
-- Update with condition
UPDATE Customers
SET last_login = CURRENT_TIMESTAMP,
login_count = login_count + 1
WHERE id = 42;
-- Bulk update
UPDATE Products
SET price = price * 1.10
WHERE category = 'Electronics' AND last_price_update < '2024-01-01';
Updates are more complex than inserts:
Delete Operations:
-- Delete with condition
DELETE FROM Orders
WHERE order_date < '2020-01-01' AND status = 'cancelled';
-- Delete with referential consideration
DELETE FROM Customers WHERE id = 42;
-- May fail if orders exist for this customer (foreign key constraint)
-- Or may cascade to delete related orders (ON DELETE CASCADE)
Deletes must consider:
A single logical write can trigger multiple physical operations: updating the main table, updating each relevant index, writing to the transaction log, and potentially updating materialized views or replicating to secondary nodes. This 'write amplification' is why heavily-indexed tables are slower to write than read.
Upsert Operations:
Many applications need "insert if not exists, update if exists" semantics:
-- PostgreSQL upsert
INSERT INTO UserStats (user_id, login_count, last_login)
VALUES (42, 1, CURRENT_TIMESTAMP)
ON CONFLICT (user_id) DO UPDATE
SET login_count = UserStats.login_count + 1,
last_login = CURRENT_TIMESTAMP;
// MongoDB upsert
db.userStats.updateOne(
{ user_id: 42 },
{ $inc: { login_count: 1 }, $set: { last_login: new Date() } },
{ upsert: true }
);
Upserts are essential for idempotent operations, especially in distributed systems where messages may be delivered multiple times.
Real-world data modifications often require multiple operations to be treated as a single logical unit. Transactions group operations together with guarantees about atomic execution.
ACID Properties:
Transactions in relational databases provide ACID guarantees:
-- Transfer money between accounts (must be atomic)
BEGIN TRANSACTION;
UPDATE Accounts SET balance = balance - 100 WHERE id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE id = 2;
-- Check invariant
IF (SELECT balance FROM Accounts WHERE id = 1) < 0 THEN
ROLLBACK; -- Undo everything
ELSE
COMMIT; -- Make permanent
END IF;
Isolation Levels:
Databases offer different isolation levels, trading consistency for performance:
-- Set isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
-- Operations here see consistent snapshot
COMMIT;
| Data Model | Single-Record Transactions | Multi-Record Transactions | Distributed Transactions |
|---|---|---|---|
| Relational | Yes, full ACID | Yes, full ACID | Yes, with 2PC (complex) |
| Document (MongoDB) | Yes, full ACID | Yes (since v4.0) | Yes, with limitations |
| Key-Value (Redis) | Atomic commands | MULTI/EXEC for pipelining | Limited (Redis Cluster) |
| Graph (Neo4j) | Yes, full ACID | Yes, full ACID | Limited in distributed mode |
Distributed databases face a fundamental trade-off: you cannot have strong consistency, high availability, and network partition tolerance simultaneously. Many NoSQL databases relaxed ACID guarantees to achieve distribution and availability, offering 'eventual consistency' instead. Modern systems often provide tunable consistency, letting applications choose the right trade-off per operation.
Beyond standard CRUD, different data models provide specialized operations tailored to their structure and use cases.
Graph Traversal Operations:
Graph databases excel at relationship-based queries:
// Find shortest path between two nodes
MATCH path = shortestPath(
(a:Person {name: 'Alice'})-[:KNOWS*]-(b:Person {name: 'Bob'})
)
RETURN path;
// Find all paths up to 4 hops
MATCH (a:Person {name: 'Alice'})-[:KNOWS*1..4]->(friend)
RETURN DISTINCT friend.name;
// Pattern matching with variable-length relationships
MATCH (user:User)-[:PURCHASED]->(:Product)<-[:PURCHASED]-(other:User)
-[:PURCHASED]->(recommended:Product)
WHERE NOT (user)-[:PURCHASED]->(recommended)
RETURN recommended, COUNT(*) as score
ORDER BY score DESC;
These traversal operations would require complex recursive queries in SQL.
Document Array Operations:
Document databases provide operations for nested arrays:
// Add element to array
db.posts.updateOne(
{ _id: postId },
{ $push: { comments: { author: 'Alice', text: 'Great post!' } } }
);
// Query within nested arrays
db.posts.find({
'comments.author': 'Alice',
'comments.likes': { $gt: 10 }
});
// Update specific array element
db.posts.updateOne(
{ _id: postId, 'comments.author': 'Alice' },
{ $inc: { 'comments.$.likes': 1 } }
);
Full-Text Search Operations:
Many databases now include text search:
-- PostgreSQL full-text search
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, plainto_tsquery('english', 'database optimization') query
WHERE search_vector @@ query
ORDER BY rank DESC;
// MongoDB text search
db.articles.find(
{ $text: { $search: 'database optimization' } },
{ score: { $meta: 'textScore' } }
).sort({ score: { $meta: 'textScore' } });
The operational component defines what we can do with data—the actions, queries, and transformations that make databases useful:
What's next:
Structure defines what data can look like; operations define what we can do with it; but neither tells us what data must be true. The next page explores the constraint aspect—the rules and invariants that ensure data quality and correctness. Constraints are what separate a database from a simple data dump.
You now understand the operational component of data models—the verbs that bring data to life. Operations transform static structures into dynamic, queryable systems. Next, we'll examine how constraints ensure that all this activity produces correct, meaningful results.