Database Management SystemsDatabase Languages

Database Languages: The Complete Language Toolkit

LevelBeginner

Duration75 mins

TopicDatabase Languages

2 / 5

Data Manipulation Language (DML)

The Language That Breathes Life Into Databases

If DDL defines the skeleton of a database—its structure and form—then Data Manipulation Language (DML) is the living, breathing circulatory system that pumps data through it. DML is where databases transition from static definitions to dynamic, working systems that power applications, process transactions, and deliver insights.

Every application you've ever used—every e-commerce checkout, every social media post, every banking transaction—ultimately translates into DML statements executed against a database. When you add an item to your cart, an INSERT runs. When you update your profile, an UPDATE executes. When you search for products, a SELECT queries millions of rows. When you delete a message, a DELETE removes it permanently.

DML represents the 99.9% of database activity in production systems. While DDL might run occasionally during deployments, DML runs continuously—thousands, millions, even billions of times per day in large-scale systems. Mastering DML is therefore not optional; it is the core competency of anyone who works with databases.

What You Will Learn

By the end of this page, you will deeply understand all four DML statements—SELECT, INSERT, UPDATE, and DELETE—their precise semantics, performance characteristics, and real-world usage patterns. You'll learn how DML interacts with transactions, how it affects storage and indexes, and the critical differences between set-based and row-based operations.

Understanding Data Manipulation Language

Data Manipulation Language (DML) is the subset of SQL that provides commands for interacting with the data stored in database tables. While DDL defines what containers exist, DML operates on the contents of those containers—creating, reading, updating, and deleting the actual information stored within.

The Four Fundamental Operations (CRUD):

DML provides exactly four operations that map directly to the CRUD paradigm fundamental to all data management:

DML Operations and CRUD Mapping
Operation	SQL Statement	Action	Example
Create	INSERT	Add new rows to a table	INSERT INTO users VALUES (...)
Read	SELECT	Retrieve data from one or more tables	SELECT * FROM users WHERE ...
Update	UPDATE	Modify existing rows in a table	UPDATE users SET email = ...
Delete	DELETE	Remove rows from a table	DELETE FROM users WHERE ...

DML vs DDL: The Fundamental Distinction:

The distinction between DML and DDL is not merely categorical—it reflects fundamentally different operations with different behaviors:

DML vs DDL Behavioral Differences
Aspect	DML	DDL
Target	Data (table rows)	Metadata (table definitions)
Frequency	Constant (production workload)	Occasional (deployments)
Transaction	Participates in transactions	Auto-commits (most RDBMS)
Rollback	Can be rolled back	Cannot (usually)
Locking	Row-level or page-level locks	Schema-level locks
Logging	Full transaction logging	Minimal logging
Volume	Billions of operations/day possible	Hundreds at most

DML: The Heart of OLTP

Online Transaction Processing (OLTP) systems are dominated by DML. High-frequency INSERT, UPDATE, and DELETE operations characterize transactional workloads in banking, e-commerce, and operational systems. The ability to execute thousands of DML operations per second while maintaining ACID properties is the defining challenge of database engineering.

SELECT: The Foundation of Data Retrieval

The SELECT statement is arguably the most important and most complex statement in SQL. It retrieves data from one or more tables based on specified criteria, transforming raw stored data into meaningful information. SELECT accounts for the vast majority of database operations in most systems.

Why SELECT Is Special:

Unlike INSERT, UPDATE, and DELETE which modify data, SELECT is a read-only operation. This non-destructive nature allows SELECT statements to run without acquiring exclusive locks (in most isolation levels), enabling high concurrency. However, SELECT's apparent simplicity belies enormous complexity:

Modern SELECT statements can involve joins across dozens of tables
Complex aggregations and window functions transform data in sophisticated ways
Subqueries and CTEs create multi-level data processing pipelines
Query execution plans determine whether a query completes in milliseconds or hours

select_anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- =====================================================
-- SELECT Statement: Complete Anatomy
-- =====================================================
 
-- The full SELECT clause order (not all are required):
SELECT      -- 5. Project: Choose columns to return
    [DISTINCT]          -- Optional: Remove duplicate rows
    column_list,        -- Columns, expressions, aggregates
    aggregate_functions
FROM        -- 1. Source: Identify tables
    table_name
    [JOIN other_table   -- Combine multiple tables
        ON condition]
WHERE       -- 2. Filter: Row-level conditions (before grouping)
    conditions
GROUP BY    -- 3. Aggregate: Group rows for aggregate functions
    columns
HAVING      -- 4. Filter: Group-level conditions (after grouping)
    aggregate_conditions
ORDER BY    -- 6. Sort: Control row order in results
    columns [ASC|DESC]
LIMIT       -- 7. Paginate: Restrict number of rows returned
    row_count
OFFSET      -- 8. Skip: Start from N-th row
    skip_count;
 
-- EXECUTION ORDER is different from WRITTEN ORDER:
-- FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT/OFFSET

select_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- =====================================================
-- SELECT: From Simple to Complex
-- =====================================================
 
-- 1. Simple retrieval: All columns from a table
SELECT * FROM employees;
 
-- 2. Column projection: Specific columns only
SELECT first_name, last_name, email FROM employees;
 
-- 3. Filtering: WHERE clause with conditions
SELECT * FROM employees 
WHERE department_id = 10 AND salary > 50000;
 
-- 4. Sorting: ORDER BY for result ordering
SELECT first_name, last_name, hire_date 
FROM employees 
ORDER BY hire_date DESC, last_name ASC;
 
-- 5. Aliasing: Rename columns in output
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary * 12 AS annual_salary
FROM employees;
 
-- 6. DISTINCT: Remove duplicate values
SELECT DISTINCT department_id FROM employees;
 
-- 7. Aggregation: Compute summaries
SELECT 
    department_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    MIN(hire_date) AS earliest_hire,
    MAX(salary) AS highest_salary,
    SUM(salary) AS total_payroll
FROM employees
GROUP BY department_id;
 
-- 8. HAVING: Filter aggregated results
SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
 
-- 9. JOIN: Combine data from multiple tables
SELECT 
    e.first_name, 
    e.last_name, 
    d.department_name,
    m.first_name AS manager_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;
 
-- 10. Subquery: Query within a query
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- 11. CTE: Common Table Expression for readable complex queries
WITH high_earners AS (
    SELECT * FROM employees WHERE salary > 100000
),
department_stats AS (
    SELECT department_id, COUNT(*) AS count
    FROM high_earners
    GROUP BY department_id
)
SELECT d.department_name, ds.count
FROM department_stats ds
JOIN departments d ON ds.department_id = d.department_id;
 
-- 12. Window Functions: Analytics within result sets
SELECT 
    first_name,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
    salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees;

SELECT Performance Critical Points

SELECT performance depends heavily on: (1) Indexing—ensure WHERE and JOIN columns are indexed. (2) Selectivity—WHERE clauses that filter more data perform better. (3) Projection—SELECT only needed columns, avoid SELECT *. (4) JOIN order—start with the most selective table. Understanding EXPLAIN/EXPLAIN ANALYZE output is essential for query optimization.

INSERT: Creating New Data

The INSERT statement adds new rows to a table. It is the only DML statement that increases the row count of the database and the primary mechanism by which data enters a system.

INSERT Semantics:

Every INSERT must provide values for all columns that:

Are defined as NOT NULL
Do not have a DEFAULT value
Are not auto-generated (like auto-increment primary keys)

The DBMS validates each inserted row against all column constraints, CHECK constraints, and foreign key constraints before the row is permanently added.

insert_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- =====================================================
-- INSERT Statement: Complete Guide
-- =====================================================
 
-- 1. Single row insert with explicit columns
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('John', 'Smith', 'john.smith@company.com', 10, 75000);
 
-- 2. Single row with all columns (order must match table definition)
INSERT INTO employees 
VALUES (101, 'Jane', 'Doe', 'jane.doe@company.com', '2024-01-15', 
        NULL, 82000, 'ACTIVE', 10, NULL, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP);
 
-- 3. Multiple rows in single statement (more efficient)
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES 
    ('Alice', 'Johnson', 'alice.j@company.com', 20, 65000),
    ('Bob', 'Williams', 'bob.w@company.com', 20, 71000),
    ('Carol', 'Brown', 'carol.b@company.com', 10, 68000);
 
-- 4. INSERT ... SELECT: Insert from query results
INSERT INTO employee_archive (employee_id, first_name, last_name, termination_date)
SELECT employee_id, first_name, last_name, termination_date
FROM employees
WHERE employment_status = 'TERMINATED';
 
-- 5. INSERT with DEFAULT values
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('New', 'Hire', 'new.hire@company.com', 10, DEFAULT);
 
-- 6. INSERT with RETURNING (PostgreSQL) - get inserted data back
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('David', 'Lee', 'david.lee@company.com', 30, 90000)
RETURNING employee_id, first_name, email;
 
-- 7. INSERT or UPDATE on conflict (UPSERT)
-- PostgreSQL: ON CONFLICT
INSERT INTO employees (employee_id, first_name, last_name, email, department_id, salary)
VALUES (101, 'John', 'Smith', 'john.smith@company.com', 10, 80000)
ON CONFLICT (employee_id) 
DO UPDATE SET salary = EXCLUDED.salary, updated_at = CURRENT_TIMESTAMP;
 
-- MySQL: ON DUPLICATE KEY UPDATE
-- INSERT INTO employees (employee_id, first_name, ...) VALUES (...)
-- ON DUPLICATE KEY UPDATE salary = VALUES(salary);
 
-- 8. INSERT IGNORE (MySQL) - Skip rows that would violate constraints
-- INSERT IGNORE INTO employees ...
 
-- 9. Bulk insert from external file (PostgreSQL COPY)
-- COPY employees FROM '/path/to/employees.csv' WITH (FORMAT csv, HEADER true);

INSERT Performance Considerations:

INSERT operations have significant system impact beyond simply adding data:

Index Maintenance: Every index on the table must be updated. A table with 10 indexes means 10 additional structures to modify per INSERT. This is why high-frequency insert tables often minimize indexes.
Constraint Validation: Foreign key constraints require lookups to parent tables. CHECK constraints must be evaluated. UNIQUE constraints require index probes.
Transaction Logging: Every INSERT is logged to the transaction log (WAL/redo log) for durability and recovery. High-volume inserts can stress logging infrastructure.
Lock Contention: While row-level locking minimizes contention, extremely high INSERT rates can create page-level contention on table hot spots.
Auto-increment Allocation: Sequence-backed IDs require atomic operations that can become bottlenecks at extreme scale.

Bulk INSERT Optimization

For large data loads, individual INSERT statements are catastrophically slow. Use: (1) Multi-row INSERT (insert many rows per statement). (2) COPY command (PostgreSQL) or LOAD DATA (MySQL) for file-based bulk loading. (3) Disable indexes during load, rebuild after. (4) Wrap in single transaction or use autocommit=off. (5) Consider unlogged tables or bulk load utilities for extreme volumes.

UPDATE: Modifying Existing Data

The UPDATE statement modifies the values of existing rows in a table. It is one of the most powerful—and therefore dangerous—DML statements. An UPDATE without a proper WHERE clause will modify every row in the table, often with catastrophic consequences.

UPDATE Semantics:

UPDATE conceptually works in two phases:

Identify rows: The WHERE clause selects which rows will be modified
Apply changes: The SET clause specifies new values for specified columns

All identified rows are modified atomically as part of the enclosing transaction.

update_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
-- =====================================================
-- UPDATE Statement: Complete Guide
-- =====================================================
 
-- 1. Basic UPDATE: Single column, single row (by primary key)
UPDATE employees 
SET salary = 85000 
WHERE employee_id = 101;
 
-- 2. UPDATE multiple columns
UPDATE employees 
SET 
    salary = 90000,
    employment_status = 'ACTIVE',
    updated_at = CURRENT_TIMESTAMP
WHERE employee_id = 101;
 
-- 3. UPDATE with calculations
UPDATE employees 
SET salary = salary * 1.10  -- 10% raise
WHERE department_id = 20;
 
-- 4. UPDATE with expressions from other columns
UPDATE employees 
SET email = LOWER(first_name) || '.' || LOWER(last_name) || '@newdomain.com'
WHERE email LIKE '%@olddomain.com';
 
-- 5. UPDATE with subquery (correlated)
UPDATE employees e
SET salary = (
    SELECT AVG(salary) 
    FROM employees 
    WHERE department_id = e.department_id
)
WHERE salary < 30000;
 
-- 6. UPDATE with JOIN (PostgreSQL syntax)
UPDATE employees e
SET salary = e.salary * 1.15
FROM departments d
WHERE e.department_id = d.department_id
  AND d.department_name = 'Engineering';
 
-- 7. UPDATE with JOIN (MySQL syntax)
-- UPDATE employees e
-- JOIN departments d ON e.department_id = d.department_id
-- SET e.salary = e.salary * 1.15
-- WHERE d.department_name = 'Engineering';
 
-- 8. UPDATE with CASE expression
UPDATE employees
SET employment_status = CASE
    WHEN termination_date IS NOT NULL THEN 'TERMINATED'
    WHEN hire_date > CURRENT_DATE - INTERVAL '90 days' THEN 'PROBATION'
    ELSE 'ACTIVE'
END;
 
-- 9. UPDATE with RETURNING (PostgreSQL) - see what was changed
UPDATE employees
SET salary = salary * 1.05
WHERE department_id = 10
RETURNING employee_id, first_name, salary AS new_salary;
 
-- 10. UPDATE with LIMIT (MySQL) - update only N rows
-- UPDATE employees SET processed = true
-- WHERE processed = false
-- LIMIT 1000;
 
-- 11. NEVER DO THIS - Updates ALL rows!
-- UPDATE employees SET salary = 0;  -- DISASTER!

UPDATE Dangers

•Missing WHERE updates ALL rows
•Incorrect WHERE updates wrong rows
•Large updates can lock tables for extended periods
•Index modifications slow large updates
•No automatic backup of old values
•Cascade updates can affect more rows than expected

Safe UPDATE Practices

•Run SELECT first with same WHERE clause
•Always include WHERE (explicit exceptions only)
•Use transactions for rollback capability
•Batch large updates into smaller chunks
•Avoid updating primary keys if possible
•Use RETURNING to verify affected data

The SELECT-Before-UPDATE Pattern

Before executing any UPDATE in production, first run a SELECT with the identical WHERE clause to verify exactly which rows will be affected. This simple practice has prevented countless disasters:

-- Step 1: Verify target rows
SELECT * FROM employees WHERE department_id = 10;

-- Step 2: If results look correct, execute UPDATE
UPDATE employees SET salary = salary * 1.10 WHERE department_id = 10;

DELETE: Removing Data

The DELETE statement removes rows from a table. Like UPDATE, DELETE operates on all rows matching the WHERE clause. A DELETE without WHERE removes every row in the table—a destructive operation that differs from TRUNCATE primarily in its transaction behavior and logging.

DELETE Semantics:

DELETE:

Identifies all rows matching the WHERE clause
Checks foreign key constraints (rows may be protected)
Fires BEFORE DELETE triggers
Removes the rows from the table
Fires AFTER DELETE triggers
Updates all affected indexes
Logs each deleted row for transaction rollback and recovery

delete_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- =====================================================
-- DELETE Statement: Complete Guide
-- =====================================================
 
-- 1. Basic DELETE: Remove specific rows
DELETE FROM employees WHERE employee_id = 101;
 
-- 2. DELETE with compound conditions
DELETE FROM employees 
WHERE employment_status = 'TERMINATED' 
  AND termination_date < CURRENT_DATE - INTERVAL '7 years';
 
-- 3. DELETE with subquery
DELETE FROM orders 
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE account_status = 'CLOSED'
);
 
-- 4. DELETE with EXISTS (often more efficient than IN)
DELETE FROM order_items oi
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.order_id = oi.order_id 
      AND o.order_status = 'CANCELLED'
);
 
-- 5. DELETE with JOIN (PostgreSQL using USING)
DELETE FROM employees e
USING departments d
WHERE e.department_id = d.department_id
  AND d.is_active = false;
 
-- 6. DELETE with JOIN (MySQL syntax)
-- DELETE e FROM employees e
-- JOIN departments d ON e.department_id = d.department_id
-- WHERE d.is_active = false;
 
-- 7. DELETE with RETURNING (PostgreSQL) - capture deleted data
DELETE FROM employees 
WHERE employment_status = 'TERMINATED'
RETURNING employee_id, first_name, last_name, email;
 
-- 8. DELETE with LIMIT (MySQL) - delete in batches
-- Useful for large deletes to avoid locking
-- DELETE FROM log_entries WHERE created_at < '2023-01-01' LIMIT 10000;
 
-- 9. Soft delete pattern (common alternative to DELETE)
UPDATE employees 
SET deleted_at = CURRENT_TIMESTAMP, deleted = true
WHERE employee_id = 101;
-- Later: SELECT * FROM employees WHERE deleted = false;
 
-- 10. DELETE with cascade behavior (depends on FK definition)
-- If FK defined with ON DELETE CASCADE:
DELETE FROM departments WHERE department_id = 10;
-- This automatically deletes all employees in department 10!
 
-- 11. NEVER DO THIS (without WHERE intentionally):
-- DELETE FROM employees;  -- Removes ALL data!

DELETE Performance and Storage:

DELETE operations have significant implications beyond just removing data:

1. Storage Reclamation: DELETE does not immediately reclaim disk space. Deleted rows are marked as "dead" but the space remains allocated. PostgreSQL requires VACUUM to reclaim space; MySQL InnoDB may need OPTIMIZE TABLE for full reclamation.

2. Index Maintenance: Every index must be updated to remove references to deleted rows. Tables with many indexes experience proportionally slower DELETE performance.

3. Foreign Key Checks: If other tables reference the rows being deleted, the DBMS must check referential integrity. Depending on the ON DELETE action (RESTRICT, CASCADE, SET NULL), this may block the delete, cascade to other tables, or update foreign key values.

4. Transaction Log Volume: Unlike TRUNCATE (minimal logging), DELETE logs each deleted row individually. Deleting millions of rows generates massive transaction logs, which affects backup size, replication lag, and point-in-time recovery.

5. Lock Escalation: Deleting large numbers of rows may cause lock escalation from row-level to table-level locks, blocking other transactions.

Soft Delete: The Production Pattern

Many production systems never physically DELETE data. Instead, they use 'soft delete' patterns—adding a deleted_at timestamp or is_deleted boolean. This preserves data for auditing, enables 'undelete' functionality, and avoids referential integrity complications. Application queries simply filter WHERE deleted = false. Physical deletion happens later through scheduled maintenance jobs.

DML and Transaction Control

A defining characteristic of DML (that distinguishes it from DDL in most databases) is its participation in transactions. DML statements can be grouped into atomic units of work that either fully succeed or fully fail, maintaining database consistency.

Transaction ACID Properties:

DML operations within transactions guarantee:

ACID Properties in DML Transactions
Property	Meaning	DML Example
Atomicity	All operations complete or none do	Transfer: debit AND credit both happen or neither
Consistency	Data moves from one valid state to another	Constraints are never violated after commit
Isolation	Concurrent transactions don't interfere	Other users don't see uncommitted changes
Durability	Committed changes survive crashes	Once committed, data persists even through power loss

dml_transactions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- =====================================================
-- DML with Transaction Control
-- =====================================================
 
-- 1. Basic transaction: Transfer $500 from account A to account B
BEGIN;
 
UPDATE accounts SET balance = balance - 500 WHERE account_id = 'A';
UPDATE accounts SET balance = balance + 500 WHERE account_id = 'B';
 
-- Verify the transfer
SELECT account_id, balance FROM accounts WHERE account_id IN ('A', 'B');
 
-- If everything looks correct:
COMMIT;
 
-- If something went wrong:
-- ROLLBACK;
 
-- 2. Transaction with savepoints (partial rollback)
BEGIN;
 
INSERT INTO orders (customer_id, total) VALUES (100, 299.99);
-- Assume this returns order_id = 5001
 
SAVEPOINT after_order;
 
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-A', 2);
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-B', 1);
 
-- Oops, PROD-C is out of stock
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-C', 3);
-- This might fail due to inventory constraint
 
ROLLBACK TO after_order;  -- Undo only the items, keep the order
 
-- Try alternative product
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-D', 3);
 
COMMIT;
 
-- 3. Read-only transaction (optimization for reporting)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
 
SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE;
SELECT SUM(total) FROM orders WHERE order_date = CURRENT_DATE;
 
COMMIT;
 
-- 4. Explicit locking for critical updates
BEGIN;
 
-- Lock the row to prevent concurrent modification
SELECT * FROM accounts WHERE account_id = 'A' FOR UPDATE;
 
-- Now safe to read, calculate, and update
UPDATE accounts SET balance = balance - 500 WHERE account_id = 'A';
 
COMMIT;

Autocommit Mode

By default, many database clients operate in 'autocommit' mode—each DML statement is immediately committed as its own transaction. This is convenient for ad-hoc queries but dangerous for multi-statement operations that should succeed or fail together. Disable autocommit (BEGIN explicit transactions) for any operation requiring atomicity.

Set-Based Thinking: The SQL Paradigm

A critical concept for DML mastery is understanding that SQL operates on sets, not individual rows. Every DML statement (except cursors) processes entire result sets at once. This set-based paradigm is fundamentally different from procedural programming and is the key to writing efficient SQL.

The Anti-Pattern: Row-by-Row Processing

Programmers accustomed to procedural languages often write SQL like this (pseudocode):

for each row in SELECT * FROM employees
    UPDATE employees SET processed = true WHERE id = row.id

This pattern, implemented via cursors or application loops, is catastrophically slow compared to set-based equivalents.

Row-by-Row (SLOW)

•Processes one row at a time
•N separate parse/plan/execute cycles
•N network round-trips (if from application)
•Cannot leverage bulk optimizations
•Often 10-1000x slower
•Common in procedural code ported to SQL

Set-Based (FAST)

•Processes all qualifying rows at once
•Single parse/plan/execute cycle
•Single network round-trip
•Optimizer can use parallel execution
•Leverages bulk I/O and index scans
•Natural SQL paradigm

set_vs_row_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- =====================================================
-- Set-Based vs Row-Based: Practical Examples
-- =====================================================
 
-- PROBLEM: Mark all orders over $1000 as 'HIGH_VALUE'
 
-- ❌ ROW-BY-ROW (via cursor - DO NOT DO THIS)
-- DECLARE order_cursor CURSOR FOR SELECT order_id FROM orders WHERE total > 1000;
-- OPEN order_cursor;
-- LOOP
--     FETCH order_cursor INTO @order_id;
--     UPDATE orders SET category = 'HIGH_VALUE' WHERE order_id = @order_id;
-- END LOOP;
-- CLOSE order_cursor;
-- TIME: Minutes to hours for large tables
 
-- ✅ SET-BASED (proper SQL)
UPDATE orders SET category = 'HIGH_VALUE' WHERE total > 1000;
-- TIME: Seconds, regardless of table size
 
-- =====================================================
-- PROBLEM: Copy active employees to a new table
 
-- ❌ ROW-BY-ROW (from application code)
-- for row in db.query("SELECT * FROM employees WHERE status = 'ACTIVE'"):
--     db.execute("INSERT INTO active_employees VALUES (...)", row)
-- TIME: 1000 rows = 1000 INSERT round-trips
 
-- ✅ SET-BASED (single statement)
INSERT INTO active_employees 
SELECT * FROM employees WHERE status = 'ACTIVE';
-- TIME: One operation, bulk optimized
 
-- =====================================================
-- PROBLEM: Calculate and update running totals
 
-- ❌ ROW-BY-ROW thinking: loop through, accumulate, update each
 
-- ✅ SET-BASED: Window function + UPDATE
UPDATE sales s
SET running_total = sub.running_total
FROM (
    SELECT 
        sale_id,
        SUM(amount) OVER (ORDER BY sale_date) AS running_total
    FROM sales
) sub
WHERE s.sale_id = sub.sale_id;

The Set-Based Mindset

When writing SQL, ask: 'What set of rows do I need to affect, and what transformation do I need to apply?' NOT 'How do I loop through rows and modify each one?' This mental shift is the difference between SQL that runs in seconds vs. hours. If you find yourself reaching for a cursor or loop, pause and look for a set-based alternative.

Summary: Mastering DML

Data Manipulation Language is the workhorse of database operations—the statements that applications execute millions of times daily to create, read, update, and delete data. DML mastery is essential for anyone who works with databases.

Let's consolidate the key concepts from this page:

Key Takeaways

•DML implements CRUD operations — SELECT reads, INSERT creates, UPDATE modifies, DELETE removes. These four statements handle all data interactions.
•SELECT is the most complex DML statement — With joins, aggregations, subqueries, and window functions, SELECT can express arbitrarily complex data retrieval logic.
•INSERT performance depends on indexes and constraints — Every index must be updated; every constraint must be validated. Bulk loading techniques are essential for high-volume inserts.
•UPDATE and DELETE require careful WHERE clauses — Missing or incorrect WHERE clauses lead to mass data corruption. Always validate with SELECT first.
•DML participates in transactions — Unlike DDL, DML can be rolled back. Group related changes in explicit transactions for atomicity.
•Think in sets, not rows — SQL operates on entire result sets. Set-based operations outperform row-by-row processing by orders of magnitude.
•Soft delete is often preferable to DELETE — For audit trails, undo capability, and referential integrity, mark records as deleted rather than physically removing them.
•Performance is critical — In production, DML runs constantly. Indexing, query optimization, and efficient statement design directly impact system performance.

What's Next:

With DML mastered, we turn to the language that controls who can execute these operations—Data Control Language (DCL). The next page explores GRANT and REVOKE: the statements that implement database security and access control.

Page Complete

You now understand DML: the language that manipulates database data. From SELECT's query power to INSERT/UPDATE/DELETE's modification capabilities, DML is the bridge between applications and stored data. Next, we'll explore DCL—the language that secures access to database operations.

2 / 5

Loading learning content...

Database Management SystemsDatabase Languages

Database Languages: The Complete Language Toolkit

LevelBeginner

Duration75 mins

TopicDatabase Languages

2 / 5

Data Manipulation Language (DML)

The Language That Breathes Life Into Databases

What You Will Learn

Understanding Data Manipulation Language

The Four Fundamental Operations (CRUD):

DML provides exactly four operations that map directly to the CRUD paradigm fundamental to all data management:

DML Operations and CRUD Mapping
Operation	SQL Statement	Action	Example
Create	INSERT	Add new rows to a table	INSERT INTO users VALUES (...)
Read	SELECT	Retrieve data from one or more tables	SELECT * FROM users WHERE ...
Update	UPDATE	Modify existing rows in a table	UPDATE users SET email = ...
Delete	DELETE	Remove rows from a table	DELETE FROM users WHERE ...

DML vs DDL: The Fundamental Distinction:

The distinction between DML and DDL is not merely categorical—it reflects fundamentally different operations with different behaviors:

DML vs DDL Behavioral Differences
Aspect	DML	DDL
Target	Data (table rows)	Metadata (table definitions)
Frequency	Constant (production workload)	Occasional (deployments)
Transaction	Participates in transactions	Auto-commits (most RDBMS)
Rollback	Can be rolled back	Cannot (usually)
Locking	Row-level or page-level locks	Schema-level locks
Logging	Full transaction logging	Minimal logging
Volume	Billions of operations/day possible	Hundreds at most

DML: The Heart of OLTP

SELECT: The Foundation of Data Retrieval

Why SELECT Is Special:

Modern SELECT statements can involve joins across dozens of tables
Complex aggregations and window functions transform data in sophisticated ways
Subqueries and CTEs create multi-level data processing pipelines
Query execution plans determine whether a query completes in milliseconds or hours

select_anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- =====================================================
-- SELECT Statement: Complete Anatomy
-- =====================================================
 
-- The full SELECT clause order (not all are required):
SELECT      -- 5. Project: Choose columns to return
    [DISTINCT]          -- Optional: Remove duplicate rows
    column_list,        -- Columns, expressions, aggregates
    aggregate_functions
FROM        -- 1. Source: Identify tables
    table_name
    [JOIN other_table   -- Combine multiple tables
        ON condition]
WHERE       -- 2. Filter: Row-level conditions (before grouping)
    conditions
GROUP BY    -- 3. Aggregate: Group rows for aggregate functions
    columns
HAVING      -- 4. Filter: Group-level conditions (after grouping)
    aggregate_conditions
ORDER BY    -- 6. Sort: Control row order in results
    columns [ASC|DESC]
LIMIT       -- 7. Paginate: Restrict number of rows returned
    row_count
OFFSET      -- 8. Skip: Start from N-th row
    skip_count;
 
-- EXECUTION ORDER is different from WRITTEN ORDER:
-- FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT/OFFSET

select_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- =====================================================
-- SELECT: From Simple to Complex
-- =====================================================
 
-- 1. Simple retrieval: All columns from a table
SELECT * FROM employees;
 
-- 2. Column projection: Specific columns only
SELECT first_name, last_name, email FROM employees;
 
-- 3. Filtering: WHERE clause with conditions
SELECT * FROM employees 
WHERE department_id = 10 AND salary > 50000;
 
-- 4. Sorting: ORDER BY for result ordering
SELECT first_name, last_name, hire_date 
FROM employees 
ORDER BY hire_date DESC, last_name ASC;
 
-- 5. Aliasing: Rename columns in output
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary * 12 AS annual_salary
FROM employees;
 
-- 6. DISTINCT: Remove duplicate values
SELECT DISTINCT department_id FROM employees;
 
-- 7. Aggregation: Compute summaries
SELECT 
    department_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    MIN(hire_date) AS earliest_hire,
    MAX(salary) AS highest_salary,
    SUM(salary) AS total_payroll
FROM employees
GROUP BY department_id;
 
-- 8. HAVING: Filter aggregated results
SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
 
-- 9. JOIN: Combine data from multiple tables
SELECT 
    e.first_name, 
    e.last_name, 
    d.department_name,
    m.first_name AS manager_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;
 
-- 10. Subquery: Query within a query
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- 11. CTE: Common Table Expression for readable complex queries
WITH high_earners AS (
    SELECT * FROM employees WHERE salary > 100000
),
department_stats AS (
    SELECT department_id, COUNT(*) AS count
    FROM high_earners
    GROUP BY department_id
)
SELECT d.department_name, ds.count
FROM department_stats ds
JOIN departments d ON ds.department_id = d.department_id;
 
-- 12. Window Functions: Analytics within result sets
SELECT 
    first_name,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
    salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank
FROM employees;

SELECT Performance Critical Points

INSERT: Creating New Data

The INSERT statement adds new rows to a table. It is the only DML statement that increases the row count of the database and the primary mechanism by which data enters a system.

INSERT Semantics:

Every INSERT must provide values for all columns that:

Are defined as NOT NULL
Do not have a DEFAULT value
Are not auto-generated (like auto-increment primary keys)

The DBMS validates each inserted row against all column constraints, CHECK constraints, and foreign key constraints before the row is permanently added.

insert_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- =====================================================
-- INSERT Statement: Complete Guide
-- =====================================================
 
-- 1. Single row insert with explicit columns
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('John', 'Smith', 'john.smith@company.com', 10, 75000);
 
-- 2. Single row with all columns (order must match table definition)
INSERT INTO employees 
VALUES (101, 'Jane', 'Doe', 'jane.doe@company.com', '2024-01-15', 
        NULL, 82000, 'ACTIVE', 10, NULL, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP);
 
-- 3. Multiple rows in single statement (more efficient)
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES 
    ('Alice', 'Johnson', 'alice.j@company.com', 20, 65000),
    ('Bob', 'Williams', 'bob.w@company.com', 20, 71000),
    ('Carol', 'Brown', 'carol.b@company.com', 10, 68000);
 
-- 4. INSERT ... SELECT: Insert from query results
INSERT INTO employee_archive (employee_id, first_name, last_name, termination_date)
SELECT employee_id, first_name, last_name, termination_date
FROM employees
WHERE employment_status = 'TERMINATED';
 
-- 5. INSERT with DEFAULT values
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('New', 'Hire', 'new.hire@company.com', 10, DEFAULT);
 
-- 6. INSERT with RETURNING (PostgreSQL) - get inserted data back
INSERT INTO employees (first_name, last_name, email, department_id, salary)
VALUES ('David', 'Lee', 'david.lee@company.com', 30, 90000)
RETURNING employee_id, first_name, email;
 
-- 7. INSERT or UPDATE on conflict (UPSERT)
-- PostgreSQL: ON CONFLICT
INSERT INTO employees (employee_id, first_name, last_name, email, department_id, salary)
VALUES (101, 'John', 'Smith', 'john.smith@company.com', 10, 80000)
ON CONFLICT (employee_id) 
DO UPDATE SET salary = EXCLUDED.salary, updated_at = CURRENT_TIMESTAMP;
 
-- MySQL: ON DUPLICATE KEY UPDATE
-- INSERT INTO employees (employee_id, first_name, ...) VALUES (...)
-- ON DUPLICATE KEY UPDATE salary = VALUES(salary);
 
-- 8. INSERT IGNORE (MySQL) - Skip rows that would violate constraints
-- INSERT IGNORE INTO employees ...
 
-- 9. Bulk insert from external file (PostgreSQL COPY)
-- COPY employees FROM '/path/to/employees.csv' WITH (FORMAT csv, HEADER true);

INSERT Performance Considerations:

INSERT operations have significant system impact beyond simply adding data:

Index Maintenance: Every index on the table must be updated. A table with 10 indexes means 10 additional structures to modify per INSERT. This is why high-frequency insert tables often minimize indexes.
Constraint Validation: Foreign key constraints require lookups to parent tables. CHECK constraints must be evaluated. UNIQUE constraints require index probes.
Transaction Logging: Every INSERT is logged to the transaction log (WAL/redo log) for durability and recovery. High-volume inserts can stress logging infrastructure.
Lock Contention: While row-level locking minimizes contention, extremely high INSERT rates can create page-level contention on table hot spots.
Auto-increment Allocation: Sequence-backed IDs require atomic operations that can become bottlenecks at extreme scale.

Bulk INSERT Optimization

UPDATE: Modifying Existing Data

UPDATE Semantics:

UPDATE conceptually works in two phases:

Identify rows: The WHERE clause selects which rows will be modified
Apply changes: The SET clause specifies new values for specified columns

All identified rows are modified atomically as part of the enclosing transaction.

update_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
-- =====================================================
-- UPDATE Statement: Complete Guide
-- =====================================================
 
-- 1. Basic UPDATE: Single column, single row (by primary key)
UPDATE employees 
SET salary = 85000 
WHERE employee_id = 101;
 
-- 2. UPDATE multiple columns
UPDATE employees 
SET 
    salary = 90000,
    employment_status = 'ACTIVE',
    updated_at = CURRENT_TIMESTAMP
WHERE employee_id = 101;
 
-- 3. UPDATE with calculations
UPDATE employees 
SET salary = salary * 1.10  -- 10% raise
WHERE department_id = 20;
 
-- 4. UPDATE with expressions from other columns
UPDATE employees 
SET email = LOWER(first_name) || '.' || LOWER(last_name) || '@newdomain.com'
WHERE email LIKE '%@olddomain.com';
 
-- 5. UPDATE with subquery (correlated)
UPDATE employees e
SET salary = (
    SELECT AVG(salary) 
    FROM employees 
    WHERE department_id = e.department_id
)
WHERE salary < 30000;
 
-- 6. UPDATE with JOIN (PostgreSQL syntax)
UPDATE employees e
SET salary = e.salary * 1.15
FROM departments d
WHERE e.department_id = d.department_id
  AND d.department_name = 'Engineering';
 
-- 7. UPDATE with JOIN (MySQL syntax)
-- UPDATE employees e
-- JOIN departments d ON e.department_id = d.department_id
-- SET e.salary = e.salary * 1.15
-- WHERE d.department_name = 'Engineering';
 
-- 8. UPDATE with CASE expression
UPDATE employees
SET employment_status = CASE
    WHEN termination_date IS NOT NULL THEN 'TERMINATED'
    WHEN hire_date > CURRENT_DATE - INTERVAL '90 days' THEN 'PROBATION'
    ELSE 'ACTIVE'
END;
 
-- 9. UPDATE with RETURNING (PostgreSQL) - see what was changed
UPDATE employees
SET salary = salary * 1.05
WHERE department_id = 10
RETURNING employee_id, first_name, salary AS new_salary;
 
-- 10. UPDATE with LIMIT (MySQL) - update only N rows
-- UPDATE employees SET processed = true
-- WHERE processed = false
-- LIMIT 1000;
 
-- 11. NEVER DO THIS - Updates ALL rows!
-- UPDATE employees SET salary = 0;  -- DISASTER!

UPDATE Dangers

•Missing WHERE updates ALL rows
•Incorrect WHERE updates wrong rows
•Large updates can lock tables for extended periods
•Index modifications slow large updates
•No automatic backup of old values
•Cascade updates can affect more rows than expected

Safe UPDATE Practices

•Run SELECT first with same WHERE clause
•Always include WHERE (explicit exceptions only)
•Use transactions for rollback capability
•Batch large updates into smaller chunks
•Avoid updating primary keys if possible
•Use RETURNING to verify affected data

The SELECT-Before-UPDATE Pattern

Before executing any UPDATE in production, first run a SELECT with the identical WHERE clause to verify exactly which rows will be affected. This simple practice has prevented countless disasters:

-- Step 1: Verify target rows
SELECT * FROM employees WHERE department_id = 10;

-- Step 2: If results look correct, execute UPDATE
UPDATE employees SET salary = salary * 1.10 WHERE department_id = 10;

DELETE: Removing Data

DELETE Semantics:

DELETE:

Identifies all rows matching the WHERE clause
Checks foreign key constraints (rows may be protected)
Fires BEFORE DELETE triggers
Removes the rows from the table
Fires AFTER DELETE triggers
Updates all affected indexes
Logs each deleted row for transaction rollback and recovery

delete_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- =====================================================
-- DELETE Statement: Complete Guide
-- =====================================================
 
-- 1. Basic DELETE: Remove specific rows
DELETE FROM employees WHERE employee_id = 101;
 
-- 2. DELETE with compound conditions
DELETE FROM employees 
WHERE employment_status = 'TERMINATED' 
  AND termination_date < CURRENT_DATE - INTERVAL '7 years';
 
-- 3. DELETE with subquery
DELETE FROM orders 
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE account_status = 'CLOSED'
);
 
-- 4. DELETE with EXISTS (often more efficient than IN)
DELETE FROM order_items oi
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.order_id = oi.order_id 
      AND o.order_status = 'CANCELLED'
);
 
-- 5. DELETE with JOIN (PostgreSQL using USING)
DELETE FROM employees e
USING departments d
WHERE e.department_id = d.department_id
  AND d.is_active = false;
 
-- 6. DELETE with JOIN (MySQL syntax)
-- DELETE e FROM employees e
-- JOIN departments d ON e.department_id = d.department_id
-- WHERE d.is_active = false;
 
-- 7. DELETE with RETURNING (PostgreSQL) - capture deleted data
DELETE FROM employees 
WHERE employment_status = 'TERMINATED'
RETURNING employee_id, first_name, last_name, email;
 
-- 8. DELETE with LIMIT (MySQL) - delete in batches
-- Useful for large deletes to avoid locking
-- DELETE FROM log_entries WHERE created_at < '2023-01-01' LIMIT 10000;
 
-- 9. Soft delete pattern (common alternative to DELETE)
UPDATE employees 
SET deleted_at = CURRENT_TIMESTAMP, deleted = true
WHERE employee_id = 101;
-- Later: SELECT * FROM employees WHERE deleted = false;
 
-- 10. DELETE with cascade behavior (depends on FK definition)
-- If FK defined with ON DELETE CASCADE:
DELETE FROM departments WHERE department_id = 10;
-- This automatically deletes all employees in department 10!
 
-- 11. NEVER DO THIS (without WHERE intentionally):
-- DELETE FROM employees;  -- Removes ALL data!

DELETE Performance and Storage:

DELETE operations have significant implications beyond just removing data:

2. Index Maintenance: Every index must be updated to remove references to deleted rows. Tables with many indexes experience proportionally slower DELETE performance.

5. Lock Escalation: Deleting large numbers of rows may cause lock escalation from row-level to table-level locks, blocking other transactions.

Soft Delete: The Production Pattern

DML and Transaction Control

Transaction ACID Properties:

DML operations within transactions guarantee:

ACID Properties in DML Transactions
Property	Meaning	DML Example
Atomicity	All operations complete or none do	Transfer: debit AND credit both happen or neither
Consistency	Data moves from one valid state to another	Constraints are never violated after commit
Isolation	Concurrent transactions don't interfere	Other users don't see uncommitted changes
Durability	Committed changes survive crashes	Once committed, data persists even through power loss

dml_transactions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- =====================================================
-- DML with Transaction Control
-- =====================================================
 
-- 1. Basic transaction: Transfer $500 from account A to account B
BEGIN;
 
UPDATE accounts SET balance = balance - 500 WHERE account_id = 'A';
UPDATE accounts SET balance = balance + 500 WHERE account_id = 'B';
 
-- Verify the transfer
SELECT account_id, balance FROM accounts WHERE account_id IN ('A', 'B');
 
-- If everything looks correct:
COMMIT;
 
-- If something went wrong:
-- ROLLBACK;
 
-- 2. Transaction with savepoints (partial rollback)
BEGIN;
 
INSERT INTO orders (customer_id, total) VALUES (100, 299.99);
-- Assume this returns order_id = 5001
 
SAVEPOINT after_order;
 
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-A', 2);
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-B', 1);
 
-- Oops, PROD-C is out of stock
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-C', 3);
-- This might fail due to inventory constraint
 
ROLLBACK TO after_order;  -- Undo only the items, keep the order
 
-- Try alternative product
INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-D', 3);
 
COMMIT;
 
-- 3. Read-only transaction (optimization for reporting)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
 
SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE;
SELECT SUM(total) FROM orders WHERE order_date = CURRENT_DATE;
 
COMMIT;
 
-- 4. Explicit locking for critical updates
BEGIN;
 
-- Lock the row to prevent concurrent modification
SELECT * FROM accounts WHERE account_id = 'A' FOR UPDATE;
 
-- Now safe to read, calculate, and update
UPDATE accounts SET balance = balance - 500 WHERE account_id = 'A';
 
COMMIT;

Autocommit Mode

Set-Based Thinking: The SQL Paradigm

The Anti-Pattern: Row-by-Row Processing

Programmers accustomed to procedural languages often write SQL like this (pseudocode):

for each row in SELECT * FROM employees
    UPDATE employees SET processed = true WHERE id = row.id

This pattern, implemented via cursors or application loops, is catastrophically slow compared to set-based equivalents.

Row-by-Row (SLOW)

•Processes one row at a time
•N separate parse/plan/execute cycles
•N network round-trips (if from application)
•Cannot leverage bulk optimizations
•Often 10-1000x slower
•Common in procedural code ported to SQL

Set-Based (FAST)

•Processes all qualifying rows at once
•Single parse/plan/execute cycle
•Single network round-trip
•Optimizer can use parallel execution
•Leverages bulk I/O and index scans
•Natural SQL paradigm

set_vs_row_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- =====================================================
-- Set-Based vs Row-Based: Practical Examples
-- =====================================================
 
-- PROBLEM: Mark all orders over $1000 as 'HIGH_VALUE'
 
-- ❌ ROW-BY-ROW (via cursor - DO NOT DO THIS)
-- DECLARE order_cursor CURSOR FOR SELECT order_id FROM orders WHERE total > 1000;
-- OPEN order_cursor;
-- LOOP
--     FETCH order_cursor INTO @order_id;
--     UPDATE orders SET category = 'HIGH_VALUE' WHERE order_id = @order_id;
-- END LOOP;
-- CLOSE order_cursor;
-- TIME: Minutes to hours for large tables
 
-- ✅ SET-BASED (proper SQL)
UPDATE orders SET category = 'HIGH_VALUE' WHERE total > 1000;
-- TIME: Seconds, regardless of table size
 
-- =====================================================
-- PROBLEM: Copy active employees to a new table
 
-- ❌ ROW-BY-ROW (from application code)
-- for row in db.query("SELECT * FROM employees WHERE status = 'ACTIVE'"):
--     db.execute("INSERT INTO active_employees VALUES (...)", row)
-- TIME: 1000 rows = 1000 INSERT round-trips
 
-- ✅ SET-BASED (single statement)
INSERT INTO active_employees 
SELECT * FROM employees WHERE status = 'ACTIVE';
-- TIME: One operation, bulk optimized
 
-- =====================================================
-- PROBLEM: Calculate and update running totals
 
-- ❌ ROW-BY-ROW thinking: loop through, accumulate, update each
 
-- ✅ SET-BASED: Window function + UPDATE
UPDATE sales s
SET running_total = sub.running_total
FROM (
    SELECT 
        sale_id,
        SUM(amount) OVER (ORDER BY sale_date) AS running_total
    FROM sales
) sub
WHERE s.sale_id = sub.sale_id;

The Set-Based Mindset

Summary: Mastering DML

Let's consolidate the key concepts from this page:

Key Takeaways

•DML implements CRUD operations — SELECT reads, INSERT creates, UPDATE modifies, DELETE removes. These four statements handle all data interactions.
•SELECT is the most complex DML statement — With joins, aggregations, subqueries, and window functions, SELECT can express arbitrarily complex data retrieval logic.
•INSERT performance depends on indexes and constraints — Every index must be updated; every constraint must be validated. Bulk loading techniques are essential for high-volume inserts.
•UPDATE and DELETE require careful WHERE clauses — Missing or incorrect WHERE clauses lead to mass data corruption. Always validate with SELECT first.
•DML participates in transactions — Unlike DDL, DML can be rolled back. Group related changes in explicit transactions for atomicity.
•Think in sets, not rows — SQL operates on entire result sets. Set-based operations outperform row-by-row processing by orders of magnitude.
•Soft delete is often preferable to DELETE — For audit trails, undo capability, and referential integrity, mark records as deleted rather than physically removing them.
•Performance is critical — In production, DML runs constantly. Indexing, query optimization, and efficient statement design directly impact system performance.

What's Next:

Page Complete

2 / 5