Loading learning content...
If DDL defines the skeleton of a database—its structure and form—then Data Manipulation Language (DML) is the living, breathing circulatory system that pumps data through it. DML is where databases transition from static definitions to dynamic, working systems that power applications, process transactions, and deliver insights.
Every application you've ever used—every e-commerce checkout, every social media post, every banking transaction—ultimately translates into DML statements executed against a database. When you add an item to your cart, an INSERT runs. When you update your profile, an UPDATE executes. When you search for products, a SELECT queries millions of rows. When you delete a message, a DELETE removes it permanently.
DML represents the 99.9% of database activity in production systems. While DDL might run occasionally during deployments, DML runs continuously—thousands, millions, even billions of times per day in large-scale systems. Mastering DML is therefore not optional; it is the core competency of anyone who works with databases.
By the end of this page, you will deeply understand all four DML statements—SELECT, INSERT, UPDATE, and DELETE—their precise semantics, performance characteristics, and real-world usage patterns. You'll learn how DML interacts with transactions, how it affects storage and indexes, and the critical differences between set-based and row-based operations.
Data Manipulation Language (DML) is the subset of SQL that provides commands for interacting with the data stored in database tables. While DDL defines what containers exist, DML operates on the contents of those containers—creating, reading, updating, and deleting the actual information stored within.
The Four Fundamental Operations (CRUD):
DML provides exactly four operations that map directly to the CRUD paradigm fundamental to all data management:
| Operation | SQL Statement | Action | Example |
|---|---|---|---|
| Create | INSERT | Add new rows to a table | INSERT INTO users VALUES (...) |
| Read | SELECT | Retrieve data from one or more tables | SELECT * FROM users WHERE ... |
| Update | UPDATE | Modify existing rows in a table | UPDATE users SET email = ... |
| Delete | DELETE | Remove rows from a table | DELETE FROM users WHERE ... |
DML vs DDL: The Fundamental Distinction:
The distinction between DML and DDL is not merely categorical—it reflects fundamentally different operations with different behaviors:
| Aspect | DML | DDL |
|---|---|---|
| Target | Data (table rows) | Metadata (table definitions) |
| Frequency | Constant (production workload) | Occasional (deployments) |
| Transaction | Participates in transactions | Auto-commits (most RDBMS) |
| Rollback | Can be rolled back | Cannot (usually) |
| Locking | Row-level or page-level locks | Schema-level locks |
| Logging | Full transaction logging | Minimal logging |
| Volume | Billions of operations/day possible | Hundreds at most |
Online Transaction Processing (OLTP) systems are dominated by DML. High-frequency INSERT, UPDATE, and DELETE operations characterize transactional workloads in banking, e-commerce, and operational systems. The ability to execute thousands of DML operations per second while maintaining ACID properties is the defining challenge of database engineering.
The SELECT statement is arguably the most important and most complex statement in SQL. It retrieves data from one or more tables based on specified criteria, transforming raw stored data into meaningful information. SELECT accounts for the vast majority of database operations in most systems.
Why SELECT Is Special:
Unlike INSERT, UPDATE, and DELETE which modify data, SELECT is a read-only operation. This non-destructive nature allows SELECT statements to run without acquiring exclusive locks (in most isolation levels), enabling high concurrency. However, SELECT's apparent simplicity belies enormous complexity:
12345678910111213141516171819202122232425262728
-- =====================================================-- SELECT Statement: Complete Anatomy-- ===================================================== -- The full SELECT clause order (not all are required):SELECT -- 5. Project: Choose columns to return [DISTINCT] -- Optional: Remove duplicate rows column_list, -- Columns, expressions, aggregates aggregate_functionsFROM -- 1. Source: Identify tables table_name [JOIN other_table -- Combine multiple tables ON condition]WHERE -- 2. Filter: Row-level conditions (before grouping) conditionsGROUP BY -- 3. Aggregate: Group rows for aggregate functions columnsHAVING -- 4. Filter: Group-level conditions (after grouping) aggregate_conditionsORDER BY -- 6. Sort: Control row order in results columns [ASC|DESC]LIMIT -- 7. Paginate: Restrict number of rows returned row_countOFFSET -- 8. Skip: Start from N-th row skip_count; -- EXECUTION ORDER is different from WRITTEN ORDER:-- FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT/OFFSET12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
-- =====================================================-- SELECT: From Simple to Complex-- ===================================================== -- 1. Simple retrieval: All columns from a tableSELECT * FROM employees; -- 2. Column projection: Specific columns onlySELECT first_name, last_name, email FROM employees; -- 3. Filtering: WHERE clause with conditionsSELECT * FROM employees WHERE department_id = 10 AND salary > 50000; -- 4. Sorting: ORDER BY for result orderingSELECT first_name, last_name, hire_date FROM employees ORDER BY hire_date DESC, last_name ASC; -- 5. Aliasing: Rename columns in outputSELECT first_name AS "First Name", last_name AS "Last Name", salary * 12 AS annual_salaryFROM employees; -- 6. DISTINCT: Remove duplicate valuesSELECT DISTINCT department_id FROM employees; -- 7. Aggregation: Compute summariesSELECT department_id, COUNT(*) AS employee_count, AVG(salary) AS avg_salary, MIN(hire_date) AS earliest_hire, MAX(salary) AS highest_salary, SUM(salary) AS total_payrollFROM employeesGROUP BY department_id; -- 8. HAVING: Filter aggregated resultsSELECT department_id, COUNT(*) AS employee_countFROM employeesGROUP BY department_idHAVING COUNT(*) > 5; -- 9. JOIN: Combine data from multiple tablesSELECT e.first_name, e.last_name, d.department_name, m.first_name AS manager_nameFROM employees eINNER JOIN departments d ON e.department_id = d.department_idLEFT JOIN employees m ON e.manager_id = m.employee_id; -- 10. Subquery: Query within a querySELECT * FROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- 11. CTE: Common Table Expression for readable complex queriesWITH high_earners AS ( SELECT * FROM employees WHERE salary > 100000),department_stats AS ( SELECT department_id, COUNT(*) AS count FROM high_earners GROUP BY department_id)SELECT d.department_name, ds.countFROM department_stats dsJOIN departments d ON ds.department_id = d.department_id; -- 12. Window Functions: Analytics within result setsSELECT first_name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS dept_avg, salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rankFROM employees;SELECT performance depends heavily on: (1) Indexing—ensure WHERE and JOIN columns are indexed. (2) Selectivity—WHERE clauses that filter more data perform better. (3) Projection—SELECT only needed columns, avoid SELECT *. (4) JOIN order—start with the most selective table. Understanding EXPLAIN/EXPLAIN ANALYZE output is essential for query optimization.
The INSERT statement adds new rows to a table. It is the only DML statement that increases the row count of the database and the primary mechanism by which data enters a system.
INSERT Semantics:
Every INSERT must provide values for all columns that:
The DBMS validates each inserted row against all column constraints, CHECK constraints, and foreign key constraints before the row is permanently added.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- =====================================================-- INSERT Statement: Complete Guide-- ===================================================== -- 1. Single row insert with explicit columnsINSERT INTO employees (first_name, last_name, email, department_id, salary)VALUES ('John', 'Smith', 'john.smith@company.com', 10, 75000); -- 2. Single row with all columns (order must match table definition)INSERT INTO employees VALUES (101, 'Jane', 'Doe', 'jane.doe@company.com', '2024-01-15', NULL, 82000, 'ACTIVE', 10, NULL, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP); -- 3. Multiple rows in single statement (more efficient)INSERT INTO employees (first_name, last_name, email, department_id, salary)VALUES ('Alice', 'Johnson', 'alice.j@company.com', 20, 65000), ('Bob', 'Williams', 'bob.w@company.com', 20, 71000), ('Carol', 'Brown', 'carol.b@company.com', 10, 68000); -- 4. INSERT ... SELECT: Insert from query resultsINSERT INTO employee_archive (employee_id, first_name, last_name, termination_date)SELECT employee_id, first_name, last_name, termination_dateFROM employeesWHERE employment_status = 'TERMINATED'; -- 5. INSERT with DEFAULT valuesINSERT INTO employees (first_name, last_name, email, department_id, salary)VALUES ('New', 'Hire', 'new.hire@company.com', 10, DEFAULT); -- 6. INSERT with RETURNING (PostgreSQL) - get inserted data backINSERT INTO employees (first_name, last_name, email, department_id, salary)VALUES ('David', 'Lee', 'david.lee@company.com', 30, 90000)RETURNING employee_id, first_name, email; -- 7. INSERT or UPDATE on conflict (UPSERT)-- PostgreSQL: ON CONFLICTINSERT INTO employees (employee_id, first_name, last_name, email, department_id, salary)VALUES (101, 'John', 'Smith', 'john.smith@company.com', 10, 80000)ON CONFLICT (employee_id) DO UPDATE SET salary = EXCLUDED.salary, updated_at = CURRENT_TIMESTAMP; -- MySQL: ON DUPLICATE KEY UPDATE-- INSERT INTO employees (employee_id, first_name, ...) VALUES (...)-- ON DUPLICATE KEY UPDATE salary = VALUES(salary); -- 8. INSERT IGNORE (MySQL) - Skip rows that would violate constraints-- INSERT IGNORE INTO employees ... -- 9. Bulk insert from external file (PostgreSQL COPY)-- COPY employees FROM '/path/to/employees.csv' WITH (FORMAT csv, HEADER true);INSERT Performance Considerations:
INSERT operations have significant system impact beyond simply adding data:
Index Maintenance: Every index on the table must be updated. A table with 10 indexes means 10 additional structures to modify per INSERT. This is why high-frequency insert tables often minimize indexes.
Constraint Validation: Foreign key constraints require lookups to parent tables. CHECK constraints must be evaluated. UNIQUE constraints require index probes.
Transaction Logging: Every INSERT is logged to the transaction log (WAL/redo log) for durability and recovery. High-volume inserts can stress logging infrastructure.
Lock Contention: While row-level locking minimizes contention, extremely high INSERT rates can create page-level contention on table hot spots.
Auto-increment Allocation: Sequence-backed IDs require atomic operations that can become bottlenecks at extreme scale.
For large data loads, individual INSERT statements are catastrophically slow. Use: (1) Multi-row INSERT (insert many rows per statement). (2) COPY command (PostgreSQL) or LOAD DATA (MySQL) for file-based bulk loading. (3) Disable indexes during load, rebuild after. (4) Wrap in single transaction or use autocommit=off. (5) Consider unlogged tables or bulk load utilities for extreme volumes.
The UPDATE statement modifies the values of existing rows in a table. It is one of the most powerful—and therefore dangerous—DML statements. An UPDATE without a proper WHERE clause will modify every row in the table, often with catastrophic consequences.
UPDATE Semantics:
UPDATE conceptually works in two phases:
All identified rows are modified atomically as part of the enclosing transaction.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
-- =====================================================-- UPDATE Statement: Complete Guide-- ===================================================== -- 1. Basic UPDATE: Single column, single row (by primary key)UPDATE employees SET salary = 85000 WHERE employee_id = 101; -- 2. UPDATE multiple columnsUPDATE employees SET salary = 90000, employment_status = 'ACTIVE', updated_at = CURRENT_TIMESTAMPWHERE employee_id = 101; -- 3. UPDATE with calculationsUPDATE employees SET salary = salary * 1.10 -- 10% raiseWHERE department_id = 20; -- 4. UPDATE with expressions from other columnsUPDATE employees SET email = LOWER(first_name) || '.' || LOWER(last_name) || '@newdomain.com'WHERE email LIKE '%@olddomain.com'; -- 5. UPDATE with subquery (correlated)UPDATE employees eSET salary = ( SELECT AVG(salary) FROM employees WHERE department_id = e.department_id)WHERE salary < 30000; -- 6. UPDATE with JOIN (PostgreSQL syntax)UPDATE employees eSET salary = e.salary * 1.15FROM departments dWHERE e.department_id = d.department_id AND d.department_name = 'Engineering'; -- 7. UPDATE with JOIN (MySQL syntax)-- UPDATE employees e-- JOIN departments d ON e.department_id = d.department_id-- SET e.salary = e.salary * 1.15-- WHERE d.department_name = 'Engineering'; -- 8. UPDATE with CASE expressionUPDATE employeesSET employment_status = CASE WHEN termination_date IS NOT NULL THEN 'TERMINATED' WHEN hire_date > CURRENT_DATE - INTERVAL '90 days' THEN 'PROBATION' ELSE 'ACTIVE'END; -- 9. UPDATE with RETURNING (PostgreSQL) - see what was changedUPDATE employeesSET salary = salary * 1.05WHERE department_id = 10RETURNING employee_id, first_name, salary AS new_salary; -- 10. UPDATE with LIMIT (MySQL) - update only N rows-- UPDATE employees SET processed = true-- WHERE processed = false-- LIMIT 1000; -- 11. NEVER DO THIS - Updates ALL rows!-- UPDATE employees SET salary = 0; -- DISASTER!Before executing any UPDATE in production, first run a SELECT with the identical WHERE clause to verify exactly which rows will be affected. This simple practice has prevented countless disasters:
-- Step 1: Verify target rows
SELECT * FROM employees WHERE department_id = 10;
-- Step 2: If results look correct, execute UPDATE
UPDATE employees SET salary = salary * 1.10 WHERE department_id = 10;
The DELETE statement removes rows from a table. Like UPDATE, DELETE operates on all rows matching the WHERE clause. A DELETE without WHERE removes every row in the table—a destructive operation that differs from TRUNCATE primarily in its transaction behavior and logging.
DELETE Semantics:
DELETE:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
-- =====================================================-- DELETE Statement: Complete Guide-- ===================================================== -- 1. Basic DELETE: Remove specific rowsDELETE FROM employees WHERE employee_id = 101; -- 2. DELETE with compound conditionsDELETE FROM employees WHERE employment_status = 'TERMINATED' AND termination_date < CURRENT_DATE - INTERVAL '7 years'; -- 3. DELETE with subqueryDELETE FROM orders WHERE customer_id IN ( SELECT customer_id FROM customers WHERE account_status = 'CLOSED'); -- 4. DELETE with EXISTS (often more efficient than IN)DELETE FROM order_items oiWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.order_id = oi.order_id AND o.order_status = 'CANCELLED'); -- 5. DELETE with JOIN (PostgreSQL using USING)DELETE FROM employees eUSING departments dWHERE e.department_id = d.department_id AND d.is_active = false; -- 6. DELETE with JOIN (MySQL syntax)-- DELETE e FROM employees e-- JOIN departments d ON e.department_id = d.department_id-- WHERE d.is_active = false; -- 7. DELETE with RETURNING (PostgreSQL) - capture deleted dataDELETE FROM employees WHERE employment_status = 'TERMINATED'RETURNING employee_id, first_name, last_name, email; -- 8. DELETE with LIMIT (MySQL) - delete in batches-- Useful for large deletes to avoid locking-- DELETE FROM log_entries WHERE created_at < '2023-01-01' LIMIT 10000; -- 9. Soft delete pattern (common alternative to DELETE)UPDATE employees SET deleted_at = CURRENT_TIMESTAMP, deleted = trueWHERE employee_id = 101;-- Later: SELECT * FROM employees WHERE deleted = false; -- 10. DELETE with cascade behavior (depends on FK definition)-- If FK defined with ON DELETE CASCADE:DELETE FROM departments WHERE department_id = 10;-- This automatically deletes all employees in department 10! -- 11. NEVER DO THIS (without WHERE intentionally):-- DELETE FROM employees; -- Removes ALL data!DELETE Performance and Storage:
DELETE operations have significant implications beyond just removing data:
1. Storage Reclamation: DELETE does not immediately reclaim disk space. Deleted rows are marked as "dead" but the space remains allocated. PostgreSQL requires VACUUM to reclaim space; MySQL InnoDB may need OPTIMIZE TABLE for full reclamation.
2. Index Maintenance: Every index must be updated to remove references to deleted rows. Tables with many indexes experience proportionally slower DELETE performance.
3. Foreign Key Checks: If other tables reference the rows being deleted, the DBMS must check referential integrity. Depending on the ON DELETE action (RESTRICT, CASCADE, SET NULL), this may block the delete, cascade to other tables, or update foreign key values.
4. Transaction Log Volume: Unlike TRUNCATE (minimal logging), DELETE logs each deleted row individually. Deleting millions of rows generates massive transaction logs, which affects backup size, replication lag, and point-in-time recovery.
5. Lock Escalation: Deleting large numbers of rows may cause lock escalation from row-level to table-level locks, blocking other transactions.
Many production systems never physically DELETE data. Instead, they use 'soft delete' patterns—adding a deleted_at timestamp or is_deleted boolean. This preserves data for auditing, enables 'undelete' functionality, and avoids referential integrity complications. Application queries simply filter WHERE deleted = false. Physical deletion happens later through scheduled maintenance jobs.
A defining characteristic of DML (that distinguishes it from DDL in most databases) is its participation in transactions. DML statements can be grouped into atomic units of work that either fully succeed or fully fail, maintaining database consistency.
Transaction ACID Properties:
DML operations within transactions guarantee:
| Property | Meaning | DML Example |
|---|---|---|
| Atomicity | All operations complete or none do | Transfer: debit AND credit both happen or neither |
| Consistency | Data moves from one valid state to another | Constraints are never violated after commit |
| Isolation | Concurrent transactions don't interfere | Other users don't see uncommitted changes |
| Durability | Committed changes survive crashes | Once committed, data persists even through power loss |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
-- =====================================================-- DML with Transaction Control-- ===================================================== -- 1. Basic transaction: Transfer $500 from account A to account BBEGIN; UPDATE accounts SET balance = balance - 500 WHERE account_id = 'A';UPDATE accounts SET balance = balance + 500 WHERE account_id = 'B'; -- Verify the transferSELECT account_id, balance FROM accounts WHERE account_id IN ('A', 'B'); -- If everything looks correct:COMMIT; -- If something went wrong:-- ROLLBACK; -- 2. Transaction with savepoints (partial rollback)BEGIN; INSERT INTO orders (customer_id, total) VALUES (100, 299.99);-- Assume this returns order_id = 5001 SAVEPOINT after_order; INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-A', 2);INSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-B', 1); -- Oops, PROD-C is out of stockINSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-C', 3);-- This might fail due to inventory constraint ROLLBACK TO after_order; -- Undo only the items, keep the order -- Try alternative productINSERT INTO order_items (order_id, product_id, quantity) VALUES (5001, 'PROD-D', 3); COMMIT; -- 3. Read-only transaction (optimization for reporting)BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE;SELECT SUM(total) FROM orders WHERE order_date = CURRENT_DATE; COMMIT; -- 4. Explicit locking for critical updatesBEGIN; -- Lock the row to prevent concurrent modificationSELECT * FROM accounts WHERE account_id = 'A' FOR UPDATE; -- Now safe to read, calculate, and updateUPDATE accounts SET balance = balance - 500 WHERE account_id = 'A'; COMMIT;By default, many database clients operate in 'autocommit' mode—each DML statement is immediately committed as its own transaction. This is convenient for ad-hoc queries but dangerous for multi-statement operations that should succeed or fail together. Disable autocommit (BEGIN explicit transactions) for any operation requiring atomicity.
A critical concept for DML mastery is understanding that SQL operates on sets, not individual rows. Every DML statement (except cursors) processes entire result sets at once. This set-based paradigm is fundamentally different from procedural programming and is the key to writing efficient SQL.
The Anti-Pattern: Row-by-Row Processing
Programmers accustomed to procedural languages often write SQL like this (pseudocode):
for each row in SELECT * FROM employees
UPDATE employees SET processed = true WHERE id = row.id
This pattern, implemented via cursors or application loops, is catastrophically slow compared to set-based equivalents.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
-- =====================================================-- Set-Based vs Row-Based: Practical Examples-- ===================================================== -- PROBLEM: Mark all orders over $1000 as 'HIGH_VALUE' -- ❌ ROW-BY-ROW (via cursor - DO NOT DO THIS)-- DECLARE order_cursor CURSOR FOR SELECT order_id FROM orders WHERE total > 1000;-- OPEN order_cursor;-- LOOP-- FETCH order_cursor INTO @order_id;-- UPDATE orders SET category = 'HIGH_VALUE' WHERE order_id = @order_id;-- END LOOP;-- CLOSE order_cursor;-- TIME: Minutes to hours for large tables -- ✅ SET-BASED (proper SQL)UPDATE orders SET category = 'HIGH_VALUE' WHERE total > 1000;-- TIME: Seconds, regardless of table size -- =====================================================-- PROBLEM: Copy active employees to a new table -- ❌ ROW-BY-ROW (from application code)-- for row in db.query("SELECT * FROM employees WHERE status = 'ACTIVE'"):-- db.execute("INSERT INTO active_employees VALUES (...)", row)-- TIME: 1000 rows = 1000 INSERT round-trips -- ✅ SET-BASED (single statement)INSERT INTO active_employees SELECT * FROM employees WHERE status = 'ACTIVE';-- TIME: One operation, bulk optimized -- =====================================================-- PROBLEM: Calculate and update running totals -- ❌ ROW-BY-ROW thinking: loop through, accumulate, update each -- ✅ SET-BASED: Window function + UPDATEUPDATE sales sSET running_total = sub.running_totalFROM ( SELECT sale_id, SUM(amount) OVER (ORDER BY sale_date) AS running_total FROM sales) subWHERE s.sale_id = sub.sale_id;When writing SQL, ask: 'What set of rows do I need to affect, and what transformation do I need to apply?' NOT 'How do I loop through rows and modify each one?' This mental shift is the difference between SQL that runs in seconds vs. hours. If you find yourself reaching for a cursor or loop, pause and look for a set-based alternative.
Data Manipulation Language is the workhorse of database operations—the statements that applications execute millions of times daily to create, read, update, and delete data. DML mastery is essential for anyone who works with databases.
Let's consolidate the key concepts from this page:
What's Next:
With DML mastered, we turn to the language that controls who can execute these operations—Data Control Language (DCL). The next page explores GRANT and REVOKE: the statements that implement database security and access control.
You now understand DML: the language that manipulates database data. From SELECT's query power to INSERT/UPDATE/DELETE's modification capabilities, DML is the bridge between applications and stored data. Next, we'll explore DCL—the language that secures access to database operations.