Loading learning content...
Every piece of data in a relational database started with an INSERT statement. Every customer record, every transaction, every log entry—all were brought into existence through this fundamental SQL operation. While seemingly simple, the INSERT statement is rich with nuances that separate amateur database developers from seasoned professionals.
Understanding INSERT deeply means understanding how data enters your system, how to ensure data integrity during insertion, and how to optimize for both single-record operations and massive bulk loads. In production systems handling millions of insertions per day, these details become critical.
By the end of this page, you will master all forms of the INSERT statement, understand how to handle NULL values and defaults, leverage INSERT with SELECT for data migration, and recognize performance implications of different insertion strategies. You'll be equipped to write INSERTs that are correct, efficient, and maintainable.
The INSERT statement adds new rows to a table. While the concept is straightforward, SQL provides multiple syntactic forms to accommodate different scenarios. Understanding each form—and when to use it—is essential for effective database programming.
The Basic INSERT Syntax:
The INSERT statement follows a consistent logical structure across all its variations. At its core, you specify:
123456789101112
-- Complete syntax pattern for INSERTINSERT INTO table_name (column1, column2, column3, ...)VALUES (value1, value2, value3, ...); -- Example: Inserting a new employee recordINSERT INTO employees (employee_id, first_name, last_name, email, hire_date, salary)VALUES (1001, 'Sarah', 'Mitchell', 'sarah.mitchell@company.com', '2024-01-15', 75000.00); -- The statement above:-- 1. Targets the 'employees' table-- 2. Specifies exactly which columns will receive values-- 3. Provides values in the same order as the column listAlways explicitly list the columns in INSERT statements. While SQL allows omitting the column list (inserting into all columns in their table definition order), this practice creates fragile code. If the table structure changes—a new column is added, or columns are reordered—existing INSERT statements may fail or silently insert data into wrong columns.
Why the Column List Matters:
Consider what happens when columns are omitted versus specified:
123456789101112131415
-- FRAGILE: No column list-- Depends on table column orderINSERT INTO employeesVALUES ( 1001, 'Sarah', 'Mitchell', 'sarah.mitchell@company.com', '2024-01-15', 75000.00); -- If a column is added to the table,-- this INSERT will break with:-- "Column count doesn't match value count"123456789101112131415161718192021
-- ROBUST: Explicit column list-- Independent of table column orderINSERT INTO employees ( employee_id, first_name, last_name, email, hire_date, salary)VALUES ( 1001, 'Sarah', 'Mitchell', 'sarah.mitchell@company.com', '2024-01-15', 75000.00); -- Table changes won't break this INSERT-- as long as listed columns existThe most common form of INSERT adds one row at a time. This is the foundation of all data entry operations, from user registration to order placement to logging events.
Complete Single Row INSERT:
A well-formed single-row INSERT includes all required columns with valid values, respecting data types and constraints:
12345678910111213141516171819202122232425262728293031323334353637383940
-- Creating a sample table to demonstrate INSERT operationsCREATE TABLE products ( product_id INTEGER PRIMARY KEY, product_name VARCHAR(100) NOT NULL, category VARCHAR(50) NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, units_in_stock INTEGER DEFAULT 0, discontinued BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, description TEXT); -- Single row INSERT with all columns specifiedINSERT INTO products ( product_id, product_name, category, unit_price, units_in_stock, discontinued, created_at, description)VALUES ( 1, 'Wireless Bluetooth Headphones', 'Electronics', 79.99, 150, FALSE, '2024-01-20 10:30:00', 'High-quality wireless headphones with noise cancellation and 30-hour battery life.'); -- The INSERT:-- 1. Assigns product_id = 1 (primary key)-- 2. Sets required fields (product_name, category, unit_price)-- 3. Overrides defaults for units_in_stock and discontinued-- 4. Explicitly sets created_at (could let it default)-- 5. Provides optional descriptionData Type Matching:
Values must match or be implicitly convertible to the column's data type. SQL performs implicit type conversions in some cases, but relying on implicit conversion is risky and can lead to subtle bugs.
| Column Type | Valid Value Examples | Invalid/Problematic Values |
|---|---|---|
| INTEGER | 42, 0, -100 | 'forty-two', 3.14 (truncated, not error) |
| VARCHAR(50) | 'Hello', 'A', '' | 123 (works but implicit), string > 50 chars (truncated or error) |
| DECIMAL(10,2) | 99.99, 100, 0.01 | 'expensive', too many digits (overflow) |
| DATE | '2024-01-15', DATE '2024-01-15' | 'January 15' (depends on DB), 2024-01-15 (unquoted) |
| BOOLEAN | TRUE, FALSE, 1, 0 | 'yes' (some DBs), 2 (some DBs treat as true) |
| TIMESTAMP | '2024-01-15 10:30:00', CURRENT_TIMESTAMP | '10:30 AM' (ambiguous) |
In SQL, string values must be enclosed in single quotes ('string'), not double quotes. Double quotes are reserved for identifiers (column/table names) in standard SQL. Writing "Hello" instead of 'Hello' will cause errors or unexpected behavior depending on your database system.
Modern SQL allows inserting multiple rows in a single INSERT statement. This capability dramatically improves performance compared to executing many individual INSERT statements, particularly over network connections where each statement incurs round-trip latency.
Multi-Row VALUES Syntax:
123456789101112131415
-- Inserting multiple rows in a single statementINSERT INTO products (product_id, product_name, category, unit_price, units_in_stock)VALUES (2, 'USB-C Charging Cable', 'Electronics', 12.99, 500), (3, 'Mechanical Keyboard', 'Electronics', 149.99, 75), (4, 'Ergonomic Mouse', 'Electronics', 59.99, 200), (5, 'Monitor Stand', 'Office', 34.99, 120), (6, 'Desk Lamp LED', 'Office', 45.99, 85); -- This single statement inserts 5 rows-- Much more efficient than 5 separate INSERT statements -- Performance comparison (conceptual):-- 5 separate INSERTs: 5 network round-trips, 5 parse operations, 5 commits-- 1 multi-row INSERT: 1 network round-trip, 1 parse operation, 1 commitPerformance Benefits of Multi-Row INSERT:
The performance gains from multi-row INSERT come from several sources:
1234567891011121314151617181920212223
-- Practical example: Seeding initial data for a new application -- Step 1: Insert categoriesINSERT INTO categories (category_id, category_name, description)VALUES (1, 'Electronics', 'Electronic devices and accessories'), (2, 'Office Supplies', 'Office and workspace equipment'), (3, 'Software', 'Software licenses and subscriptions'), (4, 'Services', 'Professional services and support'); -- Step 2: Insert products with category referencesINSERT INTO products (product_id, product_name, category_id, unit_price, units_in_stock, active)VALUES (101, 'Laptop Pro 15"', 1, 1299.99, 50, TRUE), (102, 'Wireless Mouse', 1, 29.99, 200, TRUE), (103, 'USB Hub 7-Port', 1, 39.99, 150, TRUE), (104, 'Standing Desk', 2, 599.99, 25, TRUE), (105, 'Office Chair Ergo', 2, 449.99, 40, TRUE), (106, 'IDE License Annual', 3, 199.99, NULL, TRUE), -- NULL for digital goods (107, 'Cloud Storage 1TB', 3, 99.99, NULL, TRUE), (108, 'Tech Support Monthly', 4, 149.99, NULL, TRUE); -- Note: NULL for units_in_stock where concept doesn't apply (digital/services)While multi-row INSERT is faster, there are practical limits. Most databases have maximum statement size limits. For very large data loads (thousands to millions of rows), break insertions into batches of 100-1000 rows each. This balances performance with memory usage and transaction log size.
Understanding how NULL values and DEFAULT values interact with INSERT is crucial for writing correct data entry logic. These concepts determine what happens when you don't provide a value for a column.
NULL: The Absence of a Value:
NULL represents the absence of a value—not zero, not empty string, but truly unknown or inapplicable. When inserting, you can explicitly insert NULL or allow NULL by omitting the column.
12345678910111213141516171819202122232425
-- Table definition with nullable and non-nullable columnsCREATE TABLE customers ( customer_id INTEGER PRIMARY KEY, first_name VARCHAR(50) NOT NULL, -- Required last_name VARCHAR(50) NOT NULL, -- Required email VARCHAR(100) NOT NULL, -- Required phone VARCHAR(20), -- Optional (allows NULL) fax VARCHAR(20), -- Optional (allows NULL) middle_name VARCHAR(50), -- Optional (allows NULL) notes TEXT -- Optional (allows NULL)); -- Method 1: Explicitly insert NULL for optional columnsINSERT INTO customers (customer_id, first_name, last_name, email, phone, fax, middle_name, notes)VALUES (1, 'John', 'Smith', 'john.smith@email.com', NULL, NULL, NULL, NULL); -- Method 2: Omit optional columns entirely (they become NULL)INSERT INTO customers (customer_id, first_name, last_name, email)VALUES (2, 'Jane', 'Doe', 'jane.doe@email.com');-- phone, fax, middle_name, and notes are automatically NULL -- Method 3: Mix - provide some optional columns, omit othersINSERT INTO customers (customer_id, first_name, last_name, email, phone)VALUES (3, 'Robert', 'Johnson', 'r.johnson@email.com', '555-1234');-- fax, middle_name, notes are NULL; phone has a valueDEFAULT Values:
Columns can have DEFAULT values that are automatically applied when no value is provided. Defaults differ from NULL—they provide an actual value rather than leaving the column as unknown.
123456789101112131415161718192021222324252627282930
-- Table with various DEFAULT definitionsCREATE TABLE orders ( order_id INTEGER PRIMARY KEY, customer_id INTEGER NOT NULL, order_date DATE DEFAULT CURRENT_DATE, status VARCHAR(20) DEFAULT 'pending', priority INTEGER DEFAULT 3, subtotal DECIMAL(10, 2) NOT NULL, tax_rate DECIMAL(4, 4) DEFAULT 0.0825, shipping_cost DECIMAL(8, 2) DEFAULT 0.00, notes TEXT DEFAULT 'No special instructions', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Insert with all defaults appliedINSERT INTO orders (order_id, customer_id, subtotal)VALUES (1001, 42, 150.00);-- Result: order_date = today, status = 'pending', priority = 3,-- tax_rate = 0.0825, shipping_cost = 0.00, -- notes = 'No special instructions', created_at = now -- Override specific defaultsINSERT INTO orders (order_id, customer_id, subtotal, status, priority, shipping_cost)VALUES (1002, 42, 500.00, 'confirmed', 1, 25.00);-- status, priority, shipping_cost are overridden; other defaults still apply -- Using DEFAULT keyword explicitlyINSERT INTO orders (order_id, customer_id, subtotal, status, priority)VALUES (1003, 43, 75.00, DEFAULT, DEFAULT);-- Explicitly requests default values (same as omitting)| Scenario | Column Value | Resulting Data |
|---|---|---|
| Column omitted, has DEFAULT | Auto-applies DEFAULT | order_date = CURRENT_DATE |
| Column omitted, no DEFAULT, allows NULL | NULL assigned | phone = NULL |
| Column omitted, NOT NULL, no DEFAULT | ERROR | Insert fails |
NULL explicitly inserted, allows NULL | NULL assigned | notes = NULL |
NULL explicitly inserted, NOT NULL | ERROR | Insert fails |
DEFAULT keyword used | DEFAULT value assigned | status = 'pending' |
Explicitly inserting NULL bypasses any DEFAULT value. If a column has DEFAULT 'pending' and you insert NULL, the column will be NULL, not 'pending'. This is a common source of bugs—if you want the default, either omit the column or use the DEFAULT keyword.
One of the most powerful features of SQL is the ability to insert data from a query result. The INSERT ... SELECT pattern enables data transformation, migration, and complex data population without moving data to an application layer.
Basic INSERT ... SELECT Syntax:
12345678910111213141516
-- Basic pattern: Insert results of a SELECT into a tableINSERT INTO target_table (column1, column2, column3)SELECT source_col1, source_col2, source_col3FROM source_tableWHERE condition; -- The SELECT must produce columns matching the INSERT column list-- in number, order, and compatible data types -- Example: Copy all active products to an archive tableINSERT INTO products_archive (product_id, product_name, category, unit_price, archived_date)SELECT product_id, product_name, category, unit_price, CURRENT_DATEFROM productsWHERE discontinued = TRUE; -- The SELECT adds CURRENT_DATE as a literal for the archive timestampData Transformation During Insert:
INSERT ... SELECT shines when you need to transform data during the copy process. You can apply functions, compute expressions, and join multiple tables:
1234567891011121314151617181920212223242526272829303132
-- Create a summary table from detailed dataCREATE TABLE monthly_sales_summary ( summary_id INTEGER PRIMARY KEY, year_month VARCHAR(7), category VARCHAR(50), total_orders INTEGER, total_revenue DECIMAL(12, 2), avg_order_value DECIMAL(10, 2), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Populate summary from detailed ordersINSERT INTO monthly_sales_summary (summary_id, year_month, category, total_orders, total_revenue, avg_order_value)SELECT ROW_NUMBER() OVER (ORDER BY DATE_TRUNC('month', o.order_date), p.category), TO_CHAR(o.order_date, 'YYYY-MM'), p.category, COUNT(*), SUM(oi.quantity * oi.unit_price), AVG(oi.quantity * oi.unit_price)FROM orders oJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_idWHERE o.order_date >= DATE '2024-01-01'GROUP BY DATE_TRUNC('month', o.order_date), p.category; -- This single INSERT:-- 1. Joins orders, order_items, and products-- 2. Aggregates data by month and category-- 3. Computes totals and averages-- 4. Generates unique IDs with ROW_NUMBER()-- 5. Stores results in summary table1234567891011121314151617181920212223242526272829303132333435363738
-- Data migration example: Splitting a denormalized table -- Old structure (denormalized)-- old_orders: order_id, customer_name, customer_email, product_name, quantity, price -- New normalized structureCREATE TABLE customers_new ( customer_id INTEGER PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, customer_email VARCHAR(100) UNIQUE NOT NULL); CREATE TABLE orders_new ( order_id INTEGER PRIMARY KEY, customer_id INTEGER REFERENCES customers_new(customer_id), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Step 1: Populate customers (distinct values)INSERT INTO customers_new (customer_id, customer_name, customer_email)SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY customer_email), customer_name, customer_emailFROM old_orders; -- Step 2: Populate orders with customer referencesINSERT INTO orders_new (order_id, customer_id)SELECT o.order_id, c.customer_idFROM old_orders oJOIN customers_new c ON o.customer_email = c.customer_email; -- This migration:-- 1. Extracts unique customers from denormalized data-- 2. Assigns new customer IDs-- 3. Links orders to new customer IDs via email matchingINSERT ... SELECT runs as a single operation, making it highly efficient for bulk data movement. For large operations, ensure adequate transaction log space and consider running during low-traffic periods. Some databases support parallel INSERT ... SELECT for even better performance on large datasets.
A common need after INSERT is to retrieve generated values—particularly auto-generated primary keys, default timestamps, or computed columns. Modern SQL provides mechanisms to return inserted data without a subsequent SELECT.
The RETURNING Clause (PostgreSQL, Oracle, SQLite):
1234567891011121314151617181920212223242526272829303132
-- PostgreSQL / Oracle / SQLite: RETURNING clause-- Get auto-generated ID and timestamp after insert CREATE TABLE users ( user_id SERIAL PRIMARY KEY, -- Auto-increment (PostgreSQL) email VARCHAR(100) NOT NULL UNIQUE, display_name VARCHAR(100), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, is_active BOOLEAN DEFAULT TRUE); -- Insert and return generated valuesINSERT INTO users (email, display_name)VALUES ('newuser@email.com', 'New User')RETURNING user_id, created_at; -- Result:-- user_id | created_at-- --------+-------------------------- 42 | 2024-01-20 10:30:00.123 -- Return all columnsINSERT INTO users (email, display_name)VALUES ('another@email.com', 'Another User')RETURNING *; -- Return expressionsINSERT INTO users (email, display_name)VALUES ('third@email.com', 'Third User')RETURNING user_id, 'User created: ' || display_name AS message, created_at::DATE AS created_date;SQL Server: OUTPUT Clause:
SQL Server uses the OUTPUT clause with special INSERTED pseudo-table:
12345678910111213141516171819202122232425262728
-- SQL Server: OUTPUT clauseCREATE TABLE users ( user_id INT IDENTITY(1,1) PRIMARY KEY, -- Auto-increment (SQL Server) email VARCHAR(100) NOT NULL UNIQUE, display_name VARCHAR(100), created_at DATETIME2 DEFAULT GETDATE(), is_active BIT DEFAULT 1); -- Insert and output the generated valuesINSERT INTO users (email, display_name)OUTPUT INSERTED.user_id, INSERTED.created_atVALUES ('newuser@email.com', 'New User'); -- Output all columnsINSERT INTO users (email, display_name)OUTPUT INSERTED.*VALUES ('another@email.com', 'Another User'); -- Output into a table variable (capture for later use)DECLARE @InsertedUsers TABLE (user_id INT, created_at DATETIME2); INSERT INTO users (email, display_name)OUTPUT INSERTED.user_id, INSERTED.created_at INTO @InsertedUsersVALUES ('third@email.com', 'Third User'); -- Use the captured dataSELECT * FROM @InsertedUsers;MySQL: LAST_INSERT_ID():
MySQL doesn't have RETURNING but provides LAST_INSERT_ID() function:
12345678910111213141516171819202122
-- MySQL: Using LAST_INSERT_ID()CREATE TABLE users ( user_id INT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(100) NOT NULL UNIQUE, display_name VARCHAR(100), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, is_active BOOLEAN DEFAULT TRUE); -- Insert the rowINSERT INTO users (email, display_name)VALUES ('newuser@email.com', 'New User'); -- Get the generated ID (must call immediately after INSERT)SELECT LAST_INSERT_ID();-- Returns: 42 -- Get full row with generated valuesSELECT * FROM users WHERE user_id = LAST_INSERT_ID(); -- IMPORTANT: LAST_INSERT_ID() is connection-specific-- It's safe in concurrent environments—each connection gets its own valueRetrieving generated IDs is critical for application code that needs to reference newly inserted records. Without RETURNING/OUTPUT, you'd need a separate SELECT query, creating a race condition in concurrent environments. Always use the database's native mechanism for this.
Constraints protect data integrity, and INSERT statements must satisfy all applicable constraints to succeed. Understanding constraint violations—and how to handle them—is essential for robust data operations.
Common Constraint Violations:
1234567891011121314151617181920212223242526272829303132333435363738
-- Table with multiple constraintsCREATE TABLE employees ( employee_id INTEGER PRIMARY KEY, email VARCHAR(100) NOT NULL UNIQUE, department_id INTEGER REFERENCES departments(department_id), salary DECIMAL(10, 2) CHECK (salary > 0), hire_date DATE NOT NULL); -- PRIMARY KEY violationINSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (1, 'john@company.com', 10, 50000, '2024-01-15');-- Works INSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (1, 'jane@company.com', 10, 55000, '2024-01-16');-- ERROR: duplicate key value violates unique constraint "employees_pkey" -- UNIQUE violationINSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (2, 'john@company.com', 10, 55000, '2024-01-16');-- ERROR: duplicate key value violates unique constraint "employees_email_key" -- NOT NULL violationINSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (2, NULL, 10, 55000, '2024-01-16');-- ERROR: null value in column "email" violates not-null constraint -- FOREIGN KEY violationINSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (2, 'jane@company.com', 999, 55000, '2024-01-16');-- ERROR: insert or update on table "employees" violates foreign key constraint-- Detail: Key (department_id)=(999) is not present in table "departments" -- CHECK violationINSERT INTO employees (employee_id, email, department_id, salary, hire_date)VALUES (2, 'jane@company.com', 10, -5000, '2024-01-16');-- ERROR: new row violates check constraint "employees_salary_check"| Constraint | Validated On | Resolution Strategy |
|---|---|---|
| PRIMARY KEY | Duplicate key values | Use unique value or auto-generate |
| UNIQUE | Duplicate values in column(s) | Check existence before insert, or use UPSERT |
| NOT NULL | NULL in required column | Provide a value or set DEFAULT |
| FOREIGN KEY | Reference to non-existent parent | Insert parent first, or verify existence |
| CHECK | Value fails boolean expression | Validate data before insert |
In application code, catch constraint violation exceptions and handle them gracefully. For UNIQUE violations, you might want to update instead of insert (see MERGE/UPSERT). For FOREIGN KEY violations, you might queue the insert until the parent record exists.
The INSERT statement is your gateway to populating relational databases. While conceptually simple, mastering its nuances leads to more robust, efficient, and maintainable data operations.
What's Next:
Now that you can add data to tables, the next page covers the UPDATE statement—how to modify existing data efficiently while maintaining referential integrity and data consistency.
You now have a comprehensive understanding of the INSERT statement in all its forms. From single-row insertion to bulk operations, from NULL handling to constraint management, you're equipped to write INSERTs that are correct, performant, and production-ready.