Loading content...
Every relational database begins with a table. Before you can store a single row of data, query any records, or build relationships between entities, you must first define the structure that will hold your data. This is the role of the CREATE TABLE statement—the most fundamental command in SQL's Data Definition Language (DDL).
The CREATE TABLE statement is more than just syntax to memorize. It represents a design decision that echoes throughout your entire system. The choices you make during table creation—column types, constraints, keys, and storage parameters—directly impact query performance, data integrity, application behavior, and maintenance burden for years to come.
In production systems serving millions of users, a poorly designed table schema can cost hundreds of thousands of dollars in compute resources, create nightmare debugging scenarios, and require painful migrations. Conversely, a well-designed table structure becomes the invisible foundation that makes everything else possible.
By the end of this page, you will understand the complete anatomy of CREATE TABLE statements, from basic syntax to advanced features. You'll learn how to define columns with appropriate data types, add constraints that enforce data integrity, and make storage and performance decisions that scale. Most importantly, you'll develop the judgment to create tables that are correct, performant, and maintainable.
At its core, the CREATE TABLE statement instructs the database engine to allocate storage structures and define metadata for a new table. When you execute this command, the database:
Let's begin with the fundamental syntax structure:
1234567891011121314151617
-- Basic CREATE TABLE SyntaxCREATE TABLE table_name ( column1_name data_type [column_constraints], column2_name data_type [column_constraints], ... [table_constraints]); -- Minimal Example: A Simple Employee TableCREATE TABLE employees ( employee_id INTEGER, first_name VARCHAR(50), last_name VARCHAR(50), email VARCHAR(100), hire_date DATE, salary DECIMAL(10, 2));Key components of the CREATE TABLE statement:
CREATE TABLE — The DDL keyword pair initiating table creationtable_name — A unique identifier for the table within its schemacolumn_name — Identifier for each column (field)data_type — The type of data the column can store (INTEGER, VARCHAR, DATE, etc.)column_constraints — Rules applied to individual columns (NOT NULL, UNIQUE, etc.)table_constraints — Rules spanning multiple columns or the entire table (composite keys, foreign keys)Some database systems (like PostgreSQL, Oracle) support CREATE OR REPLACE TABLE which replaces an existing table with the same name. However, standard SQL uses CREATE TABLE which fails if the table already exists. Many databases also support CREATE TABLE IF NOT EXISTS to prevent errors when the table may already exist—particularly useful in idempotent deployment scripts.
Every column definition consists of a name, a data type, and zero or more constraints. Understanding how to craft precise column definitions is essential because:
Let's examine each component in detail:
NOT NULL is specified.1234567891011121314151617181920212223242526272829303132333435363738
-- Comprehensive Column Definition ExamplesCREATE TABLE products ( -- Simple column with just name and type product_id INTEGER, -- Column with NOT NULL constraint product_name VARCHAR(200) NOT NULL, -- Column with default value status VARCHAR(20) DEFAULT 'active', -- Column with default using function created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Column with inline PRIMARY KEY constraint sku VARCHAR(50) PRIMARY KEY, -- Column with inline UNIQUE constraint barcode VARCHAR(13) UNIQUE, -- Column with CHECK constraint unit_price DECIMAL(10, 2) CHECK (unit_price >= 0), -- Column with multiple constraints quantity_in_stock INTEGER NOT NULL DEFAULT 0 CHECK (quantity_in_stock >= 0), -- Column with inline FOREIGN KEY reference category_id INTEGER REFERENCES categories(category_id), -- Boolean with default is_active BOOLEAN DEFAULT TRUE, -- Computed/generated column (SQL:2003 standard) -- total_value AS (unit_price * quantity_in_stock) STORED -- Nullable column (explicit, though redundant) description TEXT NULL);Choosing between nullable and non-nullable columns is a critical design decision. NOT NULL constraints prevent data quality issues but require all inserts to provide values. Nullable columns offer flexibility but introduce three-valued logic complexity in queries. As a general rule: make columns NOT NULL unless you have a specific, valid reason why missing data makes semantic sense for that attribute.
Selecting appropriate data types is one of the most impactful decisions in table design. The wrong choice can waste storage, break queries, cause precision loss, or create application bugs. Let's explore the major data type categories with production considerations:
| Type Category | Common Types | Storage | Use Cases |
|---|---|---|---|
| Integer | TINYINT, SMALLINT, INTEGER, BIGINT | 1-8 bytes | IDs, counts, quantities, flags |
| Decimal | DECIMAL(p,s), NUMERIC(p,s) | Variable | Financial amounts, precise calculations |
| Floating Point | REAL, FLOAT, DOUBLE | 4-8 bytes | Scientific data (approximate values OK) |
| Fixed Character | CHAR(n) | n bytes (padded) | Fixed-format codes (state codes, country ISO) |
| Variable Character | VARCHAR(n) | Actual length + overhead | Names, titles, descriptions, most strings |
| Unbounded Text | TEXT, CLOB | Variable (may be stored externally) | Large documents, articles, JSON strings |
| Date | DATE | 3-4 bytes | Birth dates, scheduled dates (no time) |
| Time | TIME | 3-4 bytes | Daily schedules, duration without date |
| Timestamp | TIMESTAMP, DATETIME | 4-8 bytes | Event logs, created/modified timestamps |
| Boolean | BOOLEAN | 1 byte (typically) | Flags, toggles, binary states |
| Binary | BINARY, VARBINARY, BLOB | Variable | Files, images, encrypted data |
| UUID | UUID | 16 bytes | Distributed unique identifiers |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
-- Integer Types: Choose the Smallest That FitsCREATE TABLE user_metrics ( user_id BIGINT, -- IDs may exceed INTEGER range login_count INTEGER, -- Reasonable count range active_days SMALLINT, -- Max ~32K is plenty for days age TINYINT, -- 0-255 covers human ages flags SMALLINT -- Bit flags in small range); -- Decimal vs Float: Precision Matters for MoneyCREATE TABLE financial_transactions ( transaction_id BIGINT PRIMARY KEY, -- CORRECT: Use DECIMAL for money (exact precision) amount DECIMAL(15, 2) NOT NULL, -- Up to 13 digits + 2 decimals tax_rate DECIMAL(5, 4), -- e.g., 0.0825 for 8.25% -- WRONG for money: FLOAT has precision issues -- amount_wrong FLOAT -- 0.1 + 0.2 != 0.3 in floating point! -- OK for scientific/statistical data where approximation is acceptable statistical_score DOUBLE PRECISION); -- Character Types: CHAR for fixed, VARCHAR for variableCREATE TABLE locations ( location_id INTEGER PRIMARY KEY, country_code CHAR(2) NOT NULL, -- Always exactly 2 chars (US, UK, DE) state_code CHAR(2), -- Fixed format city_name VARCHAR(100) NOT NULL, -- Variable length address VARCHAR(500), -- Variable length description TEXT -- Potentially very long); -- Temporal Types: Be Precise About TimeCREATE TABLE events ( event_id BIGINT PRIMARY KEY, -- DATE: No time component (just year-month-day) event_date DATE NOT NULL, -- TIME: Just time, no date start_time TIME, -- TIMESTAMP: Full date and time created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- TIMESTAMP WITH TIME ZONE: When timezone matters scheduled_at TIMESTAMP WITH TIME ZONE, -- INTERVAL: Duration between times duration INTERVAL);Floating-point types (FLOAT, REAL, DOUBLE) use binary representation that cannot exactly represent many decimal fractions. The expression 0.1 + 0.2 in floating point does NOT equal 0.3. For financial applications, this leads to penny discrepancies that compound into significant errors. Always use DECIMAL or NUMERIC for monetary values.
Choosing the Right Size:
Every data type has storage implications. While modern databases handle most size differences efficiently, the right sizing matters at scale:
The principle: Choose the smallest type that safely accommodates your data's maximum expected value, with reasonable growth margin.
Constraints are rules the database enforces to maintain data integrity. They are your first line of defense against invalid data entering the system. Without proper constraints, applications must implement validation logic, which is error-prone and can be bypassed. With database constraints, invalid data is rejected regardless of which application or query attempts the insert.
Column constraints (also called inline constraints) are specified directly in the column definition:
CHECK (age >= 0)).12345678910111213141516171819202122232425262728293031323334353637383940
-- Comprehensive Constraint ExamplesCREATE TABLE orders ( -- PRIMARY KEY: Unique identifier, never null order_id BIGINT PRIMARY KEY, -- NOT NULL: Required field customer_id BIGINT NOT NULL, -- UNIQUE: No duplicate order numbers order_number VARCHAR(50) UNIQUE NOT NULL, -- DEFAULT: Auto-set if not provided status VARCHAR(20) DEFAULT 'pending', created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- CHECK: Validate data values total_amount DECIMAL(12, 2) CHECK (total_amount >= 0), discount_percent DECIMAL(5, 2) CHECK (discount_percent BETWEEN 0 AND 100), -- FOREIGN KEY (inline syntax) shipping_address_id BIGINT REFERENCES addresses(address_id), -- Multiple constraints on one column priority INTEGER NOT NULL DEFAULT 1 CHECK (priority BETWEEN 1 AND 5)); -- Demonstrating Constraint Behavior-- This INSERT succeeds:INSERT INTO orders (order_id, customer_id, order_number, total_amount)VALUES (1, 100, 'ORD-2024-0001', 99.99); -- This INSERT fails (negative total_amount violates CHECK):-- INSERT INTO orders (order_id, customer_id, order_number, total_amount)-- VALUES (2, 100, 'ORD-2024-0002', -50.00);-- ERROR: new row violates check constraint "orders_total_amount_check" -- This INSERT fails (missing customer_id violates NOT NULL):-- INSERT INTO orders (order_id, order_number)-- VALUES (3, 'ORD-2024-0003');-- ERROR: null value in column "customer_id" violates not-null constraintWhile databases auto-generate constraint names (like 'orders_pkey', 'orders_total_amount_check'), it's good practice to name constraints explicitly for clarity in error messages and migration scripts. Use a consistent pattern like {table}_{column}_{type} (e.g., orders_total_amount_chk, orders_customer_id_fk).
Constraint Enforcement Timing:
Constraints are enforced at specific points during DML operations:
Most constraints are immediate, meaning invalid data is rejected instantly. For complex scenarios involving mutual dependencies, deferred constraints allow temporary violations within a transaction as long as everything is consistent by commit time.
While column constraints apply to individual columns, table-level constraints (also called out-of-line constraints) can reference multiple columns and are defined after all column definitions. They're essential for:
The syntax places these constraints at the end of the column list, separated from column definitions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Table-Level Constraints: The Complete SyntaxCREATE TABLE order_items ( order_id BIGINT NOT NULL, product_id BIGINT NOT NULL, line_number INTEGER NOT NULL, quantity INTEGER NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, discount_amount DECIMAL(10, 2) DEFAULT 0, -- ===== TABLE-LEVEL CONSTRAINTS ===== -- Composite PRIMARY KEY (multiple columns form the unique identifier) CONSTRAINT order_items_pk PRIMARY KEY (order_id, line_number), -- Composite UNIQUE (unique combination of order and product) CONSTRAINT order_items_order_product_uq UNIQUE (order_id, product_id), -- FOREIGN KEY referencing parent tables CONSTRAINT order_items_order_fk FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT order_items_product_fk FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE RESTRICT ON UPDATE CASCADE, -- Multi-column CHECK constraint CONSTRAINT order_items_discount_chk CHECK (discount_amount >= 0 AND discount_amount <= unit_price * quantity), -- Another check involving multiple columns CONSTRAINT order_items_quantity_chk CHECK (quantity > 0)); -- Composite Foreign Key Example (when parent has composite PK)CREATE TABLE shipment_items ( shipment_id BIGINT NOT NULL, order_id BIGINT NOT NULL, line_number INTEGER NOT NULL, quantity_shipped INTEGER NOT NULL, PRIMARY KEY (shipment_id, order_id, line_number), -- Foreign key referencing composite primary key FOREIGN KEY (order_id, line_number) REFERENCES order_items(order_id, line_number));column_name INTEGER PRIMARY KEYCONSTRAINT pk PRIMARY KEY (col1, col2)Foreign key constraints can specify what happens when the referenced row is deleted or updated: CASCADE (propagate change to child rows), RESTRICT (prevent change if child rows exist), SET NULL (set foreign key to NULL), SET DEFAULT (set to default value), or NO ACTION (similar to RESTRICT but checked at different time). Choose based on your data model's requirements.
Beyond basic columns and constraints, modern database systems offer advanced features in CREATE TABLE that address performance, storage, and specialized use cases. While vendor-specific, understanding these capabilities helps you design tables for production requirements:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
-- ============================================-- GENERATED COLUMNS (Computed/Virtual Columns)-- ============================================-- Automatically computed from other columnsCREATE TABLE products_with_computed ( product_id BIGINT PRIMARY KEY, product_name VARCHAR(200) NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, tax_rate DECIMAL(5, 4) DEFAULT 0.0825, -- STORED: Computed at insert/update, stored on disk price_with_tax DECIMAL(10, 2) GENERATED ALWAYS AS (unit_price * (1 + tax_rate)) STORED, -- VIRTUAL: Computed at query time, not stored (PostgreSQL doesn't support, MySQL does) -- discount_price DECIMAL(10, 2) GENERATED ALWAYS AS (unit_price * 0.9) VIRTUAL); -- ============================================-- IDENTITY COLUMNS (Auto-Increment)-- ============================================-- SQL Standard syntax (PostgreSQL 10+, Oracle 12c+)CREATE TABLE employees_identity ( employee_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- Or: GENERATED BY DEFAULT AS IDENTITY (allows manual override) employee_name VARCHAR(100) NOT NULL); -- Alternative: Serial (PostgreSQL specific)-- CREATE TABLE employees_serial (-- employee_id SERIAL PRIMARY KEY, -- Shorthand for sequence + default-- employee_name VARCHAR(100) NOT NULL-- ); -- ============================================-- TEMPORARY TABLES-- ============================================-- Session-scoped: Exists only for current sessionCREATE TEMPORARY TABLE session_calculations ( calc_id SERIAL PRIMARY KEY, input_value DECIMAL(15, 2), result DECIMAL(15, 2)); -- Transaction-scoped (PostgreSQL with ON COMMIT)CREATE TEMPORARY TABLE temp_import_staging ( row_id SERIAL, raw_data TEXT) ON COMMIT DROP; -- Automatically dropped at transaction end -- ============================================-- TABLE PARTITIONING (PostgreSQL 10+ declarative syntax)-- ============================================-- Parent table definitionCREATE TABLE sales ( sale_id BIGINT, sale_date DATE NOT NULL, customer_id BIGINT, amount DECIMAL(12, 2)) PARTITION BY RANGE (sale_date); -- Create partitions for each yearCREATE TABLE sales_2023 PARTITION OF sales FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); CREATE TABLE sales_2024 PARTITION OF sales FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); -- ============================================-- TABLE INHERITANCE (PostgreSQL specific)-- ============================================-- Parent tableCREATE TABLE vehicles ( vehicle_id BIGINT PRIMARY KEY, make VARCHAR(50), model VARCHAR(50), year INTEGER); -- Child table inherits all columnsCREATE TABLE cars ( number_of_doors INTEGER, trunk_size DECIMAL(5, 2)) INHERITS (vehicles); -- ============================================-- CREATE TABLE AS SELECT (CTAS)-- ============================================-- Create table from query results (copies structure AND data)CREATE TABLE high_value_customers ASSELECT customer_id, customer_name, total_orders, lifetime_valueFROM customersWHERE lifetime_value > 10000; -- Create structure only (no data)CREATE TABLE customer_archive (LIKE customers INCLUDING ALL);For idempotent scripts (scripts that can run multiple times safely), use CREATE TABLE IF NOT EXISTS. This prevents errors when the table already exists, making deployment scripts more robust. Example: CREATE TABLE IF NOT EXISTS logs (...);
Creating tables that perform well and maintain data integrity in production requires following proven best practices. These guidelines are distilled from real-world experience managing databases at scale:
{table}_{columns}_{type}) for readable error messages and easier migrations.created_at, updated_at, and created_by provide invaluable debugging and auditing capability.is_deleted flag or deleted_at timestamp instead of physical deletion for audit trails.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
-- Production-Grade Table Template-- Following all best practices for enterprise deployment CREATE TABLE customers ( -- Surrogate primary key with auto-generation customer_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- Natural key candidate (unique business identifier) customer_code VARCHAR(20) NOT NULL, -- Core business data with appropriate constraints email VARCHAR(255) NOT NULL, first_name VARCHAR(100) NOT NULL, last_name VARCHAR(100) NOT NULL, phone_number VARCHAR(20), -- NULL allowed (not all customers provide) -- Enumerated status with CHECK constraint status VARCHAR(20) NOT NULL DEFAULT 'active', -- Computed display name full_name VARCHAR(201) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED, -- Audit columns created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, created_by VARCHAR(100), updated_by VARCHAR(100), -- Soft delete support is_deleted BOOLEAN NOT NULL DEFAULT FALSE, deleted_at TIMESTAMP WITH TIME ZONE, -- ===== TABLE CONSTRAINTS with explicit names ===== CONSTRAINT customers_code_uq UNIQUE (customer_code), CONSTRAINT customers_email_uq UNIQUE (email), CONSTRAINT customers_status_chk CHECK (status IN ('active', 'inactive', 'suspended', 'pending')), CONSTRAINT customers_email_format_chk CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')); -- Add descriptive commentsCOMMENT ON TABLE customers IS 'Core customer master data table for all registered users';COMMENT ON COLUMN customers.customer_code IS 'Business-facing unique customer identifier (format: CUST-XXXXXX)';COMMENT ON COLUMN customers.status IS 'Account status: active, inactive, suspended, or pending verification'; -- Create indexes for common query patternsCREATE INDEX customers_email_idx ON customers(email) WHERE is_deleted = FALSE;CREATE INDEX customers_status_idx ON customers(status) WHERE is_deleted = FALSE;CREATE INDEX customers_created_at_idx ON customers(created_at);A well-designed table should make invalid states unrepresentable. If your constraints and types are correct, it should be impossible to INSERT data that violates business rules. The database becomes a fortress that protects data quality, freeing application code from defensive checks.
The CREATE TABLE statement is your primary tool for building the structural foundation of any relational database system. Let's consolidate the key concepts covered:
What's Next:
Tables evolve over time as requirements change. In the next page, we'll explore ALTER TABLE—the DDL command for modifying existing table structures. You'll learn how to add and remove columns, modify data types, add and drop constraints, and safely refactor schemas in production systems.
You now possess comprehensive knowledge of the CREATE TABLE statement. You understand column definitions, data types, constraints at both column and table level, and advanced features like generated columns and partitioning. Most importantly, you've learned the best practices that distinguish amateur schemas from production-grade designs.