Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

5 / 5

Conversion to 1NF

From Violation to Compliance

Identifying 1NF violations is essential, but incomplete. The real value comes from systematically transforming non-compliant tables into proper First Normal Form. This transformation requires careful planning—you must preserve all data, maintain relationships, update dependent applications, and minimize downtime.

This page provides battle-tested algorithms and procedures for 1NF conversion. Whether you're dealing with comma-separated values, repeating column groups, embedded JSON, or any other violation type, you'll learn systematic approaches that produce correct results reliably.

What You Will Learn

By the end of this page, you will master the complete 1NF conversion algorithm with step-by-step procedures, learn data migration techniques that preserve integrity, understand how to handle each violation type with specific transformation patterns, develop testing and validation strategies to ensure correctness, and practice strategies for minimizing application disruption during conversion.

The 1NF Conversion Algorithm

Converting a table to 1NF follows a systematic process. While specific techniques vary by violation type, the overall algorithm remains consistent.

The 1NF Conversion Process

•Audit the table — Identify all 1NF violations: multi-valued columns, repeating groups, missing keys, mixed domains, encoded data.
•Design the target schema — Create the normalized table structure with atomic columns, proper relationships, and appropriate keys.
•Create new tables — Build the target schema alongside the existing tables (don't drop originals yet).
•Develop migration queries — Write SQL to transform data from the old structure to the new structure.
•Execute migration — Run migration in a transaction; validate row counts and data integrity.
•Update application layer — Modify queries, stored procedures, and application code to use new schema.
•Parallel operation period — Run both old and new schemas simultaneously to verify correctness.
•Deprecate old structure — Once verified, remove the old tables and any compatibility views.

Never Delete Before Migrating

Always create new tables and migrate data before touching the original structure. Keep the original tables intact until migration is verified and applications are updated. This provides rollback capability if issues emerge.

The transformation principle:

Every 1NF conversion follows a core principle: What was implicit becomes explicit, and what was embedded becomes extracted.

Multi-valued cells → Multiple rows in a child table
Repeating columns → Rows in a child table with sequence numbers
Encoded information → Separate atomic columns
Mixed domains → Separate tables per domain or explicit type columns
Missing keys → Add surrogate or identify natural key

Converting Multi-Valued Columns

Multi-valued columns (comma-separated lists, etc.) require splitting values into separate rows in a new child table.

The conversion pattern:

Create a junction/child table
Parse and extract individual values
Insert each value as a separate row
Establish foreign key relationship
Remove the original multi-valued column

convert-multi-valued.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- ORIGINAL: Employees with comma-separated skills
CREATE TABLE Employees_Old (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100),
    Department VARCHAR(50),
    Skills VARCHAR(500)  -- 'Java, Python, SQL, Docker'
);
 
-- Sample data
INSERT INTO Employees_Old VALUES
    (1, 'Alice', 'Engineering', 'Java, Python, SQL'),
    (2, 'Bob', 'Engineering', 'Python, Docker, Kubernetes'),
    (3, 'Charlie', 'Data Science', 'Python, R, SQL, TensorFlow');
 
-- STEP 1: Create normalized tables
CREATE TABLE Employees (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Department VARCHAR(50)
);
 
CREATE TABLE Skills (
    SkillID INT PRIMARY KEY AUTO_INCREMENT,
    SkillName VARCHAR(50) UNIQUE NOT NULL
);
 
CREATE TABLE EmployeeSkills (
    EmpID INT NOT NULL,
    SkillID INT NOT NULL,
    PRIMARY KEY (EmpID, SkillID),
    FOREIGN KEY (EmpID) REFERENCES Employees(EmpID) ON DELETE CASCADE,
    FOREIGN KEY (SkillID) REFERENCES Skills(SkillID)
);
 
-- STEP 2: Migrate employee base data
INSERT INTO Employees (EmpID, Name, Department)
SELECT EmpID, Name, Department FROM Employees_Old;
 
-- STEP 3: Extract and populate distinct skills
-- PostgreSQL approach using UNNEST and STRING_TO_ARRAY
INSERT INTO Skills (SkillName)
SELECT DISTINCT TRIM(skill) AS SkillName
FROM Employees_Old,
     UNNEST(STRING_TO_ARRAY(Skills, ',')) AS skill
WHERE Skills IS NOT NULL;
 
-- STEP 4: Link employees to skills
INSERT INTO EmployeeSkills (EmpID, SkillID)
SELECT e.EmpID, s.SkillID
FROM Employees_Old e,
     UNNEST(STRING_TO_ARRAY(e.Skills, ',')) AS skill_name
JOIN Skills s ON TRIM(skill_name) = s.SkillName
WHERE e.Skills IS NOT NULL;

MySQL approach for string splitting:

MySQL lacks built-in array functions, requiring a different technique:

mysql-string-split.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- MySQL: Using a numbers table for string splitting
-- First, create a numbers table (one-time setup)
CREATE TABLE Numbers (n INT PRIMARY KEY);
INSERT INTO Numbers VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
 
-- Extract skills using SUBSTRING_INDEX
INSERT INTO Skills (SkillName)
SELECT DISTINCT TRIM(SUBSTRING_INDEX(
    SUBSTRING_INDEX(e.Skills, ',', n.n), 
    ',', -1
)) AS SkillName
FROM Employees_Old e
JOIN Numbers n ON n.n <= 1 + LENGTH(e.Skills) - LENGTH(REPLACE(e.Skills, ',', ''))
WHERE e.Skills IS NOT NULL AND e.Skills != '';
 
-- Link employees to skills
INSERT INTO EmployeeSkills (EmpID, SkillID)
SELECT e.EmpID, s.SkillID
FROM Employees_Old e
JOIN Numbers n ON n.n <= 1 + LENGTH(e.Skills) - LENGTH(REPLACE(e.Skills, ',', ''))
JOIN Skills s ON s.SkillName = TRIM(SUBSTRING_INDEX(
    SUBSTRING_INDEX(e.Skills, ',', n.n), 
    ',', -1
))
WHERE e.Skills IS NOT NULL;

Handle Inconsistent Delimiters

Real data often has inconsistent formatting: 'Java,Python' vs 'Java, Python' vs 'Java , Python'. Always TRIM() extracted values and consider normalizing case (LOWER/UPPER) before inserting into the skills table to prevent near-duplicates.

Converting Repeating Column Groups

Repeating column groups require transposing horizontal data into vertical rows. The UNION approach is most reliable across database systems.

The conversion pattern:

Create a child table with one set of the repeated columns
Add a sequence/position column if order matters
UNION each column group into the child table
Filter out NULL values during migration
Remove repeating columns from parent

convert-repeating-groups.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
-- ORIGINAL: Orders with repeating item columns
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2),
    Item4_Name VARCHAR(100),
    Item4_Qty INT,
    Item4_Price DECIMAL(10,2),
    Item5_Name VARCHAR(100),
    Item5_Qty INT,
    Item5_Price DECIMAL(10,2)
);
 
-- Sample data
INSERT INTO Orders_Old VALUES (
    1001, 'Alice Johnson', '2024-03-15',
    'Laptop', 1, 999.99,
    'Mouse', 2, 29.99,
    'Keyboard', 1, 79.99,
    NULL, NULL, NULL,
    NULL, NULL, NULL
);
 
-- STEP 1: Create normalized tables
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100) NOT NULL,
    OrderDate DATE NOT NULL
);
 
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,  -- Preserves original position
    ItemName VARCHAR(100) NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE,
    UNIQUE (OrderID, LineNumber)
);
 
-- STEP 2: Migrate order headers
INSERT INTO Orders (OrderID, CustomerName, OrderDate)
SELECT OrderID, CustomerName, OrderDate FROM Orders_Old;
 
-- STEP 3: Migrate order items using UNION ALL
INSERT INTO OrderItems (OrderID, LineNumber, ItemName, Quantity, UnitPrice)
SELECT OrderID, 1, Item1_Name, Item1_Qty, Item1_Price
FROM Orders_Old
WHERE Item1_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 2, Item2_Name, Item2_Qty, Item2_Price
FROM Orders_Old
WHERE Item2_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 3, Item3_Name, Item3_Qty, Item3_Price
FROM Orders_Old
WHERE Item3_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 4, Item4_Name, Item4_Qty, Item4_Price
FROM Orders_Old
WHERE Item4_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 5, Item5_Name, Item5_Qty, Item5_Price
FROM Orders_Old
WHERE Item5_Name IS NOT NULL;

Validation queries:

validate-migration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Verify row counts match
SELECT 
    'Orders' AS Table_Name,
    (SELECT COUNT(*) FROM Orders_Old) AS OldCount,
    (SELECT COUNT(*) FROM Orders) AS NewCount;
 
-- Verify item counts match
WITH OldItemCounts AS (
    SELECT 
        OrderID,
        (CASE WHEN Item1_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item2_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item3_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item4_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item5_Name IS NOT NULL THEN 1 ELSE 0 END) AS ItemCount
    FROM Orders_Old
),
NewItemCounts AS (
    SELECT OrderID, COUNT(*) AS ItemCount
    FROM OrderItems
    GROUP BY OrderID
)
SELECT 
    COALESCE(o.OrderID, n.OrderID) AS OrderID,
    o.ItemCount AS OldItemCount,
    n.ItemCount AS NewItemCount,
    CASE WHEN o.ItemCount = n.ItemCount THEN 'MATCH' ELSE 'MISMATCH' END AS Status
FROM OldItemCounts o
FULL OUTER JOIN NewItemCounts n ON o.OrderID = n.OrderID
WHERE o.ItemCount != n.ItemCount OR o.OrderID IS NULL OR n.OrderID IS NULL;
 
-- Verify revenue totals match
SELECT 
    'Revenue Check' AS Validation,
    (SELECT SUM(
        COALESCE(Item1_Qty * Item1_Price, 0) +
        COALESCE(Item2_Qty * Item2_Price, 0) +
        COALESCE(Item3_Qty * Item3_Price, 0) +
        COALESCE(Item4_Qty * Item4_Price, 0) +
        COALESCE(Item5_Qty * Item5_Price, 0)
    ) FROM Orders_Old) AS OldTotal,
    (SELECT SUM(Quantity * UnitPrice) FROM OrderItems) AS NewTotal;

Converting Embedded JSON/Complex Types

JSON columns containing query-critical data must be decomposed into atomic columns or related tables. Modern databases provide JSON extraction functions for this purpose.

The conversion pattern:

Analyze JSON structure to identify all properties
Create atomic columns for scalar properties
Create child tables for array properties
Extract values using JSON path expressions
Handle missing/null JSON properties appropriately

convert-json-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- ORIGINAL: Orders with JSON item data
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderData JSON
    /*
    Sample OrderData:
    {
        "shippingAddress": {
            "street": "123 Main St",
            "city": "Boston",
            "state": "MA",
            "zip": "02101"
        },
        "items": [
            {"sku": "LAPTOP", "qty": 1, "price": 999.99},
            {"sku": "MOUSE", "qty": 2, "price": 29.99}
        ],
        "notes": "Gift wrap requested"
    }
    */
);
 
-- STEP 1: Create normalized tables
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT NOT NULL,
    OrderDate DATE NOT NULL,
    -- Scalar JSON properties become columns
    ShippingStreet VARCHAR(200),
    ShippingCity VARCHAR(100),
    ShippingState VARCHAR(50),
    ShippingZip VARCHAR(20),
    Notes TEXT
);
 
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,
    OrderID INT NOT NULL,
    SKU VARCHAR(50) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- STEP 2: Migrate order data (PostgreSQL JSONB syntax)
INSERT INTO Orders (
    OrderID, CustomerID, OrderDate,
    ShippingStreet, ShippingCity, ShippingState, ShippingZip, Notes
)
SELECT 
    OrderID,
    CustomerID,
    OrderDate,
    OrderData->'shippingAddress'->>'street',
    OrderData->'shippingAddress'->>'city',
    OrderData->'shippingAddress'->>'state',
    OrderData->'shippingAddress'->>'zip',
    OrderData->>'notes'
FROM Orders_Old;
 
-- STEP 3: Migrate items from JSON array (PostgreSQL)
INSERT INTO OrderItems (OrderID, SKU, Quantity, Price)
SELECT 
    o.OrderID,
    item->>'sku',
    (item->>'qty')::INT,
    (item->>'price')::DECIMAL(10,2)
FROM Orders_Old o,
     JSONB_ARRAY_ELEMENTS(o.OrderData->'items') AS item;

MySQL JSON extraction syntax:

mysql-json-extract.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- MySQL 8.0+ JSON extraction
INSERT INTO Orders (
    OrderID, CustomerID, OrderDate,
    ShippingStreet, ShippingCity, ShippingState, ShippingZip, Notes
)
SELECT 
    OrderID,
    CustomerID,
    OrderDate,
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.street')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.city')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.state')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.zip')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.notes'))
FROM Orders_Old;
 
-- MySQL: Extract JSON array elements using JSON_TABLE (MySQL 8.0.4+)
INSERT INTO OrderItems (OrderID, SKU, Quantity, Price)
SELECT 
    o.OrderID,
    jt.sku,
    jt.qty,
    jt.price
FROM Orders_Old o,
     JSON_TABLE(
         o.OrderData,
         '$.items[*]'
         COLUMNS (
             sku VARCHAR(50) PATH '$.sku',
             qty INT PATH '$.qty',
             price DECIMAL(10,2) PATH '$.price'
         )
     ) AS jt;

Handle Missing JSON Properties

JSON data often has missing properties. Use COALESCE or NULLIF to handle cases where properties don't exist. Test extraction on sample data before running full migration to catch path errors and type mismatches.

Converting Encoded Information

Encoded information must be decomposed into separate atomic columns. This typically involves string parsing and may require lookup tables to translate codes.

The conversion pattern:

Document the encoding scheme completely
Create new columns for each encoded component
Use SUBSTRING/regex to extract components
Optionally create lookup tables for code translations
Consider keeping original code for backward compatibility

convert-encoded-keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
-- ORIGINAL: Products with encoded ProductCode
-- Format: 'CATEGORY-YYYY-WAREHOUSE-SEQUENCE'
-- Example: 'ELEC-2024-NYC-00142'
CREATE TABLE Products_Old (
    ProductCode VARCHAR(25) PRIMARY KEY,
    Description VARCHAR(200),
    Price DECIMAL(10,2)
);
 
INSERT INTO Products_Old VALUES
    ('ELEC-2024-NYC-00142', 'Gaming Laptop', 1299.99),
    ('ELEC-2024-LAX-00089', 'Wireless Mouse', 49.99),
    ('FURN-2023-CHI-00331', 'Office Chair', 299.99);
 
-- STEP 1: Create normalized table with atomic columns
CREATE TABLE Products (
    ProductID INT PRIMARY KEY AUTO_INCREMENT,
    Category VARCHAR(20) NOT NULL,
    ProductionYear INT NOT NULL,
    WarehouseCode VARCHAR(10) NOT NULL,
    SequenceNumber INT NOT NULL,
    Description VARCHAR(200),
    Price DECIMAL(10,2),
    LegacyCode VARCHAR(25) UNIQUE  -- Keep for backward compatibility
);
 
-- STEP 2: Optional - Create lookup tables
CREATE TABLE Categories (
    CategoryCode VARCHAR(20) PRIMARY KEY,
    CategoryName VARCHAR(100) NOT NULL
);
 
INSERT INTO Categories VALUES 
    ('ELEC', 'Electronics'),
    ('FURN', 'Furniture'),
    ('CLTH', 'Clothing');
 
CREATE TABLE Warehouses (
    WarehouseCode VARCHAR(10) PRIMARY KEY,
    City VARCHAR(100) NOT NULL,
    Region VARCHAR(50)
);
 
INSERT INTO Warehouses VALUES
    ('NYC', 'New York', 'East'),
    ('LAX', 'Los Angeles', 'West'),
    ('CHI', 'Chicago', 'Midwest');
 
-- STEP 3: Migrate with parsing (PostgreSQL/standard SQL)
INSERT INTO Products (
    Category, ProductionYear, WarehouseCode, SequenceNumber,
    Description, Price, LegacyCode
)
SELECT 
    SPLIT_PART(ProductCode, '-', 1) AS Category,
    CAST(SPLIT_PART(ProductCode, '-', 2) AS INT) AS ProductionYear,
    SPLIT_PART(ProductCode, '-', 3) AS WarehouseCode,
    CAST(SPLIT_PART(ProductCode, '-', 4) AS INT) AS SequenceNumber,
    Description,
    Price,
    ProductCode  -- Preserve original
FROM Products_Old;
 
-- Alternative for MySQL (using SUBSTRING_INDEX)
INSERT INTO Products (
    Category, ProductionYear, WarehouseCode, SequenceNumber,
    Description, Price, LegacyCode
)
SELECT 
    SUBSTRING_INDEX(ProductCode, '-', 1),
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(ProductCode, '-', 2), '-', -1) AS UNSIGNED),
    SUBSTRING_INDEX(SUBSTRING_INDEX(ProductCode, '-', 3), '-', -1),
    CAST(SUBSTRING_INDEX(ProductCode, '-', -1) AS UNSIGNED),
    Description,
    Price,
    ProductCode
FROM Products_Old;

Converting status flags:

convert-status-flags.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- ORIGINAL: Users with encoded status flags
-- StatusFlags: 'AVN' = Active, Verified, Notifications-on
-- Position 1: A=Active, I=Inactive
-- Position 2: V=Verified, U=Unverified  
-- Position 3: N=Notifications-on, X=Notifications-off
CREATE TABLE Users_Old (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    StatusFlags VARCHAR(10)
);
 
INSERT INTO Users_Old VALUES
    (1, 'alice', 'AVN'),
    (2, 'bob', 'AUX'),
    (3, 'charlie', 'IVN');
 
-- STEP 1: Create table with boolean columns
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50) NOT NULL,
    IsActive BOOLEAN NOT NULL DEFAULT TRUE,
    IsVerified BOOLEAN NOT NULL DEFAULT FALSE,
    NotificationsEnabled BOOLEAN NOT NULL DEFAULT TRUE
);
 
-- STEP 2: Migrate with flag parsing
INSERT INTO Users (UserID, Username, IsActive, IsVerified, NotificationsEnabled)
SELECT 
    UserID,
    Username,
    CASE SUBSTRING(StatusFlags, 1, 1) 
        WHEN 'A' THEN TRUE 
        WHEN 'I' THEN FALSE 
        ELSE TRUE  -- Default
    END AS IsActive,
    CASE SUBSTRING(StatusFlags, 2, 1) 
        WHEN 'V' THEN TRUE 
        WHEN 'U' THEN FALSE 
        ELSE FALSE  -- Default
    END AS IsVerified,
    CASE SUBSTRING(StatusFlags, 3, 1) 
        WHEN 'N' THEN TRUE 
        WHEN 'X' THEN FALSE 
        ELSE TRUE  -- Default
    END AS NotificationsEnabled
FROM Users_Old;

Adding Primary Keys to Keyless Tables

Tables without primary keys must have one added. The approach depends on whether a natural key can be identified or a surrogate key must be introduced.

The decision process:

Analyze existing columns for natural key candidates
Check if any column combination is unique
If natural key exists, promote it to primary key
If not, add a surrogate key column
Handle any existing duplicates before adding uniqueness constraint

add-primary-keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- ORIGINAL: Event logs without primary key
CREATE TABLE EventLogs_Old (
    EventTime TIMESTAMP,
    EventType VARCHAR(50),
    UserID INT,
    EventData TEXT
    -- No PRIMARY KEY!
);
 
-- STEP 1: Check for potential natural key
SELECT EventTime, EventType, UserID, COUNT(*) as Occurrences
FROM EventLogs_Old
GROUP BY EventTime, EventType, UserID
HAVING COUNT(*) > 1;
 
-- If duplicates exist, we need a surrogate key
 
-- STEP 2: Handle duplicates (if needed)
-- Option A: Keep only first occurrence
CREATE TABLE EventLogs_Deduped AS
SELECT DISTINCT ON (EventTime, EventType, UserID) *
FROM EventLogs_Old
ORDER BY EventTime, EventType, UserID;
 
-- Option B: Keep all with surrogate key
CREATE TABLE EventLogs (
    EventLogID BIGINT PRIMARY KEY AUTO_INCREMENT,
    EventTime TIMESTAMP NOT NULL,
    EventType VARCHAR(50) NOT NULL,
    UserID INT,
    EventData TEXT,
    -- Add index on common query patterns
    INDEX idx_event_time (EventTime),
    INDEX idx_user_events (UserID, EventTime)
);
 
INSERT INTO EventLogs (EventTime, EventType, UserID, EventData)
SELECT EventTime, EventType, UserID, EventData
FROM EventLogs_Old;
 
-- STEP 3: If natural key exists and is unique
-- Add constraint directly
ALTER TABLE EventLogs_Old
ADD CONSTRAINT pk_events PRIMARY KEY (EventTime, EventType, UserID);
 
-- Or create new table with proper key
CREATE TABLE EventLogs (
    EventTime TIMESTAMP NOT NULL,
    EventType VARCHAR(50) NOT NULL,
    UserID INT NOT NULL,
    EventData TEXT,
    PRIMARY KEY (EventTime, EventType, UserID)
);

Surrogate vs. Natural Key Decision

Use natural keys when: the combination is stable, meaningful, and commonly used in queries. Use surrogate keys when: no natural key exists, the natural key is large/composite, values might change, or you need join performance. For event logs, surrogates are usually better since sequential IDs are efficient for indexing.

Application Transition Strategy

Schema changes require corresponding application updates. A phased transition minimizes risk and allows rollback if issues emerge.

Phased Transition Approach

•Phase 1: Parallel Schema — Create new tables alongside old. Keep both populated. Application reads from old, writes to both.
•Phase 2: Compatibility Views — Create views with old structure that query new tables. Switch application to write-new, read-new with views for legacy.
•Phase 3: Direct Access — Update application to use new schema directly. Keep views for reporting/compatibility.
•Phase 4: Cleanup — After validation period, drop old tables and compatibility views. Archive if needed.

compatibility-views.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Create compatibility view that mimics old structure
-- New tables: Employees (EmpID, Name, Department), EmployeeSkills (EmpID, SkillID), Skills (SkillID, SkillName)
 
-- View that looks like the old comma-separated structure
CREATE VIEW Employees_Legacy AS
SELECT 
    e.EmpID,
    e.Name,
    e.Department,
    STRING_AGG(s.SkillName, ', ' ORDER BY s.SkillName) AS Skills
FROM Employees e
LEFT JOIN EmployeeSkills es ON e.EmpID = es.EmpID
LEFT JOIN Skills s ON es.SkillID = s.SkillID
GROUP BY e.EmpID, e.Name, e.Department;
 
-- MySQL equivalent using GROUP_CONCAT
CREATE VIEW Employees_Legacy AS
SELECT 
    e.EmpID,
    e.Name,
    e.Department,
    GROUP_CONCAT(s.SkillName ORDER BY s.SkillName SEPARATOR ', ') AS Skills
FROM Employees e
LEFT JOIN EmployeeSkills es ON e.EmpID = es.EmpID
LEFT JOIN Skills s ON es.SkillID = s.SkillID
GROUP BY e.EmpID, e.Name, e.Department;
 
-- Legacy code can SELECT from Employees_Legacy and see familiar structure
-- New code writes to normalized tables
-- View automatically reflects current data
 
-- For INSERTs, create a trigger or stored procedure
-- that accepts old format and inserts into new tables
CREATE PROCEDURE AddEmployeeWithSkills(
    IN p_EmpID INT,
    IN p_Name VARCHAR(100),
    IN p_Department VARCHAR(50),
    IN p_SkillsCSV VARCHAR(500)
)
BEGIN
    -- Insert employee
    INSERT INTO Employees (EmpID, Name, Department)
    VALUES (p_EmpID, p_Name, p_Department);
    
    -- Parse and insert skills
    -- (Implementation depends on DBMS - see earlier string splitting examples)
END;

Monitor Both Paths

During parallel operation, monitor both old and new query patterns. Track query performance, error rates, and data consistency. Set alerts for discrepancies between old and new structures. This monitoring provides confidence to proceed with deprecation.

Complete Conversion Example

Let's walk through a complete 1NF conversion for a table with multiple violation types.

complete-conversion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
-- ORIGINAL: Multiple 1NF violations in one table
CREATE TABLE StudentRecords_Old (
    -- Violation 1: Encoded student ID (DEPT-YEAR-SEQ)
    StudentCode VARCHAR(20) PRIMARY KEY,
    -- Violation 2: Combined name
    FullName VARCHAR(100),
    -- Violation 3: Multi-valued email addresses
    Emails VARCHAR(500),
    -- Violation 4: Repeating course groups
    Course1_Name VARCHAR(100),
    Course1_Grade VARCHAR(2),
    Course2_Name VARCHAR(100),
    Course2_Grade VARCHAR(2),
    Course3_Name VARCHAR(100),
    Course3_Grade VARCHAR(2)
);
 
INSERT INTO StudentRecords_Old VALUES (
    'CS-2024-0042',
    'John A. Smith',
    'john@university.edu, john.smith@gmail.com',
    'Database Systems', 'A',
    'Algorithms', 'B+',
    'Web Development', 'A-'
);
 
-----------------------------------------------------------------
-- STEP 1: DESIGN TARGET SCHEMA
-----------------------------------------------------------------
 
-- Departments reference table
CREATE TABLE Departments (
    DeptCode VARCHAR(10) PRIMARY KEY,
    DeptName VARCHAR(100) NOT NULL
);
 
-- Students table with atomic columns
CREATE TABLE Students (
    StudentID INT PRIMARY KEY AUTO_INCREMENT,
    DeptCode VARCHAR(10) NOT NULL,
    EnrollmentYear INT NOT NULL,
    SequenceNumber INT NOT NULL,
    FirstName VARCHAR(50) NOT NULL,
    MiddleName VARCHAR(50),
    LastName VARCHAR(50) NOT NULL,
    LegacyCode VARCHAR(20) UNIQUE,
    FOREIGN KEY (DeptCode) REFERENCES Departments(DeptCode),
    UNIQUE (DeptCode, EnrollmentYear, SequenceNumber)
);
 
-- Email addresses as separate table
CREATE TABLE StudentEmails (
    StudentID INT NOT NULL,
    Email VARCHAR(200) NOT NULL,
    IsPrimary BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (StudentID, Email),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID) ON DELETE CASCADE
);
 
-- Courses reference table
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY AUTO_INCREMENT,
    CourseName VARCHAR(100) NOT NULL UNIQUE
);
 
-- Enrollments junction table
CREATE TABLE Enrollments (
    StudentID INT NOT NULL,
    CourseID INT NOT NULL,
    Grade VARCHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
 
-----------------------------------------------------------------
-- STEP 2: POPULATE LOOKUP TABLES
-----------------------------------------------------------------
 
INSERT INTO Departments VALUES ('CS', 'Computer Science'), ('EE', 'Electrical Engineering');
 
-----------------------------------------------------------------
-- STEP 3: MIGRATE STUDENTS (parse encoded code and name)
-----------------------------------------------------------------
 
INSERT INTO Students (DeptCode, EnrollmentYear, SequenceNumber, 
                      FirstName, MiddleName, LastName, LegacyCode)
SELECT 
    SPLIT_PART(StudentCode, '-', 1),
    CAST(SPLIT_PART(StudentCode, '-', 2) AS INT),
    CAST(SPLIT_PART(StudentCode, '-', 3) AS INT),
    -- Parse name: assume "First Middle Last" format
    SPLIT_PART(FullName, ' ', 1),
    CASE 
        WHEN array_length(string_to_array(FullName, ' '), 1) = 3 
        THEN SPLIT_PART(FullName, ' ', 2) 
        ELSE NULL 
    END,
    SPLIT_PART(FullName, ' ', -1),  -- Last word
    StudentCode
FROM StudentRecords_Old;
 
-----------------------------------------------------------------
-- STEP 4: MIGRATE EMAILS (parse multi-valued column)
-----------------------------------------------------------------
 
INSERT INTO StudentEmails (StudentID, Email, IsPrimary)
SELECT 
    s.StudentID,
    TRIM(email),
    ROW_NUMBER() OVER (PARTITION BY s.StudentID ORDER BY email) = 1
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode,
     UNNEST(STRING_TO_ARRAY(o.Emails, ',')) AS email
WHERE o.Emails IS NOT NULL AND TRIM(email) != '';
 
-----------------------------------------------------------------
-- STEP 5: MIGRATE COURSES (extract distinct courses)
-----------------------------------------------------------------
 
INSERT INTO Courses (CourseName)
SELECT DISTINCT CourseName FROM (
    SELECT Course1_Name AS CourseName FROM StudentRecords_Old WHERE Course1_Name IS NOT NULL
    UNION
    SELECT Course2_Name FROM StudentRecords_Old WHERE Course2_Name IS NOT NULL
    UNION
    SELECT Course3_Name FROM StudentRecords_Old WHERE Course3_Name IS NOT NULL
) AllCourses;
 
-----------------------------------------------------------------
-- STEP 6: MIGRATE ENROLLMENTS (flatten repeating groups)
-----------------------------------------------------------------
 
INSERT INTO Enrollments (StudentID, CourseID, Grade)
SELECT s.StudentID, c.CourseID, o.Course1_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course1_Name
WHERE o.Course1_Name IS NOT NULL
 
UNION ALL
 
SELECT s.StudentID, c.CourseID, o.Course2_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course2_Name
WHERE o.Course2_Name IS NOT NULL
 
UNION ALL
 
SELECT s.StudentID, c.CourseID, o.Course3_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course3_Name
WHERE o.Course3_Name IS NOT NULL;
 
-----------------------------------------------------------------
-- STEP 7: VALIDATE
-----------------------------------------------------------------
 
-- Check student count
SELECT 'Students' AS Entity,
    (SELECT COUNT(*) FROM StudentRecords_Old) AS OldCount,
    (SELECT COUNT(*) FROM Students) AS NewCount;
 
-- Check email migration
SELECT s.FirstName, s.LastName, array_agg(e.Email) AS Emails
FROM Students s
LEFT JOIN StudentEmails e ON s.StudentID = e.StudentID
GROUP BY s.StudentID, s.FirstName, s.LastName;
 
-- Check enrollment counts match
SELECT 
    o.StudentCode,
    (CASE WHEN o.Course1_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN o.Course2_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN o.Course3_Name IS NOT NULL THEN 1 ELSE 0 END) AS OldCourseCount,
    (SELECT COUNT(*) FROM Enrollments e 
     JOIN Students s ON e.StudentID = s.StudentID 
     WHERE s.LegacyCode = o.StudentCode) AS NewCourseCount
FROM StudentRecords_Old o;

Summary: Mastering 1NF Conversion

You now have the complete toolkit for converting any table to First Normal Form. Let's consolidate the key techniques and principles:

Key Takeaways

•Follow the conversion algorithm systematically — Audit, design, create, migrate, validate, transition, deprecate. Don't skip steps.
•Multi-valued columns become child tables — Split delimited strings into rows, creating junction tables with foreign keys.
•Repeating groups use UNION — Transpose horizontal columns into vertical rows with sequence numbers.
•JSON extraction uses path functions — Extract scalar properties to columns, array elements to child tables.
•Encoded data decomposes into atomic columns — Parse each component separately; consider keeping legacy codes for compatibility.
•Missing keys require analysis — Identify natural keys or add surrogates; handle duplicates before adding constraints.
•Transition applications in phases — Use compatibility views, parallel operation, and gradual cutover to minimize risk.
•Validate at every step — Row counts, totals, sample verification, and discrepancy detection ensure data integrity.

Module Complete:

You have completed the First Normal Form module. You now understand what 1NF requires (atomicity, no repeating groups, unique rows), can recognize all types of violations, know how to flatten hierarchical data, and can systematically convert any table to 1NF compliance.

With 1NF as the foundation, subsequent modules will build on this base to address Second Normal Form (eliminating partial dependencies) and Third Normal Form (eliminating transitive dependencies).

Module Complete: First Normal Form

Congratulations! You have mastered First Normal Form—the foundational level of database normalization. You can now identify violations, design compliant schemas, and execute systematic conversions. Continue to Second Normal Form to learn how to eliminate partial dependencies within your 1NF-compliant tables.

5 / 5

Loading learning content...

Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

5 / 5

Conversion to 1NF

From Violation to Compliance

What You Will Learn

The 1NF Conversion Algorithm

Converting a table to 1NF follows a systematic process. While specific techniques vary by violation type, the overall algorithm remains consistent.

The 1NF Conversion Process

•Audit the table — Identify all 1NF violations: multi-valued columns, repeating groups, missing keys, mixed domains, encoded data.
•Design the target schema — Create the normalized table structure with atomic columns, proper relationships, and appropriate keys.
•Create new tables — Build the target schema alongside the existing tables (don't drop originals yet).
•Develop migration queries — Write SQL to transform data from the old structure to the new structure.
•Execute migration — Run migration in a transaction; validate row counts and data integrity.
•Update application layer — Modify queries, stored procedures, and application code to use new schema.
•Parallel operation period — Run both old and new schemas simultaneously to verify correctness.
•Deprecate old structure — Once verified, remove the old tables and any compatibility views.

Never Delete Before Migrating

The transformation principle:

Every 1NF conversion follows a core principle: What was implicit becomes explicit, and what was embedded becomes extracted.

Multi-valued cells → Multiple rows in a child table
Repeating columns → Rows in a child table with sequence numbers
Encoded information → Separate atomic columns
Mixed domains → Separate tables per domain or explicit type columns
Missing keys → Add surrogate or identify natural key

Converting Multi-Valued Columns

Multi-valued columns (comma-separated lists, etc.) require splitting values into separate rows in a new child table.

The conversion pattern:

Create a junction/child table
Parse and extract individual values
Insert each value as a separate row
Establish foreign key relationship
Remove the original multi-valued column

convert-multi-valued.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- ORIGINAL: Employees with comma-separated skills
CREATE TABLE Employees_Old (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100),
    Department VARCHAR(50),
    Skills VARCHAR(500)  -- 'Java, Python, SQL, Docker'
);
 
-- Sample data
INSERT INTO Employees_Old VALUES
    (1, 'Alice', 'Engineering', 'Java, Python, SQL'),
    (2, 'Bob', 'Engineering', 'Python, Docker, Kubernetes'),
    (3, 'Charlie', 'Data Science', 'Python, R, SQL, TensorFlow');
 
-- STEP 1: Create normalized tables
CREATE TABLE Employees (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Department VARCHAR(50)
);
 
CREATE TABLE Skills (
    SkillID INT PRIMARY KEY AUTO_INCREMENT,
    SkillName VARCHAR(50) UNIQUE NOT NULL
);
 
CREATE TABLE EmployeeSkills (
    EmpID INT NOT NULL,
    SkillID INT NOT NULL,
    PRIMARY KEY (EmpID, SkillID),
    FOREIGN KEY (EmpID) REFERENCES Employees(EmpID) ON DELETE CASCADE,
    FOREIGN KEY (SkillID) REFERENCES Skills(SkillID)
);
 
-- STEP 2: Migrate employee base data
INSERT INTO Employees (EmpID, Name, Department)
SELECT EmpID, Name, Department FROM Employees_Old;
 
-- STEP 3: Extract and populate distinct skills
-- PostgreSQL approach using UNNEST and STRING_TO_ARRAY
INSERT INTO Skills (SkillName)
SELECT DISTINCT TRIM(skill) AS SkillName
FROM Employees_Old,
     UNNEST(STRING_TO_ARRAY(Skills, ',')) AS skill
WHERE Skills IS NOT NULL;
 
-- STEP 4: Link employees to skills
INSERT INTO EmployeeSkills (EmpID, SkillID)
SELECT e.EmpID, s.SkillID
FROM Employees_Old e,
     UNNEST(STRING_TO_ARRAY(e.Skills, ',')) AS skill_name
JOIN Skills s ON TRIM(skill_name) = s.SkillName
WHERE e.Skills IS NOT NULL;

MySQL approach for string splitting:

MySQL lacks built-in array functions, requiring a different technique:

mysql-string-split.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- MySQL: Using a numbers table for string splitting
-- First, create a numbers table (one-time setup)
CREATE TABLE Numbers (n INT PRIMARY KEY);
INSERT INTO Numbers VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
 
-- Extract skills using SUBSTRING_INDEX
INSERT INTO Skills (SkillName)
SELECT DISTINCT TRIM(SUBSTRING_INDEX(
    SUBSTRING_INDEX(e.Skills, ',', n.n), 
    ',', -1
)) AS SkillName
FROM Employees_Old e
JOIN Numbers n ON n.n <= 1 + LENGTH(e.Skills) - LENGTH(REPLACE(e.Skills, ',', ''))
WHERE e.Skills IS NOT NULL AND e.Skills != '';
 
-- Link employees to skills
INSERT INTO EmployeeSkills (EmpID, SkillID)
SELECT e.EmpID, s.SkillID
FROM Employees_Old e
JOIN Numbers n ON n.n <= 1 + LENGTH(e.Skills) - LENGTH(REPLACE(e.Skills, ',', ''))
JOIN Skills s ON s.SkillName = TRIM(SUBSTRING_INDEX(
    SUBSTRING_INDEX(e.Skills, ',', n.n), 
    ',', -1
))
WHERE e.Skills IS NOT NULL;

Handle Inconsistent Delimiters

Converting Repeating Column Groups

Repeating column groups require transposing horizontal data into vertical rows. The UNION approach is most reliable across database systems.

The conversion pattern:

Create a child table with one set of the repeated columns
Add a sequence/position column if order matters
UNION each column group into the child table
Filter out NULL values during migration
Remove repeating columns from parent

convert-repeating-groups.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
-- ORIGINAL: Orders with repeating item columns
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2),
    Item4_Name VARCHAR(100),
    Item4_Qty INT,
    Item4_Price DECIMAL(10,2),
    Item5_Name VARCHAR(100),
    Item5_Qty INT,
    Item5_Price DECIMAL(10,2)
);
 
-- Sample data
INSERT INTO Orders_Old VALUES (
    1001, 'Alice Johnson', '2024-03-15',
    'Laptop', 1, 999.99,
    'Mouse', 2, 29.99,
    'Keyboard', 1, 79.99,
    NULL, NULL, NULL,
    NULL, NULL, NULL
);
 
-- STEP 1: Create normalized tables
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100) NOT NULL,
    OrderDate DATE NOT NULL
);
 
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,  -- Preserves original position
    ItemName VARCHAR(100) NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE,
    UNIQUE (OrderID, LineNumber)
);
 
-- STEP 2: Migrate order headers
INSERT INTO Orders (OrderID, CustomerName, OrderDate)
SELECT OrderID, CustomerName, OrderDate FROM Orders_Old;
 
-- STEP 3: Migrate order items using UNION ALL
INSERT INTO OrderItems (OrderID, LineNumber, ItemName, Quantity, UnitPrice)
SELECT OrderID, 1, Item1_Name, Item1_Qty, Item1_Price
FROM Orders_Old
WHERE Item1_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 2, Item2_Name, Item2_Qty, Item2_Price
FROM Orders_Old
WHERE Item2_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 3, Item3_Name, Item3_Qty, Item3_Price
FROM Orders_Old
WHERE Item3_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 4, Item4_Name, Item4_Qty, Item4_Price
FROM Orders_Old
WHERE Item4_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, 5, Item5_Name, Item5_Qty, Item5_Price
FROM Orders_Old
WHERE Item5_Name IS NOT NULL;

Validation queries:

validate-migration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Verify row counts match
SELECT 
    'Orders' AS Table_Name,
    (SELECT COUNT(*) FROM Orders_Old) AS OldCount,
    (SELECT COUNT(*) FROM Orders) AS NewCount;
 
-- Verify item counts match
WITH OldItemCounts AS (
    SELECT 
        OrderID,
        (CASE WHEN Item1_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item2_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item3_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item4_Name IS NOT NULL THEN 1 ELSE 0 END +
         CASE WHEN Item5_Name IS NOT NULL THEN 1 ELSE 0 END) AS ItemCount
    FROM Orders_Old
),
NewItemCounts AS (
    SELECT OrderID, COUNT(*) AS ItemCount
    FROM OrderItems
    GROUP BY OrderID
)
SELECT 
    COALESCE(o.OrderID, n.OrderID) AS OrderID,
    o.ItemCount AS OldItemCount,
    n.ItemCount AS NewItemCount,
    CASE WHEN o.ItemCount = n.ItemCount THEN 'MATCH' ELSE 'MISMATCH' END AS Status
FROM OldItemCounts o
FULL OUTER JOIN NewItemCounts n ON o.OrderID = n.OrderID
WHERE o.ItemCount != n.ItemCount OR o.OrderID IS NULL OR n.OrderID IS NULL;
 
-- Verify revenue totals match
SELECT 
    'Revenue Check' AS Validation,
    (SELECT SUM(
        COALESCE(Item1_Qty * Item1_Price, 0) +
        COALESCE(Item2_Qty * Item2_Price, 0) +
        COALESCE(Item3_Qty * Item3_Price, 0) +
        COALESCE(Item4_Qty * Item4_Price, 0) +
        COALESCE(Item5_Qty * Item5_Price, 0)
    ) FROM Orders_Old) AS OldTotal,
    (SELECT SUM(Quantity * UnitPrice) FROM OrderItems) AS NewTotal;

Converting Embedded JSON/Complex Types

JSON columns containing query-critical data must be decomposed into atomic columns or related tables. Modern databases provide JSON extraction functions for this purpose.

The conversion pattern:

Analyze JSON structure to identify all properties
Create atomic columns for scalar properties
Create child tables for array properties
Extract values using JSON path expressions
Handle missing/null JSON properties appropriately

convert-json-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- ORIGINAL: Orders with JSON item data
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderData JSON
    /*
    Sample OrderData:
    {
        "shippingAddress": {
            "street": "123 Main St",
            "city": "Boston",
            "state": "MA",
            "zip": "02101"
        },
        "items": [
            {"sku": "LAPTOP", "qty": 1, "price": 999.99},
            {"sku": "MOUSE", "qty": 2, "price": 29.99}
        ],
        "notes": "Gift wrap requested"
    }
    */
);
 
-- STEP 1: Create normalized tables
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT NOT NULL,
    OrderDate DATE NOT NULL,
    -- Scalar JSON properties become columns
    ShippingStreet VARCHAR(200),
    ShippingCity VARCHAR(100),
    ShippingState VARCHAR(50),
    ShippingZip VARCHAR(20),
    Notes TEXT
);
 
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,
    OrderID INT NOT NULL,
    SKU VARCHAR(50) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- STEP 2: Migrate order data (PostgreSQL JSONB syntax)
INSERT INTO Orders (
    OrderID, CustomerID, OrderDate,
    ShippingStreet, ShippingCity, ShippingState, ShippingZip, Notes
)
SELECT 
    OrderID,
    CustomerID,
    OrderDate,
    OrderData->'shippingAddress'->>'street',
    OrderData->'shippingAddress'->>'city',
    OrderData->'shippingAddress'->>'state',
    OrderData->'shippingAddress'->>'zip',
    OrderData->>'notes'
FROM Orders_Old;
 
-- STEP 3: Migrate items from JSON array (PostgreSQL)
INSERT INTO OrderItems (OrderID, SKU, Quantity, Price)
SELECT 
    o.OrderID,
    item->>'sku',
    (item->>'qty')::INT,
    (item->>'price')::DECIMAL(10,2)
FROM Orders_Old o,
     JSONB_ARRAY_ELEMENTS(o.OrderData->'items') AS item;

MySQL JSON extraction syntax:

mysql-json-extract.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- MySQL 8.0+ JSON extraction
INSERT INTO Orders (
    OrderID, CustomerID, OrderDate,
    ShippingStreet, ShippingCity, ShippingState, ShippingZip, Notes
)
SELECT 
    OrderID,
    CustomerID,
    OrderDate,
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.street')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.city')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.state')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.shippingAddress.zip')),
    JSON_UNQUOTE(JSON_EXTRACT(OrderData, '$.notes'))
FROM Orders_Old;
 
-- MySQL: Extract JSON array elements using JSON_TABLE (MySQL 8.0.4+)
INSERT INTO OrderItems (OrderID, SKU, Quantity, Price)
SELECT 
    o.OrderID,
    jt.sku,
    jt.qty,
    jt.price
FROM Orders_Old o,
     JSON_TABLE(
         o.OrderData,
         '$.items[*]'
         COLUMNS (
             sku VARCHAR(50) PATH '$.sku',
             qty INT PATH '$.qty',
             price DECIMAL(10,2) PATH '$.price'
         )
     ) AS jt;

Handle Missing JSON Properties

Converting Encoded Information

Encoded information must be decomposed into separate atomic columns. This typically involves string parsing and may require lookup tables to translate codes.

The conversion pattern:

Document the encoding scheme completely
Create new columns for each encoded component
Use SUBSTRING/regex to extract components
Optionally create lookup tables for code translations
Consider keeping original code for backward compatibility

convert-encoded-keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
-- ORIGINAL: Products with encoded ProductCode
-- Format: 'CATEGORY-YYYY-WAREHOUSE-SEQUENCE'
-- Example: 'ELEC-2024-NYC-00142'
CREATE TABLE Products_Old (
    ProductCode VARCHAR(25) PRIMARY KEY,
    Description VARCHAR(200),
    Price DECIMAL(10,2)
);
 
INSERT INTO Products_Old VALUES
    ('ELEC-2024-NYC-00142', 'Gaming Laptop', 1299.99),
    ('ELEC-2024-LAX-00089', 'Wireless Mouse', 49.99),
    ('FURN-2023-CHI-00331', 'Office Chair', 299.99);
 
-- STEP 1: Create normalized table with atomic columns
CREATE TABLE Products (
    ProductID INT PRIMARY KEY AUTO_INCREMENT,
    Category VARCHAR(20) NOT NULL,
    ProductionYear INT NOT NULL,
    WarehouseCode VARCHAR(10) NOT NULL,
    SequenceNumber INT NOT NULL,
    Description VARCHAR(200),
    Price DECIMAL(10,2),
    LegacyCode VARCHAR(25) UNIQUE  -- Keep for backward compatibility
);
 
-- STEP 2: Optional - Create lookup tables
CREATE TABLE Categories (
    CategoryCode VARCHAR(20) PRIMARY KEY,
    CategoryName VARCHAR(100) NOT NULL
);
 
INSERT INTO Categories VALUES 
    ('ELEC', 'Electronics'),
    ('FURN', 'Furniture'),
    ('CLTH', 'Clothing');
 
CREATE TABLE Warehouses (
    WarehouseCode VARCHAR(10) PRIMARY KEY,
    City VARCHAR(100) NOT NULL,
    Region VARCHAR(50)
);
 
INSERT INTO Warehouses VALUES
    ('NYC', 'New York', 'East'),
    ('LAX', 'Los Angeles', 'West'),
    ('CHI', 'Chicago', 'Midwest');
 
-- STEP 3: Migrate with parsing (PostgreSQL/standard SQL)
INSERT INTO Products (
    Category, ProductionYear, WarehouseCode, SequenceNumber,
    Description, Price, LegacyCode
)
SELECT 
    SPLIT_PART(ProductCode, '-', 1) AS Category,
    CAST(SPLIT_PART(ProductCode, '-', 2) AS INT) AS ProductionYear,
    SPLIT_PART(ProductCode, '-', 3) AS WarehouseCode,
    CAST(SPLIT_PART(ProductCode, '-', 4) AS INT) AS SequenceNumber,
    Description,
    Price,
    ProductCode  -- Preserve original
FROM Products_Old;
 
-- Alternative for MySQL (using SUBSTRING_INDEX)
INSERT INTO Products (
    Category, ProductionYear, WarehouseCode, SequenceNumber,
    Description, Price, LegacyCode
)
SELECT 
    SUBSTRING_INDEX(ProductCode, '-', 1),
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(ProductCode, '-', 2), '-', -1) AS UNSIGNED),
    SUBSTRING_INDEX(SUBSTRING_INDEX(ProductCode, '-', 3), '-', -1),
    CAST(SUBSTRING_INDEX(ProductCode, '-', -1) AS UNSIGNED),
    Description,
    Price,
    ProductCode
FROM Products_Old;

Converting status flags:

convert-status-flags.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- ORIGINAL: Users with encoded status flags
-- StatusFlags: 'AVN' = Active, Verified, Notifications-on
-- Position 1: A=Active, I=Inactive
-- Position 2: V=Verified, U=Unverified  
-- Position 3: N=Notifications-on, X=Notifications-off
CREATE TABLE Users_Old (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    StatusFlags VARCHAR(10)
);
 
INSERT INTO Users_Old VALUES
    (1, 'alice', 'AVN'),
    (2, 'bob', 'AUX'),
    (3, 'charlie', 'IVN');
 
-- STEP 1: Create table with boolean columns
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50) NOT NULL,
    IsActive BOOLEAN NOT NULL DEFAULT TRUE,
    IsVerified BOOLEAN NOT NULL DEFAULT FALSE,
    NotificationsEnabled BOOLEAN NOT NULL DEFAULT TRUE
);
 
-- STEP 2: Migrate with flag parsing
INSERT INTO Users (UserID, Username, IsActive, IsVerified, NotificationsEnabled)
SELECT 
    UserID,
    Username,
    CASE SUBSTRING(StatusFlags, 1, 1) 
        WHEN 'A' THEN TRUE 
        WHEN 'I' THEN FALSE 
        ELSE TRUE  -- Default
    END AS IsActive,
    CASE SUBSTRING(StatusFlags, 2, 1) 
        WHEN 'V' THEN TRUE 
        WHEN 'U' THEN FALSE 
        ELSE FALSE  -- Default
    END AS IsVerified,
    CASE SUBSTRING(StatusFlags, 3, 1) 
        WHEN 'N' THEN TRUE 
        WHEN 'X' THEN FALSE 
        ELSE TRUE  -- Default
    END AS NotificationsEnabled
FROM Users_Old;

Adding Primary Keys to Keyless Tables

Tables without primary keys must have one added. The approach depends on whether a natural key can be identified or a surrogate key must be introduced.

The decision process:

Analyze existing columns for natural key candidates
Check if any column combination is unique
If natural key exists, promote it to primary key
If not, add a surrogate key column
Handle any existing duplicates before adding uniqueness constraint

add-primary-keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
-- ORIGINAL: Event logs without primary key
CREATE TABLE EventLogs_Old (
    EventTime TIMESTAMP,
    EventType VARCHAR(50),
    UserID INT,
    EventData TEXT
    -- No PRIMARY KEY!
);
 
-- STEP 1: Check for potential natural key
SELECT EventTime, EventType, UserID, COUNT(*) as Occurrences
FROM EventLogs_Old
GROUP BY EventTime, EventType, UserID
HAVING COUNT(*) > 1;
 
-- If duplicates exist, we need a surrogate key
 
-- STEP 2: Handle duplicates (if needed)
-- Option A: Keep only first occurrence
CREATE TABLE EventLogs_Deduped AS
SELECT DISTINCT ON (EventTime, EventType, UserID) *
FROM EventLogs_Old
ORDER BY EventTime, EventType, UserID;
 
-- Option B: Keep all with surrogate key
CREATE TABLE EventLogs (
    EventLogID BIGINT PRIMARY KEY AUTO_INCREMENT,
    EventTime TIMESTAMP NOT NULL,
    EventType VARCHAR(50) NOT NULL,
    UserID INT,
    EventData TEXT,
    -- Add index on common query patterns
    INDEX idx_event_time (EventTime),
    INDEX idx_user_events (UserID, EventTime)
);
 
INSERT INTO EventLogs (EventTime, EventType, UserID, EventData)
SELECT EventTime, EventType, UserID, EventData
FROM EventLogs_Old;
 
-- STEP 3: If natural key exists and is unique
-- Add constraint directly
ALTER TABLE EventLogs_Old
ADD CONSTRAINT pk_events PRIMARY KEY (EventTime, EventType, UserID);
 
-- Or create new table with proper key
CREATE TABLE EventLogs (
    EventTime TIMESTAMP NOT NULL,
    EventType VARCHAR(50) NOT NULL,
    UserID INT NOT NULL,
    EventData TEXT,
    PRIMARY KEY (EventTime, EventType, UserID)
);

Surrogate vs. Natural Key Decision

Application Transition Strategy

Schema changes require corresponding application updates. A phased transition minimizes risk and allows rollback if issues emerge.

Phased Transition Approach

•Phase 1: Parallel Schema — Create new tables alongside old. Keep both populated. Application reads from old, writes to both.
•Phase 2: Compatibility Views — Create views with old structure that query new tables. Switch application to write-new, read-new with views for legacy.
•Phase 3: Direct Access — Update application to use new schema directly. Keep views for reporting/compatibility.
•Phase 4: Cleanup — After validation period, drop old tables and compatibility views. Archive if needed.

compatibility-views.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Create compatibility view that mimics old structure
-- New tables: Employees (EmpID, Name, Department), EmployeeSkills (EmpID, SkillID), Skills (SkillID, SkillName)
 
-- View that looks like the old comma-separated structure
CREATE VIEW Employees_Legacy AS
SELECT 
    e.EmpID,
    e.Name,
    e.Department,
    STRING_AGG(s.SkillName, ', ' ORDER BY s.SkillName) AS Skills
FROM Employees e
LEFT JOIN EmployeeSkills es ON e.EmpID = es.EmpID
LEFT JOIN Skills s ON es.SkillID = s.SkillID
GROUP BY e.EmpID, e.Name, e.Department;
 
-- MySQL equivalent using GROUP_CONCAT
CREATE VIEW Employees_Legacy AS
SELECT 
    e.EmpID,
    e.Name,
    e.Department,
    GROUP_CONCAT(s.SkillName ORDER BY s.SkillName SEPARATOR ', ') AS Skills
FROM Employees e
LEFT JOIN EmployeeSkills es ON e.EmpID = es.EmpID
LEFT JOIN Skills s ON es.SkillID = s.SkillID
GROUP BY e.EmpID, e.Name, e.Department;
 
-- Legacy code can SELECT from Employees_Legacy and see familiar structure
-- New code writes to normalized tables
-- View automatically reflects current data
 
-- For INSERTs, create a trigger or stored procedure
-- that accepts old format and inserts into new tables
CREATE PROCEDURE AddEmployeeWithSkills(
    IN p_EmpID INT,
    IN p_Name VARCHAR(100),
    IN p_Department VARCHAR(50),
    IN p_SkillsCSV VARCHAR(500)
)
BEGIN
    -- Insert employee
    INSERT INTO Employees (EmpID, Name, Department)
    VALUES (p_EmpID, p_Name, p_Department);
    
    -- Parse and insert skills
    -- (Implementation depends on DBMS - see earlier string splitting examples)
END;

Monitor Both Paths

Complete Conversion Example

Let's walk through a complete 1NF conversion for a table with multiple violation types.

complete-conversion.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
-- ORIGINAL: Multiple 1NF violations in one table
CREATE TABLE StudentRecords_Old (
    -- Violation 1: Encoded student ID (DEPT-YEAR-SEQ)
    StudentCode VARCHAR(20) PRIMARY KEY,
    -- Violation 2: Combined name
    FullName VARCHAR(100),
    -- Violation 3: Multi-valued email addresses
    Emails VARCHAR(500),
    -- Violation 4: Repeating course groups
    Course1_Name VARCHAR(100),
    Course1_Grade VARCHAR(2),
    Course2_Name VARCHAR(100),
    Course2_Grade VARCHAR(2),
    Course3_Name VARCHAR(100),
    Course3_Grade VARCHAR(2)
);
 
INSERT INTO StudentRecords_Old VALUES (
    'CS-2024-0042',
    'John A. Smith',
    'john@university.edu, john.smith@gmail.com',
    'Database Systems', 'A',
    'Algorithms', 'B+',
    'Web Development', 'A-'
);
 
-----------------------------------------------------------------
-- STEP 1: DESIGN TARGET SCHEMA
-----------------------------------------------------------------
 
-- Departments reference table
CREATE TABLE Departments (
    DeptCode VARCHAR(10) PRIMARY KEY,
    DeptName VARCHAR(100) NOT NULL
);
 
-- Students table with atomic columns
CREATE TABLE Students (
    StudentID INT PRIMARY KEY AUTO_INCREMENT,
    DeptCode VARCHAR(10) NOT NULL,
    EnrollmentYear INT NOT NULL,
    SequenceNumber INT NOT NULL,
    FirstName VARCHAR(50) NOT NULL,
    MiddleName VARCHAR(50),
    LastName VARCHAR(50) NOT NULL,
    LegacyCode VARCHAR(20) UNIQUE,
    FOREIGN KEY (DeptCode) REFERENCES Departments(DeptCode),
    UNIQUE (DeptCode, EnrollmentYear, SequenceNumber)
);
 
-- Email addresses as separate table
CREATE TABLE StudentEmails (
    StudentID INT NOT NULL,
    Email VARCHAR(200) NOT NULL,
    IsPrimary BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (StudentID, Email),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID) ON DELETE CASCADE
);
 
-- Courses reference table
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY AUTO_INCREMENT,
    CourseName VARCHAR(100) NOT NULL UNIQUE
);
 
-- Enrollments junction table
CREATE TABLE Enrollments (
    StudentID INT NOT NULL,
    CourseID INT NOT NULL,
    Grade VARCHAR(2),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
 
-----------------------------------------------------------------
-- STEP 2: POPULATE LOOKUP TABLES
-----------------------------------------------------------------
 
INSERT INTO Departments VALUES ('CS', 'Computer Science'), ('EE', 'Electrical Engineering');
 
-----------------------------------------------------------------
-- STEP 3: MIGRATE STUDENTS (parse encoded code and name)
-----------------------------------------------------------------
 
INSERT INTO Students (DeptCode, EnrollmentYear, SequenceNumber, 
                      FirstName, MiddleName, LastName, LegacyCode)
SELECT 
    SPLIT_PART(StudentCode, '-', 1),
    CAST(SPLIT_PART(StudentCode, '-', 2) AS INT),
    CAST(SPLIT_PART(StudentCode, '-', 3) AS INT),
    -- Parse name: assume "First Middle Last" format
    SPLIT_PART(FullName, ' ', 1),
    CASE 
        WHEN array_length(string_to_array(FullName, ' '), 1) = 3 
        THEN SPLIT_PART(FullName, ' ', 2) 
        ELSE NULL 
    END,
    SPLIT_PART(FullName, ' ', -1),  -- Last word
    StudentCode
FROM StudentRecords_Old;
 
-----------------------------------------------------------------
-- STEP 4: MIGRATE EMAILS (parse multi-valued column)
-----------------------------------------------------------------
 
INSERT INTO StudentEmails (StudentID, Email, IsPrimary)
SELECT 
    s.StudentID,
    TRIM(email),
    ROW_NUMBER() OVER (PARTITION BY s.StudentID ORDER BY email) = 1
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode,
     UNNEST(STRING_TO_ARRAY(o.Emails, ',')) AS email
WHERE o.Emails IS NOT NULL AND TRIM(email) != '';
 
-----------------------------------------------------------------
-- STEP 5: MIGRATE COURSES (extract distinct courses)
-----------------------------------------------------------------
 
INSERT INTO Courses (CourseName)
SELECT DISTINCT CourseName FROM (
    SELECT Course1_Name AS CourseName FROM StudentRecords_Old WHERE Course1_Name IS NOT NULL
    UNION
    SELECT Course2_Name FROM StudentRecords_Old WHERE Course2_Name IS NOT NULL
    UNION
    SELECT Course3_Name FROM StudentRecords_Old WHERE Course3_Name IS NOT NULL
) AllCourses;
 
-----------------------------------------------------------------
-- STEP 6: MIGRATE ENROLLMENTS (flatten repeating groups)
-----------------------------------------------------------------
 
INSERT INTO Enrollments (StudentID, CourseID, Grade)
SELECT s.StudentID, c.CourseID, o.Course1_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course1_Name
WHERE o.Course1_Name IS NOT NULL
 
UNION ALL
 
SELECT s.StudentID, c.CourseID, o.Course2_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course2_Name
WHERE o.Course2_Name IS NOT NULL
 
UNION ALL
 
SELECT s.StudentID, c.CourseID, o.Course3_Grade
FROM StudentRecords_Old o
JOIN Students s ON s.LegacyCode = o.StudentCode
JOIN Courses c ON c.CourseName = o.Course3_Name
WHERE o.Course3_Name IS NOT NULL;
 
-----------------------------------------------------------------
-- STEP 7: VALIDATE
-----------------------------------------------------------------
 
-- Check student count
SELECT 'Students' AS Entity,
    (SELECT COUNT(*) FROM StudentRecords_Old) AS OldCount,
    (SELECT COUNT(*) FROM Students) AS NewCount;
 
-- Check email migration
SELECT s.FirstName, s.LastName, array_agg(e.Email) AS Emails
FROM Students s
LEFT JOIN StudentEmails e ON s.StudentID = e.StudentID
GROUP BY s.StudentID, s.FirstName, s.LastName;
 
-- Check enrollment counts match
SELECT 
    o.StudentCode,
    (CASE WHEN o.Course1_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN o.Course2_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN o.Course3_Name IS NOT NULL THEN 1 ELSE 0 END) AS OldCourseCount,
    (SELECT COUNT(*) FROM Enrollments e 
     JOIN Students s ON e.StudentID = s.StudentID 
     WHERE s.LegacyCode = o.StudentCode) AS NewCourseCount
FROM StudentRecords_Old o;

Summary: Mastering 1NF Conversion

You now have the complete toolkit for converting any table to First Normal Form. Let's consolidate the key techniques and principles:

Key Takeaways

•Follow the conversion algorithm systematically — Audit, design, create, migrate, validate, transition, deprecate. Don't skip steps.
•Multi-valued columns become child tables — Split delimited strings into rows, creating junction tables with foreign keys.
•Repeating groups use UNION — Transpose horizontal columns into vertical rows with sequence numbers.
•JSON extraction uses path functions — Extract scalar properties to columns, array elements to child tables.
•Encoded data decomposes into atomic columns — Parse each component separately; consider keeping legacy codes for compatibility.
•Missing keys require analysis — Identify natural keys or add surrogates; handle duplicates before adding constraints.
•Transition applications in phases — Use compatibility views, parallel operation, and gradual cutover to minimize risk.
•Validate at every step — Row counts, totals, sample verification, and discrepancy detection ensure data integrity.

Module Complete:

With 1NF as the foundation, subsequent modules will build on this base to address Second Normal Form (eliminating partial dependencies) and Third Normal Form (eliminating transitive dependencies).

Module Complete: First Normal Form

5 / 5