Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

3 / 5

Flattening Tables

From Hierarchy to Relations

Real-world data rarely arrives in neat relational form. It comes from spreadsheets with merged cells, JSON documents with nested objects, XML files with hierarchical elements, and legacy systems with pre-relational designs. Before this data can be properly normalized, it must be flattened into the regular, rectangular structure that the relational model requires.

Flattening is the process of transforming hierarchical, nested, or irregularly structured data into relations where:

Every row has the same columns
Every cell contains a single atomic value
Every row represents a single fact or entity instance
No structural nesting or embedding exists

This page provides systematic techniques for flattening various data structures into 1NF-compliant relations.

What You Will Learn

By the end of this page, you will understand what flattening means in the context of relational databases, how to flatten nested hierarchical data into multiple related tables, techniques for handling variable-depth hierarchies, strategies for transforming document-oriented data (JSON, XML) to relations, how to preserve data integrity and relationships during flattening, and common pitfalls that lead to data loss during transformation.

Understanding Data Flattening

Flattening in database design refers to the process of converting data with irregular, nested, or hierarchical structure into the uniform, rectangular format required by the relational model.

Why flattening is necessary:

The relational model, as defined by E.F. Codd, requires that:

A relation is a set of tuples (rows)
Each tuple has the same attributes (columns)
Each attribute value comes from a simple domain (atomic values)
The order of tuples and attributes is insignificant

Hierarchical data violates these requirements because:

Nested structures mean cells contain complex objects, not atomic values
Variable-depth hierarchies mean different "rows" have different structures
Parent-child embedding conflates multiple facts into single rows

Hierarchical vs. Relational Structure Comparison
Characteristic	Hierarchical Data	Relational (Flat) Data
Structure	Trees with parent-child nesting	Flat tables with rows and columns
Cell contents	May contain objects, arrays, nested structures	Single atomic values only
Row uniformity	Different nodes may have different attributes	All rows have identical columns
Relationships	Embedded within structure	Explicit via foreign keys
Query approach	Path traversal (XPath, JSONPath)	Set operations (SQL)
Schema flexibility	Often schema-less or flexible	Strict schema enforcement

The flattening transformation:

Flattening is not merely about layout—it's about decomposing complex structures into simple, atomic components. The process typically involves:

Identifying entities — Each distinct "thing" in the hierarchy becomes its own table
Extracting attributes — Properties of each entity become columns in that table
Establishing relationships — Parent-child relationships become foreign key references
Handling multiplicity — One-to-many relationships are expressed through separate tables, not embedding

Flattening Is Decomposition

Think of flattening as the inverse of denormalization. Where denormalization combines related data for read performance, flattening separates combined data for write integrity and query flexibility. A deeply nested JSON document might become 5-10 related tables when properly flattened.

Flattening Embedded Objects

The simplest flattening scenario involves single embedded objects—complex attributes that should be decomposed into multiple atomic attributes or extracted into related tables.

Case 1: Embedded object with single occurrence

When a row contains an embedded object that occurs exactly once, you can often flatten by promoting the object's properties to top-level columns:

flatten-single-object.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- BEFORE: Conceptual representation with embedded address
-- (Might come from JSON: {"id":1, "name":"Alice", "address":{"street":"123 Main", "city":"Boston", "zip":"02101"}})
 
-- Non-relational representation:
-- CustomerID | Name  | Address (embedded object)
-- 1          | Alice | {street: "123 Main", city: "Boston", zip: "02101"}
 
-- AFTER: Flattened into atomic columns
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    -- Address properties promoted to columns
    Street VARCHAR(200),
    City VARCHAR(100),
    State VARCHAR(50),
    ZipCode VARCHAR(20),
    Country VARCHAR(50)
);
 
-- This works when:
-- 1. Each customer has exactly one address
-- 2. The address components are accessed/queried individually
-- 3. No address sharing between customers is needed

Case 2: Embedded object that should be a separate entity

When the embedded object represents a distinct entity with its own identity, or when it might be shared or referenced independently, extract it to a separate table:

flatten-separate-entity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- BEFORE: Orders with embedded product details
-- OrderID | CustomerID | Product (embedded)
-- 1       | 101        | {sku: "LAPTOP", name: "Pro Laptop", price: 999.99, category: "Electronics"}
 
-- This is wrong because:
-- - Products exist independently of orders
-- - Same product appears in multiple orders
-- - Product details shouldn't be duplicated
 
-- AFTER: Proper separation into related tables
CREATE TABLE Products (
    SKU VARCHAR(20) PRIMARY KEY,
    ProductName VARCHAR(100) NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    Category VARCHAR(50)
);
 
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT NOT NULL,
    OrderDate DATE NOT NULL
);
 
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,  -- Price at time of order
    PRIMARY KEY (OrderID, SKU),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (SKU) REFERENCES Products(SKU)
);

When to Promote vs. Extract

Promote embedded attributes to columns when: the object occurs exactly once per parent, has no independent identity, and won't be shared. Extract to a separate table when: the object could exist independently, might be referenced by multiple parents, needs its own constraints, or represents a distinct business concept.

Flattening Nested Arrays

Nested arrays represent one-to-many relationships embedded within a parent object. Flattening them requires creating child tables with foreign key references back to the parent.

The general pattern:

For each array in the source data:

Create a new table for the array elements
Add a foreign key column referencing the parent table
Add columns for each property of the array elements
If order matters, add a sequence number column

flatten-nested-array.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- SOURCE: JSON document with nested arrays
/*
{
    "orderId": 1001,
    "customer": "Alice Johnson",
    "orderDate": "2024-03-15",
    "items": [
        {"sku": "LAPTOP", "qty": 1, "price": 999.99},
        {"sku": "MOUSE", "qty": 2, "price": 29.99},
        {"sku": "KEYBOARD", "qty": 1, "price": 79.99}
    ],
    "payments": [
        {"method": "CREDIT", "amount": 500.00, "date": "2024-03-15"},
        {"method": "CREDIT", "amount": 639.96, "date": "2024-03-20"}
    ]
}
*/
 
-- FLATTENED SCHEMA: Three related tables
 
-- Parent table for order header
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100) NOT NULL,
    OrderDate DATE NOT NULL
);
 
-- Child table for items array
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    ItemSequence INT NOT NULL,  -- Preserves array order
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, ItemSequence),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- Child table for payments array
CREATE TABLE OrderPayments (
    OrderID INT NOT NULL,
    PaymentSequence INT NOT NULL,
    PaymentMethod VARCHAR(20) NOT NULL,
    Amount DECIMAL(10,2) NOT NULL,
    PaymentDate DATE NOT NULL,
    PRIMARY KEY (OrderID, PaymentSequence),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- DATA INSERTION:
INSERT INTO Orders VALUES (1001, 'Alice Johnson', '2024-03-15');
 
INSERT INTO OrderItems VALUES 
    (1001, 1, 'LAPTOP', 1, 999.99),
    (1001, 2, 'MOUSE', 2, 29.99),
    (1001, 3, 'KEYBOARD', 1, 79.99);
 
INSERT INTO OrderPayments VALUES
    (1001, 1, 'CREDIT', 500.00, '2024-03-15'),
    (1001, 2, 'CREDIT', 639.96, '2024-03-20');

Cascade Deletes for Flattened Arrays

When flattening arrays that are truly dependent on their parent (like order items), use ON DELETE CASCADE on the foreign key. This maintains the semantic that the child rows have no meaning without their parent—deleting an order automatically removes its items.

Multi-Level Hierarchy Flattening

Complex data often contains multiple levels of nesting. Flattening these requires creating a table for each level, with foreign keys forming a reference chain.

Example: Corporate hierarchy with nested departments and employees

multi-level-flatten.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- SOURCE: Three-level nested structure
/*
{
    "companyId": 1,
    "companyName": "TechCorp",
    "divisions": [
        {
            "divisionName": "Engineering",
            "departments": [
                {
                    "deptName": "Backend",
                    "employees": [
                        {"empId": 101, "name": "Alice", "role": "Senior Engineer"},
                        {"empId": 102, "name": "Bob", "role": "Engineer"}
                    ]
                },
                {
                    "deptName": "Frontend",
                    "employees": [
                        {"empId": 103, "name": "Charlie", "role": "Lead Developer"}
                    ]
                }
            ]
        },
        {
            "divisionName": "Sales",
            "departments": [
                {
                    "deptName": "Enterprise",
                    "employees": [
                        {"empId": 201, "name": "Diana", "role": "Account Executive"}
                    ]
                }
            ]
        }
    ]
}
*/
 
-- FLATTENED SCHEMA: Four tables with referential chain
 
CREATE TABLE Companies (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100) NOT NULL
);
 
CREATE TABLE Divisions (
    DivisionID INT PRIMARY KEY AUTO_INCREMENT,
    CompanyID INT NOT NULL,
    DivisionName VARCHAR(100) NOT NULL,
    FOREIGN KEY (CompanyID) REFERENCES Companies(CompanyID),
    UNIQUE (CompanyID, DivisionName)  -- Division names unique within company
);
 
CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY AUTO_INCREMENT,
    DivisionID INT NOT NULL,
    DepartmentName VARCHAR(100) NOT NULL,
    FOREIGN KEY (DivisionID) REFERENCES Divisions(DivisionID),
    UNIQUE (DivisionID, DepartmentName)  -- Dept names unique within division
);
 
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,  -- From source data
    DepartmentID INT NOT NULL,
    EmployeeName VARCHAR(100) NOT NULL,
    Role VARCHAR(50),
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
 
-- To query all employees in Engineering division:
SELECT e.EmployeeName, e.Role, dep.DepartmentName
FROM Employees e
JOIN Departments dep ON e.DepartmentID = dep.DepartmentID
JOIN Divisions div ON dep.DivisionID = div.DivisionID
WHERE div.DivisionName = 'Engineering';

Flattening decisions at each level:

For each level of nesting, decide:

Level	Source Structure	Resulting Table	Key Strategy
1	Root object	Companies	Natural or surrogate PK
2	`divisions[]`	Divisions	Surrogate PK, FK to level 1
3	`departments[]`	Departments	Surrogate PK, FK to level 2
4	`employees[]`	Employees	Natural PK (empId), FK to level 3

Deep Hierarchy Joins

Each level of flattened hierarchy requires an additional JOIN to traverse. A 5-level hierarchy needs 4 JOINs to get from leaf to root. Consider whether frequently-needed queries might benefit from denormalized columns (like storing CompanyID directly on Employees for reporting), balanced against the update anomalies this introduces.

Handling Variable-Depth Hierarchies

Some hierarchies don't have a fixed depth—the same type of node can nest arbitrarily deep. Examples include organizational charts, file systems, comment threads, and category trees. These require special modeling techniques.

The adjacency list model stores each node with a reference to its parent. It's simple to implement but requires recursive queries to traverse.

adjacency-list.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Adjacency List for variable-depth categories
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY,
    CategoryName VARCHAR(100) NOT NULL,
    ParentCategoryID INT NULL,  -- NULL for root nodes
    FOREIGN KEY (ParentCategoryID) REFERENCES Categories(CategoryID)
);
 
-- Sample data: Electronics > Computers > Laptops > Gaming Laptops
INSERT INTO Categories VALUES (1, 'Electronics', NULL);
INSERT INTO Categories VALUES (2, 'Computers', 1);
INSERT INTO Categories VALUES (3, 'Laptops', 2);
INSERT INTO Categories VALUES (4, 'Gaming Laptops', 3);
INSERT INTO Categories VALUES (5, 'Phones', 1);
 
-- Query ancestors using recursive CTE (SQL standard)
WITH RECURSIVE CategoryPath AS (
    -- Base case: start from the target category
    SELECT CategoryID, CategoryName, ParentCategoryID, 1 AS Depth
    FROM Categories
    WHERE CategoryID = 4  -- Gaming Laptops
    
    UNION ALL
    
    -- Recursive case: join with parent
    SELECT c.CategoryID, c.CategoryName, c.ParentCategoryID, cp.Depth + 1
    FROM Categories c
    JOIN CategoryPath cp ON c.CategoryID = cp.ParentCategoryID
)
SELECT * FROM CategoryPath;
 
-- Result: Gaming Laptops -> Laptops -> Computers -> Electronics

Flattening JSON and XML Data

Modern data ingestion often involves JSON from APIs or XML from enterprise systems. Here's a systematic approach to flattening these formats.

JSON Flattening Process:

Analyze the JSON structure — Identify scalar properties, nested objects, and arrays
Create the primary table — Map top-level scalar properties to columns
Handle nested objects — Decide: promote to columns or extract to separate table
Handle arrays — Always create separate tables with foreign keys
Determine key strategy — Use existing IDs or generate surrogates

json-flattening-complete.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- COMPLEX JSON SOURCE
/*
{
    "userId": 1001,
    "profile": {
        "firstName": "Alice",
        "lastName": "Smith",
        "email": "alice@example.com",
        "preferences": {
            "theme": "dark",
            "language": "en",
            "notifications": true
        }
    },
    "orders": [
        {
            "orderId": 5001,
            "date": "2024-03-15",
            "items": [
                {"sku": "A1", "qty": 2, "price": 29.99},
                {"sku": "B2", "qty": 1, "price": 49.99}
            ],
            "shipping": {
                "address": "123 Main St",
                "city": "Boston",
                "method": "EXPRESS"
            }
        }
    ],
    "tags": ["premium", "early-adopter", "referrer"]
}
*/
 
-- FLATTENED SCHEMA
 
-- Users table: combines profile scalars (promoted)
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) NOT NULL,
    -- Preferences promoted since 1:1 relationship
    PreferenceTheme VARCHAR(20) DEFAULT 'light',
    PreferenceLanguage VARCHAR(10) DEFAULT 'en',
    PreferenceNotifications BOOLEAN DEFAULT TRUE
);
 
-- Orders table: from orders array
CREATE TABLE UserOrders (
    OrderID INT PRIMARY KEY,
    UserID INT NOT NULL,
    OrderDate DATE NOT NULL,
    -- Shipping promoted since 1:1 with order
    ShippingAddress VARCHAR(200),
    ShippingCity VARCHAR(100),
    ShippingMethod VARCHAR(20),
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);
 
-- Order Items: from nested items array within orders
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, SKU),
    FOREIGN KEY (OrderID) REFERENCES UserOrders(OrderID)
);
 
-- User Tags: from tags array
CREATE TABLE UserTags (
    UserID INT NOT NULL,
    Tag VARCHAR(50) NOT NULL,
    PRIMARY KEY (UserID, Tag),
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);
 
-- DATA POPULATION
INSERT INTO Users VALUES (1001, 'Alice', 'Smith', 'alice@example.com', 'dark', 'en', TRUE);
INSERT INTO UserOrders VALUES (5001, 1001, '2024-03-15', '123 Main St', 'Boston', 'EXPRESS');
INSERT INTO OrderItems VALUES (5001, 'A1', 2, 29.99), (5001, 'B2', 1, 49.99);
INSERT INTO UserTags VALUES (1001, 'premium'), (1001, 'early-adopter'), (1001, 'referrer');

Handling Optional JSON Fields

JSON is often schema-less with optional fields. When flattening, make columns for optional fields NULLable, or use default values where semantic defaults exist. Document which fields may be absent in your data dictionary.

Flattening Spreadsheet Data

Spreadsheet data presents unique flattening challenges: merged cells, implicit hierarchies through indentation, repeating groups, and header rows embedded within data. Here's how to handle common patterns.

Challenge 1: Merged header cells indicating grouping

Spreadsheets often merge cells to create visual groups:

spreadsheet-merged-headers.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
ORIGINAL SPREADSHEET FORMAT:
                
| Employee | Department        |                  | Contact          |             |
| Name     | Name    | Manager | Email            | Phone           |
|----------|---------|---------|------------------|-----------------|
| Alice    | Eng     | Bob     | alice@co.com     | 555-1234        |
| Charlie  | Sales   | Diana   | charlie@co.com   | 555-5678        |
 
The visual grouping ("Department" spans Name and Manager) is lost in CSV export.
 
FLATTENING APPROACH:
1. Identify that "Manager" is a Department property, not Employee property
2. Create separate tables if Departments can exist independently
3. Or flatten if it's purely denormalized data
 
CREATE TABLE EmployeeData (
    EmployeeName VARCHAR(100) PRIMARY KEY,
    DepartmentName VARCHAR(100),
    DepartmentManager VARCHAR(100),
    Email VARCHAR(100),
    Phone VARCHAR(20)
);
 
-- If departments should be entities:
CREATE TABLE Departments (
    DepartmentName VARCHAR(100) PRIMARY KEY,
    Manager VARCHAR(100)
);
 
CREATE TABLE Employees (
    EmployeeName VARCHAR(100) PRIMARY KEY,
    DepartmentName VARCHAR(100) REFERENCES Departments(DepartmentName),
    Email VARCHAR(100),
    Phone VARCHAR(20)
);

Challenge 2: Implicit hierarchy through indentation

Some spreadsheets show hierarchy through indentation rather than explicit relationships:

spreadsheet-indentation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ORIGINAL SPREADSHEET (indentation shows hierarchy):
 
| Category           | SKU    | Stock |
|--------------------|--------|-------|
| Electronics        |        |       |
|   Computers        |        |       |
|     Laptop X       | SKU001 | 50    |
|     Desktop Y      | SKU002 | 30    |
|   Phones           |        |       |
|     Phone A        | SKU003 | 100   |
| Clothing           |        |       |
|   Shirts           |        |       |
|     Blue Shirt     | SKU004 | 75    |
 
FLATTENING APPROACH:
1. Parse indentation levels to determine parent-child relationships
2. Create adjacency list or path enumeration structure
3. SKU rows are leaf products; non-SKU rows are categories
 
-- After parsing:
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY,
    CategoryName VARCHAR(100),
    ParentCategoryID INT REFERENCES Categories(CategoryID)
);
 
CREATE TABLE Products (
    SKU VARCHAR(20) PRIMARY KEY,
    ProductName VARCHAR(100),
    CategoryID INT REFERENCES Categories(CategoryID),
    Stock INT
);
 
INSERT INTO Categories VALUES (1, 'Electronics', NULL);
INSERT INTO Categories VALUES (2, 'Computers', 1);
INSERT INTO Categories VALUES (3, 'Phones', 1);
INSERT INTO Categories VALUES (4, 'Clothing', NULL);
INSERT INTO Categories VALUES (5, 'Shirts', 4);
 
INSERT INTO Products VALUES ('SKU001', 'Laptop X', 2, 50);
INSERT INTO Products VALUES ('SKU002', 'Desktop Y', 2, 30);
-- ... etc.

Spreadsheet Flattening Tools

For complex spreadsheet transformations, consider using Python pandas for data wrangling, Apache tools like Spark for large datasets, or ETL tools with visual transformations. Parse the human-readable format into intermediate structures before generating SQL.

Summary: Flattening Data to Relations

Flattening transforms hierarchical, nested, or irregular data into the uniform relational structure that 1NF requires. It's often the first step when ingesting data from external sources or migrating from non-relational systems. Let's consolidate the key learnings:

Key Takeaways

•Flattening creates regular rectangular structure — Every row gets the same columns, every cell contains an atomic value, and structural nesting is eliminated in favor of foreign key relationships.
•Embedded objects become columns or separate tables — Promote 1:1 embedded objects to columns; extract objects with independent identity or multi-occurrence to separate tables with foreign keys.
•Nested arrays become child tables — Each array becomes a table with parent's foreign key. Use sequence numbers to preserve order if it matters.
•Multi-level hierarchies create chains of tables — Each level becomes its own table, with foreign keys forming a reference chain from child to root.
•Variable-depth hierarchies need special models — Choose from adjacency list (simple but needs recursion), path enumeration (easy queries, hard updates), nested sets (fast reads, slow writes), or closure table (flexible but storage-heavy).
•JSON/XML flattening follows systematic decomposition — Analyze structure, identify entities, determine key strategies, and create the table graph before any data migration.

What's next:

With understanding of atomicity, repeating groups, and flattening, we're ready to look at what actually constitutes a 1NF violation. The next page provides a comprehensive taxonomy of 1NF violations, helping you identify issues in existing schemas and avoid them in new designs.

Page Complete

You can now transform hierarchical data into flat relational structures, handle embedded objects and nested arrays, model variable-depth hierarchies with appropriate techniques, and flatten JSON, XML, and spreadsheet data for relational storage. Next, we'll examine common 1NF violations in detail.

3 / 5

Loading learning content...

Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

3 / 5

Flattening Tables

From Hierarchy to Relations

Flattening is the process of transforming hierarchical, nested, or irregularly structured data into relations where:

Every row has the same columns
Every cell contains a single atomic value
Every row represents a single fact or entity instance
No structural nesting or embedding exists

This page provides systematic techniques for flattening various data structures into 1NF-compliant relations.

What You Will Learn

Understanding Data Flattening

Flattening in database design refers to the process of converting data with irregular, nested, or hierarchical structure into the uniform, rectangular format required by the relational model.

Why flattening is necessary:

The relational model, as defined by E.F. Codd, requires that:

A relation is a set of tuples (rows)
Each tuple has the same attributes (columns)
Each attribute value comes from a simple domain (atomic values)
The order of tuples and attributes is insignificant

Hierarchical data violates these requirements because:

Nested structures mean cells contain complex objects, not atomic values
Variable-depth hierarchies mean different "rows" have different structures
Parent-child embedding conflates multiple facts into single rows

Hierarchical vs. Relational Structure Comparison
Characteristic	Hierarchical Data	Relational (Flat) Data
Structure	Trees with parent-child nesting	Flat tables with rows and columns
Cell contents	May contain objects, arrays, nested structures	Single atomic values only
Row uniformity	Different nodes may have different attributes	All rows have identical columns
Relationships	Embedded within structure	Explicit via foreign keys
Query approach	Path traversal (XPath, JSONPath)	Set operations (SQL)
Schema flexibility	Often schema-less or flexible	Strict schema enforcement

The flattening transformation:

Flattening is not merely about layout—it's about decomposing complex structures into simple, atomic components. The process typically involves:

Identifying entities — Each distinct "thing" in the hierarchy becomes its own table
Extracting attributes — Properties of each entity become columns in that table
Establishing relationships — Parent-child relationships become foreign key references
Handling multiplicity — One-to-many relationships are expressed through separate tables, not embedding

Flattening Is Decomposition

Flattening Embedded Objects

The simplest flattening scenario involves single embedded objects—complex attributes that should be decomposed into multiple atomic attributes or extracted into related tables.

Case 1: Embedded object with single occurrence

When a row contains an embedded object that occurs exactly once, you can often flatten by promoting the object's properties to top-level columns:

flatten-single-object.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- BEFORE: Conceptual representation with embedded address
-- (Might come from JSON: {"id":1, "name":"Alice", "address":{"street":"123 Main", "city":"Boston", "zip":"02101"}})
 
-- Non-relational representation:
-- CustomerID | Name  | Address (embedded object)
-- 1          | Alice | {street: "123 Main", city: "Boston", zip: "02101"}
 
-- AFTER: Flattened into atomic columns
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    -- Address properties promoted to columns
    Street VARCHAR(200),
    City VARCHAR(100),
    State VARCHAR(50),
    ZipCode VARCHAR(20),
    Country VARCHAR(50)
);
 
-- This works when:
-- 1. Each customer has exactly one address
-- 2. The address components are accessed/queried individually
-- 3. No address sharing between customers is needed

Case 2: Embedded object that should be a separate entity

When the embedded object represents a distinct entity with its own identity, or when it might be shared or referenced independently, extract it to a separate table:

flatten-separate-entity.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- BEFORE: Orders with embedded product details
-- OrderID | CustomerID | Product (embedded)
-- 1       | 101        | {sku: "LAPTOP", name: "Pro Laptop", price: 999.99, category: "Electronics"}
 
-- This is wrong because:
-- - Products exist independently of orders
-- - Same product appears in multiple orders
-- - Product details shouldn't be duplicated
 
-- AFTER: Proper separation into related tables
CREATE TABLE Products (
    SKU VARCHAR(20) PRIMARY KEY,
    ProductName VARCHAR(100) NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    Category VARCHAR(50)
);
 
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT NOT NULL,
    OrderDate DATE NOT NULL
);
 
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,  -- Price at time of order
    PRIMARY KEY (OrderID, SKU),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (SKU) REFERENCES Products(SKU)
);

When to Promote vs. Extract

Flattening Nested Arrays

Nested arrays represent one-to-many relationships embedded within a parent object. Flattening them requires creating child tables with foreign key references back to the parent.

The general pattern:

For each array in the source data:

Create a new table for the array elements
Add a foreign key column referencing the parent table
Add columns for each property of the array elements
If order matters, add a sequence number column

flatten-nested-array.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- SOURCE: JSON document with nested arrays
/*
{
    "orderId": 1001,
    "customer": "Alice Johnson",
    "orderDate": "2024-03-15",
    "items": [
        {"sku": "LAPTOP", "qty": 1, "price": 999.99},
        {"sku": "MOUSE", "qty": 2, "price": 29.99},
        {"sku": "KEYBOARD", "qty": 1, "price": 79.99}
    ],
    "payments": [
        {"method": "CREDIT", "amount": 500.00, "date": "2024-03-15"},
        {"method": "CREDIT", "amount": 639.96, "date": "2024-03-20"}
    ]
}
*/
 
-- FLATTENED SCHEMA: Three related tables
 
-- Parent table for order header
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100) NOT NULL,
    OrderDate DATE NOT NULL
);
 
-- Child table for items array
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    ItemSequence INT NOT NULL,  -- Preserves array order
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, ItemSequence),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- Child table for payments array
CREATE TABLE OrderPayments (
    OrderID INT NOT NULL,
    PaymentSequence INT NOT NULL,
    PaymentMethod VARCHAR(20) NOT NULL,
    Amount DECIMAL(10,2) NOT NULL,
    PaymentDate DATE NOT NULL,
    PRIMARY KEY (OrderID, PaymentSequence),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- DATA INSERTION:
INSERT INTO Orders VALUES (1001, 'Alice Johnson', '2024-03-15');
 
INSERT INTO OrderItems VALUES 
    (1001, 1, 'LAPTOP', 1, 999.99),
    (1001, 2, 'MOUSE', 2, 29.99),
    (1001, 3, 'KEYBOARD', 1, 79.99);
 
INSERT INTO OrderPayments VALUES
    (1001, 1, 'CREDIT', 500.00, '2024-03-15'),
    (1001, 2, 'CREDIT', 639.96, '2024-03-20');

Cascade Deletes for Flattened Arrays

Multi-Level Hierarchy Flattening

Complex data often contains multiple levels of nesting. Flattening these requires creating a table for each level, with foreign keys forming a reference chain.

Example: Corporate hierarchy with nested departments and employees

multi-level-flatten.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- SOURCE: Three-level nested structure
/*
{
    "companyId": 1,
    "companyName": "TechCorp",
    "divisions": [
        {
            "divisionName": "Engineering",
            "departments": [
                {
                    "deptName": "Backend",
                    "employees": [
                        {"empId": 101, "name": "Alice", "role": "Senior Engineer"},
                        {"empId": 102, "name": "Bob", "role": "Engineer"}
                    ]
                },
                {
                    "deptName": "Frontend",
                    "employees": [
                        {"empId": 103, "name": "Charlie", "role": "Lead Developer"}
                    ]
                }
            ]
        },
        {
            "divisionName": "Sales",
            "departments": [
                {
                    "deptName": "Enterprise",
                    "employees": [
                        {"empId": 201, "name": "Diana", "role": "Account Executive"}
                    ]
                }
            ]
        }
    ]
}
*/
 
-- FLATTENED SCHEMA: Four tables with referential chain
 
CREATE TABLE Companies (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100) NOT NULL
);
 
CREATE TABLE Divisions (
    DivisionID INT PRIMARY KEY AUTO_INCREMENT,
    CompanyID INT NOT NULL,
    DivisionName VARCHAR(100) NOT NULL,
    FOREIGN KEY (CompanyID) REFERENCES Companies(CompanyID),
    UNIQUE (CompanyID, DivisionName)  -- Division names unique within company
);
 
CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY AUTO_INCREMENT,
    DivisionID INT NOT NULL,
    DepartmentName VARCHAR(100) NOT NULL,
    FOREIGN KEY (DivisionID) REFERENCES Divisions(DivisionID),
    UNIQUE (DivisionID, DepartmentName)  -- Dept names unique within division
);
 
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,  -- From source data
    DepartmentID INT NOT NULL,
    EmployeeName VARCHAR(100) NOT NULL,
    Role VARCHAR(50),
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
 
-- To query all employees in Engineering division:
SELECT e.EmployeeName, e.Role, dep.DepartmentName
FROM Employees e
JOIN Departments dep ON e.DepartmentID = dep.DepartmentID
JOIN Divisions div ON dep.DivisionID = div.DivisionID
WHERE div.DivisionName = 'Engineering';

Flattening decisions at each level:

For each level of nesting, decide:

Level	Source Structure	Resulting Table	Key Strategy
1	Root object	Companies	Natural or surrogate PK
2	`divisions[]`	Divisions	Surrogate PK, FK to level 1
3	`departments[]`	Departments	Surrogate PK, FK to level 2
4	`employees[]`	Employees	Natural PK (empId), FK to level 3

Deep Hierarchy Joins

Handling Variable-Depth Hierarchies

The adjacency list model stores each node with a reference to its parent. It's simple to implement but requires recursive queries to traverse.

adjacency-list.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Adjacency List for variable-depth categories
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY,
    CategoryName VARCHAR(100) NOT NULL,
    ParentCategoryID INT NULL,  -- NULL for root nodes
    FOREIGN KEY (ParentCategoryID) REFERENCES Categories(CategoryID)
);
 
-- Sample data: Electronics > Computers > Laptops > Gaming Laptops
INSERT INTO Categories VALUES (1, 'Electronics', NULL);
INSERT INTO Categories VALUES (2, 'Computers', 1);
INSERT INTO Categories VALUES (3, 'Laptops', 2);
INSERT INTO Categories VALUES (4, 'Gaming Laptops', 3);
INSERT INTO Categories VALUES (5, 'Phones', 1);
 
-- Query ancestors using recursive CTE (SQL standard)
WITH RECURSIVE CategoryPath AS (
    -- Base case: start from the target category
    SELECT CategoryID, CategoryName, ParentCategoryID, 1 AS Depth
    FROM Categories
    WHERE CategoryID = 4  -- Gaming Laptops
    
    UNION ALL
    
    -- Recursive case: join with parent
    SELECT c.CategoryID, c.CategoryName, c.ParentCategoryID, cp.Depth + 1
    FROM Categories c
    JOIN CategoryPath cp ON c.CategoryID = cp.ParentCategoryID
)
SELECT * FROM CategoryPath;
 
-- Result: Gaming Laptops -> Laptops -> Computers -> Electronics

Flattening JSON and XML Data

Modern data ingestion often involves JSON from APIs or XML from enterprise systems. Here's a systematic approach to flattening these formats.

JSON Flattening Process:

Analyze the JSON structure — Identify scalar properties, nested objects, and arrays
Create the primary table — Map top-level scalar properties to columns
Handle nested objects — Decide: promote to columns or extract to separate table
Handle arrays — Always create separate tables with foreign keys
Determine key strategy — Use existing IDs or generate surrogates

json-flattening-complete.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- COMPLEX JSON SOURCE
/*
{
    "userId": 1001,
    "profile": {
        "firstName": "Alice",
        "lastName": "Smith",
        "email": "alice@example.com",
        "preferences": {
            "theme": "dark",
            "language": "en",
            "notifications": true
        }
    },
    "orders": [
        {
            "orderId": 5001,
            "date": "2024-03-15",
            "items": [
                {"sku": "A1", "qty": 2, "price": 29.99},
                {"sku": "B2", "qty": 1, "price": 49.99}
            ],
            "shipping": {
                "address": "123 Main St",
                "city": "Boston",
                "method": "EXPRESS"
            }
        }
    ],
    "tags": ["premium", "early-adopter", "referrer"]
}
*/
 
-- FLATTENED SCHEMA
 
-- Users table: combines profile scalars (promoted)
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(100) NOT NULL,
    -- Preferences promoted since 1:1 relationship
    PreferenceTheme VARCHAR(20) DEFAULT 'light',
    PreferenceLanguage VARCHAR(10) DEFAULT 'en',
    PreferenceNotifications BOOLEAN DEFAULT TRUE
);
 
-- Orders table: from orders array
CREATE TABLE UserOrders (
    OrderID INT PRIMARY KEY,
    UserID INT NOT NULL,
    OrderDate DATE NOT NULL,
    -- Shipping promoted since 1:1 with order
    ShippingAddress VARCHAR(200),
    ShippingCity VARCHAR(100),
    ShippingMethod VARCHAR(20),
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);
 
-- Order Items: from nested items array within orders
CREATE TABLE OrderItems (
    OrderID INT NOT NULL,
    SKU VARCHAR(20) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, SKU),
    FOREIGN KEY (OrderID) REFERENCES UserOrders(OrderID)
);
 
-- User Tags: from tags array
CREATE TABLE UserTags (
    UserID INT NOT NULL,
    Tag VARCHAR(50) NOT NULL,
    PRIMARY KEY (UserID, Tag),
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);
 
-- DATA POPULATION
INSERT INTO Users VALUES (1001, 'Alice', 'Smith', 'alice@example.com', 'dark', 'en', TRUE);
INSERT INTO UserOrders VALUES (5001, 1001, '2024-03-15', '123 Main St', 'Boston', 'EXPRESS');
INSERT INTO OrderItems VALUES (5001, 'A1', 2, 29.99), (5001, 'B2', 1, 49.99);
INSERT INTO UserTags VALUES (1001, 'premium'), (1001, 'early-adopter'), (1001, 'referrer');

Handling Optional JSON Fields

Flattening Spreadsheet Data

Challenge 1: Merged header cells indicating grouping

Spreadsheets often merge cells to create visual groups:

spreadsheet-merged-headers.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
ORIGINAL SPREADSHEET FORMAT:
                
| Employee | Department        |                  | Contact          |             |
| Name     | Name    | Manager | Email            | Phone           |
|----------|---------|---------|------------------|-----------------|
| Alice    | Eng     | Bob     | alice@co.com     | 555-1234        |
| Charlie  | Sales   | Diana   | charlie@co.com   | 555-5678        |
 
The visual grouping ("Department" spans Name and Manager) is lost in CSV export.
 
FLATTENING APPROACH:
1. Identify that "Manager" is a Department property, not Employee property
2. Create separate tables if Departments can exist independently
3. Or flatten if it's purely denormalized data
 
CREATE TABLE EmployeeData (
    EmployeeName VARCHAR(100) PRIMARY KEY,
    DepartmentName VARCHAR(100),
    DepartmentManager VARCHAR(100),
    Email VARCHAR(100),
    Phone VARCHAR(20)
);
 
-- If departments should be entities:
CREATE TABLE Departments (
    DepartmentName VARCHAR(100) PRIMARY KEY,
    Manager VARCHAR(100)
);
 
CREATE TABLE Employees (
    EmployeeName VARCHAR(100) PRIMARY KEY,
    DepartmentName VARCHAR(100) REFERENCES Departments(DepartmentName),
    Email VARCHAR(100),
    Phone VARCHAR(20)
);

Challenge 2: Implicit hierarchy through indentation

Some spreadsheets show hierarchy through indentation rather than explicit relationships:

spreadsheet-indentation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ORIGINAL SPREADSHEET (indentation shows hierarchy):
 
| Category           | SKU    | Stock |
|--------------------|--------|-------|
| Electronics        |        |       |
|   Computers        |        |       |
|     Laptop X       | SKU001 | 50    |
|     Desktop Y      | SKU002 | 30    |
|   Phones           |        |       |
|     Phone A        | SKU003 | 100   |
| Clothing           |        |       |
|   Shirts           |        |       |
|     Blue Shirt     | SKU004 | 75    |
 
FLATTENING APPROACH:
1. Parse indentation levels to determine parent-child relationships
2. Create adjacency list or path enumeration structure
3. SKU rows are leaf products; non-SKU rows are categories
 
-- After parsing:
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY,
    CategoryName VARCHAR(100),
    ParentCategoryID INT REFERENCES Categories(CategoryID)
);
 
CREATE TABLE Products (
    SKU VARCHAR(20) PRIMARY KEY,
    ProductName VARCHAR(100),
    CategoryID INT REFERENCES Categories(CategoryID),
    Stock INT
);
 
INSERT INTO Categories VALUES (1, 'Electronics', NULL);
INSERT INTO Categories VALUES (2, 'Computers', 1);
INSERT INTO Categories VALUES (3, 'Phones', 1);
INSERT INTO Categories VALUES (4, 'Clothing', NULL);
INSERT INTO Categories VALUES (5, 'Shirts', 4);
 
INSERT INTO Products VALUES ('SKU001', 'Laptop X', 2, 50);
INSERT INTO Products VALUES ('SKU002', 'Desktop Y', 2, 30);
-- ... etc.

Spreadsheet Flattening Tools

Summary: Flattening Data to Relations

Key Takeaways

•Flattening creates regular rectangular structure — Every row gets the same columns, every cell contains an atomic value, and structural nesting is eliminated in favor of foreign key relationships.
•Embedded objects become columns or separate tables — Promote 1:1 embedded objects to columns; extract objects with independent identity or multi-occurrence to separate tables with foreign keys.
•Nested arrays become child tables — Each array becomes a table with parent's foreign key. Use sequence numbers to preserve order if it matters.
•Multi-level hierarchies create chains of tables — Each level becomes its own table, with foreign keys forming a reference chain from child to root.
•Variable-depth hierarchies need special models — Choose from adjacency list (simple but needs recursion), path enumeration (easy queries, hard updates), nested sets (fast reads, slow writes), or closure table (flexible but storage-heavy).
•JSON/XML flattening follows systematic decomposition — Analyze structure, identify entities, determine key strategies, and create the table graph before any data migration.

What's next:

Page Complete

3 / 5