Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

2 / 5

Eliminating Repeating Groups

The Hidden Structure Problem

While atomicity ensures that individual values are indivisible, there's another structural pattern that violates First Normal Form: repeating groups. This occurs when a table contains multiple columns (or sets of columns) that represent the same type of data—essentially embedding a list horizontally within a single row.

Repeating groups are seductive because they seem to simplify data entry and display. A spreadsheet user might naturally create columns like Product1, Product2, Product3 for an order. But this horizontal duplication creates fundamental problems for the relational model and represents one of the most common design flaws in databases created by developers without formal training.

Understanding and eliminating repeating groups is essential not just for 1NF compliance, but for creating databases that can evolve gracefully as requirements change.

What You Will Learn

By the end of this page, you will understand what repeating groups are and why they violate 1NF, how to identify repeating group patterns in existing schemas, the specific problems repeating groups cause for queries and maintenance, systematic techniques for transforming repeating groups into proper relational structures, and how to preserve data integrity during the transformation process.

Defining Repeating Groups

A repeating group is a set of columns that contains multiple instances of the same type of data. Unlike non-atomic values (which pack multiple values into a single cell), repeating groups spread related values across multiple columns in the same row.

Formal definition:

A repeating group exists when:

Multiple columns in a table represent the same logical attribute
These columns are typically numbered or ordered (e.g., Item1, Item2, Item3)
The number of columns limits the maximum occurrences of that data
The columns share a semantic relationship that should be modeled as a one-to-many relationship

The classic example:

repeating-group-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- REPEATING GROUP VIOLATION
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    -- Repeating group: Items 1-5
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2),
    Item4_Name VARCHAR(100),
    Item4_Qty INT,
    Item4_Price DECIMAL(10,2),
    Item5_Name VARCHAR(100),
    Item5_Qty INT,
    Item5_Price DECIMAL(10,2)
);
 
-- Sample data
INSERT INTO Orders VALUES (
    1001,
    'Alice Johnson',
    '2024-03-15',
    'Laptop', 1, 999.99,
    'Mouse', 2, 29.99,
    'Keyboard', 1, 79.99,
    NULL, NULL, NULL,  -- Unused slots
    NULL, NULL, NULL
);

In this example, the columns Item1_Name, Item1_Qty, Item1_Price through Item5_* form a repeating group. Each set of three columns represents the same concept (an order line item), repeated five times.

Why this happens:

Repeating groups typically arise from:

Spreadsheet thinking — Users comfortable with Excel naturally create column structures that look like spreadsheet layouts
Avoiding complexity — Developers may avoid creating separate tables to keep the schema "simple"
Fixed assumptions — "An order will never have more than 5 items" (until it does)
Legacy migration — Importing flat file data without restructuring
Report-driven design — Creating tables that match report layouts rather than data relationships

The Schema Ceiling Problem

A fundamental flaw of repeating groups is that they impose an arbitrary maximum on your data. With Item1 through Item5, you cannot have a 6-item order without schema modification. And when that modification comes, you must update all queries, application code, reports, and possibly migrate existing data.

Recognizing Repeating Group Patterns

Repeating groups can be subtle. Not all cases involve obviously numbered columns. Here are the common patterns to watch for:

The most obvious pattern: columns with numeric suffixes that represent the same logical attribute.

Examples:

numbered-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Pattern: Numbered column names
CREATE TABLE Employees (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100),
    Skill1 VARCHAR(50),     -- Repeating
    Skill2 VARCHAR(50),     -- group of
    Skill3 VARCHAR(50),     -- skills
    Skill4 VARCHAR(50),
    Dependent1_Name VARCHAR(100),   -- Repeating
    Dependent1_DOB DATE,            -- group of
    Dependent2_Name VARCHAR(100),   -- dependents
    Dependent2_DOB DATE,
    PreviousJob1_Title VARCHAR(100),    -- Repeating group
    PreviousJob1_Company VARCHAR(100),  -- of employment
    PreviousJob1_Years INT,             -- history
    PreviousJob2_Title VARCHAR(100),
    PreviousJob2_Company VARCHAR(100),
    PreviousJob2_Years INT
);
 
-- Each numbered set is a repeating group that should be
-- its own table with a foreign key to Employees

The Column Naming Test

If you find yourself naming columns with numbers (Skill1, Skill2) or with prefixes/suffixes that distinguish instances of the same thing (Phone_Home, Phone_Work, Phone_Mobile), you likely have a repeating group. Ask yourself: 'Are these columns all the same attribute, just for different instances?' If yes, it's a repeating group.

Problems Caused by Repeating Groups

Repeating groups cause systematic problems that compound as data volume and complexity grow. Let's examine each problem category with concrete examples.

Major Problems with Repeating Groups

•Fixed Capacity Limits — The schema determines maximum occurrences. 5 item columns means max 5 items. Changing this requires DDL modifications, application updates, and often data migration.
•Wasted Storage and Sparse Data — Most rows won't use all columns. An order with 2 items has 3 empty item slots. This multiplies across millions of rows.
•Complex and Inefficient Queries — Finding 'all orders containing product X' requires checking 5+ columns with OR conditions. Aggregations require UNION or complex CASE expressions.
•Maintenance Nightmares — Adding a 6th item slot requires: ALTER TABLE, updating all INSERT/UPDATE queries, modifying all SELECT queries, updating application code, and testing everything.
•Indexing Challenges — You cannot create a single index for 'all items.' You'd need 5 separate indexes, each partially useful, wasting space and slowing writes.
•Reporting Complexity — Reports that summarize 'all items' require unpivoting data in queries, making reports harder to write and slower to execute.

Query complexity demonstration:

Let's see how repeating groups complicate what should be simple queries:

repeating-group-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Given: Orders table with Item1_Name through Item5_Name
 
-- SIMPLE TASK: Find all orders containing 'Laptop'
-- With repeating groups (BAD):
SELECT OrderID, CustomerName 
FROM Orders
WHERE Item1_Name = 'Laptop'
   OR Item2_Name = 'Laptop'
   OR Item3_Name = 'Laptop'
   OR Item4_Name = 'Laptop'
   OR Item5_Name = 'Laptop';
 
-- With proper normalization (SIMPLE):
SELECT DISTINCT o.OrderID, o.CustomerName
FROM Orders o
JOIN OrderItems oi ON o.OrderID = oi.OrderID
WHERE oi.ItemName = 'Laptop';
 
 
-- SIMPLE TASK: Count total items per order
-- With repeating groups (BAD):
SELECT OrderID,
    (CASE WHEN Item1_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item2_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item3_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item4_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item5_Name IS NOT NULL THEN 1 ELSE 0 END) AS ItemCount
FROM Orders;
 
-- With proper normalization (SIMPLE):
SELECT OrderID, COUNT(*) AS ItemCount
FROM OrderItems
GROUP BY OrderID;
 
 
-- SIMPLE TASK: Calculate total revenue per order
-- With repeating groups (BAD):
SELECT OrderID,
    COALESCE(Item1_Qty * Item1_Price, 0) +
    COALESCE(Item2_Qty * Item2_Price, 0) +
    COALESCE(Item3_Qty * Item3_Price, 0) +
    COALESCE(Item4_Qty * Item4_Price, 0) +
    COALESCE(Item5_Qty * Item5_Price, 0) AS TotalRevenue
FROM Orders;
 
-- With proper normalization (SIMPLE):
SELECT OrderID, SUM(Quantity * Price) AS TotalRevenue
FROM OrderItems
GROUP BY OrderID;

The Maintenance Multiplier

Every query, stored procedure, application function, and report that touches a repeating group must be updated when columns are added. If you have 50 queries referencing Item1-Item5, adding Item6 means modifying 50 queries. In a properly normalized design, zero queries need modification.

The Transformation Process

Eliminating repeating groups follows a systematic process. The core idea is to create a new table where each occurrence of the repeating group becomes a separate row, linked back to the original entity by a foreign key.

The general algorithm:

Identify the repeating group — Determine which columns form the group and what they represent
Create a new table — Name it to represent what each row will contain (e.g., OrderItems)
Include a foreign key — Add a column for the primary key of the original table
Include group columns once — Add each distinct attribute from the group as a single column
Determine the primary key — Usually the foreign key plus a distinguishing attribute or sequence number
Migrate data — Transform horizontal data into vertical rows
Remove repeating columns — Delete the original repeating columns from the parent table

Complete transformation example:

Let's transform the Orders table with repeating items:

transformation-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- BEFORE: Table with repeating groups
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2)
);
 
-- AFTER: Two normalized tables
 
-- Step 1: Create parent table (without repeating columns)
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE
);
 
-- Step 2: Create child table for repeating group
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,  -- Surrogate key
    OrderID INT NOT NULL,                        -- Foreign key
    ItemName VARCHAR(100) NOT NULL,              -- Was Item#_Name
    Quantity INT NOT NULL,                       -- Was Item#_Qty
    Price DECIMAL(10,2) NOT NULL,                -- Was Item#_Price
    LineNumber INT,                              -- Optional: preserves order
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
        ON DELETE CASCADE
);
 
-- Alternative: Natural composite key
CREATE TABLE OrderItems_Alt (
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,      -- 1, 2, 3... for each order
    ItemName VARCHAR(100) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

Surrogate vs. Natural Keys

For child tables created from repeating groups, you have two key options: (1) A surrogate key (auto-increment ID) with a separate foreign key column, or (2) A composite natural key using the foreign key plus a sequence/line number. Surrogate keys are simpler for joins; natural keys enforce uniqueness semantically. Choose based on your application patterns.

Data Migration Strategies

After creating the new schema, you must migrate existing data from the repeating group format to the normalized format. This involves "unpivoting" horizontal data into vertical rows.

Migration approach 1: UNION ALL

The most straightforward method uses UNION ALL to combine data from each repeating column position:

migration-union.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Migrate data using UNION ALL
INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
SELECT OrderID, Item1_Name, Item1_Qty, Item1_Price, 1
FROM Orders_Old
WHERE Item1_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, Item2_Name, Item2_Qty, Item2_Price, 2
FROM Orders_Old
WHERE Item2_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, Item3_Name, Item3_Qty, Item3_Price, 3
FROM Orders_Old
WHERE Item3_Name IS NOT NULL;
 
-- Migrate parent table (simple copy)
INSERT INTO Orders (OrderID, CustomerName, OrderDate)
SELECT OrderID, CustomerName, OrderDate
FROM Orders_Old;

Migration approach 2: UNPIVOT (SQL Server/Oracle)

Some databases provide UNPIVOT syntax for this transformation:

migration-unpivot.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- SQL Server UNPIVOT approach
-- First, restructure to have matching column pairs
WITH Unpivoted AS (
    SELECT 
        OrderID,
        CASE Position
            WHEN 1 THEN Item1_Name
            WHEN 2 THEN Item2_Name
            WHEN 3 THEN Item3_Name
        END AS ItemName,
        CASE Position
            WHEN 1 THEN Item1_Qty
            WHEN 2 THEN Item2_Qty
            WHEN 3 THEN Item3_Qty
        END AS Quantity,
        CASE Position
            WHEN 1 THEN Item1_Price
            WHEN 2 THEN Item2_Price
            WHEN 3 THEN Item3_Price
        END AS Price,
        Position AS LineNumber
    FROM Orders_Old
    CROSS JOIN (SELECT 1 AS Position UNION SELECT 2 UNION SELECT 3) Positions
)
INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
SELECT OrderID, ItemName, Quantity, Price, LineNumber
FROM Unpivoted
WHERE ItemName IS NOT NULL;

Migration approach 3: Procedural/ETL

For complex transformations or very large datasets, a procedural approach with batch processing:

migration-procedural.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL PL/pgSQL procedural migration
DO $$
DECLARE
    order_rec RECORD;
BEGIN
    FOR order_rec IN SELECT * FROM Orders_Old LOOP
        -- Insert parent record
        INSERT INTO Orders (OrderID, CustomerName, OrderDate)
        VALUES (order_rec.OrderID, order_rec.CustomerName, order_rec.OrderDate);
        
        -- Insert each non-null item
        IF order_rec.Item1_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item1_Name, 
                    order_rec.Item1_Qty, order_rec.Item1_Price, 1);
        END IF;
        
        IF order_rec.Item2_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item2_Name, 
                    order_rec.Item2_Qty, order_rec.Item2_Price, 2);
        END IF;
        
        IF order_rec.Item3_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item3_Name, 
                    order_rec.Item3_Qty, order_rec.Item3_Price, 3);
        END IF;
        
        -- Commit in batches for large datasets
        -- IF order_rec.OrderID % 1000 = 0 THEN COMMIT; END IF;
    END LOOP;
END $$;

Migration Validation

After migration, validate that: (1) The count of non-null items matches the row count in the new table, (2) Sum totals match between old and new schemas, (3) Referential integrity is intact. Always perform migration in a transaction that can be rolled back, and test thoroughly in a non-production environment first.

Complex Repeating Group Patterns

Some repeating groups are more complex than simple numbered columns. These require more sophisticated transformations.

Pattern 1: Nested Repeating Groups

When repeating groups contain their own repeating groups:

nested-repeating.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- BEFORE: Double-nested repeating groups
CREATE TABLE Projects_Bad (
    ProjectID INT PRIMARY KEY,
    ProjectName VARCHAR(100),
    -- First level: Team members (repeating group)
    Member1_Name VARCHAR(100),
    Member1_Role VARCHAR(50),
    Member1_Skill1 VARCHAR(50),   -- Second level: Skills per member
    Member1_Skill2 VARCHAR(50),
    Member1_Skill3 VARCHAR(50),
    Member2_Name VARCHAR(100),
    Member2_Role VARCHAR(50),
    Member2_Skill1 VARCHAR(50),
    Member2_Skill2 VARCHAR(50),
    Member2_Skill3 VARCHAR(50)
);
 
-- AFTER: Three normalized tables
CREATE TABLE Projects (
    ProjectID INT PRIMARY KEY,
    ProjectName VARCHAR(100)
);
 
CREATE TABLE ProjectMembers (
    MemberAssignmentID INT PRIMARY KEY,
    ProjectID INT NOT NULL,
    MemberName VARCHAR(100) NOT NULL,
    Role VARCHAR(50),
    FOREIGN KEY (ProjectID) REFERENCES Projects(ProjectID)
);
 
CREATE TABLE MemberSkills (
    MemberAssignmentID INT NOT NULL,
    Skill VARCHAR(50) NOT NULL,
    PRIMARY KEY (MemberAssignmentID, Skill),
    FOREIGN KEY (MemberAssignmentID) REFERENCES ProjectMembers(MemberAssignmentID)
);

Pattern 2: Cross-product Repeating Groups

When two dimensions create a grid of repeating values:

cross-product-repeating.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- BEFORE: Product-by-Region sales matrix
CREATE TABLE SalesMatrix_Bad (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Sales_North_Q1 DECIMAL(10,2),
    Sales_North_Q2 DECIMAL(10,2),
    Sales_North_Q3 DECIMAL(10,2),
    Sales_North_Q4 DECIMAL(10,2),
    Sales_South_Q1 DECIMAL(10,2),
    Sales_South_Q2 DECIMAL(10,2),
    Sales_South_Q3 DECIMAL(10,2),
    Sales_South_Q4 DECIMAL(10,2),
    Sales_East_Q1 DECIMAL(10,2),
    Sales_East_Q2 DECIMAL(10,2),
    Sales_East_Q3 DECIMAL(10,2),
    Sales_East_Q4 DECIMAL(10,2),
    Sales_West_Q1 DECIMAL(10,2),
    Sales_West_Q2 DECIMAL(10,2),
    Sales_West_Q3 DECIMAL(10,2),
    Sales_West_Q4 DECIMAL(10,2)
);
 
-- AFTER: Fact table with dimension references
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100)
);
 
CREATE TABLE Regions (
    RegionCode VARCHAR(10) PRIMARY KEY,
    RegionName VARCHAR(50)
);
 
CREATE TABLE Quarters (
    QuarterCode VARCHAR(10) PRIMARY KEY,  -- 'Q1', 'Q2', etc.
    StartMonth INT,
    EndMonth INT
);
 
CREATE TABLE Sales (
    ProductID INT NOT NULL,
    RegionCode VARCHAR(10) NOT NULL,
    QuarterCode VARCHAR(10) NOT NULL,
    SalesAmount DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (ProductID, RegionCode, QuarterCode),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
    FOREIGN KEY (RegionCode) REFERENCES Regions(RegionCode),
    FOREIGN KEY (QuarterCode) REFERENCES Quarters(QuarterCode)
);

Dimensional Modeling Connection

The normalized structure for cross-product repeating groups looks similar to dimensional modeling (star schema). This is not a coincidence—dimensional modeling and normalization share the same theoretical foundation. The Sales table above is a fact table with dimension foreign keys, a pattern you'll see in data warehousing.

Preserving Semantics During Transformation

When eliminating repeating groups, you must ensure that no information is lost and that the meaning of the data is preserved. This requires attention to several considerations:

Semantic Preservation Checklist

•Ordering information — If the order of items matters (Item1 before Item2), add a SequenceNumber or LineNumber column to the new table to preserve this ordering.
•Optional vs. Required — If some repeating positions are always filled (e.g., every order has at least one item), ensure NOT NULL constraints reflect this in the new table.
•Position-specific semantics — If positions have meaning beyond ordering (e.g., Primary Contact vs. Secondary Contact), add a ContactType column rather than just a sequence number.
•Default values — If missing values in repeating columns had implicit defaults, make these explicit in the new schema or data.
•Business rules — If rules like 'first item gets special discount' existed, ensure application logic is updated to work with the new structure.

semantic-preservation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- BEFORE: Labeled contact columns with semantic meaning
CREATE TABLE Companies_Old (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100),
    PrimaryContact_Name VARCHAR(100),
    PrimaryContact_Email VARCHAR(100),
    PrimaryContact_Phone VARCHAR(20),
    BillingContact_Name VARCHAR(100),
    BillingContact_Email VARCHAR(100),
    BillingContact_Phone VARCHAR(20),
    TechnicalContact_Name VARCHAR(100),
    TechnicalContact_Email VARCHAR(100),
    TechnicalContact_Phone VARCHAR(20)
);
 
-- AFTER: Preserve contact type semantics
CREATE TABLE Companies (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100)
);
 
CREATE TABLE CompanyContacts (
    ContactID INT PRIMARY KEY AUTO_INCREMENT,
    CompanyID INT NOT NULL,
    ContactType VARCHAR(20) NOT NULL,  -- 'Primary', 'Billing', 'Technical'
    ContactName VARCHAR(100) NOT NULL,
    Email VARCHAR(100),
    Phone VARCHAR(20),
    FOREIGN KEY (CompanyID) REFERENCES Companies(CompanyID),
    -- Ensure only one of each type per company
    UNIQUE (CompanyID, ContactType)
);
 
-- Now you can:
-- 1. Add new contact types without schema changes
-- 2. Query all contacts of a specific type across companies
-- 3. Enforce that each company has exactly one primary contact
-- 4. Allow companies with different contact configurations

Semantic Column Becomes Data

Notice how 'Primary', 'Billing', 'Technical'—which were embedded in column names—became values in the ContactType column. This is a key normalization pattern: semantic information hidden in schema becomes explicit data that can be queried, indexed, and extended without DDL changes.

Summary: Eliminating Repeating Groups

Repeating groups represent horizontal duplication of the same logical attribute across multiple columns. Eliminating them is essential for 1NF compliance and practical database manageability. Let's consolidate the key learnings:

Key Takeaways

•Repeating groups are columns representing the same attribute — Numbered columns (Item1, Item2, Item3), temporal columns (Jan, Feb, Mar), and categorical columns (Sales_North, Sales_South) are all forms of repeating groups.
•They impose arbitrary capacity limits — The schema determines maximum occurrences, and changing that limit requires DDL modifications plus query/application updates.
•They severely complicate queries — Simple operations like 'find all orders with laptops' require ORing across multiple columns. Aggregations require complex expressions.
•The transformation creates a new child table — Move the repeating columns to a new table with a foreign key to the parent table. Each repeating instance becomes a separate row.
•Preserve ordering and semantics — Add sequence numbers if order matters, and add type columns if positions had semantic meaning beyond sequence.
•Migration requires unpivoting — Use UNION ALL, UNPIVOT, or procedural code to transform horizontal data into vertical rows during migration.

What's next:

Sometimes existing tables have both repeating groups AND nested structures that need to be "flattened" before proper normalization can proceed. The next page explores table flattening techniques—how to take denormalized, hierarchical data and restructure it into clean relational tables.

Page Complete

You can now identify repeating group patterns in database designs, understand the problems they cause, and systematically transform them into properly normalized structures. Next, we'll explore flattening more complex table structures.

2 / 5

Loading learning content...

Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

2 / 5

Eliminating Repeating Groups

The Hidden Structure Problem

Understanding and eliminating repeating groups is essential not just for 1NF compliance, but for creating databases that can evolve gracefully as requirements change.

What You Will Learn

Defining Repeating Groups

Formal definition:

A repeating group exists when:

Multiple columns in a table represent the same logical attribute
These columns are typically numbered or ordered (e.g., Item1, Item2, Item3)
The number of columns limits the maximum occurrences of that data
The columns share a semantic relationship that should be modeled as a one-to-many relationship

The classic example:

repeating-group-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- REPEATING GROUP VIOLATION
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    -- Repeating group: Items 1-5
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2),
    Item4_Name VARCHAR(100),
    Item4_Qty INT,
    Item4_Price DECIMAL(10,2),
    Item5_Name VARCHAR(100),
    Item5_Qty INT,
    Item5_Price DECIMAL(10,2)
);
 
-- Sample data
INSERT INTO Orders VALUES (
    1001,
    'Alice Johnson',
    '2024-03-15',
    'Laptop', 1, 999.99,
    'Mouse', 2, 29.99,
    'Keyboard', 1, 79.99,
    NULL, NULL, NULL,  -- Unused slots
    NULL, NULL, NULL
);

Why this happens:

Repeating groups typically arise from:

Spreadsheet thinking — Users comfortable with Excel naturally create column structures that look like spreadsheet layouts
Avoiding complexity — Developers may avoid creating separate tables to keep the schema "simple"
Fixed assumptions — "An order will never have more than 5 items" (until it does)
Legacy migration — Importing flat file data without restructuring
Report-driven design — Creating tables that match report layouts rather than data relationships

The Schema Ceiling Problem

Recognizing Repeating Group Patterns

Repeating groups can be subtle. Not all cases involve obviously numbered columns. Here are the common patterns to watch for:

The most obvious pattern: columns with numeric suffixes that represent the same logical attribute.

Examples:

numbered-columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Pattern: Numbered column names
CREATE TABLE Employees (
    EmpID INT PRIMARY KEY,
    Name VARCHAR(100),
    Skill1 VARCHAR(50),     -- Repeating
    Skill2 VARCHAR(50),     -- group of
    Skill3 VARCHAR(50),     -- skills
    Skill4 VARCHAR(50),
    Dependent1_Name VARCHAR(100),   -- Repeating
    Dependent1_DOB DATE,            -- group of
    Dependent2_Name VARCHAR(100),   -- dependents
    Dependent2_DOB DATE,
    PreviousJob1_Title VARCHAR(100),    -- Repeating group
    PreviousJob1_Company VARCHAR(100),  -- of employment
    PreviousJob1_Years INT,             -- history
    PreviousJob2_Title VARCHAR(100),
    PreviousJob2_Company VARCHAR(100),
    PreviousJob2_Years INT
);
 
-- Each numbered set is a repeating group that should be
-- its own table with a foreign key to Employees

The Column Naming Test

Problems Caused by Repeating Groups

Repeating groups cause systematic problems that compound as data volume and complexity grow. Let's examine each problem category with concrete examples.

Major Problems with Repeating Groups

•Fixed Capacity Limits — The schema determines maximum occurrences. 5 item columns means max 5 items. Changing this requires DDL modifications, application updates, and often data migration.
•Wasted Storage and Sparse Data — Most rows won't use all columns. An order with 2 items has 3 empty item slots. This multiplies across millions of rows.
•Complex and Inefficient Queries — Finding 'all orders containing product X' requires checking 5+ columns with OR conditions. Aggregations require UNION or complex CASE expressions.
•Maintenance Nightmares — Adding a 6th item slot requires: ALTER TABLE, updating all INSERT/UPDATE queries, modifying all SELECT queries, updating application code, and testing everything.
•Indexing Challenges — You cannot create a single index for 'all items.' You'd need 5 separate indexes, each partially useful, wasting space and slowing writes.
•Reporting Complexity — Reports that summarize 'all items' require unpivoting data in queries, making reports harder to write and slower to execute.

Query complexity demonstration:

Let's see how repeating groups complicate what should be simple queries:

repeating-group-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Given: Orders table with Item1_Name through Item5_Name
 
-- SIMPLE TASK: Find all orders containing 'Laptop'
-- With repeating groups (BAD):
SELECT OrderID, CustomerName 
FROM Orders
WHERE Item1_Name = 'Laptop'
   OR Item2_Name = 'Laptop'
   OR Item3_Name = 'Laptop'
   OR Item4_Name = 'Laptop'
   OR Item5_Name = 'Laptop';
 
-- With proper normalization (SIMPLE):
SELECT DISTINCT o.OrderID, o.CustomerName
FROM Orders o
JOIN OrderItems oi ON o.OrderID = oi.OrderID
WHERE oi.ItemName = 'Laptop';
 
 
-- SIMPLE TASK: Count total items per order
-- With repeating groups (BAD):
SELECT OrderID,
    (CASE WHEN Item1_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item2_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item3_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item4_Name IS NOT NULL THEN 1 ELSE 0 END +
     CASE WHEN Item5_Name IS NOT NULL THEN 1 ELSE 0 END) AS ItemCount
FROM Orders;
 
-- With proper normalization (SIMPLE):
SELECT OrderID, COUNT(*) AS ItemCount
FROM OrderItems
GROUP BY OrderID;
 
 
-- SIMPLE TASK: Calculate total revenue per order
-- With repeating groups (BAD):
SELECT OrderID,
    COALESCE(Item1_Qty * Item1_Price, 0) +
    COALESCE(Item2_Qty * Item2_Price, 0) +
    COALESCE(Item3_Qty * Item3_Price, 0) +
    COALESCE(Item4_Qty * Item4_Price, 0) +
    COALESCE(Item5_Qty * Item5_Price, 0) AS TotalRevenue
FROM Orders;
 
-- With proper normalization (SIMPLE):
SELECT OrderID, SUM(Quantity * Price) AS TotalRevenue
FROM OrderItems
GROUP BY OrderID;

The Maintenance Multiplier

The Transformation Process

The general algorithm:

Identify the repeating group — Determine which columns form the group and what they represent
Create a new table — Name it to represent what each row will contain (e.g., OrderItems)
Include a foreign key — Add a column for the primary key of the original table
Include group columns once — Add each distinct attribute from the group as a single column
Determine the primary key — Usually the foreign key plus a distinguishing attribute or sequence number
Migrate data — Transform horizontal data into vertical rows
Remove repeating columns — Delete the original repeating columns from the parent table

Complete transformation example:

Let's transform the Orders table with repeating items:

transformation-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- BEFORE: Table with repeating groups
CREATE TABLE Orders_Old (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE,
    Item1_Name VARCHAR(100),
    Item1_Qty INT,
    Item1_Price DECIMAL(10,2),
    Item2_Name VARCHAR(100),
    Item2_Qty INT,
    Item2_Price DECIMAL(10,2),
    Item3_Name VARCHAR(100),
    Item3_Qty INT,
    Item3_Price DECIMAL(10,2)
);
 
-- AFTER: Two normalized tables
 
-- Step 1: Create parent table (without repeating columns)
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    OrderDate DATE
);
 
-- Step 2: Create child table for repeating group
CREATE TABLE OrderItems (
    OrderItemID INT PRIMARY KEY AUTO_INCREMENT,  -- Surrogate key
    OrderID INT NOT NULL,                        -- Foreign key
    ItemName VARCHAR(100) NOT NULL,              -- Was Item#_Name
    Quantity INT NOT NULL,                       -- Was Item#_Qty
    Price DECIMAL(10,2) NOT NULL,                -- Was Item#_Price
    LineNumber INT,                              -- Optional: preserves order
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
        ON DELETE CASCADE
);
 
-- Alternative: Natural composite key
CREATE TABLE OrderItems_Alt (
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,      -- 1, 2, 3... for each order
    ItemName VARCHAR(100) NOT NULL,
    Quantity INT NOT NULL,
    Price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

Surrogate vs. Natural Keys

Data Migration Strategies

After creating the new schema, you must migrate existing data from the repeating group format to the normalized format. This involves "unpivoting" horizontal data into vertical rows.

Migration approach 1: UNION ALL

The most straightforward method uses UNION ALL to combine data from each repeating column position:

migration-union.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Migrate data using UNION ALL
INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
SELECT OrderID, Item1_Name, Item1_Qty, Item1_Price, 1
FROM Orders_Old
WHERE Item1_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, Item2_Name, Item2_Qty, Item2_Price, 2
FROM Orders_Old
WHERE Item2_Name IS NOT NULL
 
UNION ALL
 
SELECT OrderID, Item3_Name, Item3_Qty, Item3_Price, 3
FROM Orders_Old
WHERE Item3_Name IS NOT NULL;
 
-- Migrate parent table (simple copy)
INSERT INTO Orders (OrderID, CustomerName, OrderDate)
SELECT OrderID, CustomerName, OrderDate
FROM Orders_Old;

Migration approach 2: UNPIVOT (SQL Server/Oracle)

Some databases provide UNPIVOT syntax for this transformation:

migration-unpivot.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- SQL Server UNPIVOT approach
-- First, restructure to have matching column pairs
WITH Unpivoted AS (
    SELECT 
        OrderID,
        CASE Position
            WHEN 1 THEN Item1_Name
            WHEN 2 THEN Item2_Name
            WHEN 3 THEN Item3_Name
        END AS ItemName,
        CASE Position
            WHEN 1 THEN Item1_Qty
            WHEN 2 THEN Item2_Qty
            WHEN 3 THEN Item3_Qty
        END AS Quantity,
        CASE Position
            WHEN 1 THEN Item1_Price
            WHEN 2 THEN Item2_Price
            WHEN 3 THEN Item3_Price
        END AS Price,
        Position AS LineNumber
    FROM Orders_Old
    CROSS JOIN (SELECT 1 AS Position UNION SELECT 2 UNION SELECT 3) Positions
)
INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
SELECT OrderID, ItemName, Quantity, Price, LineNumber
FROM Unpivoted
WHERE ItemName IS NOT NULL;

Migration approach 3: Procedural/ETL

For complex transformations or very large datasets, a procedural approach with batch processing:

migration-procedural.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL PL/pgSQL procedural migration
DO $$
DECLARE
    order_rec RECORD;
BEGIN
    FOR order_rec IN SELECT * FROM Orders_Old LOOP
        -- Insert parent record
        INSERT INTO Orders (OrderID, CustomerName, OrderDate)
        VALUES (order_rec.OrderID, order_rec.CustomerName, order_rec.OrderDate);
        
        -- Insert each non-null item
        IF order_rec.Item1_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item1_Name, 
                    order_rec.Item1_Qty, order_rec.Item1_Price, 1);
        END IF;
        
        IF order_rec.Item2_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item2_Name, 
                    order_rec.Item2_Qty, order_rec.Item2_Price, 2);
        END IF;
        
        IF order_rec.Item3_Name IS NOT NULL THEN
            INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)
            VALUES (order_rec.OrderID, order_rec.Item3_Name, 
                    order_rec.Item3_Qty, order_rec.Item3_Price, 3);
        END IF;
        
        -- Commit in batches for large datasets
        -- IF order_rec.OrderID % 1000 = 0 THEN COMMIT; END IF;
    END LOOP;
END $$;

Migration Validation

Complex Repeating Group Patterns

Some repeating groups are more complex than simple numbered columns. These require more sophisticated transformations.

Pattern 1: Nested Repeating Groups

When repeating groups contain their own repeating groups:

nested-repeating.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- BEFORE: Double-nested repeating groups
CREATE TABLE Projects_Bad (
    ProjectID INT PRIMARY KEY,
    ProjectName VARCHAR(100),
    -- First level: Team members (repeating group)
    Member1_Name VARCHAR(100),
    Member1_Role VARCHAR(50),
    Member1_Skill1 VARCHAR(50),   -- Second level: Skills per member
    Member1_Skill2 VARCHAR(50),
    Member1_Skill3 VARCHAR(50),
    Member2_Name VARCHAR(100),
    Member2_Role VARCHAR(50),
    Member2_Skill1 VARCHAR(50),
    Member2_Skill2 VARCHAR(50),
    Member2_Skill3 VARCHAR(50)
);
 
-- AFTER: Three normalized tables
CREATE TABLE Projects (
    ProjectID INT PRIMARY KEY,
    ProjectName VARCHAR(100)
);
 
CREATE TABLE ProjectMembers (
    MemberAssignmentID INT PRIMARY KEY,
    ProjectID INT NOT NULL,
    MemberName VARCHAR(100) NOT NULL,
    Role VARCHAR(50),
    FOREIGN KEY (ProjectID) REFERENCES Projects(ProjectID)
);
 
CREATE TABLE MemberSkills (
    MemberAssignmentID INT NOT NULL,
    Skill VARCHAR(50) NOT NULL,
    PRIMARY KEY (MemberAssignmentID, Skill),
    FOREIGN KEY (MemberAssignmentID) REFERENCES ProjectMembers(MemberAssignmentID)
);

Pattern 2: Cross-product Repeating Groups

When two dimensions create a grid of repeating values:

cross-product-repeating.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- BEFORE: Product-by-Region sales matrix
CREATE TABLE SalesMatrix_Bad (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Sales_North_Q1 DECIMAL(10,2),
    Sales_North_Q2 DECIMAL(10,2),
    Sales_North_Q3 DECIMAL(10,2),
    Sales_North_Q4 DECIMAL(10,2),
    Sales_South_Q1 DECIMAL(10,2),
    Sales_South_Q2 DECIMAL(10,2),
    Sales_South_Q3 DECIMAL(10,2),
    Sales_South_Q4 DECIMAL(10,2),
    Sales_East_Q1 DECIMAL(10,2),
    Sales_East_Q2 DECIMAL(10,2),
    Sales_East_Q3 DECIMAL(10,2),
    Sales_East_Q4 DECIMAL(10,2),
    Sales_West_Q1 DECIMAL(10,2),
    Sales_West_Q2 DECIMAL(10,2),
    Sales_West_Q3 DECIMAL(10,2),
    Sales_West_Q4 DECIMAL(10,2)
);
 
-- AFTER: Fact table with dimension references
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100)
);
 
CREATE TABLE Regions (
    RegionCode VARCHAR(10) PRIMARY KEY,
    RegionName VARCHAR(50)
);
 
CREATE TABLE Quarters (
    QuarterCode VARCHAR(10) PRIMARY KEY,  -- 'Q1', 'Q2', etc.
    StartMonth INT,
    EndMonth INT
);
 
CREATE TABLE Sales (
    ProductID INT NOT NULL,
    RegionCode VARCHAR(10) NOT NULL,
    QuarterCode VARCHAR(10) NOT NULL,
    SalesAmount DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (ProductID, RegionCode, QuarterCode),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
    FOREIGN KEY (RegionCode) REFERENCES Regions(RegionCode),
    FOREIGN KEY (QuarterCode) REFERENCES Quarters(QuarterCode)
);

Dimensional Modeling Connection

Preserving Semantics During Transformation

When eliminating repeating groups, you must ensure that no information is lost and that the meaning of the data is preserved. This requires attention to several considerations:

Semantic Preservation Checklist

•Ordering information — If the order of items matters (Item1 before Item2), add a SequenceNumber or LineNumber column to the new table to preserve this ordering.
•Optional vs. Required — If some repeating positions are always filled (e.g., every order has at least one item), ensure NOT NULL constraints reflect this in the new table.
•Position-specific semantics — If positions have meaning beyond ordering (e.g., Primary Contact vs. Secondary Contact), add a ContactType column rather than just a sequence number.
•Default values — If missing values in repeating columns had implicit defaults, make these explicit in the new schema or data.
•Business rules — If rules like 'first item gets special discount' existed, ensure application logic is updated to work with the new structure.

semantic-preservation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- BEFORE: Labeled contact columns with semantic meaning
CREATE TABLE Companies_Old (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100),
    PrimaryContact_Name VARCHAR(100),
    PrimaryContact_Email VARCHAR(100),
    PrimaryContact_Phone VARCHAR(20),
    BillingContact_Name VARCHAR(100),
    BillingContact_Email VARCHAR(100),
    BillingContact_Phone VARCHAR(20),
    TechnicalContact_Name VARCHAR(100),
    TechnicalContact_Email VARCHAR(100),
    TechnicalContact_Phone VARCHAR(20)
);
 
-- AFTER: Preserve contact type semantics
CREATE TABLE Companies (
    CompanyID INT PRIMARY KEY,
    CompanyName VARCHAR(100)
);
 
CREATE TABLE CompanyContacts (
    ContactID INT PRIMARY KEY AUTO_INCREMENT,
    CompanyID INT NOT NULL,
    ContactType VARCHAR(20) NOT NULL,  -- 'Primary', 'Billing', 'Technical'
    ContactName VARCHAR(100) NOT NULL,
    Email VARCHAR(100),
    Phone VARCHAR(20),
    FOREIGN KEY (CompanyID) REFERENCES Companies(CompanyID),
    -- Ensure only one of each type per company
    UNIQUE (CompanyID, ContactType)
);
 
-- Now you can:
-- 1. Add new contact types without schema changes
-- 2. Query all contacts of a specific type across companies
-- 3. Enforce that each company has exactly one primary contact
-- 4. Allow companies with different contact configurations

Semantic Column Becomes Data

Summary: Eliminating Repeating Groups

Key Takeaways

•Repeating groups are columns representing the same attribute — Numbered columns (Item1, Item2, Item3), temporal columns (Jan, Feb, Mar), and categorical columns (Sales_North, Sales_South) are all forms of repeating groups.
•They impose arbitrary capacity limits — The schema determines maximum occurrences, and changing that limit requires DDL modifications plus query/application updates.
•They severely complicate queries — Simple operations like 'find all orders with laptops' require ORing across multiple columns. Aggregations require complex expressions.
•The transformation creates a new child table — Move the repeating columns to a new table with a foreign key to the parent table. Each repeating instance becomes a separate row.
•Preserve ordering and semantics — Add sequence numbers if order matters, and add type columns if positions had semantic meaning beyond sequence.
•Migration requires unpivoting — Use UNION ALL, UNPIVOT, or procedural code to transform horizontal data into vertical rows during migration.

What's next:

Page Complete

2 / 5