Loading learning content...
While atomicity ensures that individual values are indivisible, there's another structural pattern that violates First Normal Form: repeating groups. This occurs when a table contains multiple columns (or sets of columns) that represent the same type of data—essentially embedding a list horizontally within a single row.
Repeating groups are seductive because they seem to simplify data entry and display. A spreadsheet user might naturally create columns like Product1, Product2, Product3 for an order. But this horizontal duplication creates fundamental problems for the relational model and represents one of the most common design flaws in databases created by developers without formal training.
Understanding and eliminating repeating groups is essential not just for 1NF compliance, but for creating databases that can evolve gracefully as requirements change.
By the end of this page, you will understand what repeating groups are and why they violate 1NF, how to identify repeating group patterns in existing schemas, the specific problems repeating groups cause for queries and maintenance, systematic techniques for transforming repeating groups into proper relational structures, and how to preserve data integrity during the transformation process.
A repeating group is a set of columns that contains multiple instances of the same type of data. Unlike non-atomic values (which pack multiple values into a single cell), repeating groups spread related values across multiple columns in the same row.
Formal definition:
A repeating group exists when:
Item1, Item2, Item3)The classic example:
12345678910111213141516171819202122232425262728293031323334
-- REPEATING GROUP VIOLATIONCREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerName VARCHAR(100), OrderDate DATE, -- Repeating group: Items 1-5 Item1_Name VARCHAR(100), Item1_Qty INT, Item1_Price DECIMAL(10,2), Item2_Name VARCHAR(100), Item2_Qty INT, Item2_Price DECIMAL(10,2), Item3_Name VARCHAR(100), Item3_Qty INT, Item3_Price DECIMAL(10,2), Item4_Name VARCHAR(100), Item4_Qty INT, Item4_Price DECIMAL(10,2), Item5_Name VARCHAR(100), Item5_Qty INT, Item5_Price DECIMAL(10,2)); -- Sample dataINSERT INTO Orders VALUES ( 1001, 'Alice Johnson', '2024-03-15', 'Laptop', 1, 999.99, 'Mouse', 2, 29.99, 'Keyboard', 1, 79.99, NULL, NULL, NULL, -- Unused slots NULL, NULL, NULL);In this example, the columns Item1_Name, Item1_Qty, Item1_Price through Item5_* form a repeating group. Each set of three columns represents the same concept (an order line item), repeated five times.
Why this happens:
Repeating groups typically arise from:
A fundamental flaw of repeating groups is that they impose an arbitrary maximum on your data. With Item1 through Item5, you cannot have a 6-item order without schema modification. And when that modification comes, you must update all queries, application code, reports, and possibly migrate existing data.
Repeating groups can be subtle. Not all cases involve obviously numbered columns. Here are the common patterns to watch for:
The most obvious pattern: columns with numeric suffixes that represent the same logical attribute.
Examples:
12345678910111213141516171819202122
-- Pattern: Numbered column namesCREATE TABLE Employees ( EmpID INT PRIMARY KEY, Name VARCHAR(100), Skill1 VARCHAR(50), -- Repeating Skill2 VARCHAR(50), -- group of Skill3 VARCHAR(50), -- skills Skill4 VARCHAR(50), Dependent1_Name VARCHAR(100), -- Repeating Dependent1_DOB DATE, -- group of Dependent2_Name VARCHAR(100), -- dependents Dependent2_DOB DATE, PreviousJob1_Title VARCHAR(100), -- Repeating group PreviousJob1_Company VARCHAR(100), -- of employment PreviousJob1_Years INT, -- history PreviousJob2_Title VARCHAR(100), PreviousJob2_Company VARCHAR(100), PreviousJob2_Years INT); -- Each numbered set is a repeating group that should be-- its own table with a foreign key to EmployeesIf you find yourself naming columns with numbers (Skill1, Skill2) or with prefixes/suffixes that distinguish instances of the same thing (Phone_Home, Phone_Work, Phone_Mobile), you likely have a repeating group. Ask yourself: 'Are these columns all the same attribute, just for different instances?' If yes, it's a repeating group.
Repeating groups cause systematic problems that compound as data volume and complexity grow. Let's examine each problem category with concrete examples.
Query complexity demonstration:
Let's see how repeating groups complicate what should be simple queries:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- Given: Orders table with Item1_Name through Item5_Name -- SIMPLE TASK: Find all orders containing 'Laptop'-- With repeating groups (BAD):SELECT OrderID, CustomerName FROM OrdersWHERE Item1_Name = 'Laptop' OR Item2_Name = 'Laptop' OR Item3_Name = 'Laptop' OR Item4_Name = 'Laptop' OR Item5_Name = 'Laptop'; -- With proper normalization (SIMPLE):SELECT DISTINCT o.OrderID, o.CustomerNameFROM Orders oJOIN OrderItems oi ON o.OrderID = oi.OrderIDWHERE oi.ItemName = 'Laptop'; -- SIMPLE TASK: Count total items per order-- With repeating groups (BAD):SELECT OrderID, (CASE WHEN Item1_Name IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN Item2_Name IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN Item3_Name IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN Item4_Name IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN Item5_Name IS NOT NULL THEN 1 ELSE 0 END) AS ItemCountFROM Orders; -- With proper normalization (SIMPLE):SELECT OrderID, COUNT(*) AS ItemCountFROM OrderItemsGROUP BY OrderID; -- SIMPLE TASK: Calculate total revenue per order-- With repeating groups (BAD):SELECT OrderID, COALESCE(Item1_Qty * Item1_Price, 0) + COALESCE(Item2_Qty * Item2_Price, 0) + COALESCE(Item3_Qty * Item3_Price, 0) + COALESCE(Item4_Qty * Item4_Price, 0) + COALESCE(Item5_Qty * Item5_Price, 0) AS TotalRevenueFROM Orders; -- With proper normalization (SIMPLE):SELECT OrderID, SUM(Quantity * Price) AS TotalRevenueFROM OrderItemsGROUP BY OrderID;Every query, stored procedure, application function, and report that touches a repeating group must be updated when columns are added. If you have 50 queries referencing Item1-Item5, adding Item6 means modifying 50 queries. In a properly normalized design, zero queries need modification.
Eliminating repeating groups follows a systematic process. The core idea is to create a new table where each occurrence of the repeating group becomes a separate row, linked back to the original entity by a foreign key.
The general algorithm:
OrderItems)Complete transformation example:
Let's transform the Orders table with repeating items:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- BEFORE: Table with repeating groupsCREATE TABLE Orders_Old ( OrderID INT PRIMARY KEY, CustomerName VARCHAR(100), OrderDate DATE, Item1_Name VARCHAR(100), Item1_Qty INT, Item1_Price DECIMAL(10,2), Item2_Name VARCHAR(100), Item2_Qty INT, Item2_Price DECIMAL(10,2), Item3_Name VARCHAR(100), Item3_Qty INT, Item3_Price DECIMAL(10,2)); -- AFTER: Two normalized tables -- Step 1: Create parent table (without repeating columns)CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerName VARCHAR(100), OrderDate DATE); -- Step 2: Create child table for repeating groupCREATE TABLE OrderItems ( OrderItemID INT PRIMARY KEY AUTO_INCREMENT, -- Surrogate key OrderID INT NOT NULL, -- Foreign key ItemName VARCHAR(100) NOT NULL, -- Was Item#_Name Quantity INT NOT NULL, -- Was Item#_Qty Price DECIMAL(10,2) NOT NULL, -- Was Item#_Price LineNumber INT, -- Optional: preserves order FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE); -- Alternative: Natural composite keyCREATE TABLE OrderItems_Alt ( OrderID INT NOT NULL, LineNumber INT NOT NULL, -- 1, 2, 3... for each order ItemName VARCHAR(100) NOT NULL, Quantity INT NOT NULL, Price DECIMAL(10,2) NOT NULL, PRIMARY KEY (OrderID, LineNumber), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID));For child tables created from repeating groups, you have two key options: (1) A surrogate key (auto-increment ID) with a separate foreign key column, or (2) A composite natural key using the foreign key plus a sequence/line number. Surrogate keys are simpler for joins; natural keys enforce uniqueness semantically. Choose based on your application patterns.
After creating the new schema, you must migrate existing data from the repeating group format to the normalized format. This involves "unpivoting" horizontal data into vertical rows.
Migration approach 1: UNION ALL
The most straightforward method uses UNION ALL to combine data from each repeating column position:
12345678910111213141516171819202122
-- Migrate data using UNION ALLINSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)SELECT OrderID, Item1_Name, Item1_Qty, Item1_Price, 1FROM Orders_OldWHERE Item1_Name IS NOT NULL UNION ALL SELECT OrderID, Item2_Name, Item2_Qty, Item2_Price, 2FROM Orders_OldWHERE Item2_Name IS NOT NULL UNION ALL SELECT OrderID, Item3_Name, Item3_Qty, Item3_Price, 3FROM Orders_OldWHERE Item3_Name IS NOT NULL; -- Migrate parent table (simple copy)INSERT INTO Orders (OrderID, CustomerName, OrderDate)SELECT OrderID, CustomerName, OrderDateFROM Orders_Old;Migration approach 2: UNPIVOT (SQL Server/Oracle)
Some databases provide UNPIVOT syntax for this transformation:
12345678910111213141516171819202122232425262728
-- SQL Server UNPIVOT approach-- First, restructure to have matching column pairsWITH Unpivoted AS ( SELECT OrderID, CASE Position WHEN 1 THEN Item1_Name WHEN 2 THEN Item2_Name WHEN 3 THEN Item3_Name END AS ItemName, CASE Position WHEN 1 THEN Item1_Qty WHEN 2 THEN Item2_Qty WHEN 3 THEN Item3_Qty END AS Quantity, CASE Position WHEN 1 THEN Item1_Price WHEN 2 THEN Item2_Price WHEN 3 THEN Item3_Price END AS Price, Position AS LineNumber FROM Orders_Old CROSS JOIN (SELECT 1 AS Position UNION SELECT 2 UNION SELECT 3) Positions)INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber)SELECT OrderID, ItemName, Quantity, Price, LineNumberFROM UnpivotedWHERE ItemName IS NOT NULL;Migration approach 3: Procedural/ETL
For complex transformations or very large datasets, a procedural approach with batch processing:
123456789101112131415161718192021222324252627282930313233
-- PostgreSQL PL/pgSQL procedural migrationDO $$DECLARE order_rec RECORD;BEGIN FOR order_rec IN SELECT * FROM Orders_Old LOOP -- Insert parent record INSERT INTO Orders (OrderID, CustomerName, OrderDate) VALUES (order_rec.OrderID, order_rec.CustomerName, order_rec.OrderDate); -- Insert each non-null item IF order_rec.Item1_Name IS NOT NULL THEN INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber) VALUES (order_rec.OrderID, order_rec.Item1_Name, order_rec.Item1_Qty, order_rec.Item1_Price, 1); END IF; IF order_rec.Item2_Name IS NOT NULL THEN INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber) VALUES (order_rec.OrderID, order_rec.Item2_Name, order_rec.Item2_Qty, order_rec.Item2_Price, 2); END IF; IF order_rec.Item3_Name IS NOT NULL THEN INSERT INTO OrderItems (OrderID, ItemName, Quantity, Price, LineNumber) VALUES (order_rec.OrderID, order_rec.Item3_Name, order_rec.Item3_Qty, order_rec.Item3_Price, 3); END IF; -- Commit in batches for large datasets -- IF order_rec.OrderID % 1000 = 0 THEN COMMIT; END IF; END LOOP;END $$;After migration, validate that: (1) The count of non-null items matches the row count in the new table, (2) Sum totals match between old and new schemas, (3) Referential integrity is intact. Always perform migration in a transaction that can be rolled back, and test thoroughly in a non-production environment first.
Some repeating groups are more complex than simple numbered columns. These require more sophisticated transformations.
Pattern 1: Nested Repeating Groups
When repeating groups contain their own repeating groups:
12345678910111213141516171819202122232425262728293031323334353637
-- BEFORE: Double-nested repeating groupsCREATE TABLE Projects_Bad ( ProjectID INT PRIMARY KEY, ProjectName VARCHAR(100), -- First level: Team members (repeating group) Member1_Name VARCHAR(100), Member1_Role VARCHAR(50), Member1_Skill1 VARCHAR(50), -- Second level: Skills per member Member1_Skill2 VARCHAR(50), Member1_Skill3 VARCHAR(50), Member2_Name VARCHAR(100), Member2_Role VARCHAR(50), Member2_Skill1 VARCHAR(50), Member2_Skill2 VARCHAR(50), Member2_Skill3 VARCHAR(50)); -- AFTER: Three normalized tablesCREATE TABLE Projects ( ProjectID INT PRIMARY KEY, ProjectName VARCHAR(100)); CREATE TABLE ProjectMembers ( MemberAssignmentID INT PRIMARY KEY, ProjectID INT NOT NULL, MemberName VARCHAR(100) NOT NULL, Role VARCHAR(50), FOREIGN KEY (ProjectID) REFERENCES Projects(ProjectID)); CREATE TABLE MemberSkills ( MemberAssignmentID INT NOT NULL, Skill VARCHAR(50) NOT NULL, PRIMARY KEY (MemberAssignmentID, Skill), FOREIGN KEY (MemberAssignmentID) REFERENCES ProjectMembers(MemberAssignmentID));Pattern 2: Cross-product Repeating Groups
When two dimensions create a grid of repeating values:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- BEFORE: Product-by-Region sales matrixCREATE TABLE SalesMatrix_Bad ( ProductID INT PRIMARY KEY, ProductName VARCHAR(100), Sales_North_Q1 DECIMAL(10,2), Sales_North_Q2 DECIMAL(10,2), Sales_North_Q3 DECIMAL(10,2), Sales_North_Q4 DECIMAL(10,2), Sales_South_Q1 DECIMAL(10,2), Sales_South_Q2 DECIMAL(10,2), Sales_South_Q3 DECIMAL(10,2), Sales_South_Q4 DECIMAL(10,2), Sales_East_Q1 DECIMAL(10,2), Sales_East_Q2 DECIMAL(10,2), Sales_East_Q3 DECIMAL(10,2), Sales_East_Q4 DECIMAL(10,2), Sales_West_Q1 DECIMAL(10,2), Sales_West_Q2 DECIMAL(10,2), Sales_West_Q3 DECIMAL(10,2), Sales_West_Q4 DECIMAL(10,2)); -- AFTER: Fact table with dimension referencesCREATE TABLE Products ( ProductID INT PRIMARY KEY, ProductName VARCHAR(100)); CREATE TABLE Regions ( RegionCode VARCHAR(10) PRIMARY KEY, RegionName VARCHAR(50)); CREATE TABLE Quarters ( QuarterCode VARCHAR(10) PRIMARY KEY, -- 'Q1', 'Q2', etc. StartMonth INT, EndMonth INT); CREATE TABLE Sales ( ProductID INT NOT NULL, RegionCode VARCHAR(10) NOT NULL, QuarterCode VARCHAR(10) NOT NULL, SalesAmount DECIMAL(10,2) NOT NULL, PRIMARY KEY (ProductID, RegionCode, QuarterCode), FOREIGN KEY (ProductID) REFERENCES Products(ProductID), FOREIGN KEY (RegionCode) REFERENCES Regions(RegionCode), FOREIGN KEY (QuarterCode) REFERENCES Quarters(QuarterCode));The normalized structure for cross-product repeating groups looks similar to dimensional modeling (star schema). This is not a coincidence—dimensional modeling and normalization share the same theoretical foundation. The Sales table above is a fact table with dimension foreign keys, a pattern you'll see in data warehousing.
When eliminating repeating groups, you must ensure that no information is lost and that the meaning of the data is preserved. This requires attention to several considerations:
Item1 before Item2), add a SequenceNumber or LineNumber column to the new table to preserve this ordering.Primary Contact vs. Secondary Contact), add a ContactType column rather than just a sequence number.1234567891011121314151617181920212223242526272829303132333435363738
-- BEFORE: Labeled contact columns with semantic meaningCREATE TABLE Companies_Old ( CompanyID INT PRIMARY KEY, CompanyName VARCHAR(100), PrimaryContact_Name VARCHAR(100), PrimaryContact_Email VARCHAR(100), PrimaryContact_Phone VARCHAR(20), BillingContact_Name VARCHAR(100), BillingContact_Email VARCHAR(100), BillingContact_Phone VARCHAR(20), TechnicalContact_Name VARCHAR(100), TechnicalContact_Email VARCHAR(100), TechnicalContact_Phone VARCHAR(20)); -- AFTER: Preserve contact type semanticsCREATE TABLE Companies ( CompanyID INT PRIMARY KEY, CompanyName VARCHAR(100)); CREATE TABLE CompanyContacts ( ContactID INT PRIMARY KEY AUTO_INCREMENT, CompanyID INT NOT NULL, ContactType VARCHAR(20) NOT NULL, -- 'Primary', 'Billing', 'Technical' ContactName VARCHAR(100) NOT NULL, Email VARCHAR(100), Phone VARCHAR(20), FOREIGN KEY (CompanyID) REFERENCES Companies(CompanyID), -- Ensure only one of each type per company UNIQUE (CompanyID, ContactType)); -- Now you can:-- 1. Add new contact types without schema changes-- 2. Query all contacts of a specific type across companies-- 3. Enforce that each company has exactly one primary contact-- 4. Allow companies with different contact configurationsNotice how 'Primary', 'Billing', 'Technical'—which were embedded in column names—became values in the ContactType column. This is a key normalization pattern: semantic information hidden in schema becomes explicit data that can be queried, indexed, and extended without DDL changes.
Repeating groups represent horizontal duplication of the same logical attribute across multiple columns. Eliminating them is essential for 1NF compliance and practical database manageability. Let's consolidate the key learnings:
What's next:
Sometimes existing tables have both repeating groups AND nested structures that need to be "flattened" before proper normalization can proceed. The next page explores table flattening techniques—how to take denormalized, hierarchical data and restructure it into clean relational tables.
You can now identify repeating group patterns in database designs, understand the problems they cause, and systematically transform them into properly normalized structures. Next, we'll explore flattening more complex table structures.