Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

1 / 5

Atomicity Requirement

The Principle That Underpins All Normalization

Before a database can be normalized to Second Normal Form, Third Normal Form, or any higher normal forms, it must first satisfy a deceptively simple yet profoundly important requirement: every attribute must contain only atomic (indivisible) values.

This is the essence of First Normal Form (1NF)—the gateway to systematic database normalization. While the concept appears straightforward, its correct application requires deep understanding of what constitutes an atomic value, why atomicity matters for data integrity and query efficiency, and how violations manifest in real-world database designs.

The atomicity requirement isn't merely a theoretical constraint—it's the practical foundation that enables SQL's relational operations to work correctly and efficiently. Without atomicity, fundamental operations like SELECT, JOIN, and WHERE become ambiguous, unpredictable, or impossible.

What You Will Learn

By the end of this page, you will understand what atomic values are and why they're essential, how to recognize non-atomic attributes in database designs, the relationship between atomicity and the relational model, practical strategies for achieving atomicity in your tables, and common pitfalls that lead developers to violate atomicity.

Understanding Atomic Values

An atomic value (from the Greek atomos, meaning "indivisible") is a value that cannot be meaningfully subdivided within the context of the database application. The key phrase here is "within the context"—atomicity is not absolute; it depends on the requirements and operations your database must support.

The formal definition:

A relation is in First Normal Form (1NF) if and only if:

All attributes contain only atomic (scalar) values
There are no repeating groups or arrays within a single row
Each row is unique (identifiable by a primary key)
The order of rows and columns is insignificant

Let's focus on the first requirement—atomic values—which is the conceptual heart of 1NF.

Atomicity Is Context-Dependent

Consider a 'FullName' attribute storing 'John Smith'. Is this atomic? If your application never needs to sort by last name, filter by first name, or extract name components, then 'John Smith' is atomic for your purposes. But if you need any of these operations, then 'FullName' should be split into 'FirstName' and 'LastName'. Atomicity is determined by how you will use the data.

Examples of atomic vs. non-atomic values:

Attribute	Value	Atomic?	Reason
`Age`	`25`	✅ Yes	Single integer value
`Email`	`user@example.com`	✅ Yes	Single string, used as a unit
`PhoneNumbers`	`555-1234, 555-5678`	❌ No	Multiple values in one cell
`Address`	`123 Main St, NYC, NY 10001`	❓ Depends	Atomic if used as a unit; non-atomic if components are queried separately
`Skills`	`Java, Python, SQL`	❌ No	List of values requiring individual access
`OrderItems`	`{Widget: 3, Gadget: 2}`	❌ No	Composite/structured data

The relational model's requirement for atomicity:

E.F. Codd, the inventor of the relational model, specified that relation attributes must come from "simple domains"—domains whose elements are atomic. This wasn't an arbitrary choice; it was essential for the mathematical foundations of relational algebra.

When attributes contain atomic values:

Comparison operations are well-defined (e.g., WHERE Age > 25)
Equality testing is unambiguous (e.g., WHERE Email = 'user@example.com')
Aggregation works correctly (e.g., AVG(Age))
Indexing is straightforward and efficient
Join operations produce predictable results

When attributes contain non-atomic values, all of these operations become problematic or impossible using standard SQL.

Why Atomicity Matters

The atomicity requirement isn't bureaucratic overhead—it directly enables the capabilities that make relational databases powerful. Let's examine the concrete problems that arise when atomicity is violated.

Problems Caused by Non-Atomic Values

•Query Complexity and Ambiguity — Searching within non-atomic values requires string manipulation functions like LIKE '%value%' or SUBSTRING(), which are inefficient, error-prone, and cannot use indexes effectively.
•Data Integrity Loss — Without atomicity, you cannot enforce constraints on individual values within a composite. If a column contains 'Java, Python, SQL', how do you ensure 'Python' isn't duplicated or validate each skill against a master list?
•Update Anomalies — Modifying one value within a non-atomic attribute requires parsing, modifying, and reconstructing the entire string—a process prone to errors and race conditions.
•Aggregation Failures — Aggregate functions like COUNT(), AVG(), and SUM() cannot operate on individual values within a non-atomic attribute. How do you count employees with 'Python' skill if skills are stored as comma-separated lists?
•Index Inefficiency — B-tree indexes are designed for atomic values. A non-atomic value like 'NYC, LA, Chicago' cannot be efficiently indexed for queries like 'find all records containing LA'.
•Join Operation Breakdown — Joins rely on comparing atomic values for equality. Joining on non-atomic columns produces incorrect or meaningless results.

Illustrating the problem with a concrete example:

Consider an Employees table with a non-atomic Skills column:

non-atomic-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Non-atomic design (VIOLATES 1NF)
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    Skills VARCHAR(500)  -- Stores: 'Java, Python, SQL'
);
 
INSERT INTO Employees VALUES
(1, 'Alice', 'Java, Python, SQL'),
(2, 'Bob', 'Python, JavaScript'),
(3, 'Charlie', 'Java, C++, Python');
 
-- PROBLEM 1: Finding all employees with Python skill
-- This query is inefficient and error-prone:
SELECT * FROM Employees WHERE Skills LIKE '%Python%';
-- Also matches 'PythonScript', 'IronPython3', etc.
 
-- PROBLEM 2: Counting employees per skill
-- Impossible with standard SQL! Would need complex string parsing.
 
-- PROBLEM 3: Adding a new skill
UPDATE Employees 
SET Skills = Skills || ', Go'  -- Concatenation varies by DBMS
WHERE EmployeeID = 1;
-- Risk of double-adding, inconsistent separators, trailing commas
 
-- PROBLEM 4: Removing a skill
-- Extremely complex string manipulation required
-- What if 'Python' appears multiple times? What about whitespace?

The Performance Penalty

The query WHERE Skills LIKE '%Python%' forces a full table scan on every execution. It cannot use indexes, and as the table grows, query time grows linearly. For 1 million rows, this could take seconds instead of milliseconds. At scale, non-atomic designs cause system failures.

The Domain Perspective on Atomicity

Understanding atomicity requires thinking about domains—the sets of possible values for an attribute. In relational theory, every attribute has a domain, and the domain determines what values are valid and how they can be compared.

Formal domain definition:

A domain D is a named set of scalar (atomic) values. When we define an attribute A over domain D, we're saying:

Every value in attribute A must be a member of D
A can hold exactly one value from D in any given tuple
Values in A can be compared using the operations defined on D

Simple domains vs. composite domains:

Domain Classification Examples
Domain Name	Sample Values	Type	Notes
`EMPLOYEE_ID`	`1, 2, 3, ...`	Simple (Atomic)	Integer identifiers, directly comparable
`SALARY`	`50000.00, 75000.00`	Simple (Atomic)	Decimal values supporting arithmetic
`EMAIL`	`user@domain.com`	Simple (Atomic)	String values used as units
`FULL_NAME`	`'John Smith'`	Potentially Composite	May need splitting if components accessed separately
`ADDRESS`	`'123 Main, NYC'`	Composite	Usually needs decomposition into street, city, state, zip
`PHONE_NUMBERS`	`{555-1234, 555-5678}`	Set-valued	Violates 1NF—needs separate table

The semantic criterion for atomicity:

The definitive test for atomicity is semantic: Will the application ever need to access, search, or manipulate parts of this value independently?

If no: The value is atomic for your application
If yes: The value should be decomposed into atomic components

Example: Is a date atomic?

A date like 2024-03-15 is stored as a single value. But is it atomic?

If you only compare whole dates (WHERE OrderDate > '2024-01-01'): Atomic
If you need month-based reports (WHERE MONTH(OrderDate) = 3): The database handles this with functions, so still Atomic (the database provides operations on the domain)
If you store dates as strings and parse components manually: This is a design smell, though not technically a 1NF violation if using proper date types

The relational model considers date, time, and datetime as atomic because the DBMS provides a complete set of operations on these domains.

Domain Closure

A domain is 'closed' under certain operations if applying those operations to domain values produces other domain values. Integer domains are closed under addition (int + int = int). This closure property is what makes atomic domains work—operations have predictable types. Non-atomic 'domains' lack this property, which is why they break relational operations.

Types of Atomicity Violations

Atomicity violations in database design typically fall into several recognizable categories. Understanding these patterns helps you identify and prevent 1NF violations in your own work.

Multi-valued attributes store multiple values of the same type in a single cell, typically as comma-separated lists, JSON arrays, or delimited strings.

Example violations:

multi-valued-violations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Violation: Multiple phone numbers in one cell
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    PhoneNumbers VARCHAR(500)  -- '555-1234, 555-5678, 555-9012'
);
 
-- Violation: Multiple categories for products
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Categories VARCHAR(200)  -- 'Electronics, Gadgets, Gifts'
);
 
-- Violation: Multiple email addresses
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    EmailAddresses VARCHAR(1000)  -- 'work@co.com; personal@mail.com'
);

The solution: Create a separate table to represent the one-to-many relationship:

-- 1NF-compliant design
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE CustomerPhones (
    CustomerID INT,
    PhoneNumber VARCHAR(20),
    PhoneType VARCHAR(20),
    PRIMARY KEY (CustomerID, PhoneNumber),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

Testing for Atomicity

Given that atomicity is context-dependent, how do you determine whether an attribute is atomic for your specific application? Here is a systematic approach:

The Atomicity Test Checklist

•The Query Test: Will you ever need to query by parts of this value? If WHERE clauses need string functions like LIKE, SUBSTRING, or SPLIT, the attribute isn't atomic enough.
•The Update Test: Will you ever need to modify part of this value while keeping the rest? If yes, the attribute should be decomposed.
•The Constraint Test: Do you need to enforce constraints on parts of this value? Referential integrity, uniqueness, or check constraints on components require atomic attributes.
•The Aggregation Test: Will you need to count, sum, or group by parts of this value? Aggregation on embedded values is a strong signal for decomposition.
•The Index Test: Would you benefit from indexing parts of this value? If component searches need to be fast, those components should be separate, indexable columns.
•The Sort Test: Will you need to sort by parts of this value? Sorting by last name when storing full names requires decomposition.
•The Display Test (Inverse): If you only ever display the whole value as-is without modification, it might genuinely be atomic for your needs.

Practical example: Analyzing an Address attribute

Let's apply the checklist to a FullAddress column storing values like '123 Main Street, Suite 100, San Francisco, CA 94105':

Atomicity Analysis for FullAddress
Test	Question	Answer	Conclusion
Query Test	Will we filter by state or zip?	Yes—marketing targets by region	❌ Not atomic
Update Test	Will we update just the suite number?	Yes—offices relocate within buildings	❌ Not atomic
Constraint Test	Do we validate state codes?	Yes—must be valid US states	❌ Not atomic
Aggregation Test	Will we count customers per city?	Yes—sales reports need this	❌ Not atomic
Index Test	Do we need fast zip code lookups?	Yes—shipping rate calculations	❌ Not atomic
Sort Test	Will we sort by city?	Yes—alphabetical customer lists	❌ Not atomic

Verdict: The FullAddress attribute fails multiple tests and should be decomposed into StreetAddress, Suite, City, State, and ZipCode columns.

When in Doubt, Decompose

If you're uncertain whether an attribute is atomic, err on the side of decomposition. It's much easier to concatenate atomic values for display than to parse non-atomic values for queries. You can always add a computed column or view that combines atomic values for convenience.

Atomicity in Modern Databases

Modern database systems offer features that seem to blur the lines of atomicity—JSON columns, array types, and XML storage. How do these fit into the 1NF framework?

Acceptable Uses of Complex Types

•Audit/Log Data — Raw JSON from external systems stored for compliance, not queried by components
•Document Storage — Schema-less data where structure varies per record and is processed externally
•Temporary Processing — Staging data that will be normalized before production use
•Read-Only Snapshots — Historical records where original structure must be preserved
•External API Responses — Third-party data consumed as-is by applications

Problematic Uses of Complex Types

•Avoiding Schema Design — Using JSON to skip proper table modeling for core business entities
•Query-Critical Data — Data that appears in WHERE clauses, JOINs, or GROUP BY
•Constrained Relationships — Data requiring referential integrity to other tables
•Indexed Lookups — Data that needs performant searching by components
•Frequently Updated Components — Data where individual fields change independently

PostgreSQL array example—when is it acceptable?

postgres-arrays.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- ACCEPTABLE: Tags for full-text search (read-mostly, no joins)
CREATE TABLE Articles (
    ArticleID INT PRIMARY KEY,
    Title VARCHAR(200),
    Content TEXT,
    Tags TEXT[]  -- ['database', 'normalization', 'tutorial']
);
 
-- PostgreSQL provides efficient array operators
SELECT * FROM Articles WHERE 'database' = ANY(Tags);
 
-- This works because:
-- 1. Tags are only for filtering/searching, not joining to a Tags table
-- 2. We don't need to count articles per tag with full accuracy
-- 3. No referential integrity to enforce
-- 4. GIN indexes make array searches performant
 
-- PROBLEMATIC: Using arrays for what should be a relationship
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    CourseName VARCHAR(100),
    StudentIDs INT[]  -- [1, 5, 23, 47] -- References Students table
);
 
-- This is problematic because:
-- 1. No foreign key enforcement—StudentIDs could contain invalid IDs
-- 2. Cannot easily query "all courses for student 23" with index use
-- 3. Cannot join with Students table for student details
-- 4. Cascade operations (delete student) are not automatic

Pragmatic Normalization

The goal of 1NF isn't purity for its own sake—it's to enable the relational operations your application needs. If a JSON column is never queried, joined, or constrained by components, it's effectively atomic from your application's perspective. But be honest about your future requirements; 'never' often becomes 'next quarter' as applications evolve.

Achieving Atomicity: Practical Strategies

When you identify non-atomic attributes in a design, how do you fix them? Here are the primary strategies, each suited to different types of violations.

Atomization Strategies

•Column Splitting — For composite attributes, split into multiple columns. FullName → FirstName, MiddleName, LastName. Address → Street, City, State, Zip.
•Table Creation — For multi-valued attributes, create a separate table with a foreign key relationship. Skills column → EmployeeSkills table with EmployeeID and SkillID columns.
•Domain Refinement — Replace encoded values with explicit columns. StatusCode 'A1P' → IsActive, Level, IsPremium boolean/integer columns.
•Relationship Modeling — When multiple entities are conflated, separate them into distinct tables with proper relationships. This often reveals hidden many-to-many relationships.

Complete transformation example:

Let's transform a non-1NF design to 1NF compliance:

1nf-transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- BEFORE: Non-1NF Design
CREATE TABLE StudentRecords (
    StudentID INT PRIMARY KEY,
    FullName VARCHAR(100),           -- 'John A. Smith'
    Address VARCHAR(500),            -- '123 Oak St, Boston, MA 02101'
    PhoneNumbers VARCHAR(200),       -- '617-555-1234, 617-555-5678'
    EnrolledCourses VARCHAR(500),    -- 'CS101, MATH201, PHYS101'
    Grades VARCHAR(200)              -- 'A, B+, A-' (corresponds to courses)
);
 
-- AFTER: 1NF-Compliant Design
 
-- 1. Main entity with atomic attributes only
CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    MiddleName VARCHAR(50),
    LastName VARCHAR(50) NOT NULL,
    StreetAddress VARCHAR(200),
    City VARCHAR(100),
    State CHAR(2),
    ZipCode VARCHAR(10)
);
 
-- 2. Phone numbers in separate table (one-to-many)
CREATE TABLE StudentPhones (
    PhoneID INT PRIMARY KEY,
    StudentID INT NOT NULL,
    PhoneNumber VARCHAR(20) NOT NULL,
    PhoneType VARCHAR(20),  -- 'Mobile', 'Home', 'Work'
    IsPrimary BOOLEAN DEFAULT FALSE,
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    UNIQUE (StudentID, PhoneNumber)
);
 
-- 3. Courses in reference table
CREATE TABLE Courses (
    CourseID VARCHAR(20) PRIMARY KEY,
    CourseName VARCHAR(100) NOT NULL,
    Credits INT NOT NULL
);
 
-- 4. Enrollments as junction table (many-to-many with attributes)
CREATE TABLE Enrollments (
    StudentID INT NOT NULL,
    CourseID VARCHAR(20) NOT NULL,
    EnrollmentDate DATE NOT NULL,
    Grade VARCHAR(2),  -- NULL until graded
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

Benefits of the Transformation

The 1NF-compliant design enables: filtering students by state (WHERE State = 'MA'), finding all students in a course (JOIN Enrollments), enforcing valid course codes (foreign key to Courses), updating one phone number without affecting others, and proper grade-to-course association without positional ambiguity.

Summary: The Atomicity Foundation

Atomicity is the gateway to database normalization. Without it, the higher normal forms are meaningless because the relational operations they depend on cannot function correctly. Let's consolidate what we've learned:

Key Takeaways

•Atomic values are indivisible within your application context — What's atomic depends on how you use the data. A value is atomic if you never need to access, query, update, or constrain its parts independently.
•Non-atomic values break relational operations — Queries require string parsing, constraints cannot be enforced on components, indexes become useless, and joins produce incorrect results.
•Common violations include multi-valued attributes, composite fields, embedded JSON, and encoded information — Learn to recognize these patterns and refactor them systematically.
•The atomicity test is semantic, not syntactic — Apply the query, update, constraint, aggregation, index, and sort tests to determine if an attribute is truly atomic for your needs.
•Modern database features don't eliminate the need for atomicity — JSON columns and array types are tools, not substitutes for proper design. Use them for genuinely unstructured data, not to avoid normalization.
•Achieving atomicity requires column splitting, table creation, and relationship modeling — Transform non-atomic designs systematically, creating separate columns for composite values and separate tables for multi-valued attributes.

What's next:

Atomicity addresses individual values within cells. But there's another dimension to 1NF violations: repeating groups—where the same type of information appears in multiple columns of a single row. The next page explores this pattern and how to eliminate it.

Page Complete

You now understand the atomicity requirement—the foundational principle of First Normal Form. You can identify non-atomic attributes, apply systematic tests for atomicity, and transform designs to achieve atomic values. Next, we'll tackle the complementary 1NF requirement: eliminating repeating groups.

1 / 5

Loading learning content...

Database Management SystemsFirst Normal Form (1NF)

First Normal Form (1NF): Foundation of Normalization

LevelBeginner

Duration60 mins

TopicFirst Normal Form (1NF)

1 / 5

Atomicity Requirement

The Principle That Underpins All Normalization

What You Will Learn

Understanding Atomic Values

The formal definition:

A relation is in First Normal Form (1NF) if and only if:

All attributes contain only atomic (scalar) values
There are no repeating groups or arrays within a single row
Each row is unique (identifiable by a primary key)
The order of rows and columns is insignificant

Let's focus on the first requirement—atomic values—which is the conceptual heart of 1NF.

Atomicity Is Context-Dependent

Examples of atomic vs. non-atomic values:

Attribute	Value	Atomic?	Reason
`Age`	`25`	✅ Yes	Single integer value
`Email`	`user@example.com`	✅ Yes	Single string, used as a unit
`PhoneNumbers`	`555-1234, 555-5678`	❌ No	Multiple values in one cell
`Address`	`123 Main St, NYC, NY 10001`	❓ Depends	Atomic if used as a unit; non-atomic if components are queried separately
`Skills`	`Java, Python, SQL`	❌ No	List of values requiring individual access
`OrderItems`	`{Widget: 3, Gadget: 2}`	❌ No	Composite/structured data

The relational model's requirement for atomicity:

When attributes contain atomic values:

Comparison operations are well-defined (e.g., WHERE Age > 25)
Equality testing is unambiguous (e.g., WHERE Email = 'user@example.com')
Aggregation works correctly (e.g., AVG(Age))
Indexing is straightforward and efficient
Join operations produce predictable results

When attributes contain non-atomic values, all of these operations become problematic or impossible using standard SQL.

Why Atomicity Matters

Problems Caused by Non-Atomic Values

•Query Complexity and Ambiguity — Searching within non-atomic values requires string manipulation functions like LIKE '%value%' or SUBSTRING(), which are inefficient, error-prone, and cannot use indexes effectively.
•Data Integrity Loss — Without atomicity, you cannot enforce constraints on individual values within a composite. If a column contains 'Java, Python, SQL', how do you ensure 'Python' isn't duplicated or validate each skill against a master list?
•Update Anomalies — Modifying one value within a non-atomic attribute requires parsing, modifying, and reconstructing the entire string—a process prone to errors and race conditions.
•Aggregation Failures — Aggregate functions like COUNT(), AVG(), and SUM() cannot operate on individual values within a non-atomic attribute. How do you count employees with 'Python' skill if skills are stored as comma-separated lists?
•Index Inefficiency — B-tree indexes are designed for atomic values. A non-atomic value like 'NYC, LA, Chicago' cannot be efficiently indexed for queries like 'find all records containing LA'.
•Join Operation Breakdown — Joins rely on comparing atomic values for equality. Joining on non-atomic columns produces incorrect or meaningless results.

Illustrating the problem with a concrete example:

Consider an Employees table with a non-atomic Skills column:

non-atomic-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Non-atomic design (VIOLATES 1NF)
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    Name VARCHAR(100),
    Skills VARCHAR(500)  -- Stores: 'Java, Python, SQL'
);
 
INSERT INTO Employees VALUES
(1, 'Alice', 'Java, Python, SQL'),
(2, 'Bob', 'Python, JavaScript'),
(3, 'Charlie', 'Java, C++, Python');
 
-- PROBLEM 1: Finding all employees with Python skill
-- This query is inefficient and error-prone:
SELECT * FROM Employees WHERE Skills LIKE '%Python%';
-- Also matches 'PythonScript', 'IronPython3', etc.
 
-- PROBLEM 2: Counting employees per skill
-- Impossible with standard SQL! Would need complex string parsing.
 
-- PROBLEM 3: Adding a new skill
UPDATE Employees 
SET Skills = Skills || ', Go'  -- Concatenation varies by DBMS
WHERE EmployeeID = 1;
-- Risk of double-adding, inconsistent separators, trailing commas
 
-- PROBLEM 4: Removing a skill
-- Extremely complex string manipulation required
-- What if 'Python' appears multiple times? What about whitespace?

The Performance Penalty

The Domain Perspective on Atomicity

Formal domain definition:

A domain D is a named set of scalar (atomic) values. When we define an attribute A over domain D, we're saying:

Every value in attribute A must be a member of D
A can hold exactly one value from D in any given tuple
Values in A can be compared using the operations defined on D

Simple domains vs. composite domains:

Domain Classification Examples
Domain Name	Sample Values	Type	Notes
`EMPLOYEE_ID`	`1, 2, 3, ...`	Simple (Atomic)	Integer identifiers, directly comparable
`SALARY`	`50000.00, 75000.00`	Simple (Atomic)	Decimal values supporting arithmetic
`EMAIL`	`user@domain.com`	Simple (Atomic)	String values used as units
`FULL_NAME`	`'John Smith'`	Potentially Composite	May need splitting if components accessed separately
`ADDRESS`	`'123 Main, NYC'`	Composite	Usually needs decomposition into street, city, state, zip
`PHONE_NUMBERS`	`{555-1234, 555-5678}`	Set-valued	Violates 1NF—needs separate table

The semantic criterion for atomicity:

The definitive test for atomicity is semantic: Will the application ever need to access, search, or manipulate parts of this value independently?

If no: The value is atomic for your application
If yes: The value should be decomposed into atomic components

Example: Is a date atomic?

A date like 2024-03-15 is stored as a single value. But is it atomic?

If you only compare whole dates (WHERE OrderDate > '2024-01-01'): Atomic
If you need month-based reports (WHERE MONTH(OrderDate) = 3): The database handles this with functions, so still Atomic (the database provides operations on the domain)
If you store dates as strings and parse components manually: This is a design smell, though not technically a 1NF violation if using proper date types

The relational model considers date, time, and datetime as atomic because the DBMS provides a complete set of operations on these domains.

Domain Closure

Types of Atomicity Violations

Atomicity violations in database design typically fall into several recognizable categories. Understanding these patterns helps you identify and prevent 1NF violations in your own work.

Multi-valued attributes store multiple values of the same type in a single cell, typically as comma-separated lists, JSON arrays, or delimited strings.

Example violations:

multi-valued-violations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Violation: Multiple phone numbers in one cell
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100),
    PhoneNumbers VARCHAR(500)  -- '555-1234, 555-5678, 555-9012'
);
 
-- Violation: Multiple categories for products
CREATE TABLE Products (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Categories VARCHAR(200)  -- 'Electronics, Gadgets, Gifts'
);
 
-- Violation: Multiple email addresses
CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    EmailAddresses VARCHAR(1000)  -- 'work@co.com; personal@mail.com'
);

The solution: Create a separate table to represent the one-to-many relationship:

-- 1NF-compliant design
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE CustomerPhones (
    CustomerID INT,
    PhoneNumber VARCHAR(20),
    PhoneType VARCHAR(20),
    PRIMARY KEY (CustomerID, PhoneNumber),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

Testing for Atomicity

Given that atomicity is context-dependent, how do you determine whether an attribute is atomic for your specific application? Here is a systematic approach:

The Atomicity Test Checklist

•The Query Test: Will you ever need to query by parts of this value? If WHERE clauses need string functions like LIKE, SUBSTRING, or SPLIT, the attribute isn't atomic enough.
•The Update Test: Will you ever need to modify part of this value while keeping the rest? If yes, the attribute should be decomposed.
•The Constraint Test: Do you need to enforce constraints on parts of this value? Referential integrity, uniqueness, or check constraints on components require atomic attributes.
•The Aggregation Test: Will you need to count, sum, or group by parts of this value? Aggregation on embedded values is a strong signal for decomposition.
•The Index Test: Would you benefit from indexing parts of this value? If component searches need to be fast, those components should be separate, indexable columns.
•The Sort Test: Will you need to sort by parts of this value? Sorting by last name when storing full names requires decomposition.
•The Display Test (Inverse): If you only ever display the whole value as-is without modification, it might genuinely be atomic for your needs.

Practical example: Analyzing an Address attribute

Let's apply the checklist to a FullAddress column storing values like '123 Main Street, Suite 100, San Francisco, CA 94105':

Atomicity Analysis for FullAddress
Test	Question	Answer	Conclusion
Query Test	Will we filter by state or zip?	Yes—marketing targets by region	❌ Not atomic
Update Test	Will we update just the suite number?	Yes—offices relocate within buildings	❌ Not atomic
Constraint Test	Do we validate state codes?	Yes—must be valid US states	❌ Not atomic
Aggregation Test	Will we count customers per city?	Yes—sales reports need this	❌ Not atomic
Index Test	Do we need fast zip code lookups?	Yes—shipping rate calculations	❌ Not atomic
Sort Test	Will we sort by city?	Yes—alphabetical customer lists	❌ Not atomic

Verdict: The FullAddress attribute fails multiple tests and should be decomposed into StreetAddress, Suite, City, State, and ZipCode columns.

When in Doubt, Decompose

Atomicity in Modern Databases

Modern database systems offer features that seem to blur the lines of atomicity—JSON columns, array types, and XML storage. How do these fit into the 1NF framework?

Acceptable Uses of Complex Types

•Audit/Log Data — Raw JSON from external systems stored for compliance, not queried by components
•Document Storage — Schema-less data where structure varies per record and is processed externally
•Temporary Processing — Staging data that will be normalized before production use
•Read-Only Snapshots — Historical records where original structure must be preserved
•External API Responses — Third-party data consumed as-is by applications

Problematic Uses of Complex Types

•Avoiding Schema Design — Using JSON to skip proper table modeling for core business entities
•Query-Critical Data — Data that appears in WHERE clauses, JOINs, or GROUP BY
•Constrained Relationships — Data requiring referential integrity to other tables
•Indexed Lookups — Data that needs performant searching by components
•Frequently Updated Components — Data where individual fields change independently

PostgreSQL array example—when is it acceptable?

postgres-arrays.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- ACCEPTABLE: Tags for full-text search (read-mostly, no joins)
CREATE TABLE Articles (
    ArticleID INT PRIMARY KEY,
    Title VARCHAR(200),
    Content TEXT,
    Tags TEXT[]  -- ['database', 'normalization', 'tutorial']
);
 
-- PostgreSQL provides efficient array operators
SELECT * FROM Articles WHERE 'database' = ANY(Tags);
 
-- This works because:
-- 1. Tags are only for filtering/searching, not joining to a Tags table
-- 2. We don't need to count articles per tag with full accuracy
-- 3. No referential integrity to enforce
-- 4. GIN indexes make array searches performant
 
-- PROBLEMATIC: Using arrays for what should be a relationship
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    CourseName VARCHAR(100),
    StudentIDs INT[]  -- [1, 5, 23, 47] -- References Students table
);
 
-- This is problematic because:
-- 1. No foreign key enforcement—StudentIDs could contain invalid IDs
-- 2. Cannot easily query "all courses for student 23" with index use
-- 3. Cannot join with Students table for student details
-- 4. Cascade operations (delete student) are not automatic

Pragmatic Normalization

Achieving Atomicity: Practical Strategies

When you identify non-atomic attributes in a design, how do you fix them? Here are the primary strategies, each suited to different types of violations.

Atomization Strategies

•Column Splitting — For composite attributes, split into multiple columns. FullName → FirstName, MiddleName, LastName. Address → Street, City, State, Zip.
•Table Creation — For multi-valued attributes, create a separate table with a foreign key relationship. Skills column → EmployeeSkills table with EmployeeID and SkillID columns.
•Domain Refinement — Replace encoded values with explicit columns. StatusCode 'A1P' → IsActive, Level, IsPremium boolean/integer columns.
•Relationship Modeling — When multiple entities are conflated, separate them into distinct tables with proper relationships. This often reveals hidden many-to-many relationships.

Complete transformation example:

Let's transform a non-1NF design to 1NF compliance:

1nf-transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- BEFORE: Non-1NF Design
CREATE TABLE StudentRecords (
    StudentID INT PRIMARY KEY,
    FullName VARCHAR(100),           -- 'John A. Smith'
    Address VARCHAR(500),            -- '123 Oak St, Boston, MA 02101'
    PhoneNumbers VARCHAR(200),       -- '617-555-1234, 617-555-5678'
    EnrolledCourses VARCHAR(500),    -- 'CS101, MATH201, PHYS101'
    Grades VARCHAR(200)              -- 'A, B+, A-' (corresponds to courses)
);
 
-- AFTER: 1NF-Compliant Design
 
-- 1. Main entity with atomic attributes only
CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    FirstName VARCHAR(50) NOT NULL,
    MiddleName VARCHAR(50),
    LastName VARCHAR(50) NOT NULL,
    StreetAddress VARCHAR(200),
    City VARCHAR(100),
    State CHAR(2),
    ZipCode VARCHAR(10)
);
 
-- 2. Phone numbers in separate table (one-to-many)
CREATE TABLE StudentPhones (
    PhoneID INT PRIMARY KEY,
    StudentID INT NOT NULL,
    PhoneNumber VARCHAR(20) NOT NULL,
    PhoneType VARCHAR(20),  -- 'Mobile', 'Home', 'Work'
    IsPrimary BOOLEAN DEFAULT FALSE,
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    UNIQUE (StudentID, PhoneNumber)
);
 
-- 3. Courses in reference table
CREATE TABLE Courses (
    CourseID VARCHAR(20) PRIMARY KEY,
    CourseName VARCHAR(100) NOT NULL,
    Credits INT NOT NULL
);
 
-- 4. Enrollments as junction table (many-to-many with attributes)
CREATE TABLE Enrollments (
    StudentID INT NOT NULL,
    CourseID VARCHAR(20) NOT NULL,
    EnrollmentDate DATE NOT NULL,
    Grade VARCHAR(2),  -- NULL until graded
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

Benefits of the Transformation

Summary: The Atomicity Foundation

Key Takeaways

•Atomic values are indivisible within your application context — What's atomic depends on how you use the data. A value is atomic if you never need to access, query, update, or constrain its parts independently.
•Non-atomic values break relational operations — Queries require string parsing, constraints cannot be enforced on components, indexes become useless, and joins produce incorrect results.
•Common violations include multi-valued attributes, composite fields, embedded JSON, and encoded information — Learn to recognize these patterns and refactor them systematically.
•The atomicity test is semantic, not syntactic — Apply the query, update, constraint, aggregation, index, and sort tests to determine if an attribute is truly atomic for your needs.
•Modern database features don't eliminate the need for atomicity — JSON columns and array types are tools, not substitutes for proper design. Use them for genuinely unstructured data, not to avoid normalization.
•Achieving atomicity requires column splitting, table creation, and relationship modeling — Transform non-atomic designs systematically, creating separate columns for composite values and separate tables for multi-valued attributes.

What's next:

Page Complete

1 / 5