Attributes And Domains - Learning Module

Loading content...

0/241

Atomic Values

The Indivisible Grain of Relational Data

The word 'atomic' derives from the Greek atomos, meaning 'uncuttable' or 'indivisible.' In physics, the atom was once believed to be the smallest possible unit of matter—a belief disproved when subatomic particles were discovered. In the relational model, however, atomicity remains a defining requirement: every value in a relation must be indivisible within the database context.

This principle, formalized as part of First Normal Form (1NF), is not merely a constraint—it is foundational to the entire relational algebra. Without atomicity, operations like projection, selection, and join lose their clean semantics. Without atomicity, queries become ambiguous, updates risky, and indexing ineffective.

Understanding atomicity deeply—what it means, where the boundaries lie, and how to enforce it—separates database designers who create clean, queryable schemas from those who create tangled data structures that fight every operation.

What You Will Learn

By the end of this page, you will understand the formal definition of atomic values, why atomicity is essential for relational operations, how to identify atomicity violations, the nuanced boundary between atomic and composite in practice, and strategies for transforming non-atomic data into atomic form.

Formal Definition of Atomicity

Definition:

A value is atomic if it cannot be meaningfully decomposed into smaller parts within the context of the database schema and its operations. An atomic value is treated as a single, indivisible unit by all database operations.

The First Normal Form Requirement:

A relation is in First Normal Form (1NF) if and only if:

Every attribute value is atomic (no multi-valued attributes)
Every attribute value comes from a single domain
There are no repeating groups (no arrays or lists in cells)

This requirement has profound implications:

$$\forall r \in R, \forall A \in \text{attrs}(R): r.A \text{ is atomic and } r.A \in \text{dom}(A)$$

For every tuple r in relation R, for every attribute A, the value r.A is atomic and belongs to the domain of A.

Context Dependence of Atomicity

Atomicity is context-dependent, not absolute. A date like '2024-03-15' is atomic if you never need to query year, month, or day separately. But if you need WHERE MONTH = 3, the date must be decomposed. The key question: will any operation ever need to 'see inside' this value?

Atomic vs Non-Atomic Values
Value	Context	Atomic?	Reason
`'John Smith'`	Full name never queried by parts	Yes	Treated as single unit
`'John Smith'`	Need to sort by last name	No	Must decompose to `first_name`, `last_name`
`'555-1234,555-5678'`	Multiple phone numbers	No	Contains list—violates 1NF
`'2024-03-15'`	Date used as single value	Yes	Native DATE type is atomic
`'red,blue,green'`	Multiple colors	No	Contains list—violates 1NF
`3.14159`	Pi approximation	Yes	Single numeric value
`'{"a":1,"b":2}'`	JSON accessed as single blob	Yes	If never queried inside
`'{"a":1,"b":2}'`	Need to filter by inner fields	No	Structure must be queryable

Why Atomicity Matters: Operational Integrity

Atomicity is not just a theoretical nicety—it directly enables the core operations of relational algebra and protects data integrity. Let's examine why non-atomic values break fundamental guarantees.

Problems with Non-Atomic Values

•Query Ambiguity — How do you find records where color includes 'blue'? WHERE color = 'blue' fails for 'red,blue,green'. WHERE color LIKE '%blue%' matches 'blueberry'. No clean solution.
•Indexing Failure — B-tree indexes work on atomic values. An index on 'red,blue,green' doesn't help find 'blue'. Full-text or specialized indexes are expensive workarounds.
•Update Anomalies — Changing one phone number in a list requires parsing the string, modifying, and reconstructing. Race conditions abound.
•Referential Integrity Impossible — Foreign keys reference single values. You cannot create a FK constraint on one element within a comma-separated list.
•Aggregate Confusion — How many colors does a product have? COUNT(*) counts rows, not embedded list elements. You need application code to parse and count.
•Join Semantics Break — Natural joins compare atomic values. Joining on a multi-valued attribute produces semantic garbage.
•Domain Constraint Failure — CHECK constraints validate single values. You cannot constrain each element of an embedded list.

atomicity-problems.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- VIOLATION: Non-atomic values in a product table
CREATE TABLE Product_Bad (
    product_id      INT PRIMARY KEY,
    name            VARCHAR(200),
    colors          VARCHAR(500),   -- 'red,blue,green' - NON-ATOMIC!
    sizes           VARCHAR(200),   -- 'S,M,L,XL' - NON-ATOMIC!
    phone_numbers   VARCHAR(500)    -- '555-1234,555-5678' - NON-ATOMIC!
);
 
INSERT INTO Product_Bad VALUES
    (1, 'T-Shirt', 'red,blue,green', 'S,M,L,XL', '555-1234,555-5678'),
    (2, 'Jeans', 'blue,black', 'S,M,L', '555-9999');
 
-- PROBLEM 1: Query ambiguity
-- Find products available in blue
SELECT * FROM Product_Bad WHERE colors = 'blue';  -- Misses product 1!
SELECT * FROM Product_Bad WHERE colors LIKE '%blue%';  -- Works, but...
-- What about a product with color 'navy-blue-gray'? 'light-blue'? 
 
-- PROBLEM 2: Count aggregation
-- How many color options does each product have?
-- SQL has no built-in way to count comma-separated elements
SELECT 
    product_id,
    name,
    colors,
    -- This is a hack and varies by database
    LENGTH(colors) - LENGTH(REPLACE(colors, ',', '')) + 1 AS color_count
FROM Product_Bad;
 
-- PROBLEM 3: Update anomaly
-- Remove 'blue' from product 1's colors
-- Requires: parse, modify, reconstruct, handle edge cases...
UPDATE Product_Bad
SET colors = 'red,green'  -- Manual reconstruction!
WHERE product_id = 1;
 
-- PROBLEM 4: Cannot create foreign key to individual color
-- This is IMPOSSIBLE:
-- ALTER TABLE Product_Bad ADD CONSTRAINT fk_color 
--     FOREIGN KEY (EACH_ELEMENT_IN colors) REFERENCES Color(color_name);
 
-- PROBLEM 5: Index is useless for element lookup
CREATE INDEX idx_colors ON Product_Bad(colors);
-- This index helps find 'red,blue,green' exactly
-- Useless for finding products with 'blue'

The CSV Anti-Pattern

Storing comma-separated values (CSV) in database columns is one of the most common and damaging anti-patterns. It seems convenient—'just one column!'—but it violates 1NF and creates cascading problems. If you find yourself writing WHERE col LIKE '%value%' or parsing strings in queries, you have an atomicity violation.

The Atomicity Boundary: Where to Draw the Line

Determining what is 'atomic' requires analyzing access patterns—how will the data be queried, updated, and reported? The atomicity boundary should be drawn based on operational requirements, not physical structure.

The Access Pattern Principle:

If any database operation ever needs to access, compare, or manipulate a sub-component of a value, that value is not atomic for your schema.

Common Atomicity Decisions:

Data Element	Atomic If...	Decompose If...
Full Name	Always displayed as single unit	Sorting by last name, searching by first name
Address	Mailed as-is, never searched	Filtering by city/state, spatial queries
Phone Number	Displayed only	Area code analysis, country-based routing
Date	Compared as whole dates	Month-over-month reports, weekday analysis
URL	Stored as reference link	Protocol filtering, domain analysis
Email	Identity verification only	Domain-based segmentation, local-part matching
JSON Blob	Opaque storage, passed to apps	Queries filter on JSON properties

Keep It Atomic When

•Value is always used as a whole unit
•No filtering or sorting on sub-parts
•No aggregation on elements
•Value is opaque to the database
•Components have no independent meaning
•Schema simplicity outweighs query flexibility

Decompose When

•Queries filter on sub-parts
•Reports aggregate by component
•Sorting requires component access
•Components have independent updates
•Components need separate validation
•Foreign keys needed to components

atomicity-decisions.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- EXAMPLE 1: Date - Atomic in most cases
CREATE TABLE event (
    event_id        SERIAL PRIMARY KEY,
    event_name      VARCHAR(200) NOT NULL,
    event_date      DATE NOT NULL           -- Atomic: DATE is a native type
);
 
-- Date is atomic, but database functions extract parts when needed
SELECT event_name, event_date,
       EXTRACT(YEAR FROM event_date) AS year,
       EXTRACT(MONTH FROM event_date) AS month,
       EXTRACT(DOW FROM event_date) AS day_of_week
FROM event
WHERE EXTRACT(MONTH FROM event_date) = 12;  -- December events
 
-- EXAMPLE 2: Address - Should typically be decomposed
-- BAD: Single atomic address field
CREATE TABLE customer_bad (
    customer_id     SERIAL PRIMARY KEY,
    full_address    TEXT    -- '123 Main St, Anytown, CA 90210'
);
 
-- GOOD: Decomposed address for querying
CREATE TABLE customer_good (
    customer_id     SERIAL PRIMARY KEY,
    street_line_1   VARCHAR(100) NOT NULL,
    street_line_2   VARCHAR(100),
    city            VARCHAR(50) NOT NULL,
    state_code      CHAR(2) NOT NULL,
    postal_code     VARCHAR(10) NOT NULL,
    country_code    CHAR(2) NOT NULL DEFAULT 'US'
);
 
-- Now we can query efficiently
SELECT * FROM customer_good WHERE state_code = 'CA';
SELECT city, COUNT(*) FROM customer_good GROUP BY city;
 
-- EXAMPLE 3: Full name - Depends on access patterns
-- Scenario A: CRM system that sorts/searches by last name
CREATE TABLE contact_decomposed (
    contact_id      SERIAL PRIMARY KEY,
    first_name      VARCHAR(50) NOT NULL,
    middle_name     VARCHAR(50),
    last_name       VARCHAR(50) NOT NULL,
    suffix          VARCHAR(10)  -- Jr., III, PhD
);
 
SELECT * FROM contact_decomposed ORDER BY last_name, first_name;
 
-- Scenario B: Display-only system where name is never searched
CREATE TABLE profile_atomic (
    profile_id      SERIAL PRIMARY KEY,
    display_name    VARCHAR(150) NOT NULL,  -- Atomic: never parsed
    bio             TEXT
);

When in Doubt, Decompose

If you're uncertain whether you'll need to access sub-parts, decompose. It's much easier to concatenate atomic parts into a composite view than to parse composite data into parts. You can always create a computed column or view like full_name = first_name || ' ' || last_name, but you cannot reliably split 'Mary Jane Watson-Parker' into components.

Multi-Valued Attributes: The 1NF Violation

The most egregious atomicity violation occurs with multi-valued attributes—storing multiple independent values in a single cell. This directly violates First Normal Form and creates severe operational problems.

Common Multi-Value Anti-Patterns:

Comma-Separated Lists: tags = 'electronics,gadgets,gifts'
Delimited Strings: phones = '555-1234|555-5678|555-9999'
Numbered Columns: phone1, phone2, phone3, phone4 (repeating groups)
Embedded Arrays: Using database array types to store multiple values
JSON Arrays: skills = '["Python", "SQL", "Java"]'

The Normalization Solution:

For any multi-valued attribute, create a separate relation that links back to the parent:

Parent(parent_id, ..other_attributes..)
        |
        | 1:N
        |
Child(parent_id, value, ...additional_attributes...)

This structure allows:

Individual value querying
Proper indexing
Constraint enforcement
Clean aggregation
Referential integrity

normalize-multivalued.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- ANTI-PATTERN 1: Comma-separated list
-- BAD
CREATE TABLE product_tags_bad (
    product_id      INT PRIMARY KEY,
    product_name    VARCHAR(200),
    tags            VARCHAR(1000)   -- 'electronics,gadgets,gifts'
);
 
-- GOOD: Normalize into separate relation
CREATE TABLE product (
    product_id      SERIAL PRIMARY KEY,
    product_name    VARCHAR(200) NOT NULL
);
 
CREATE TABLE tag (
    tag_id          SERIAL PRIMARY KEY,
    tag_name        VARCHAR(50) NOT NULL UNIQUE
);
 
CREATE TABLE product_tag (
    product_id      INT REFERENCES product(product_id),
    tag_id          INT REFERENCES tag(tag_id),
    PRIMARY KEY (product_id, tag_id)
);
 
-- Query: Find products tagged 'electronics'
SELECT p.product_name
FROM product p
JOIN product_tag pt ON p.product_id = pt.product_id
JOIN tag t ON pt.tag_id = t.tag_id
WHERE t.tag_name = 'electronics';
 
-- ANTI-PATTERN 2: Numbered columns (repeating groups)
-- BAD: Fixed number of phone slots
CREATE TABLE contact_bad (
    contact_id      INT PRIMARY KEY,
    name            VARCHAR(100),
    phone1          VARCHAR(20),
    phone2          VARCHAR(20),
    phone3          VARCHAR(20)
    -- What if someone has 4 phones? 0 phones?
);
 
-- GOOD: Flexible phone storage
CREATE TABLE contact (
    contact_id      SERIAL PRIMARY KEY,
    name            VARCHAR(100) NOT NULL
);
 
CREATE TABLE contact_phone (
    contact_id      INT REFERENCES contact(contact_id),
    phone_type      VARCHAR(20) NOT NULL,  -- 'mobile', 'home', 'work'
    phone_number    VARCHAR(20) NOT NULL,
    is_primary      BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (contact_id, phone_type, phone_number)
);
 
-- Query: Find contact's primary phone
SELECT c.name, cp.phone_number
FROM contact c
JOIN contact_phone cp ON c.contact_id = cp.contact_id
WHERE cp.is_primary = TRUE;
 
-- ANTI-PATTERN 3: JSON arrays for structured data
-- BAD (if querying inside is needed)
CREATE TABLE employee_skills_bad (
    employee_id     INT PRIMARY KEY,
    skills          JSONB   -- '["Python", "SQL", "Java"]'
);
 
-- GOOD: Proper normalization with metadata
CREATE TABLE skill (
    skill_id        SERIAL PRIMARY KEY,
    skill_name      VARCHAR(100) NOT NULL UNIQUE,
    skill_category  VARCHAR(50)
);
 
CREATE TABLE employee_skill (
    employee_id     INT REFERENCES employee(employee_id),
    skill_id        INT REFERENCES skill(skill_id),
    proficiency     VARCHAR(20),
    years_experience DECIMAL(4,1),
    certified       BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (employee_id, skill_id)
);

When JSON Arrays Are Acceptable

JSON arrays don't always violate atomicity—it depends on usage. If the JSON blob is opaque (stored, retrieved whole, never queried inside), it's effectively atomic. If you need WHERE skills ? 'Python', you're querying inside, and normalization is better. PostgreSQL's JSONB with GIN indexes offers middle ground, but normalized tables remain cleaner for complex queries.

Repeating Groups: Another 1NF Violation

Repeating groups occur when a set of related attributes appears multiple times in a single tuple. This is a structural violation of atomicity that creates severe maintenance and query problems.

Forms of Repeating Groups:

Numbered Columns: item1_name, item1_qty, item1_price, item2_name, item2_qty, item2_price, ...
Date-Based Columns: sales_jan, sales_feb, sales_mar, ...
Category Columns: score_math, score_science, score_english, ...

Problems with Repeating Groups:

Problem	Example
Fixed cardinality	What if an order has 25 items but schema only allows 10?
Sparse data	Most orders have 3 items; you have 10 null-filled columns
Query complexity	`WHERE item1_name = 'Widget' OR item2_name = 'Widget' OR ...`
Aggregation nightmare	Summing all item prices requires naming every column
Schema rigidity	Adding item11 requires ALTER TABLE
Data sparsity	Empty slots waste storage
Code maintenance	Every new slot requires code changes

repeating-groups.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
-- ANTI-PATTERN: Repeating groups in order items
CREATE TABLE order_bad (
    order_id        INT PRIMARY KEY,
    customer_id     INT,
    order_date      DATE,
    
    -- Repeating group: can only have 5 items max!
    item1_product   INT,
    item1_quantity  INT,
    item1_price     DECIMAL(10,2),
    
    item2_product   INT,
    item2_quantity  INT,
    item2_price     DECIMAL(10,2),
    
    item3_product   INT,
    item3_quantity  INT,
    item3_price     DECIMAL(10,2),
    
    item4_product   INT,
    item4_quantity  INT,
    item4_price     DECIMAL(10,2),
    
    item5_product   INT,
    item5_quantity  INT,
    item5_price     DECIMAL(10,2)
);
 
-- NIGHTMARE QUERY: Find orders containing product 42
SELECT * FROM order_bad
WHERE item1_product = 42
   OR item2_product = 42
   OR item3_product = 42
   OR item4_product = 42
   OR item5_product = 42;
 
-- NIGHTMARE CALCULATION: Total order value
SELECT order_id,
       COALESCE(item1_quantity * item1_price, 0) +
       COALESCE(item2_quantity * item2_price, 0) +
       COALESCE(item3_quantity * item3_price, 0) +
       COALESCE(item4_quantity * item4_price, 0) +
       COALESCE(item5_quantity * item5_price, 0) AS total
FROM order_bad;
 
-- CORRECT: Normalized structure
CREATE TABLE orders (
    order_id        SERIAL PRIMARY KEY,
    customer_id     INT NOT NULL,
    order_date      DATE NOT NULL DEFAULT CURRENT_DATE
);
 
CREATE TABLE order_item (
    order_id        INT REFERENCES orders(order_id),
    line_number     INT,
    product_id      INT NOT NULL,
    quantity        INT NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, line_number)
);
 
-- CLEAN QUERY: Find orders containing product 42
SELECT DISTINCT o.order_id
FROM orders o
JOIN order_item oi ON o.order_id = oi.order_id
WHERE oi.product_id = 42;
 
-- CLEAN CALCULATION: Total order value
SELECT o.order_id,
       SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN order_item oi ON o.order_id = oi.order_id
GROUP BY o.order_id;
 
-- FLEXIBLE: Any number of items per order
-- SPARSE-FREE: No wasted null columns
-- MAINTAINABLE: Adding order_item.discount requires no schema change

The Spreadsheet Trap

Repeating groups often originate from spreadsheets converted to databases. Spreadsheets naturally use Jan, Feb, Mar... or Item1, Item2, Item3 columns. When migrating to databases, resist copying this structure—normalize into rows. What seems convenient in Excel becomes a maintenance nightmare in a database.

Complex Data Types and Atomicity

Modern databases support complex data types—arrays, JSON/JSONB, XML, geometric types, and more. These types exist in a gray area regarding atomicity. Understanding when they enhance versus when they violate 1NF is nuanced.

Complex Data Types: Atomicity Analysis
Data Type	Atomic If...	Not Atomic If...	Recommendation
JSON/JSONB	Opaque blob, never queried inside	Filtered/indexed on properties	Prefer normalized tables for queryable data
Array	Fixed-size, homogeneous, never searched	Variable-size, searched by element	Avoid for variable multi-value; use for fixed tuples
XML	Stored for transfer, not queried	XPath queries needed	Parse to tables if queried frequently
Geometry	Spatial queries treat as unit	Need component access	Usually atomic—spatial functions handle internals
HSTORE	Key-value blob, rare access	Frequently filtered by key	Consider normalization or JSONB with indexes
Composite Type	Single logical unit	Components queried separately	Decompose into columns if queried

complex-types-atomicity.sql
SQL (PostgreSQL)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
-- CASE 1: JSON as atomic blob (acceptable)
-- When JSON is opaque—stored and retrieved whole
 
CREATE TABLE api_response_log (
    log_id          SERIAL PRIMARY KEY,
    request_url     TEXT NOT NULL,
    response_body   JSONB NOT NULL,  -- Atomic: never queried inside
    logged_at       TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- We just store and retrieve the whole response
-- No WHERE clauses filtering on JSON properties
 
-- CASE 2: JSON with internal queries (consider normalizing)
CREATE TABLE product_with_metadata (
    product_id      SERIAL PRIMARY KEY,
    name            VARCHAR(200),
    metadata        JSONB   -- {'color': 'red', 'size': 'L', 'weight_kg': 2.5}
);
 
-- If you frequently do this...
SELECT * FROM product_with_metadata
WHERE metadata->>'color' = 'red';
 
-- Consider normalizing:
CREATE TABLE product_normalized (
    product_id      SERIAL PRIMARY KEY,
    name            VARCHAR(200),
    color           VARCHAR(50),
    size            VARCHAR(20),
    weight_kg       DECIMAL(10,2)
);
 
-- CASE 3: Arrays for fixed tuples (acceptable)
-- RGB color: always exactly 3 values, never searched by element
 
CREATE TABLE pixel (
    x               INT,
    y               INT,
    rgb             SMALLINT[3] NOT NULL,  -- [255, 128, 0]
    PRIMARY KEY (x, y)
);
 
-- CASE 4: Arrays for variable lists (avoid)
-- Tags: variable count, searched by element
 
-- Avoid:
CREATE TABLE article_bad (
    article_id      SERIAL PRIMARY KEY,
    title           TEXT,
    tags            TEXT[]  -- ARRAY['news', 'politics', 'breaking']
);
 
-- Prefer:
CREATE TABLE article (
    article_id      SERIAL PRIMARY KEY,
    title           TEXT NOT NULL
);
 
CREATE TABLE article_tag (
    article_id      INT REFERENCES article(article_id),
    tag             VARCHAR(50) NOT NULL,
    PRIMARY KEY (article_id, tag)
);
 
-- CASE 5: Composite types (use carefully)
CREATE TYPE address AS (
    street      VARCHAR(100),
    city        VARCHAR(50),
    state       CHAR(2),
    postal_code VARCHAR(10)
);
 
CREATE TABLE customer (
    customer_id         SERIAL PRIMARY KEY,
    name                VARCHAR(100),
    shipping_address    address,
    billing_address     address
);
 
-- This is atomic if addresses are always handled as units
-- If you need: WHERE (shipping_address).state = 'CA'
-- Consider decomposing into separate columns

The Queryability Test

Apply the queryability test: If you ever need a WHERE, ORDER BY, GROUP BY, or JOIN on the internal structure of a complex type, it's not truly atomic for your use case. Either limit your queries to treat it atomically, or normalize into proper relational structures.

Transforming Non-Atomic Data

When inheriting or migrating databases with atomicity violations, systematic transformation is required. This process—part of normalization—converts non-atomic structures into 1NF-compliant schemas.

Transformation Steps:

Identify Non-Atomic Attributes:
- Look for comma-separated values
- Find numbered columns (col1, col2, col3...)
- Check for array or JSON with variable structure
- Find columns with embedded structure (addresses, names)
Analyze Access Patterns:
- Do queries filter on components?
- Are components updated independently?
- Do components have their own constraints?
Design Target Schema:
- One relation per entity
- One column per atomic attribute
- Separate relation for multi-valued attributes
Create Migration Scripts:
- Extract and insert individual values
- Maintain foreign key relationships
- Validate data integrity post-migration
Update Application Code:
- Rewrite queries for normalized structure
- Update insert/update logic
- Test thoroughly

transform-atomicity.sql
SQL (PostgreSQL)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- MIGRATION EXAMPLE: Transform comma-separated tags to normalized form
 
-- Source: Non-atomic structure
CREATE TABLE article_old (
    article_id      INT PRIMARY KEY,
    title           VARCHAR(500),
    tags            VARCHAR(1000)   -- 'technology,programming,database'
);
 
-- Sample data
INSERT INTO article_old VALUES
    (1, 'Introduction to SQL', 'database,sql,tutorial'),
    (2, 'Python Best Practices', 'python,programming,best-practices'),
    (3, 'Web Security Guide', 'security,web,programming');
 
-- Target: Normalized structure
CREATE TABLE article_new (
    article_id      INT PRIMARY KEY,
    title           VARCHAR(500) NOT NULL
);
 
CREATE TABLE tag (
    tag_name        VARCHAR(50) PRIMARY KEY
);
 
CREATE TABLE article_tag (
    article_id      INT REFERENCES article_new(article_id),
    tag_name        VARCHAR(50) REFERENCES tag(tag_name),
    PRIMARY KEY (article_id, tag_name)
);
 
-- MIGRATION: Step 1 - Copy article base data
INSERT INTO article_new (article_id, title)
SELECT article_id, title FROM article_old;
 
-- MIGRATION: Step 2 - Extract unique tags
INSERT INTO tag (tag_name)
SELECT DISTINCT UNNEST(STRING_TO_ARRAY(tags, ',')) AS tag_name
FROM article_old
WHERE tags IS NOT NULL AND tags != '';
 
-- MIGRATION: Step 3 - Create tag associations
INSERT INTO article_tag (article_id, tag_name)
SELECT article_id, UNNEST(STRING_TO_ARRAY(tags, ',')) AS tag_name
FROM article_old
WHERE tags IS NOT NULL AND tags != '';
 
-- VERIFICATION: Check migration integrity
SELECT 
    'Articles' AS entity,
    COUNT(*) AS old_count,
    (SELECT COUNT(*) FROM article_new) AS new_count
FROM article_old
UNION ALL
SELECT 
    'Total tag associations',
    SUM(LENGTH(tags) - LENGTH(REPLACE(tags, ',', '')) + 1),
    (SELECT COUNT(*) FROM article_tag)
FROM article_old
WHERE tags IS NOT NULL AND tags != '';
 
-- Cleanup
DROP TABLE article_old;
 
-- NOW: Clean queries work!
SELECT a.title, ARRAY_AGG(at.tag_name) AS tags
FROM article_new a
JOIN article_tag at ON a.article_id = at.article_id
GROUP BY a.article_id, a.title;

Migration Complexity

Real-world migrations are messy. Comma-separated values may have inconsistent delimiters, extra whitespace, or embedded commas. Numbered columns may have gaps (col1 filled, col2 empty, col3 filled). Plan for edge cases, validate extensively, and consider keeping the old table until the new structure is proven.

Summary: Atomic Values Mastery

Atomicity is fundamental to the relational model—without it, the mathematical foundations of relational algebra crumble. Let's consolidate what we've learned:

Key Takeaways

•Atomic values are contextually indivisible — What's atomic depends on access patterns, not physical structure. If you query inside it, it's not atomic.
•1NF requires atomicity — Every cell contains exactly one value from the attribute's domain. No lists, no repeating groups, no embedded structures.
•Non-atomic values break operations — Querying, indexing, constraints, joins, aggregation—all become problematic with non-atomic data.
•Multi-valued attributes require normalization — Create a separate relation linking back to the parent entity.
•Repeating groups are structural violations — Numbered columns (item1, item2, item3) must be normalized into rows.
•Complex types need careful analysis — JSON, arrays, and composites can be atomic if treated as opaque blobs, but violate atomicity if queried internally.
•When in doubt, decompose — It's easier to combine atomic parts than to parse composite values.

What's Next:

With atomic values understood, we now examine NULL values—the special marker that indicates the absence of a value. NULLs are perhaps the most misunderstood aspect of relational databases, and mastering their behavior is essential for writing correct queries.

Page Complete

You now understand atomicity rigorously—as a foundational requirement of the relational model that enables clean operations, proper indexing, and data integrity. Recognizing and fixing atomicity violations is a core skill that will serve you throughout your database career.