Fd And Keys - Learning Module

Loading content...

0/241

Non-Prime Attributes

The Dependent Attributes

While prime attributes define identity, non-prime attributes carry the descriptive payload of our relations. They represent the information we actually want to store about each entity—yet paradoxically, they are also the source of most normalization problems.

Non-prime attributes are where redundancy hides, where update anomalies manifest, and where transitive dependencies create subtle data integrity issues. Understanding them is essential for diagnosing schema problems and applying normalization techniques effectively.

What You Will Learn

By the end of this page, you will understand the precise definition of non-prime attributes, master techniques for identifying them, comprehend why they are the focus of normalization rules, and recognize the common dependency patterns that cause schema problems.

Formal Definition of Non-Prime Attributes

Definition:

An attribute A in a relation schema R is a non-prime attribute (also called non-key attribute) if A is NOT a member of ANY candidate key of R.

Formally, given the set of candidate keys CK = {K₁, K₂, ..., Kₙ}, an attribute A is non-prime if:

A ∉ (K₁ ∪ K₂ ∪ ... ∪ Kₙ) = A ∉ Prime(R)

In other words:

Non-Prime(R) = U - Prime(R)

where U is the set of all attributes in R.

Key Observations:

Complete exclusion required — An attribute must be absent from ALL candidate keys to be non-prime; appearing in even one key makes it prime
The complement of prime — Non-prime and prime partition the attribute set; every attribute is exactly one or the other
Typically the majority — In most relations, non-prime attributes outnumber prime attributes (we store more descriptive data than identifying data)
The focus of normalization — 2NF and 3NF are primarily concerned with how non-prime attributes depend on keys

Attribute Classification Summary
Attribute Type	In Candidate Key?	Count Typically	Normalization Focus
Prime	Yes (at least one)	Few	Less scrutinized (with exceptions)
Non-Prime	No (none)	Many	Primary focus of 2NF and 3NF
Primary Key	Yes (the chosen one)	Exactly 1	Syntactic designation
Alternate Key	Yes (unchosen candidates)	0 or more	Same status as primary key attributes

Why Non-Prime Matters

While prime attributes establish identity, non-prime attributes hold the actual business data. Names, descriptions, quantities, prices, dates—these are typically non-prime. The normalization rules protect these valuable attributes from the anomalies that arise from improper schema design.

Identifying Non-Prime Attributes

The process for identifying non-prime attributes builds directly on the prime attribute identification:

Step 1: Find all candidate keys Step 2: Compute Prime = union of all candidate keys
Step 3: Compute Non-Prime = All_Attributes - Prime

Let's work through detailed examples:

Relation: OrderLine(OrderID, LineNumber, ProductID, ProductName, Quantity, UnitPrice, LineTotal)

Functional Dependencies:

F = {
    OrderID, LineNumber → ProductID, Quantity, LineTotal
    ProductID → ProductName, UnitPrice
    Quantity, UnitPrice → LineTotal
}

Step 1: Find candidate keys

Attribute classification:

L-only: {OrderID, LineNumber} (never on RHS)
R-only: {ProductName, LineTotal}
Both: {ProductID, Quantity, UnitPrice}
Neither: {}

Core = {OrderID, LineNumber}

{OrderID, LineNumber}⁺:

Start: {OrderID, LineNumber}
Apply FD1: {OrderID, LineNumber, ProductID, Quantity, LineTotal}
Apply FD2: {OrderID, LineNumber, ProductID, ProductName, Quantity, UnitPrice, LineTotal}
All attributes included! ✓

Is it minimal? Yes—neither {OrderID} nor {LineNumber} alone determines all attributes.

Candidate Key: {{OrderID, LineNumber}}

Step 2: Compute Prime

Prime = {OrderID, LineNumber}

Step 3: Compute Non-Prime

Non-Prime = U - Prime = {ProductID, ProductName, Quantity, UnitPrice, LineTotal}

Attribute	Classification	Why
OrderID	Prime	Part of the key
LineNumber	Prime	Part of the key
ProductID	Non-Prime	Derived from key
ProductName	Non-Prime	Derived via ProductID
Quantity	Non-Prime	Derived from key
UnitPrice	Non-Prime	Derived via ProductID
LineTotal	Non-Prime	Derived via calculation

Non-Prime Attributes and Data Anomalies

Non-prime attributes are where data anomalies manifest. Understanding why requires examining how improper dependencies on non-prime attributes create redundancy and inconsistency.

The Core Problem:

When a non-prime attribute is determined by something less than the full candidate key, or transitively through another non-prime attribute, the same piece of information gets stored multiple times. This redundancy is the root cause of anomalies.

Example: The Classic Anomalies Scenario

Relation: EmployeeProject(
    EmpID,         -- Prime (key component)
    ProjectID,     -- Prime (key component)
    EmpName,       -- Non-Prime
    EmpDept,       -- Non-Prime
    DeptHead,      -- Non-Prime
    ProjectName,   -- Non-Prime
    HoursWorked    -- Non-Prime
)

Candidate Key: {EmpID, ProjectID}

FDs:
    EmpID, ProjectID → HoursWorked
    EmpID → EmpName, EmpDept
    EmpDept → DeptHead
    ProjectID → ProjectName

Sample Data Showing Redundancy
EmpID	ProjectID	EmpName	EmpDept	DeptHead	ProjectName	HoursWorked
E1	P1	Alice	IT	Bob	Alpha	40
E1	P2	Alice	IT	Bob	Beta	20
E2	P1	Carol	IT	Bob	Alpha	30
E3	P2	Dave	HR	Eve	Beta	35

Anomalies Present

•Update Anomaly: Changing 'Bob' to 'Frank' as IT DeptHead requires updating multiple rows (potentially missing some)
•Insert Anomaly: Can't add a new employee until they're assigned to a project
•Delete Anomaly: Removing Dave from P2 loses the information that Dave works in HR with head Eve

Root Causes

•Partial Dependency: EmpName, EmpDept depend on EmpID alone (partial key)
•Partial Dependency: ProjectName depends on ProjectID alone (partial key)
•Transitive Dependency: DeptHead depends on EmpDept, which depends on EmpID

Non-Prime Attributes Carry the Redundancy

Notice that all redundant data is in non-prime attributes. Alice's name appears twice, 'IT' appears thrice, 'Bob' appears thrice, and 'Alpha' appears twice. The prime attributes (EmpID, ProjectID) are naturally distinct—that's what keys do. Normalization's job is to ensure non-prime attributes are also stored without redundancy.

Non-Prime Attributes in 2NF Analysis

Second Normal Form directly targets partial dependencies on non-prime attributes. The definition explicitly calls out non-prime attributes:

2NF Definition: A relation R is in 2NF if:

R is in 1NF, AND

Every non-prime attribute is fully functionally dependent on every candidate key

What "Fully Functionally Dependent" Means:

An attribute A is fully functionally dependent on a key K if:

K → A, AND
No proper subset of K determines A

If a proper subset of K determines A, then A is partially dependent on K—a 2NF violation.

Worked Example: 2NF Analysis

Relation: OrderItem(OrderID, ProductID, Quantity, ProductName, UnitPrice, CustomerName)

FDs:
    OrderID, ProductID → Quantity
    ProductID → ProductName, UnitPrice
    OrderID → CustomerName

Candidate Key: {OrderID, ProductID}
Prime: {OrderID, ProductID}
Non-Prime: {Quantity, ProductName, UnitPrice, CustomerName}

2NF Check for each non-prime attribute:

Non-Prime	Determined By	Full Key?	2NF Violation?
Quantity	{OrderID, ProductID}	Yes	No
ProductName	{ProductID}	No (subset)	YES
UnitPrice	{ProductID}	No (subset)	YES
CustomerName	{OrderID}	No (subset)	YES

Conclusion: Not in 2NF. Three non-prime attributes have partial dependencies.

2NF Violation Detection
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
def find_2nf_violations(all_attrs: set, fds: list, candidate_keys: list, 
                          prime_attrs: set) -> list:
    """
    Find all 2NF violations in a relation.
    
    A 2NF violation occurs when a non-prime attribute is partially
    dependent on a candidate key (determined by a proper subset).
    
    Returns: List of (non_prime_attr, partial_key, full_key) violations
    """
    non_prime = all_attrs - prime_attrs
    violations = []
    
    for attr in non_prime:
        # Find what determines this attribute
        for lhs, rhs in fds:
            if attr in rhs:
                # Check if lhs is a proper subset of any candidate key
                for key in candidate_keys:
                    if lhs < key:  # Proper subset
                        violations.append({
                            'attribute': attr,
                            'partial_determinant': lhs,
                            'full_key': key,
                            'explanation': f"{attr} depends on {lhs}, which is a proper subset of key {key}"
                        })
    
    return violations
 
 
def decompose_for_2nf(all_attrs: set, fds: list, candidate_keys: list,
                       prime_attrs: set) -> list:
    """
    Decompose a relation to achieve 2NF.
    
    Strategy: Create separate relations for each partial dependency,
    plus a main relation with the full key and fully dependent attributes.
    """
    non_prime = all_attrs - prime_attrs
    
    # Group: attributes fully dependent on key
    full_dependencies = {}
    partial_dependencies = {}
    
    for key in candidate_keys:
        for attr in non_prime:
            # Find the minimal determinant for this attribute
            min_det = None
            for lhs, rhs in fds:
                if attr in rhs:
                    if min_det is None or len(lhs) < len(min_det):
                        min_det = lhs
            
            if min_det and min_det < key:
                # Partial dependency
                det_tuple = frozenset(min_det)
                if det_tuple not in partial_dependencies:
                    partial_dependencies[det_tuple] = set()
                partial_dependencies[det_tuple].add(attr)
            else:
                # Full dependency
                key_tuple = frozenset(key)
                if key_tuple not in full_dependencies:
                    full_dependencies[key_tuple] = set()
                full_dependencies[key_tuple].add(attr)
    
    # Build decomposed relations
    decomposition = []
    
    # Main relation: key + fully dependent non-primes
    for key, attrs in full_dependencies.items():
        decomposition.append({
            'name': 'Main',
            'attributes': set(key) | attrs,
            'key': set(key)
        })
    
    # Partial dependency relations
    for det, attrs in partial_dependencies.items():
        decomposition.append({
            'name': f'Partial_{sorted(det)}',
            'attributes': set(det) | attrs,
            'key': set(det)
        })
    
    return decomposition
 
 
# Example usage
if __name__ == "__main__":
    all_attrs = {'OrderID', 'ProductID', 'Quantity', 'ProductName', 
                 'UnitPrice', 'CustomerName'}
    fds = [
        ({'OrderID', 'ProductID'}, {'Quantity'}),
        ({'ProductID'}, {'ProductName', 'UnitPrice'}),
        ({'OrderID'}, {'CustomerName'})
    ]
    candidate_keys = [{'OrderID', 'ProductID'}]
    prime_attrs = {'OrderID', 'ProductID'}
    
    violations = find_2nf_violations(all_attrs, fds, candidate_keys, prime_attrs)
    
    print("2NF Violations:")
    for v in violations:
        print(f"  {v['explanation']}")
    
    print("\nProposed Decomposition:")
    decomp = decompose_for_2nf(all_attrs, fds, candidate_keys, prime_attrs)
    for rel in decomp:
        print(f"  {rel['name']}: {sorted(rel['attributes'])} [Key: {sorted(rel['key'])}]")

Non-Prime Attributes in 3NF Analysis

Third Normal Form extends the analysis to transitive dependencies among non-prime attributes:

3NF Definition: A relation R is in 3NF if, for every non-trivial FD X → A:

X is a superkey, OR

A is a prime attribute

The second condition provides an escape clause for prime attributes, but non-prime attributes receive full scrutiny: if a non-prime attribute is determined by a non-superkey, that's a 3NF violation.

Understanding Transitive Dependencies:

A transitive dependency occurs when:

Key → B → C

where B and C are non-prime, and B is not a superkey. The non-prime attribute C is transitively dependent on the key through B.

Worked Example: 3NF Analysis

Relation: Employee(EmpID, EmpName, DeptID, DeptName, DeptLocation)

FDs:
    EmpID → EmpName, DeptID, DeptName, DeptLocation
    DeptID → DeptName, DeptLocation

Candidate Key: {EmpID}
Prime: {EmpID}
Non-Prime: {EmpName, DeptID, DeptName, DeptLocation}

3NF Check for each FD:

FD	X is Superkey?	A is Prime?	3NF Violation?
EmpID → EmpName	Yes	—	No
EmpID → DeptID	Yes	—	No
EmpID → DeptName	Yes	—	No
EmpID → DeptLocation	Yes	—	No
DeptID → DeptName	No	No	YES
DeptID → DeptLocation	No	No	YES

Conclusion: Not in 3NF. DeptID (non-prime) determines DeptName and DeptLocation (also non-prime).

The Transitive Chain:

EmpID → DeptID → DeptName, DeptLocation

DeptName and DeptLocation are transitively dependent on EmpID through DeptID.

The 3NF Intuition

3NF violations involving non-prime attributes indicate that we're storing information about two different entities in one table. In the example, we're mixing Employee facts (name, department assignment) with Department facts (name, location). These should be separated.

3NF Violation Detection
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if an attribute set is a superkey."""
    closure = compute_closure(attrs, fds)
    return closure == all_attrs
 
 
def find_3nf_violations(all_attrs: set, fds: list, prime_attrs: set) -> list:
    """
    Find all 3NF violations in a relation.
    
    A 3NF violation occurs when:
    - X → A is non-trivial
    - X is not a superkey
    - A is not a prime attribute
    
    Returns: List of violation details
    """
    violations = []
    
    for lhs, rhs in fds:
        # Check each attribute on the right side
        for attr in rhs:
            # Skip trivial dependencies (A → A)
            if attr in lhs:
                continue
            
            # Check the conditions
            x_is_superkey = is_superkey(lhs, all_attrs, fds)
            a_is_prime = attr in prime_attrs
            
            if not x_is_superkey and not a_is_prime:
                violations.append({
                    'fd': f"{lhs} → {attr}",
                    'determinant': lhs,
                    'dependent': attr,
                    'reason': f"'{lhs}' is not a superkey and '{attr}' is non-prime"
                })
    
    return violations
 
 
def decompose_for_3nf(all_attrs: set, fds: list, candidate_keys: list) -> list:
    """
    Decompose a relation to achieve 3NF using the synthesis algorithm.
    
    The 3NF synthesis algorithm:
    1. Find a minimal cover of FDs
    2. Create a relation for each FD in the minimal cover
    3. If no relation contains a candidate key, add one
    """
    # Step 1: Compute minimal cover (simplified - assumes input is already minimal)
    minimal_cover = list(fds)
    
    # Step 2: Create relations from FDs
    relations = []
    for lhs, rhs in minimal_cover:
        relations.append({
            'attributes': set(lhs) | set(rhs),
            'key': set(lhs)
        })
    
    # Step 3: Ensure a candidate key is included
    key_included = False
    for key in candidate_keys:
        for rel in relations:
            if key.issubset(rel['attributes']):
                key_included = True
                break
        if key_included:
            break
    
    if not key_included and candidate_keys:
        relations.append({
            'attributes': set(candidate_keys[0]),
            'key': set(candidate_keys[0])
        })
    
    return relations
 
 
# Example usage
if __name__ == "__main__":
    all_attrs = {'EmpID', 'EmpName', 'DeptID', 'DeptName', 'DeptLocation'}
    fds = [
        ({'EmpID'}, {'EmpName', 'DeptID'}),
        ({'DeptID'}, {'DeptName', 'DeptLocation'})
    ]
    prime_attrs = {'EmpID'}
    
    violations = find_3nf_violations(all_attrs, fds, prime_attrs)
    
    print("3NF Violations:")
    for v in violations:
        print(f"  {v['fd']}: {v['reason']}")
    
    print("\nDecomposition for 3NF:")
    decomp = decompose_for_3nf(all_attrs, fds, [{'EmpID'}])
    for i, rel in enumerate(decomp):
        print(f"  R{i+1}: {sorted(rel['attributes'])} [Key: {sorted(rel['key'])}]")

Special Cases and Edge Conditions

Several special cases merit attention when analyzing non-prime attributes:

Case 1: All Attributes Are Prime

In rare scenarios, every attribute participates in some candidate key:

Relation: AB(A, B)
FDs: A → B, B → A

Candidate Keys: {A}, {B}
Prime: {A, B}
Non-Prime: {} (empty!)

Implications:

Automatically in 2NF and 3NF (no non-prime attributes to violate them)
May still violate BCNF (determinants that aren't superkeys)
In the example: A → B violates BCNF because A is a superkey, but B → A also makes B a superkey. Actually, this example IS in BCNF.

A true BCNF violation with all-prime:

Relation: ABC(A, B, C)
FDs: AB → C, C → B

Candidate Keys: {A, B}, {A, C}
Prime: {A, B, C}  (all prime)
Non-Prime: {} (empty)

3NF Check: C → B — C is not a superkey, but B is prime. ✓ (passes 3NF)
BCNF Check: C → B — C is not a superkey. ✗ (violates BCNF)

Case 2: Single-Attribute Key

When the candidate key is a single attribute, 2NF is automatically satisfied:

Relation: Product(ProductID, Name, Category, CategoryManager)
FDs: ProductID → Name, Category, CategoryManager
     Category → CategoryManager

Candidate Key: {ProductID}
Prime: {ProductID}
Non-Prime: {Name, Category, CategoryManager}

2NF Check: With a single-attribute key, there are no proper subsets of the key. Partial dependencies are impossible. Automatically 2NF! ✓

3NF Check: Category → CategoryManager. Category is not a superkey, CategoryManager is non-prime. ✗ Violates 3NF.

Lesson: Single-attribute keys guarantee 2NF but not 3NF.

Case 3: All Attributes Are Non-Prime (No Proper Key)

If no FDs exist or all FDs are trivial, the only candidate key is the full set of attributes:

Relation: Log(Timestamp, UserID, Action, Details)
FDs: {} (none meaningful)

Candidate Key: {Timestamp, UserID, Action, Details}
Prime: {Timestamp, UserID, Action, Details}
Non-Prime: {} (everything is in the key!)

This is often appropriate for logging tables where each row is unique and no attributes determine others. It's automatically in BCNF (the only determinant is the full key, which is a superkey).

The Non-Prime Sweet Spot

In typical well-designed schemas, there's a balance: one or a few prime attributes (the key) and several non-prime attributes (the data about the entity). The goal of normalization is to ensure each non-prime attribute depends on 'the key, the whole key, and nothing but the key.'

Summary: Non-Prime Attributes

Let's consolidate our understanding of non-prime attributes and their significance:

Key Takeaways

•Definition clarity — An attribute is non-prime if it appears in NO candidate key (complete exclusion required)
•The locus of anomalies — Non-prime attributes are where redundancy and data anomalies manifest
•2NF focus — 2NF specifically checks non-prime attributes for partial dependencies on composite keys
•3NF focus — 3NF specifically checks non-prime attributes for transitive dependencies
•Single-key guarantee — Single-attribute keys automatically satisfy 2NF (no proper subsets to cause partial dependencies)

What's Next:

With a solid understanding of both prime and non-prime attributes, we're now ready to tackle the challenge of finding ALL candidate keys for a given relation. This comprehensive process is essential for accurate attribute classification and complete normalization analysis.

Page Complete

You now understand non-prime attributes—their definition, identification process, and central role in recognizing normalization violations. This knowledge is essential for diagnosing schema problems and applying normalization techniques effectively.