Loading content...
While prime attributes define identity, non-prime attributes carry the descriptive payload of our relations. They represent the information we actually want to store about each entity—yet paradoxically, they are also the source of most normalization problems.
Non-prime attributes are where redundancy hides, where update anomalies manifest, and where transitive dependencies create subtle data integrity issues. Understanding them is essential for diagnosing schema problems and applying normalization techniques effectively.
By the end of this page, you will understand the precise definition of non-prime attributes, master techniques for identifying them, comprehend why they are the focus of normalization rules, and recognize the common dependency patterns that cause schema problems.
Definition:
An attribute A in a relation schema R is a non-prime attribute (also called non-key attribute) if A is NOT a member of ANY candidate key of R.
Formally, given the set of candidate keys CK = {K₁, K₂, ..., Kₙ}, an attribute A is non-prime if:
A ∉ (K₁ ∪ K₂ ∪ ... ∪ Kₙ) = A ∉ Prime(R)
In other words:
Non-Prime(R) = U - Prime(R)
where U is the set of all attributes in R.
Key Observations:
Complete exclusion required — An attribute must be absent from ALL candidate keys to be non-prime; appearing in even one key makes it prime
The complement of prime — Non-prime and prime partition the attribute set; every attribute is exactly one or the other
Typically the majority — In most relations, non-prime attributes outnumber prime attributes (we store more descriptive data than identifying data)
The focus of normalization — 2NF and 3NF are primarily concerned with how non-prime attributes depend on keys
| Attribute Type | In Candidate Key? | Count Typically | Normalization Focus |
|---|---|---|---|
| Prime | Yes (at least one) | Few | Less scrutinized (with exceptions) |
| Non-Prime | No (none) | Many | Primary focus of 2NF and 3NF |
| Primary Key | Yes (the chosen one) | Exactly 1 | Syntactic designation |
| Alternate Key | Yes (unchosen candidates) | 0 or more | Same status as primary key attributes |
While prime attributes establish identity, non-prime attributes hold the actual business data. Names, descriptions, quantities, prices, dates—these are typically non-prime. The normalization rules protect these valuable attributes from the anomalies that arise from improper schema design.
The process for identifying non-prime attributes builds directly on the prime attribute identification:
Step 1: Find all candidate keys
Step 2: Compute Prime = union of all candidate keys
Step 3: Compute Non-Prime = All_Attributes - Prime
Let's work through detailed examples:
Relation: OrderLine(OrderID, LineNumber, ProductID, ProductName, Quantity, UnitPrice, LineTotal)
Functional Dependencies:
F = {
OrderID, LineNumber → ProductID, Quantity, LineTotal
ProductID → ProductName, UnitPrice
Quantity, UnitPrice → LineTotal
}
Step 1: Find candidate keys
Attribute classification:
Core = {OrderID, LineNumber}
{OrderID, LineNumber}⁺:
Is it minimal? Yes—neither {OrderID} nor {LineNumber} alone determines all attributes.
Candidate Key: {{OrderID, LineNumber}}
Step 2: Compute Prime
Prime = {OrderID, LineNumber}
Step 3: Compute Non-Prime
Non-Prime = U - Prime = {ProductID, ProductName, Quantity, UnitPrice, LineTotal}
| Attribute | Classification | Why |
|---|---|---|
| OrderID | Prime | Part of the key |
| LineNumber | Prime | Part of the key |
| ProductID | Non-Prime | Derived from key |
| ProductName | Non-Prime | Derived via ProductID |
| Quantity | Non-Prime | Derived from key |
| UnitPrice | Non-Prime | Derived via ProductID |
| LineTotal | Non-Prime | Derived via calculation |
Non-prime attributes are where data anomalies manifest. Understanding why requires examining how improper dependencies on non-prime attributes create redundancy and inconsistency.
The Core Problem:
When a non-prime attribute is determined by something less than the full candidate key, or transitively through another non-prime attribute, the same piece of information gets stored multiple times. This redundancy is the root cause of anomalies.
Example: The Classic Anomalies Scenario
Relation: EmployeeProject(
EmpID, -- Prime (key component)
ProjectID, -- Prime (key component)
EmpName, -- Non-Prime
EmpDept, -- Non-Prime
DeptHead, -- Non-Prime
ProjectName, -- Non-Prime
HoursWorked -- Non-Prime
)
Candidate Key: {EmpID, ProjectID}
FDs:
EmpID, ProjectID → HoursWorked
EmpID → EmpName, EmpDept
EmpDept → DeptHead
ProjectID → ProjectName
| EmpID | ProjectID | EmpName | EmpDept | DeptHead | ProjectName | HoursWorked |
|---|---|---|---|---|---|---|
| E1 | P1 | Alice | IT | Bob | Alpha | 40 |
| E1 | P2 | Alice | IT | Bob | Beta | 20 |
| E2 | P1 | Carol | IT | Bob | Alpha | 30 |
| E3 | P2 | Dave | HR | Eve | Beta | 35 |
Notice that all redundant data is in non-prime attributes. Alice's name appears twice, 'IT' appears thrice, 'Bob' appears thrice, and 'Alpha' appears twice. The prime attributes (EmpID, ProjectID) are naturally distinct—that's what keys do. Normalization's job is to ensure non-prime attributes are also stored without redundancy.
Second Normal Form directly targets partial dependencies on non-prime attributes. The definition explicitly calls out non-prime attributes:
2NF Definition: A relation R is in 2NF if:
- R is in 1NF, AND
- Every non-prime attribute is fully functionally dependent on every candidate key
What "Fully Functionally Dependent" Means:
An attribute A is fully functionally dependent on a key K if:
If a proper subset of K determines A, then A is partially dependent on K—a 2NF violation.
Worked Example: 2NF Analysis
Relation: OrderItem(OrderID, ProductID, Quantity, ProductName, UnitPrice, CustomerName)
FDs:
OrderID, ProductID → Quantity
ProductID → ProductName, UnitPrice
OrderID → CustomerName
Candidate Key: {OrderID, ProductID}
Prime: {OrderID, ProductID}
Non-Prime: {Quantity, ProductName, UnitPrice, CustomerName}
2NF Check for each non-prime attribute:
| Non-Prime | Determined By | Full Key? | 2NF Violation? |
|---|---|---|---|
| Quantity | {OrderID, ProductID} | Yes | No |
| ProductName | {ProductID} | No (subset) | YES |
| UnitPrice | {ProductID} | No (subset) | YES |
| CustomerName | {OrderID} | No (subset) | YES |
Conclusion: Not in 2NF. Three non-prime attributes have partial dependencies.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
def find_2nf_violations(all_attrs: set, fds: list, candidate_keys: list, prime_attrs: set) -> list: """ Find all 2NF violations in a relation. A 2NF violation occurs when a non-prime attribute is partially dependent on a candidate key (determined by a proper subset). Returns: List of (non_prime_attr, partial_key, full_key) violations """ non_prime = all_attrs - prime_attrs violations = [] for attr in non_prime: # Find what determines this attribute for lhs, rhs in fds: if attr in rhs: # Check if lhs is a proper subset of any candidate key for key in candidate_keys: if lhs < key: # Proper subset violations.append({ 'attribute': attr, 'partial_determinant': lhs, 'full_key': key, 'explanation': f"{attr} depends on {lhs}, which is a proper subset of key {key}" }) return violations def decompose_for_2nf(all_attrs: set, fds: list, candidate_keys: list, prime_attrs: set) -> list: """ Decompose a relation to achieve 2NF. Strategy: Create separate relations for each partial dependency, plus a main relation with the full key and fully dependent attributes. """ non_prime = all_attrs - prime_attrs # Group: attributes fully dependent on key full_dependencies = {} partial_dependencies = {} for key in candidate_keys: for attr in non_prime: # Find the minimal determinant for this attribute min_det = None for lhs, rhs in fds: if attr in rhs: if min_det is None or len(lhs) < len(min_det): min_det = lhs if min_det and min_det < key: # Partial dependency det_tuple = frozenset(min_det) if det_tuple not in partial_dependencies: partial_dependencies[det_tuple] = set() partial_dependencies[det_tuple].add(attr) else: # Full dependency key_tuple = frozenset(key) if key_tuple not in full_dependencies: full_dependencies[key_tuple] = set() full_dependencies[key_tuple].add(attr) # Build decomposed relations decomposition = [] # Main relation: key + fully dependent non-primes for key, attrs in full_dependencies.items(): decomposition.append({ 'name': 'Main', 'attributes': set(key) | attrs, 'key': set(key) }) # Partial dependency relations for det, attrs in partial_dependencies.items(): decomposition.append({ 'name': f'Partial_{sorted(det)}', 'attributes': set(det) | attrs, 'key': set(det) }) return decomposition # Example usageif __name__ == "__main__": all_attrs = {'OrderID', 'ProductID', 'Quantity', 'ProductName', 'UnitPrice', 'CustomerName'} fds = [ ({'OrderID', 'ProductID'}, {'Quantity'}), ({'ProductID'}, {'ProductName', 'UnitPrice'}), ({'OrderID'}, {'CustomerName'}) ] candidate_keys = [{'OrderID', 'ProductID'}] prime_attrs = {'OrderID', 'ProductID'} violations = find_2nf_violations(all_attrs, fds, candidate_keys, prime_attrs) print("2NF Violations:") for v in violations: print(f" {v['explanation']}") print("\nProposed Decomposition:") decomp = decompose_for_2nf(all_attrs, fds, candidate_keys, prime_attrs) for rel in decomp: print(f" {rel['name']}: {sorted(rel['attributes'])} [Key: {sorted(rel['key'])}]")Third Normal Form extends the analysis to transitive dependencies among non-prime attributes:
3NF Definition: A relation R is in 3NF if, for every non-trivial FD X → A:
- X is a superkey, OR
- A is a prime attribute
The second condition provides an escape clause for prime attributes, but non-prime attributes receive full scrutiny: if a non-prime attribute is determined by a non-superkey, that's a 3NF violation.
Understanding Transitive Dependencies:
A transitive dependency occurs when:
Key → B → C
where B and C are non-prime, and B is not a superkey. The non-prime attribute C is transitively dependent on the key through B.
Worked Example: 3NF Analysis
Relation: Employee(EmpID, EmpName, DeptID, DeptName, DeptLocation)
FDs:
EmpID → EmpName, DeptID, DeptName, DeptLocation
DeptID → DeptName, DeptLocation
Candidate Key: {EmpID}
Prime: {EmpID}
Non-Prime: {EmpName, DeptID, DeptName, DeptLocation}
3NF Check for each FD:
| FD | X is Superkey? | A is Prime? | 3NF Violation? |
|---|---|---|---|
| EmpID → EmpName | Yes | — | No |
| EmpID → DeptID | Yes | — | No |
| EmpID → DeptName | Yes | — | No |
| EmpID → DeptLocation | Yes | — | No |
| DeptID → DeptName | No | No | YES |
| DeptID → DeptLocation | No | No | YES |
Conclusion: Not in 3NF. DeptID (non-prime) determines DeptName and DeptLocation (also non-prime).
The Transitive Chain:
EmpID → DeptID → DeptName, DeptLocation
DeptName and DeptLocation are transitively dependent on EmpID through DeptID.
3NF violations involving non-prime attributes indicate that we're storing information about two different entities in one table. In the example, we're mixing Employee facts (name, department assignment) with Department facts (name, location). These should be separated.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool: """Check if an attribute set is a superkey.""" closure = compute_closure(attrs, fds) return closure == all_attrs def find_3nf_violations(all_attrs: set, fds: list, prime_attrs: set) -> list: """ Find all 3NF violations in a relation. A 3NF violation occurs when: - X → A is non-trivial - X is not a superkey - A is not a prime attribute Returns: List of violation details """ violations = [] for lhs, rhs in fds: # Check each attribute on the right side for attr in rhs: # Skip trivial dependencies (A → A) if attr in lhs: continue # Check the conditions x_is_superkey = is_superkey(lhs, all_attrs, fds) a_is_prime = attr in prime_attrs if not x_is_superkey and not a_is_prime: violations.append({ 'fd': f"{lhs} → {attr}", 'determinant': lhs, 'dependent': attr, 'reason': f"'{lhs}' is not a superkey and '{attr}' is non-prime" }) return violations def decompose_for_3nf(all_attrs: set, fds: list, candidate_keys: list) -> list: """ Decompose a relation to achieve 3NF using the synthesis algorithm. The 3NF synthesis algorithm: 1. Find a minimal cover of FDs 2. Create a relation for each FD in the minimal cover 3. If no relation contains a candidate key, add one """ # Step 1: Compute minimal cover (simplified - assumes input is already minimal) minimal_cover = list(fds) # Step 2: Create relations from FDs relations = [] for lhs, rhs in minimal_cover: relations.append({ 'attributes': set(lhs) | set(rhs), 'key': set(lhs) }) # Step 3: Ensure a candidate key is included key_included = False for key in candidate_keys: for rel in relations: if key.issubset(rel['attributes']): key_included = True break if key_included: break if not key_included and candidate_keys: relations.append({ 'attributes': set(candidate_keys[0]), 'key': set(candidate_keys[0]) }) return relations # Example usageif __name__ == "__main__": all_attrs = {'EmpID', 'EmpName', 'DeptID', 'DeptName', 'DeptLocation'} fds = [ ({'EmpID'}, {'EmpName', 'DeptID'}), ({'DeptID'}, {'DeptName', 'DeptLocation'}) ] prime_attrs = {'EmpID'} violations = find_3nf_violations(all_attrs, fds, prime_attrs) print("3NF Violations:") for v in violations: print(f" {v['fd']}: {v['reason']}") print("\nDecomposition for 3NF:") decomp = decompose_for_3nf(all_attrs, fds, [{'EmpID'}]) for i, rel in enumerate(decomp): print(f" R{i+1}: {sorted(rel['attributes'])} [Key: {sorted(rel['key'])}]")Several special cases merit attention when analyzing non-prime attributes:
Case 1: All Attributes Are Prime
In rare scenarios, every attribute participates in some candidate key:
Relation: AB(A, B)
FDs: A → B, B → A
Candidate Keys: {A}, {B}
Prime: {A, B}
Non-Prime: {} (empty!)
Implications:
A true BCNF violation with all-prime:
Relation: ABC(A, B, C)
FDs: AB → C, C → B
Candidate Keys: {A, B}, {A, C}
Prime: {A, B, C} (all prime)
Non-Prime: {} (empty)
3NF Check: C → B — C is not a superkey, but B is prime. ✓ (passes 3NF)
BCNF Check: C → B — C is not a superkey. ✗ (violates BCNF)
Case 2: Single-Attribute Key
When the candidate key is a single attribute, 2NF is automatically satisfied:
Relation: Product(ProductID, Name, Category, CategoryManager)
FDs: ProductID → Name, Category, CategoryManager
Category → CategoryManager
Candidate Key: {ProductID}
Prime: {ProductID}
Non-Prime: {Name, Category, CategoryManager}
2NF Check: With a single-attribute key, there are no proper subsets of the key. Partial dependencies are impossible. Automatically 2NF! ✓
3NF Check: Category → CategoryManager. Category is not a superkey, CategoryManager is non-prime. ✗ Violates 3NF.
Lesson: Single-attribute keys guarantee 2NF but not 3NF.
Case 3: All Attributes Are Non-Prime (No Proper Key)
If no FDs exist or all FDs are trivial, the only candidate key is the full set of attributes:
Relation: Log(Timestamp, UserID, Action, Details)
FDs: {} (none meaningful)
Candidate Key: {Timestamp, UserID, Action, Details}
Prime: {Timestamp, UserID, Action, Details}
Non-Prime: {} (everything is in the key!)
This is often appropriate for logging tables where each row is unique and no attributes determine others. It's automatically in BCNF (the only determinant is the full key, which is a superkey).
In typical well-designed schemas, there's a balance: one or a few prime attributes (the key) and several non-prime attributes (the data about the entity). The goal of normalization is to ensure each non-prime attribute depends on 'the key, the whole key, and nothing but the key.'
Let's consolidate our understanding of non-prime attributes and their significance:
What's Next:
With a solid understanding of both prime and non-prime attributes, we're now ready to tackle the challenge of finding ALL candidate keys for a given relation. This comprehensive process is essential for accurate attribute classification and complete normalization analysis.
You now understand non-prime attributes—their definition, identification process, and central role in recognizing normalization violations. This knowledge is essential for diagnosing schema problems and applying normalization techniques effectively.