Loading learning content...
In the previous page, we learned to derive candidate keys from functional dependencies. Now we turn our attention to a critical classification: which attributes participate in these keys, and which do not?
This distinction is far from academic. Prime attributes—those appearing in at least one candidate key—enjoy special status in normalization theory. They are protected from certain anomalies and treated differently by normal form definitions. Understanding prime attributes is essential for making informed decisions about schema design and decomposition.
By the end of this page, you will understand the formal definition of prime attributes, master the techniques for identifying them from candidate keys, and appreciate their pivotal role in normalization theory—particularly in the definitions of 2NF, 3NF, and BCNF.
Definition:
An attribute A in a relation schema R is a prime attribute if A is a member of at least one candidate key of R.
Equivalently, given the set of candidate keys CK = {K₁, K₂, ..., Kₙ}, the set of prime attributes is:
Prime(R) = K₁ ∪ K₂ ∪ ... ∪ Kₙ = ⋃ Kᵢ
Key Observations:
Membership in ONE key suffices — An attribute need not appear in every candidate key; appearing in just one is enough to be prime
The union over all candidate keys — This means we must first find ALL candidate keys before identifying prime attributes
Primary key members are prime — Since the primary key is a candidate key, all primary key attributes are prime
Alternate key members are also prime — Attributes in non-primary candidate keys are equally prime
| Term | Definition | Symbol Notation | Example |
|---|---|---|---|
| Prime Attribute | Member of at least one candidate key | A ∈ ⋃CK | StudentID in Student(StudentID, Email, Name) |
| Non-Prime Attribute | Not a member of any candidate key | A ∉ ⋃CK | Name in the same relation |
| Key Attribute | Synonym for prime attribute | Same as prime | Used interchangeably in literature |
| Non-Key Attribute | Synonym for non-prime | Same as non-prime | Used interchangeably in literature |
Different textbooks use different terminology. 'Prime attribute' and 'key attribute' are synonymous, as are 'non-prime attribute' and 'non-key attribute.' Some sources also use 'secondary attribute' for non-prime. Be aware of these variations when reading database literature.
Identifying prime attributes requires first finding all candidate keys, then taking their union. The process is:
Step 1: Find all candidate keys (using techniques from the previous page)
Step 2: Compute the union of all candidate keys
Step 3: All attributes in this union are prime; all others are non-prime
Let's see this in action with a detailed example.
Relation: Employee(EmpID, Name, Email, Salary, DeptID)
Functional Dependencies:
F = {
EmpID → Name, Email, Salary, DeptID
}
Step 1: Find all candidate keys
Applying attribute classification:
Core = {EmpID} {EmpID}⁺ = {EmpID, Name, Email, Salary, DeptID} = U ✓
Since core is sufficient and there are no 'Both' attributes to try, there is exactly one candidate key: {EmpID}
Step 2: Union of all candidate keys
⋃CK = {EmpID}
Step 3: Classify attributes
| Attribute | In ⋃CK? | Classification |
|---|---|---|
| EmpID | Yes | PRIME |
| Name | No | Non-prime |
| No | Non-prime | |
| Salary | No | Non-prime |
| DeptID | No | Non-prime |
Result: Prime attributes = {EmpID}, Non-prime attributes = {Name, Email, Salary, DeptID}
Given the candidate key finding algorithm from the previous page, computing prime attributes is straightforward. Here's a complete implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
from itertools import combinations def compute_closure(attributes: set, fds: list[tuple[set, set]]) -> set: """Compute attribute closure X+ under FDs F.""" closure = set(attributes) changed = True while changed: changed = False for lhs, rhs in fds: if lhs.issubset(closure) and not rhs.issubset(closure): closure = closure.union(rhs) changed = True return closure def find_all_candidate_keys(all_attrs: set, fds: list[tuple[set, set]]) -> list[set]: """Find all candidate keys using attribute classification.""" # Classify attributes left_side = set() right_side = set() for lhs, rhs in fds: left_side.update(lhs) right_side.update(rhs) left_only = left_side - right_side right_only = right_side - left_side both = left_side & right_side neither = all_attrs - left_side - right_side core = left_only | neither if compute_closure(core, fds) == all_attrs: return [core] if core else [all_attrs] candidate_keys = [] for size in range(1, len(both) + 1): for subset in combinations(both, size): candidate = core | set(subset) if compute_closure(candidate, fds) != all_attrs: continue # Check minimality is_minimal = True for existing in candidate_keys: if existing < candidate: is_minimal = False break if is_minimal: for attr in subset: smaller = core | (set(subset) - {attr}) if compute_closure(smaller, fds) == all_attrs: is_minimal = False break if is_minimal: candidate_keys.append(candidate) return candidate_keys if candidate_keys else [all_attrs] def find_prime_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> tuple[set, set]: """ Find prime and non-prime attributes. Returns: Tuple of (prime_attributes, non_prime_attributes) """ candidate_keys = find_all_candidate_keys(all_attrs, fds) # Prime = union of all candidate keys prime = set() for key in candidate_keys: prime.update(key) # Non-prime = all attributes minus prime non_prime = all_attrs - prime return prime, non_prime def analyze_relation(name: str, all_attrs: set, fds: list[tuple[set, set]]): """Complete analysis of a relation's key structure.""" print(f"\n{'='*60}") print(f"Relation: {name}") print(f"Attributes: {sorted(all_attrs)}") print(f"\nFunctional Dependencies:") for lhs, rhs in fds: print(f" {sorted(lhs)} → {sorted(rhs)}") candidate_keys = find_all_candidate_keys(all_attrs, fds) print(f"\nCandidate Keys: {[sorted(k) for k in candidate_keys]}") prime, non_prime = find_prime_attributes(all_attrs, fds) print(f"\nPrime Attributes: {sorted(prime)}") print(f"Non-Prime Attributes: {sorted(non_prime)}") print(f"{'='*60}") # Example analysesif __name__ == "__main__": # Example 1: Single key analyze_relation( "Employee", {"EmpID", "Name", "Email", "Salary", "DeptID"}, [ ({"EmpID"}, {"Name", "Email", "Salary", "DeptID"}) ] ) # Example 2: Multiple single-attribute keys analyze_relation( "Student", {"StudentID", "SSN", "Email", "Name", "Major"}, [ ({"StudentID"}, {"SSN", "Email", "Name", "Major"}), ({"SSN"}, {"StudentID", "Email", "Name", "Major"}), ({"Email"}, {"StudentID", "SSN", "Name", "Major"}) ] ) # Example 3: Composite key analyze_relation( "Enrollment", {"StudentID", "CourseID", "Semester", "Grade", "InstructorID"}, [ ({"StudentID", "CourseID", "Semester"}, {"Grade", "InstructorID"}), ({"CourseID", "Semester"}, {"InstructorID"}) ] )The prime/non-prime distinction is not merely a classification—it's central to the very definitions of normal forms. Let's examine how this classification shapes normalization theory:
Second Normal Form (2NF):
A relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on every candidate key.
This definition explicitly treats prime and non-prime attributes differently:
Example of 2NF Analysis:
Relation: OrderItem(OrderID, ProductID, Quantity, ProductName, UnitPrice)
FDs:
{OrderID, ProductID} → Quantity, ProductName, UnitPrice
ProductID → ProductName, UnitPrice
Candidate Key: {OrderID, ProductID}
Prime: {OrderID, ProductID}
Non-Prime: {Quantity, ProductName, UnitPrice}
2NF Check: Is every non-prime fully dependent on the key?
Not in 2NF — ProductName and UnitPrice partially depend on the key.
Prime attributes are part of keys themselves, so "partial dependency on a key" becomes logically awkward for them. If attribute A is part of key K, then certainly A depends on K—but this is a trivial dependency. The 2NF definition focuses on meaningful dependency violations involving non-prime attributes.
Third Normal Form (3NF):
A relation is in 3NF if, for every non-trivial FD X → A, at least one of the following holds:
- X is a superkey, OR
- A is a prime attribute
Again, prime attributes receive special treatment—they can appear on the right side of non-superkey FDs without violating 3NF!
Example of 3NF Analysis:
Relation: Employee(EmpID, DeptID, DeptName, DeptHead)
FDs:
EmpID → DeptID, DeptName, DeptHead
DeptID → DeptName, DeptHead
Candidate Key: {EmpID}
Prime: {EmpID}
Non-Prime: {DeptID, DeptName, DeptHead}
3NF Check: For each non-trivial FD X → A:
Not in 3NF — The FDs DeptID → DeptName and DeptID → DeptHead violate 3NF because DeptID is not a superkey and DeptName/DeptHead are not prime.
| Normal Form | Role of Prime Attributes | Role of Non-Prime Attributes |
|---|---|---|
| 2NF | Exempt from partial dependency check | Must be fully dependent on all candidate keys |
| 3NF | Can be on RHS of non-superkey FDs | Cannot be transitively dependent on key |
| BCNF | No special treatment—must follow superkey rule | No special treatment—must follow superkey rule |
Boyce-Codd Normal Form (BCNF) provides no special exemption for prime attributes. Every determinant must be a superkey, period. This is why BCNF is sometimes called a 'stronger' form than 3NF—it closes the loophole that 3NF provides for prime attributes.
Prime attributes participate in interesting dependency patterns that warrant careful study:
1. Dependencies Among Prime Attributes
When multiple candidate keys exist, there are often dependencies among prime attributes:
Relation: Student(StudentID, SSN, Email, Name)
Candidate Keys: {StudentID}, {SSN}, {Email}
Prime: {StudentID, SSN, Email}
Non-Prime: {Name}
Implied FDs among primes:
StudentID → SSN
StudentID → Email
SSN → StudentID
SSN → Email
Email → StudentID
Email → SSN
These inter-key dependencies are valid but don't violate any normal form because the determinants are superkeys (single-attribute keys in this case).
2. Prime Attributes as Determinants
Prime attributes that are part of one key but not alone sufficient to be a key can be determinants of non-prime attributes:
Relation: Enrollment(StudentID, CourseID, Semester, Grade, RoomNumber)
FDs:
{StudentID, CourseID, Semester} → Grade, RoomNumber
{CourseID, Semester} → RoomNumber
Candidate Key: {StudentID, CourseID, Semester}
Prime: {StudentID, CourseID, Semester}
Non-Prime: {Grade, RoomNumber}
Here, {CourseID, Semester} determines RoomNumber. This is interesting:
The fact that the determinant consists entirely of prime attributes doesn't exempt it from 3NF analysis!
Don't confuse 'composed of prime attributes' with 'is a superkey.' A proper subset of a candidate key is made entirely of prime attributes, yet by definition it's NOT a superkey. Prime-ness is a property of individual attributes; superkey-ness is a property of attribute sets.
3. Overlapping Candidate Keys
When candidate keys overlap, the shared attributes are particularly interesting:
Relation: Assignment(ProjectID, EmployeeID, RoleType, StartDate)
FDs:
{ProjectID, EmployeeID} → RoleType, StartDate
{ProjectID, RoleType} → EmployeeID, StartDate
Candidate Keys: {ProjectID, EmployeeID}, {ProjectID, RoleType}
Prime: {ProjectID, EmployeeID, RoleType} (union)
Non-Prime: {StartDate}
Note: ProjectID appears in both keys — it's "doubly prime"
The overlapping attribute (ProjectID) is the anchor—any way to identify an assignment must include the project. The choice between using EmployeeID or RoleType to complete the key represents two different views of the data.
Understanding prime attributes has direct practical implications for database design and development:
Query Design Impact:
Knowing which attributes are prime helps optimize queries:
-- If StudentID, SSN, and Email are all prime (candidate keys)
-- These are equally efficient for exact-match lookups:
SELECT * FROM Student WHERE StudentID = 12345;
SELECT * FROM Student WHERE SSN = '123-45-6789';
SELECT * FROM Student WHERE Email = 'student@university.edu';
-- But for range queries, the primary key's index is typically preferred:
SELECT * FROM Student WHERE StudentID BETWEEN 10000 AND 20000;
-- Non-prime attribute queries may require table scans or secondary indexes:
SELECT * FROM Student WHERE Name LIKE 'John%'; -- Depends on index
Document all candidate keys, not just the primary key. This makes the prime/non-prime classification explicit and helps developers understand which attributes have key significance even if they're not the designated primary key.
Let's consolidate our understanding of prime attributes and their significance:
What's Next:
Having understood prime attributes in depth, we now turn to their complement: non-prime attributes. Understanding non-prime attributes is equally essential, as they are the focus of many normalization rules and the source of common design problems like transitive dependencies.
You now understand prime attributes—their formal definition, identification process, and crucial role in normalization theory. This classification is fundamental to evaluating normal forms and making informed schema design decisions.