Entities And Entity Sets - Learning Module

Loading content...

0/241

Entity Type

The Blueprint of Data

In the previous pages, we learned that entities are 'things with independent existence' and that entity sets collect similar entities together. But what defines 'similar'? What determines which attributes an entity possesses? What constraints govern valid entity values?

The answer lies in the entity type—the formal specification that defines the structure, properties, and rules for all entities of a particular kind. If an entity is a specific employee named Alice Chen, and an entity set is the collection of all employees, then the entity type is the definition of what it means to be an Employee: what information we store, what rules must be followed, what makes each employee unique.

Entity types are to entities what classes are to objects in programming, what blueprints are to buildings in architecture, what recipes are to dishes in cooking. They define the template from which actual instances emerge.

This page explores entity types in depth—their structure, their role in data modeling, their constraints, and how they enable powerful features like inheritance and specialization. Mastering entity types is essential for creating data models that are both expressive enough to capture complex domains and constrained enough to maintain data quality.

What You Will Learn

By the end of this page, you will understand entity types as structural definitions, distinguish entity types from entity sets confidently, specify entity type components (attributes, keys, constraints), understand type hierarchies and inheritance, and apply best practices for entity type design.

Entity Type Definition

An entity type is formally defined as:

A category or class of entities that share a common structure (same attributes) and semantic meaning within the domain.

Let's analyze this definition component by component:

'A category or class' — An entity type represents a classification. Just as biological taxonomy classifies organisms into species, data modeling classifies real-world things into entity types. The Customer entity type represents the classification 'customer' as understood in the business domain.

'That share a common structure' — All entities of a given type have the same set of attributes. Every Customer has name, email, and registration_date. The structure is defined once at the type level and applies to all instances.

'Same attributes' — Attributes are the properties that describe entities. Entity types specify which attributes exist and what values they can hold. This is the schema aspect.

'Semantic meaning within the domain' — Entity types aren't arbitrary groupings. They reflect meaningful classifications in the problem domain. A bank has Customers and Accounts because those concepts are meaningful in banking, not because a programmer thought they'd be convenient.

Components of an Entity Type:

Name — The identifier for the entity type (e.g., Employee, Product, Order)
Attributes — The properties that describe entities of this type
- Simple attributes: single-valued, atomic (e.g., first_name)
- Composite attributes: structured (e.g., address with street, city, state)
- Derived attributes: computed from other attributes (e.g., age from birth_date)
- Multivalued attributes: can have multiple values (e.g., phone_numbers)
Key — Attribute(s) that uniquely identify each entity instance
- Candidate keys: minimal sets of attributes that uniquely identify
- Primary key: the chosen candidate key for identification
- Alternate keys: candidate keys not chosen as primary
Constraints — Rules that valid entities must satisfy
- Domain constraints: valid values for attributes
- Uniqueness constraints: keys and unique attributes
- Not-null constraints: required attributes
- Business rules: semantic constraints

The Power of Type Definitions

Entity type definitions are powerful precisely because they apply uniformly. Define the Customer type once, and every customer instance automatically has the right structure, obeys the constraints, and can participate in relationships defined for Customers. Changes to the type definition affect all instances consistently.

Strong vs. Weak Entity Types

Not all entity types are created equal. A fundamental distinction exists between strong (or regular) entity types and weak entity types, based on how entities are identified.

Strong Entity Types:

A strong entity type can identify its instances using only its own attributes. It is independent—it doesn't rely on other entity types for its key.

Characteristics:

Has a primary key composed only of its own attributes
Entities exist independently of other entities
Most entity types in a typical model are strong
Represented in ER diagrams with single-line rectangle

Examples:

Employee — Identified by employee_id
Product — Identified by product_code
Customer — Identified by customer_number
Building — Identified by building_id

Weak Entity Types:

A weak entity type cannot identify its instances using only its own attributes. It depends on an owning (or identifying) entity type for part of its identification.

Characteristics:

Has a partial key (or discriminator) that distinguishes entities within one owner
Full identification requires owner's key + partial key
Existence depends on the owning entity
Represented in ER diagrams with double-line rectangle

Strong Entity Types

•Self-sufficient key from own attributes
•Independent existence
•Single-line rectangle notation
•Examples: Employee, Product, Customer
•Most common in typical models
•No identifying relationship required

Weak Entity Types

•Partial key only; needs owner's key
•Dependent on owner entity
•Double-line rectangle notation
•Examples: Dependent, OrderItem, RoomNumber
•Used for component/part-of relationships
•Always has identifying relationship

Classic Weak Entity Example:

Building and Room:

Building (strong): building_id (e.g., 'Main', 'Annex', 'Tower')
Room (weak): room_number (e.g., '101', '102', '201')

Room 101 in the Main Building and Room 101 in the Tower are different rooms. The room_number alone (partial key) isn't unique globally—it's unique only within a building. Full identification requires:

(building_id, room_number) → ('Main', '101')
(building_id, room_number) → ('Tower', '101')

Employee and Dependent:

Employee (strong): employee_id
Dependent (weak): dependent_name (partial key)

An employee's dependents (spouse, children) might share names with other employees' dependents. 'John Smith' as a dependent is meaningful only in the context of a specific employee.

(employee_id, dependent_name) → ('E101', 'John Smith')
(employee_id, dependent_name) → ('E205', 'John Smith')  -- different person

Existence Dependency

Weak entities have existence dependency on their owner. If an employee is deleted, their dependents no longer exist. If a building is demolished, its rooms don't exist independently. This cascading behavior must be carefully implemented in the database through foreign key constraints with CASCADE DELETE.

Entity Type Attributes in Depth

Attributes are the descriptive properties defined within an entity type. The attribute structure of an entity type determines what information can be stored and how it can be queried. Let's examine attribute classifications in detail.

1. Simple (Atomic) Attributes:

Attributes that cannot be meaningfully subdivided. They hold single, indivisible values.

employee_id: 'E12345'
first_name: 'Alice'
salary: 85000
hire_date: '2020-03-15'
is_active: true

Simple attributes map directly to table columns with appropriate data types.

2. Composite Attributes:

Attributes that can be divided into smaller, meaningful sub-parts. They represent structured values.

Example — Full Address:

address:
  ├── street: '123 Main Street'
  ├── city: 'San Francisco'
  ├── state: 'CA'
  ├── postal_code: '94102'
  └── country: 'USA'

Example — Full Name:

name:
  ├── first_name: 'Alice'
  ├── middle_name: 'Marie'
  └── last_name: 'Chen'

Composite attributes can be flattened into multiple simple columns or, in some databases, stored as structured types.

3. Single-Valued vs. Multivalued Attributes:

Single-valued: Hold exactly one value per entity.

birth_date: one date
email: one email (if we model it that way)

Multivalued: Can hold multiple values per entity.

phone_numbers: ['415-555-1234', '415-555-5678']
skills: ['Python', 'SQL', 'Machine Learning']
degrees: ['BS Computer Science', 'MS Data Science']

Multivalued attributes create modeling challenges. In relational implementation, they typically become:

A separate table (recommended for true set semantics)
An array column (in databases supporting arrays)
A denormalized string (problematic for querying)

4. Stored vs. Derived Attributes:

Stored: Captured and persisted in the database.

birth_date: stored
hire_date: stored

Derived: Computed from other attributes on demand.

age: derived from birth_date and current date
years_of_service: derived from hire_date
full_name: derived by concatenating name components
total_order_value: derived by summing line items

Derived attributes may be:

Computed dynamically (always current)
Computed and cached (faster but potentially stale)
Materialized (stored and maintained via triggers)

Attribute Classification Summary
Classification	Examples	Implementation Considerations
Simple/Atomic	id, name, price, date	Direct column mapping
Composite	address, full_name	Flatten to columns or use structured types
Single-valued	birth_date, ssn, primary_email	Standard column
Multivalued	phone_numbers, skills, hobbies	Separate table or array type
Stored	name, hire_date, salary	Persisted in database
Derived	age, tenure, full_name	Computed or materialized
Key attribute	employee_id, ssn, email	Primary/unique constraint; indexed
Nullable	middle_name, fax_number	NULL allowed; optional
Required	name, email, created_at	NOT NULL constraint; mandatory

Attribute Domain

Every attribute has a domain—the set of permissible values. 'salary' has a numeric domain (perhaps Decimal(10,2) with min 0). 'status' might have an enumeration domain {'active', 'inactive', 'pending'}. 'email' has a string domain with format constraints. Proper domain specification prevents invalid data.

Key Attributes and Identification

Every entity type must have a mechanism for uniquely identifying each entity instance. This is accomplished through key attributes—attributes whose values distinguish one entity from another within the entity set.

Superkey:

A superkey is any set of attributes that, taken together, uniquely identifies each entity. A superkey may contain more attributes than necessary.

For Employee:

{employee_id} — superkey
{employee_id, name} — superkey (includes unnecessary attribute)
{ssn} — superkey
{email} — superkey (if unique)
{employee_id, name, email} — superkey (redundant attributes)

Candidate Key:

A candidate key is a minimal superkey—no proper subset of it is also a superkey. Removing any attribute would make it non-unique.

For Employee:

{employee_id} — candidate key
{ssn} — candidate key
{email} — candidate key (if business guarantees uniqueness)

Primary Key:

From among candidate keys, one is selected as the primary key. This becomes the official identification mechanism used in relationships.

Selection criteria:

Simplicity: Fewer attributes preferred
Stability: Values shouldn't change
Familiarity: Meaningful to users when appropriate
Performance: Efficient for indexing

Alternate Key:

Candidate keys not chosen as primary key become alternate keys. They're still unique and often get unique constraints/indices.

Natural Keys vs. Surrogate Keys:

A major design decision is whether to use natural or surrogate keys:

Natural Key:

Uses real-world attributes
Meaningful to users
Examples: SSN, ISBN, VIN, email address

Pros: Already exists; no generation needed; meaningful in queries Cons: May change; may have format issues; potential privacy concerns

Surrogate Key:

System-generated artificial identifier
No real-world meaning
Examples: auto-increment integer, UUID, GUID

Pros: Stable; compact; simple joins; no business-rule coupling Cons: Requires lookup for meaning; extra attribute to maintain

Industry Recommendation:

Most modern systems prefer surrogate primary keys with natural keys as alternate/unique:

CREATE TABLE Customer (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,  -- surrogate
    email VARCHAR(255) UNIQUE NOT NULL,          -- natural alternate key
    ssn CHAR(11) UNIQUE,                          -- natural alternate key
    name VARCHAR(100) NOT NULL,
    ...
);

This provides stability (surrogate never changes) and business usability (natural keys for queries and constraints).

Key Stability Is Critical

Primary keys propagate through foreign keys to other tables. If a primary key changes, all referencing foreign keys must also change—a risky, expensive operation. This is why surrogate keys (which never have business reasons to change) are preferred. Choose keys wisely; changing them later is painful.

Entity Type Constraints

Beyond structural definitions (attributes and keys), entity types include constraints—rules that valid entity instances must satisfy. Constraints encode business rules and maintain data quality.

1. Domain Constraints:

Specify valid values for each attribute:

Data type: salary is DECIMAL(10,2), not TEXT
Range: salary >= 0, temperature between -50 and 150
Format: email matches pattern, phone has 10 digits
Enumeration: status IN ('active', 'inactive', 'pending')

2. Key Constraints:

Enforce uniqueness for key attributes:

Primary key: exactly one per entity, never null, unique
Unique constraints: alternate keys and other unique attributes

3. Not-Null Constraints:

Require values for certain attributes:

Required attributes: name NOT NULL, email NOT NULL
Optional attributes: middle_name (nullable), fax_number (nullable)

4. Entity Integrity Constraint:

The primary key of an entity can never be null. This fundamental rule ensures every entity is identifiable.

5. Business Rule Constraints:

Domain-specific rules beyond basic data validation:

"End date must be after start date"
"Manager must be in the same department"
"Discount cannot exceed 50%"
"At least one phone number required if customer_type is 'premium'"

Constraint Types and Enforcement Mechanisms
Constraint Type	Example	Enforcement Level
Domain (data type)	salary DECIMAL(10,2)	DDL / Column definition
Domain (range)	quantity >= 0	CHECK constraint
Domain (enumeration)	status IN ('A','I','P')	CHECK or ENUM type
Domain (format)	email LIKE '%@%.%'	CHECK with LIKE or regex
Key (primary)	employee_id PRIMARY KEY	PRIMARY KEY constraint
Key (unique)	email UNIQUE	UNIQUE constraint
Not-null	name NOT NULL	NOT NULL constraint
Entity integrity	PK NOT NULL	Implicit in PRIMARY KEY
Business rule (simple)	end_date > start_date	CHECK constraint
Business rule (complex)	Manager same department	Trigger or application logic

Constraints in Conceptual vs. Physical Modeling

In conceptual models (ER diagrams), we note constraints informally—text annotations like 'unique' or 'required'. In logical/physical design, these become formal DDL: PRIMARY KEY, UNIQUE, NOT NULL, CHECK. The conceptual model captures the rules; the physical model enforces them.

Entity Type Hierarchies

Entity types can be organized into hierarchies through specialization (top-down) and generalization (bottom-up). This enables inheritance of attributes and more precise domain modeling.

Supertype and Subtype:

A supertype is a general entity type that can be specialized into more specific subtypes:

Person (supertype)
├── Employee (subtype)
├── Customer (subtype)
└── Vendor (subtype)

Subtypes:

Inherit all attributes of the supertype
Add their own local attributes
May have their own relationships
May have their own subtypes (multi-level hierarchies)

Example — Account Hierarchy:

Account (supertype)
├── account_number
├── balance
├── open_date
└── customer_id (FK)
    ├── CheckingAccount (subtype)
    │   ├── overdraft_limit
    │   └── check_fee
    └── SavingsAccount (subtype)
        ├── interest_rate
        └── min_balance

Every CheckingAccount has account_number, balance, open_date (inherited) PLUS overdraft_limit, check_fee (local).

Specialization vs. Generalization:

Specialization (Top-Down):

Start with general type
Identify subgroups with distinct attributes
Create subtypes for each
"What kinds of Accounts exist?"

Generalization (Bottom-Up):

Start with specific types
Recognize common attributes
Create supertype that captures commonality
"What do Car, Truck, and Motorcycle have in common?"

Both processes result in the same hierarchy structure; they differ in how you arrive at it.

Constraints on Hierarchies:

Disjoint vs. Overlapping:

Disjoint: Entity can be in only one subtype
- A vehicle is either a Car OR a Truck, not both
Overlapping: Entity can be in multiple subtypes
- A person can be both Employee AND Customer

Total vs. Partial:

Total: Every supertype instance must be in some subtype
- Every Payment is either Cash, Credit, or Debit
Partial: Supertype instances may not be in any subtype
- A Person might be neither Employee nor Customer (just a contact)

Inheritance in Data Modeling

Entity type inheritance resembles class inheritance in OOP. Subtypes inherit attributes like subclasses inherit fields. However, relational databases don't directly support inheritance. Mapping inheritance to tables (single table, class table, concrete table strategies) is a major design decision covered in ER-to-Relational mapping.

Entity Type Design Best Practices

Designing entity types well is crucial for creating maintainable, flexible data models. Here are established best practices:

1. One Entity Type, One Concept:

Each entity type should represent exactly one concept from the domain. Don't combine unrelated concepts, and don't split a single concept across multiple types unnecessarily.

Bad: PersonOrOrganization (mixed concepts) Good: Person and Organization as separate types (possibly with Party supertype)

2. Meaningful Names:

Entity type names should reflect domain vocabulary, not implementation details:

Bad: tbl_cust, record_type_01 Good: Customer, Employee, Order

Use singular nouns (Customer, not Customers). Be specific when needed (SalesOrder vs PurchaseOrder).

3. Complete Attribute Sets:

Identify all relevant attributes during design. Missing attributes create problems later:

Consider:

What information describes this entity?
What questions will users ask about it?
What reports will include it?
What calculations need it?

4. Appropriate Key Selection:

Choose keys carefully:

Use surrogate keys for stability
Define natural alternate keys for business queries
Avoid composite keys when simple keys suffice
Never use mutable attributes as keys

Entity Type Design Checklist

•Single responsibility — Type represents one coherent concept
•Meaningful name — Clear, domain-appropriate, singular noun
•Complete attributes — All descriptive information captured
•Defined primary key — Stable, simple, surrogate preferred
•Alternate keys — Natural keys for business use
•Specified constraints — Domain, null, business rules documented
•Appropriate granularity — Not too broad (mixed concepts) or too narrow (forced splits)
•Future extensibility — Room for additional attributes without redesign
•Relationship clarity — Clear role in domain relationships
•Specialization consideration — Subtypes identified if warranted

The 'Describe It' Test

Can you describe a typical entity of this type in a natural sentence? 'A Customer is a person or organization that purchases products from us, identified by customer_id, described by name, email, and registration_date, with constraints that email must be unique and name is required.' If the description is awkward, the type design might need work.

Summary: Understanding Entity Types

We've explored entity types—the structural blueprints that define what entities are, what attributes they possess, and what rules govern their instances. Entity types are central to ER modeling, enabling precise, enforceable data definitions.

Key Takeaways

•Entity types define structure — Categories of entities sharing attributes and semantic meaning
•Strong vs. weak entity types — Strong types are self-identifying; weak types depend on owners
•Attributes have classifications — Simple/composite, single/multi-valued, stored/derived
•Keys establish identity — Superkeys contain candidate keys; primary key is chosen identifier
•Natural vs. surrogate keys — Industry favors surrogate primary with natural alternates
•Constraints encode rules — Domain, key, not-null, and business rules maintain quality
•Type hierarchies enable inheritance — Specialization and generalization with overlap/disjoint, total/partial constraints
•Best practices guide design — Single concept, meaningful names, complete attributes, appropriate keys

What's next:

Now that we understand entity types as structural definitions, we'll examine entity instances—the actual entities that exist in the database at runtime. Understanding the distinction between types and instances deepens our grasp of how abstract definitions translate to concrete data.

Page Complete

You now understand entity types as the structural blueprints of data modeling—defining attributes, keys, and constraints for all entities of a kind. This knowledge enables you to create precise, maintainable data models. Next, we'll explore entity instances and how types manifest as actual data.

Entity Type

The Blueprint of Data

What You Will Learn

Entity Type Definition

An entity type is formally defined as:

A category or class of entities that share a common structure (same attributes) and semantic meaning within the domain.

Let's analyze this definition component by component:

'Same attributes' — Attributes are the properties that describe entities. Entity types specify which attributes exist and what values they can hold. This is the schema aspect.

Components of an Entity Type:

Name — The identifier for the entity type (e.g., Employee, Product, Order)
Attributes — The properties that describe entities of this type
- Simple attributes: single-valued, atomic (e.g., first_name)
- Composite attributes: structured (e.g., address with street, city, state)
- Derived attributes: computed from other attributes (e.g., age from birth_date)
- Multivalued attributes: can have multiple values (e.g., phone_numbers)
Key — Attribute(s) that uniquely identify each entity instance
- Candidate keys: minimal sets of attributes that uniquely identify
- Primary key: the chosen candidate key for identification
- Alternate keys: candidate keys not chosen as primary
Constraints — Rules that valid entities must satisfy
- Domain constraints: valid values for attributes
- Uniqueness constraints: keys and unique attributes
- Not-null constraints: required attributes
- Business rules: semantic constraints

The Power of Type Definitions

Strong vs. Weak Entity Types

Not all entity types are created equal. A fundamental distinction exists between strong (or regular) entity types and weak entity types, based on how entities are identified.

Strong Entity Types:

A strong entity type can identify its instances using only its own attributes. It is independent—it doesn't rely on other entity types for its key.

Characteristics:

Has a primary key composed only of its own attributes
Entities exist independently of other entities
Most entity types in a typical model are strong
Represented in ER diagrams with single-line rectangle

Examples:

Employee — Identified by employee_id
Product — Identified by product_code
Customer — Identified by customer_number
Building — Identified by building_id

Weak Entity Types:

A weak entity type cannot identify its instances using only its own attributes. It depends on an owning (or identifying) entity type for part of its identification.

Characteristics:

Has a partial key (or discriminator) that distinguishes entities within one owner
Full identification requires owner's key + partial key
Existence depends on the owning entity
Represented in ER diagrams with double-line rectangle

Strong Entity Types

•Self-sufficient key from own attributes
•Independent existence
•Single-line rectangle notation
•Examples: Employee, Product, Customer
•Most common in typical models
•No identifying relationship required

Weak Entity Types

•Partial key only; needs owner's key
•Dependent on owner entity
•Double-line rectangle notation
•Examples: Dependent, OrderItem, RoomNumber
•Used for component/part-of relationships
•Always has identifying relationship

Classic Weak Entity Example:

Building and Room:

Building (strong): building_id (e.g., 'Main', 'Annex', 'Tower')
Room (weak): room_number (e.g., '101', '102', '201')

(building_id, room_number) → ('Main', '101')
(building_id, room_number) → ('Tower', '101')

Employee and Dependent:

Employee (strong): employee_id
Dependent (weak): dependent_name (partial key)

An employee's dependents (spouse, children) might share names with other employees' dependents. 'John Smith' as a dependent is meaningful only in the context of a specific employee.

(employee_id, dependent_name) → ('E101', 'John Smith')
(employee_id, dependent_name) → ('E205', 'John Smith')  -- different person

Existence Dependency

Entity Type Attributes in Depth

1. Simple (Atomic) Attributes:

Attributes that cannot be meaningfully subdivided. They hold single, indivisible values.

employee_id: 'E12345'
first_name: 'Alice'
salary: 85000
hire_date: '2020-03-15'
is_active: true

Simple attributes map directly to table columns with appropriate data types.

2. Composite Attributes:

Attributes that can be divided into smaller, meaningful sub-parts. They represent structured values.

Example — Full Address:

address:
  ├── street: '123 Main Street'
  ├── city: 'San Francisco'
  ├── state: 'CA'
  ├── postal_code: '94102'
  └── country: 'USA'

Example — Full Name:

name:
  ├── first_name: 'Alice'
  ├── middle_name: 'Marie'
  └── last_name: 'Chen'

Composite attributes can be flattened into multiple simple columns or, in some databases, stored as structured types.

3. Single-Valued vs. Multivalued Attributes:

Single-valued: Hold exactly one value per entity.

birth_date: one date
email: one email (if we model it that way)

Multivalued: Can hold multiple values per entity.

phone_numbers: ['415-555-1234', '415-555-5678']
skills: ['Python', 'SQL', 'Machine Learning']
degrees: ['BS Computer Science', 'MS Data Science']

Multivalued attributes create modeling challenges. In relational implementation, they typically become:

A separate table (recommended for true set semantics)
An array column (in databases supporting arrays)
A denormalized string (problematic for querying)

4. Stored vs. Derived Attributes:

Stored: Captured and persisted in the database.

birth_date: stored
hire_date: stored

Derived: Computed from other attributes on demand.

age: derived from birth_date and current date
years_of_service: derived from hire_date
full_name: derived by concatenating name components
total_order_value: derived by summing line items

Derived attributes may be:

Computed dynamically (always current)
Computed and cached (faster but potentially stale)
Materialized (stored and maintained via triggers)

Attribute Classification Summary
Classification	Examples	Implementation Considerations
Simple/Atomic	id, name, price, date	Direct column mapping
Composite	address, full_name	Flatten to columns or use structured types
Single-valued	birth_date, ssn, primary_email	Standard column
Multivalued	phone_numbers, skills, hobbies	Separate table or array type
Stored	name, hire_date, salary	Persisted in database
Derived	age, tenure, full_name	Computed or materialized
Key attribute	employee_id, ssn, email	Primary/unique constraint; indexed
Nullable	middle_name, fax_number	NULL allowed; optional
Required	name, email, created_at	NOT NULL constraint; mandatory

Attribute Domain

Key Attributes and Identification

Superkey:

A superkey is any set of attributes that, taken together, uniquely identifies each entity. A superkey may contain more attributes than necessary.

For Employee:

{employee_id} — superkey
{employee_id, name} — superkey (includes unnecessary attribute)
{ssn} — superkey
{email} — superkey (if unique)
{employee_id, name, email} — superkey (redundant attributes)

Candidate Key:

A candidate key is a minimal superkey—no proper subset of it is also a superkey. Removing any attribute would make it non-unique.

For Employee:

{employee_id} — candidate key
{ssn} — candidate key
{email} — candidate key (if business guarantees uniqueness)

Primary Key:

From among candidate keys, one is selected as the primary key. This becomes the official identification mechanism used in relationships.

Selection criteria:

Simplicity: Fewer attributes preferred
Stability: Values shouldn't change
Familiarity: Meaningful to users when appropriate
Performance: Efficient for indexing

Alternate Key:

Candidate keys not chosen as primary key become alternate keys. They're still unique and often get unique constraints/indices.

Natural Keys vs. Surrogate Keys:

A major design decision is whether to use natural or surrogate keys:

Natural Key:

Uses real-world attributes
Meaningful to users
Examples: SSN, ISBN, VIN, email address

Pros: Already exists; no generation needed; meaningful in queries Cons: May change; may have format issues; potential privacy concerns

Surrogate Key:

System-generated artificial identifier
No real-world meaning
Examples: auto-increment integer, UUID, GUID

Pros: Stable; compact; simple joins; no business-rule coupling Cons: Requires lookup for meaning; extra attribute to maintain

Industry Recommendation:

Most modern systems prefer surrogate primary keys with natural keys as alternate/unique:

CREATE TABLE Customer (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,  -- surrogate
    email VARCHAR(255) UNIQUE NOT NULL,          -- natural alternate key
    ssn CHAR(11) UNIQUE,                          -- natural alternate key
    name VARCHAR(100) NOT NULL,
    ...
);

This provides stability (surrogate never changes) and business usability (natural keys for queries and constraints).

Key Stability Is Critical

Entity Type Constraints

1. Domain Constraints:

Specify valid values for each attribute:

Data type: salary is DECIMAL(10,2), not TEXT
Range: salary >= 0, temperature between -50 and 150
Format: email matches pattern, phone has 10 digits
Enumeration: status IN ('active', 'inactive', 'pending')

2. Key Constraints:

Enforce uniqueness for key attributes:

Primary key: exactly one per entity, never null, unique
Unique constraints: alternate keys and other unique attributes

3. Not-Null Constraints:

Require values for certain attributes:

Required attributes: name NOT NULL, email NOT NULL
Optional attributes: middle_name (nullable), fax_number (nullable)

4. Entity Integrity Constraint:

The primary key of an entity can never be null. This fundamental rule ensures every entity is identifiable.

5. Business Rule Constraints:

Domain-specific rules beyond basic data validation:

"End date must be after start date"
"Manager must be in the same department"
"Discount cannot exceed 50%"
"At least one phone number required if customer_type is 'premium'"

Constraint Types and Enforcement Mechanisms
Constraint Type	Example	Enforcement Level
Domain (data type)	salary DECIMAL(10,2)	DDL / Column definition
Domain (range)	quantity >= 0	CHECK constraint
Domain (enumeration)	status IN ('A','I','P')	CHECK or ENUM type
Domain (format)	email LIKE '%@%.%'	CHECK with LIKE or regex
Key (primary)	employee_id PRIMARY KEY	PRIMARY KEY constraint
Key (unique)	email UNIQUE	UNIQUE constraint
Not-null	name NOT NULL	NOT NULL constraint
Entity integrity	PK NOT NULL	Implicit in PRIMARY KEY
Business rule (simple)	end_date > start_date	CHECK constraint
Business rule (complex)	Manager same department	Trigger or application logic

Constraints in Conceptual vs. Physical Modeling

Entity Type Hierarchies

Entity types can be organized into hierarchies through specialization (top-down) and generalization (bottom-up). This enables inheritance of attributes and more precise domain modeling.

Supertype and Subtype:

A supertype is a general entity type that can be specialized into more specific subtypes:

Person (supertype)
├── Employee (subtype)
├── Customer (subtype)
└── Vendor (subtype)

Subtypes:

Inherit all attributes of the supertype
Add their own local attributes
May have their own relationships
May have their own subtypes (multi-level hierarchies)

Example — Account Hierarchy:

Account (supertype)
├── account_number
├── balance
├── open_date
└── customer_id (FK)
    ├── CheckingAccount (subtype)
    │   ├── overdraft_limit
    │   └── check_fee
    └── SavingsAccount (subtype)
        ├── interest_rate
        └── min_balance

Every CheckingAccount has account_number, balance, open_date (inherited) PLUS overdraft_limit, check_fee (local).

Specialization vs. Generalization:

Specialization (Top-Down):

Start with general type
Identify subgroups with distinct attributes
Create subtypes for each
"What kinds of Accounts exist?"

Generalization (Bottom-Up):

Start with specific types
Recognize common attributes
Create supertype that captures commonality
"What do Car, Truck, and Motorcycle have in common?"

Both processes result in the same hierarchy structure; they differ in how you arrive at it.

Constraints on Hierarchies:

Disjoint vs. Overlapping:

Disjoint: Entity can be in only one subtype
- A vehicle is either a Car OR a Truck, not both
Overlapping: Entity can be in multiple subtypes
- A person can be both Employee AND Customer

Total vs. Partial:

Total: Every supertype instance must be in some subtype
- Every Payment is either Cash, Credit, or Debit
Partial: Supertype instances may not be in any subtype
- A Person might be neither Employee nor Customer (just a contact)

Inheritance in Data Modeling

Entity Type Design Best Practices

Designing entity types well is crucial for creating maintainable, flexible data models. Here are established best practices:

1. One Entity Type, One Concept:

Each entity type should represent exactly one concept from the domain. Don't combine unrelated concepts, and don't split a single concept across multiple types unnecessarily.

Bad: PersonOrOrganization (mixed concepts) Good: Person and Organization as separate types (possibly with Party supertype)

2. Meaningful Names:

Entity type names should reflect domain vocabulary, not implementation details:

Bad: tbl_cust, record_type_01 Good: Customer, Employee, Order

Use singular nouns (Customer, not Customers). Be specific when needed (SalesOrder vs PurchaseOrder).

3. Complete Attribute Sets:

Identify all relevant attributes during design. Missing attributes create problems later:

Consider:

What information describes this entity?
What questions will users ask about it?
What reports will include it?
What calculations need it?

4. Appropriate Key Selection:

Choose keys carefully:

Use surrogate keys for stability
Define natural alternate keys for business queries
Avoid composite keys when simple keys suffice
Never use mutable attributes as keys

Entity Type Design Checklist

•Single responsibility — Type represents one coherent concept
•Meaningful name — Clear, domain-appropriate, singular noun
•Complete attributes — All descriptive information captured
•Defined primary key — Stable, simple, surrogate preferred
•Alternate keys — Natural keys for business use
•Specified constraints — Domain, null, business rules documented
•Appropriate granularity — Not too broad (mixed concepts) or too narrow (forced splits)
•Future extensibility — Room for additional attributes without redesign
•Relationship clarity — Clear role in domain relationships
•Specialization consideration — Subtypes identified if warranted

The 'Describe It' Test

Summary: Understanding Entity Types

Key Takeaways

•Entity types define structure — Categories of entities sharing attributes and semantic meaning
•Strong vs. weak entity types — Strong types are self-identifying; weak types depend on owners
•Attributes have classifications — Simple/composite, single/multi-valued, stored/derived
•Keys establish identity — Superkeys contain candidate keys; primary key is chosen identifier
•Natural vs. surrogate keys — Industry favors surrogate primary with natural alternates
•Constraints encode rules — Domain, key, not-null, and business rules maintain quality
•Type hierarchies enable inheritance — Specialization and generalization with overlap/disjoint, total/partial constraints
•Best practices guide design — Single concept, meaningful names, complete attributes, appropriate keys

What's next:

Page Complete