Loading content...
In the previous pages, we learned that entities are 'things with independent existence' and that entity sets collect similar entities together. But what defines 'similar'? What determines which attributes an entity possesses? What constraints govern valid entity values?
The answer lies in the entity type—the formal specification that defines the structure, properties, and rules for all entities of a particular kind. If an entity is a specific employee named Alice Chen, and an entity set is the collection of all employees, then the entity type is the definition of what it means to be an Employee: what information we store, what rules must be followed, what makes each employee unique.
Entity types are to entities what classes are to objects in programming, what blueprints are to buildings in architecture, what recipes are to dishes in cooking. They define the template from which actual instances emerge.
This page explores entity types in depth—their structure, their role in data modeling, their constraints, and how they enable powerful features like inheritance and specialization. Mastering entity types is essential for creating data models that are both expressive enough to capture complex domains and constrained enough to maintain data quality.
By the end of this page, you will understand entity types as structural definitions, distinguish entity types from entity sets confidently, specify entity type components (attributes, keys, constraints), understand type hierarchies and inheritance, and apply best practices for entity type design.
An entity type is formally defined as:
A category or class of entities that share a common structure (same attributes) and semantic meaning within the domain.
Let's analyze this definition component by component:
'A category or class' — An entity type represents a classification. Just as biological taxonomy classifies organisms into species, data modeling classifies real-world things into entity types. The Customer entity type represents the classification 'customer' as understood in the business domain.
'That share a common structure' — All entities of a given type have the same set of attributes. Every Customer has name, email, and registration_date. The structure is defined once at the type level and applies to all instances.
'Same attributes' — Attributes are the properties that describe entities. Entity types specify which attributes exist and what values they can hold. This is the schema aspect.
'Semantic meaning within the domain' — Entity types aren't arbitrary groupings. They reflect meaningful classifications in the problem domain. A bank has Customers and Accounts because those concepts are meaningful in banking, not because a programmer thought they'd be convenient.
Components of an Entity Type:
Name — The identifier for the entity type (e.g., Employee, Product, Order)
Attributes — The properties that describe entities of this type
Key — Attribute(s) that uniquely identify each entity instance
Constraints — Rules that valid entities must satisfy
Entity type definitions are powerful precisely because they apply uniformly. Define the Customer type once, and every customer instance automatically has the right structure, obeys the constraints, and can participate in relationships defined for Customers. Changes to the type definition affect all instances consistently.
Not all entity types are created equal. A fundamental distinction exists between strong (or regular) entity types and weak entity types, based on how entities are identified.
Strong Entity Types:
A strong entity type can identify its instances using only its own attributes. It is independent—it doesn't rely on other entity types for its key.
Characteristics:
Examples:
Weak Entity Types:
A weak entity type cannot identify its instances using only its own attributes. It depends on an owning (or identifying) entity type for part of its identification.
Characteristics:
Classic Weak Entity Example:
Building and Room:
Room 101 in the Main Building and Room 101 in the Tower are different rooms. The room_number alone (partial key) isn't unique globally—it's unique only within a building. Full identification requires:
(building_id, room_number) → ('Main', '101')
(building_id, room_number) → ('Tower', '101')
Employee and Dependent:
An employee's dependents (spouse, children) might share names with other employees' dependents. 'John Smith' as a dependent is meaningful only in the context of a specific employee.
(employee_id, dependent_name) → ('E101', 'John Smith')
(employee_id, dependent_name) → ('E205', 'John Smith') -- different person
Weak entities have existence dependency on their owner. If an employee is deleted, their dependents no longer exist. If a building is demolished, its rooms don't exist independently. This cascading behavior must be carefully implemented in the database through foreign key constraints with CASCADE DELETE.
Attributes are the descriptive properties defined within an entity type. The attribute structure of an entity type determines what information can be stored and how it can be queried. Let's examine attribute classifications in detail.
1. Simple (Atomic) Attributes:
Attributes that cannot be meaningfully subdivided. They hold single, indivisible values.
Simple attributes map directly to table columns with appropriate data types.
2. Composite Attributes:
Attributes that can be divided into smaller, meaningful sub-parts. They represent structured values.
Example — Full Address:
address:
├── street: '123 Main Street'
├── city: 'San Francisco'
├── state: 'CA'
├── postal_code: '94102'
└── country: 'USA'
Example — Full Name:
name:
├── first_name: 'Alice'
├── middle_name: 'Marie'
└── last_name: 'Chen'
Composite attributes can be flattened into multiple simple columns or, in some databases, stored as structured types.
3. Single-Valued vs. Multivalued Attributes:
Single-valued: Hold exactly one value per entity.
Multivalued: Can hold multiple values per entity.
Multivalued attributes create modeling challenges. In relational implementation, they typically become:
4. Stored vs. Derived Attributes:
Stored: Captured and persisted in the database.
Derived: Computed from other attributes on demand.
Derived attributes may be:
| Classification | Examples | Implementation Considerations |
|---|---|---|
| Simple/Atomic | id, name, price, date | Direct column mapping |
| Composite | address, full_name | Flatten to columns or use structured types |
| Single-valued | birth_date, ssn, primary_email | Standard column |
| Multivalued | phone_numbers, skills, hobbies | Separate table or array type |
| Stored | name, hire_date, salary | Persisted in database |
| Derived | age, tenure, full_name | Computed or materialized |
| Key attribute | employee_id, ssn, email | Primary/unique constraint; indexed |
| Nullable | middle_name, fax_number | NULL allowed; optional |
| Required | name, email, created_at | NOT NULL constraint; mandatory |
Every attribute has a domain—the set of permissible values. 'salary' has a numeric domain (perhaps Decimal(10,2) with min 0). 'status' might have an enumeration domain {'active', 'inactive', 'pending'}. 'email' has a string domain with format constraints. Proper domain specification prevents invalid data.
Every entity type must have a mechanism for uniquely identifying each entity instance. This is accomplished through key attributes—attributes whose values distinguish one entity from another within the entity set.
Superkey:
A superkey is any set of attributes that, taken together, uniquely identifies each entity. A superkey may contain more attributes than necessary.
For Employee:
Candidate Key:
A candidate key is a minimal superkey—no proper subset of it is also a superkey. Removing any attribute would make it non-unique.
For Employee:
Primary Key:
From among candidate keys, one is selected as the primary key. This becomes the official identification mechanism used in relationships.
Selection criteria:
Alternate Key:
Candidate keys not chosen as primary key become alternate keys. They're still unique and often get unique constraints/indices.
Natural Keys vs. Surrogate Keys:
A major design decision is whether to use natural or surrogate keys:
Natural Key:
Pros: Already exists; no generation needed; meaningful in queries Cons: May change; may have format issues; potential privacy concerns
Surrogate Key:
Pros: Stable; compact; simple joins; no business-rule coupling Cons: Requires lookup for meaning; extra attribute to maintain
Industry Recommendation:
Most modern systems prefer surrogate primary keys with natural keys as alternate/unique:
CREATE TABLE Customer (
customer_id INT PRIMARY KEY AUTO_INCREMENT, -- surrogate
email VARCHAR(255) UNIQUE NOT NULL, -- natural alternate key
ssn CHAR(11) UNIQUE, -- natural alternate key
name VARCHAR(100) NOT NULL,
...
);
This provides stability (surrogate never changes) and business usability (natural keys for queries and constraints).
Primary keys propagate through foreign keys to other tables. If a primary key changes, all referencing foreign keys must also change—a risky, expensive operation. This is why surrogate keys (which never have business reasons to change) are preferred. Choose keys wisely; changing them later is painful.
Beyond structural definitions (attributes and keys), entity types include constraints—rules that valid entity instances must satisfy. Constraints encode business rules and maintain data quality.
1. Domain Constraints:
Specify valid values for each attribute:
2. Key Constraints:
Enforce uniqueness for key attributes:
3. Not-Null Constraints:
Require values for certain attributes:
4. Entity Integrity Constraint:
The primary key of an entity can never be null. This fundamental rule ensures every entity is identifiable.
5. Business Rule Constraints:
Domain-specific rules beyond basic data validation:
| Constraint Type | Example | Enforcement Level |
|---|---|---|
| Domain (data type) | salary DECIMAL(10,2) | DDL / Column definition |
| Domain (range) | quantity >= 0 | CHECK constraint |
| Domain (enumeration) | status IN ('A','I','P') | CHECK or ENUM type |
| Domain (format) | email LIKE '%@%.%' | CHECK with LIKE or regex |
| Key (primary) | employee_id PRIMARY KEY | PRIMARY KEY constraint |
| Key (unique) | email UNIQUE | UNIQUE constraint |
| Not-null | name NOT NULL | NOT NULL constraint |
| Entity integrity | PK NOT NULL | Implicit in PRIMARY KEY |
| Business rule (simple) | end_date > start_date | CHECK constraint |
| Business rule (complex) | Manager same department | Trigger or application logic |
In conceptual models (ER diagrams), we note constraints informally—text annotations like 'unique' or 'required'. In logical/physical design, these become formal DDL: PRIMARY KEY, UNIQUE, NOT NULL, CHECK. The conceptual model captures the rules; the physical model enforces them.
Entity types can be organized into hierarchies through specialization (top-down) and generalization (bottom-up). This enables inheritance of attributes and more precise domain modeling.
Supertype and Subtype:
A supertype is a general entity type that can be specialized into more specific subtypes:
Person (supertype)
├── Employee (subtype)
├── Customer (subtype)
└── Vendor (subtype)
Subtypes:
Example — Account Hierarchy:
Account (supertype)
├── account_number
├── balance
├── open_date
└── customer_id (FK)
├── CheckingAccount (subtype)
│ ├── overdraft_limit
│ └── check_fee
└── SavingsAccount (subtype)
├── interest_rate
└── min_balance
Every CheckingAccount has account_number, balance, open_date (inherited) PLUS overdraft_limit, check_fee (local).
Specialization vs. Generalization:
Specialization (Top-Down):
Generalization (Bottom-Up):
Both processes result in the same hierarchy structure; they differ in how you arrive at it.
Constraints on Hierarchies:
Disjoint vs. Overlapping:
Total vs. Partial:
Entity type inheritance resembles class inheritance in OOP. Subtypes inherit attributes like subclasses inherit fields. However, relational databases don't directly support inheritance. Mapping inheritance to tables (single table, class table, concrete table strategies) is a major design decision covered in ER-to-Relational mapping.
Designing entity types well is crucial for creating maintainable, flexible data models. Here are established best practices:
1. One Entity Type, One Concept:
Each entity type should represent exactly one concept from the domain. Don't combine unrelated concepts, and don't split a single concept across multiple types unnecessarily.
Bad: PersonOrOrganization (mixed concepts) Good: Person and Organization as separate types (possibly with Party supertype)
2. Meaningful Names:
Entity type names should reflect domain vocabulary, not implementation details:
Bad: tbl_cust, record_type_01 Good: Customer, Employee, Order
Use singular nouns (Customer, not Customers). Be specific when needed (SalesOrder vs PurchaseOrder).
3. Complete Attribute Sets:
Identify all relevant attributes during design. Missing attributes create problems later:
Consider:
4. Appropriate Key Selection:
Choose keys carefully:
Can you describe a typical entity of this type in a natural sentence? 'A Customer is a person or organization that purchases products from us, identified by customer_id, described by name, email, and registration_date, with constraints that email must be unique and name is required.' If the description is awkward, the type design might need work.
We've explored entity types—the structural blueprints that define what entities are, what attributes they possess, and what rules govern their instances. Entity types are central to ER modeling, enabling precise, enforceable data definitions.
What's next:
Now that we understand entity types as structural definitions, we'll examine entity instances—the actual entities that exist in the database at runtime. Understanding the distinction between types and instances deepens our grasp of how abstract definitions translate to concrete data.
You now understand entity types as the structural blueprints of data modeling—defining attributes, keys, and constraints for all entities of a kind. This knowledge enables you to create precise, maintainable data models. Next, we'll explore entity instances and how types manifest as actual data.