Entity Mapping - Learning Module

Loading content...

0/241

Regular Entity Mapping

From Conceptual Vision to Physical Reality

The Entity-Relationship diagram represents the conceptual blueprint of your database—a visual articulation of the real-world domain you're modeling. But an ER diagram, no matter how elegant, cannot store a single byte of data. To bridge the gap between conceptual design and operational database, we must master entity mapping: the systematic transformation of ER entities into relational tables.

This transformation is not merely a mechanical translation. It requires deep understanding of both the ER model's semantics and the relational model's capabilities and constraints. A poorly executed mapping can introduce anomalies, violate normalization principles, and create maintenance nightmares. Conversely, a well-executed mapping preserves all semantic information while creating efficient, maintainable database structures.

Regular entities—also called strong entities—form the backbone of any ER diagram. They exist independently, have their own identifying attributes, and serve as the foundation upon which relationships and weak entities depend. Mastering regular entity mapping is therefore the essential first step in the ER-to-relational transformation process.

What You Will Learn

By the end of this page, you will understand the complete theory and practice of regular entity mapping. You'll learn the formal algorithm, common patterns, edge cases, and how mapping decisions affect database quality. You'll be equipped to transform any regular entity from an ER diagram into a properly structured relational table.

Understanding Regular Entities

Before we can map entities to relations, we must precisely define what constitutes a regular entity (synonymously called a strong entity). This distinction is fundamental because different entity types require different mapping strategies.

Definition: A regular entity is an entity type that has a key attribute capable of uniquely identifying each entity instance. Regular entities do not depend on other entities for their identification—they are existentially and identificationally independent.

Contrast with Weak Entities: Weak entities lack a complete key attribute. They depend on an owner entity (or identifying entity) for their identification. For example, a Room entity might be identified by its room number only in combination with the Building it belongs to. Weak entities require a different mapping strategy, covered in a later module.

Characteristics of Regular Entities:

Defining Characteristics

•Self-Identifying: Possesses one or more key attributes that uniquely identify each instance without reference to other entities
•Existentially Independent: Can exist in the database without requiring another entity to exist first
•Represented by Rectangle: In standard ER notation, depicted as a single-bordered rectangle (weak entities use double borders)
•Foundation for Relationships: Serves as a participant in relationships with other entities
•Contains All Attribute Types: May include simple, composite, multivalued, and derived attributes

Entity vs. Entity Type

Terminology matters. An entity type (or entity set) is the schema—the definition. An entity (or entity instance) is a specific occurrence. When we map entities to relations, we're mapping entity types to relation schemas. The actual entity instances become tuples (rows) in the resulting relation (table).

Examples of Regular Entities in Common Domains:

Domain	Entity Type	Key Attribute	Sample Attributes
University	Student	student_id	name, email, enrollment_date
E-commerce	Product	product_id	name, price, category
Healthcare	Patient	patient_ssn	name, dob, address
Banking	Account	account_number	balance, type, open_date
HR	Employee	employee_id	name, hire_date, department

Each of these entities is identifiable by its own attribute(s) and doesn't require another entity to establish its identity. This independence is what makes them "strong" or "regular" entities.

The Mapping Algorithm

The transformation of a regular entity into a relation follows a precise algorithm. While the basic case is straightforward, the algorithm must account for various attribute types and edge cases. Here is the comprehensive, step-by-step procedure:

Regular Entity Mapping Algorithm

•Create a Relation: For each regular entity type E, create a new relation R. The relation name typically matches the entity name, though naming conventions may be applied (e.g., pluralization, case style).
•Map Simple Attributes: Include each simple (atomic) attribute of E as a column in R. The attribute name becomes the column name, and an appropriate SQL data type is assigned based on the attribute's domain.
•Flatten Composite Attributes: For each composite attribute, include only the simple component attributes as columns. The composite attribute itself does not become a column (though it may influence naming).
•Defer Multivalued Attributes: Multivalued attributes cannot be directly represented in a relation (violates 1NF). These require separate handling—typically a new relation. (Covered in detail on Page 3.)
•Document Derived Attributes: Derived attributes (those computable from other attributes) are typically not stored. Instead, they're calculated at query time or implemented as computed columns. (Covered on Page 4.)
•Designate Primary Key: Choose one candidate key as the primary key of R. The key attribute(s) from the ER diagram become the PRIMARY KEY constraint in the relation.
•Define Domains/Types: Specify appropriate data types for each column based on the attribute's semantic domain (e.g., INTEGER, VARCHAR, DATE, DECIMAL).
•Add Constraints: Include NOT NULL constraints for mandatory attributes, UNIQUE constraints for alternate keys, and CHECK constraints for domain restrictions.

Why Order Matters

The algorithm processes attribute types in a specific order for good reason. Simple attributes are handled first because they map directly. Composite attributes are flattened before we consider storage. Multivalued and derived attributes are deferred because they require special handling that may involve creating additional tables. Following this order prevents confusion and ensures complete coverage.

Formal Notation:

Given an entity type E with:

Key attribute(s): K
Simple attributes: A₁, A₂, ..., Aₙ
Composite attributes: C₁(c₁₁, c₁₂, ...), C₂(c₂₁, c₂₂, ...), ...

The resulting relation R is:

R(K, A₁, A₂, ..., Aₙ, c₁₁, c₁₂, ..., c₂₁, c₂₂, ...)

with PRIMARY KEY(K)

Note that composite attributes contribute their component attributes, not themselves.

Detailed Mapping Example

Let's work through a comprehensive example that illustrates each step of the mapping algorithm. Consider a university database with the following Student entity:

ER Diagram Description:

Entity: Student (regular entity, single-bordered rectangle)
Key Attribute: student_id (underlined in ER diagram)
Simple Attributes: first_name, last_name, email, date_of_birth, enrollment_date
Composite Attribute: address (comprised of street, city, state, zip_code, country)
Multivalued Attribute: phone_numbers (can have multiple)
Derived Attribute: age (derived from date_of_birth)

Converting Mermaid diagram...

Step-by-Step Mapping:

Step 1: Create Relation

Create a new relation named Student (or Students if following pluralization convention).

Step 2: Map Simple Attributes

Include the simple attributes as columns:

student_id → VARCHAR or CHAR (key attribute)
first_name → VARCHAR
last_name → VARCHAR
email → VARCHAR
date_of_birth → DATE
enrollment_date → DATE

Step 3: Flatten Composite Attributes

The address composite attribute is decomposed into its components:

street → VARCHAR
city → VARCHAR
state → VARCHAR or CHAR(2)
zip_code → VARCHAR or CHAR
country → VARCHAR

Step 4: Defer Multivalued Attribute

The phone_numbers attribute cannot be included directly. A separate table Student_Phone will be created (covered in the multivalued attributes page).

Step 5: Document Derived Attribute

The age attribute is not stored. It will be computed as needed: age = CURRENT_DATE - date_of_birth

Step 6: Designate Primary Key

student_id becomes the PRIMARY KEY.

Step 7-8: Define Types and Constraints

Apply appropriate types and constraints.

student_table.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Regular Entity Mapping: Student
-- Derived from ER diagram following the standard mapping algorithm
 
CREATE TABLE Student (
    -- Primary Key (from ER key attribute)
    student_id      VARCHAR(20)     PRIMARY KEY,
    
    -- Simple Attributes
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(100)    NOT NULL UNIQUE,
    date_of_birth   DATE            NOT NULL,
    enrollment_date DATE            NOT NULL DEFAULT CURRENT_DATE,
    
    -- Flattened Composite Attribute: address
    street          VARCHAR(100),
    city            VARCHAR(50),
    state           VARCHAR(50),
    zip_code        VARCHAR(20),
    country         VARCHAR(50)     DEFAULT 'USA',
    
    -- Constraints
    CONSTRAINT chk_enrollment_after_birth 
        CHECK (enrollment_date > date_of_birth),
    CONSTRAINT chk_valid_email 
        CHECK (email LIKE '%@%.%')
);
 
-- Note: Multivalued attribute 'phone_numbers' requires separate table
-- Note: Derived attribute 'age' is computed, not stored
 
-- Index for common queries
CREATE INDEX idx_student_name ON Student(last_name, first_name);
CREATE INDEX idx_student_enrollment ON Student(enrollment_date);
 
-- Computed column for derived attribute (SQL Server syntax)
-- ALTER TABLE Student ADD age AS DATEDIFF(YEAR, date_of_birth, GETDATE());
 
-- Or as a VIEW (portable approach)
CREATE VIEW Student_With_Age AS
SELECT 
    *,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, date_of_birth)) AS age
FROM Student;

Mapping Complete

The Student entity has been successfully mapped to a relational table. Note how the composite 'address' attribute was flattened into individual columns, and how derived and multivalued attributes were handled separately. This table is in at least 1NF (First Normal Form) since all attributes are atomic.

Key Attribute Considerations

The selection and mapping of key attributes deserves special attention. In the ER model, an entity may have multiple candidate keys—minimal sets of attributes that uniquely identify each entity instance. When mapping to the relational model, we must make critical decisions about key representation.

Key Mapping Considerations

•Primary Key Selection: When multiple candidate keys exist, choose one as PRIMARY KEY based on stability (unlikely to change), simplicity (fewer attributes), and usage patterns (frequently used for lookups/joins).
•Alternate Keys: Non-primary candidate keys become UNIQUE constraints. They still enforce uniqueness but aren't the primary identifier. Example: email might be UNIQUE but student_id is PRIMARY KEY.
•Composite Keys: If the ER key consists of multiple attributes, the PRIMARY KEY in the relation will be composite. Example: PRIMARY KEY(department_code, course_number) for a Course entity.
•Natural vs. Surrogate Keys: ER diagrams typically show natural keys (meaningful values like SSN or email). During physical mapping, consider whether to introduce surrogate keys (system-generated integers or UUIDs).
•Key Inheritance: When entities participate in relationships (covered later), the primary key may need to migrate to other tables as a foreign key. Choose keys that work well in this role.

Natural Keys

•Meaningful and human-readable
•Already unique in business domain
•No additional storage for artificial IDs
•Direct correlation with source documents
•May be composite (multiple columns)

Surrogate Keys

•Guaranteed stable (never changes)
•Compact storage and fast indexing
•Simple (single column)
•Database-managed (auto-increment/UUID)
•Abstracts from business changes

Natural Key Pitfalls

Natural keys can change: emails get updated, SSNs can be reassigned, product codes get reformatted. When natural keys change, all foreign key references must be updated—potentially across dozens of tables. Surrogate keys with natural keys as UNIQUE constraints often provide the best of both worlds: referential stability plus business-meaningful uniqueness enforcement.

key_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Example 1: Natural Key (as shown in ER diagram)
CREATE TABLE Student_Natural (
    student_id      VARCHAR(20)     PRIMARY KEY,  -- University-assigned ID
    email           VARCHAR(100)    NOT NULL UNIQUE, -- Alternate key
    -- ... other attributes
);
 
-- Example 2: Surrogate Key (enhanced for physical design)
CREATE TABLE Student_Surrogate (
    id              BIGINT          GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    student_id      VARCHAR(20)     NOT NULL UNIQUE, -- Natural key preserved
    email           VARCHAR(100)    NOT NULL UNIQUE,  -- Also unique
    -- ... other attributes
);
 
-- Example 3: Composite Key (when entity has multi-attribute key)
CREATE TABLE Course_Section (
    department_code VARCHAR(10),
    course_number   VARCHAR(10),
    section_id      VARCHAR(5),
    semester        VARCHAR(10),
    year            INTEGER,
    instructor_id   BIGINT,
    room_id         BIGINT,
    
    PRIMARY KEY (department_code, course_number, section_id, semester, year)
);
 
-- Note: Composite keys work correctly but:
--   - Make foreign key references verbose
--   - Increase index size
--   - Complicate application code
-- Consider surrogate key if this table is heavily referenced.

Data Type Mapping

While ER diagrams focus on conceptual attributes, relational tables require concrete data types. The mapping from conceptual domain to SQL type is a critical design decision that affects storage efficiency, query performance, and data integrity.

ER Domain to SQL Type Mapping Guide
Conceptual Domain	SQL Type Options	Considerations
Identifier (short)	CHAR(n), VARCHAR(n)	Fixed vs. variable length depends on whether all values have same length
Identifier (numeric)	INTEGER, BIGINT, SERIAL	Use SERIAL/IDENTITY for auto-generated keys
Text (short)	VARCHAR(50-255)	Always specify maximum length for performance
Text (long)	TEXT, VARCHAR(MAX)	For descriptions, notes, content fields
Whole number	SMALLINT, INTEGER, BIGINT	Choose based on expected value range
Decimal number	DECIMAL(p,s), NUMERIC(p,s)	Critical for financial values—avoid FLOAT/DOUBLE
Boolean	BOOLEAN, CHAR(1), TINYINT	Native BOOLEAN if supported; else CHAR('Y'/'N') or TINYINT(0/1)
Date only	DATE	No time component—birthdays, hire dates
Time only	TIME	No date component—store hours, schedules
Date and time	TIMESTAMP, DATETIME	Include timezone handling for global applications
Currency	DECIMAL(19,4), MONEY	Never use FLOAT for money—precision loss
Email	VARCHAR(254)	RFC 5321 max length; add CHECK constraint for format
Phone	VARCHAR(20)	Store as string to preserve formatting, international codes
URL	VARCHAR(2048)	Common max URL length; consider TEXT for very long URLs
Binary/Blob	BYTEA, BLOB, VARBINARY	For files, images—consider external storage with reference

Precision Matters

For DECIMAL types, precision (p) is total digits and scale (s) is digits after decimal. DECIMAL(10,2) stores values like 12345678.90. Always specify explicitly—database defaults vary and can cause subtle bugs. For financial applications, DECIMAL(19,4) is a safe choice that handles even cryptocurrency precision.

Vendor-Specific Considerations:

While SQL standards define core types, implementations vary:

PostgreSQL: Rich type system with UUID, JSONB, array types, custom domains
MySQL: TINYINT for boolean, different TEXT size variants (TINYTEXT, MEDIUMTEXT, LONGTEXT)
SQL Server: NVARCHAR for Unicode, UNIQUEIDENTIFIER for UUIDs, MONEY type
Oracle: NUMBER(p,s) for all numerics, VARCHAR2 instead of VARCHAR, CLOB for large text
SQLite: Dynamic typing—declared types are hints only

When designing for portability, stick to widely-supported types (INTEGER, VARCHAR, DATE, DECIMAL) or create abstraction layers.

Naming Conventions

Consistent naming conventions enhance maintainability, reduce confusion, and prevent errors. While ER diagrams may use informal names, the relational implementation should follow strict conventions. There's no single "correct" convention, but there is value in consistency.

snake_case: Words separated by underscores, all lowercase.

Examples:

Table: student, course_enrollment, order_item
Column: student_id, first_name, date_of_birth
Constraint: pk_student, fk_enrollment_student, uq_student_email

Advantages:

Highly readable
Works consistently across case-sensitive and case-insensitive databases
Common in PostgreSQL, MySQL ecosystems
Maps well to many programming language conventions

Reserved Words

Avoid SQL reserved words as identifiers: ORDER, USER, TABLE, INDEX, GROUP, COMMENT, etc. If you must use them, quote the identifier ("ORDER" or [ORDER])—but this creates maintenance headaches. Better to choose different names: customer_order, app_user, table_name.

Common Mapping Mistakes

Even experienced designers make mapping errors. Understanding these common mistakes helps you avoid them and produce higher-quality database schemas.

Mapping Anti-Patterns

•Storing Multivalued Attributes Directly: Creating a column like phone_numbers VARCHAR(500) to store comma-separated values. This violates 1NF, prevents proper indexing, and complicates queries. Solution: Separate table with foreign key.
•Storing Composite Attributes as Single Column: Putting the full address in one address TEXT column. This makes it impossible to query or sort by city, state, or zip code individually. Solution: Flatten to component columns.
•Storing Derived Attributes: Including age INTEGER as a stored column that must be updated whenever date_of_birth changes (or worse, annually for everyone). Solution: Compute at query time or use computed columns.
•Omitting Constraints: Creating tables without PRIMARY KEY, NOT NULL, UNIQUE, or CHECK constraints. This allows invalid data and makes the database unreliable. Solution: Apply all appropriate constraints during initial mapping.
•Using Wrong Data Types: Storing dates as strings, prices as FLOAT, or phone numbers as integers. This causes sorting issues, precision loss, and data loss. Solution: Choose types based on semantic meaning, not convenience.
•Ignoring NULL Semantics: Not specifying whether columns allow NULL. Some attributes are truly optional (middle_name), others are mandatory (email for users who must be contactable). Solution: Explicitly decide and specify NOT NULL where appropriate.
•Overly Generic Column Names: Using name, value, type without context. These become ambiguous in joins. Solution: Prefix with entity context: student_name, config_value, account_type.

bad_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
-- ❌ Anti-Pattern Example
CREATE TABLE Student (
    id INT,
    name VARCHAR(100),
    address TEXT,
    phones VARCHAR(500),  -- CSV values!
    age INT,              -- Derived, stored
    enrolled CHAR(1)      -- 'Y' or 'N' or ???
    -- No constraints!
);

good_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
-- ✓ Correct Mapping
CREATE TABLE Student (
    student_id   BIGINT PRIMARY KEY,
    first_name   VARCHAR(50) NOT NULL,
    last_name    VARCHAR(50) NOT NULL,
    street       VARCHAR(100),
    city         VARCHAR(50),
    state        VARCHAR(50),
    date_of_birth DATE NOT NULL,
    is_enrolled  BOOLEAN DEFAULT TRUE
);

Summary: Regular Entity Mapping

Regular entity mapping is the foundation of ER-to-relational transformation. Master this process, and you're equipped to handle the most common mapping scenarios. Let's consolidate what we've learned:

Key Takeaways

•One Entity → One Table: Each regular entity type maps to a distinct relation (table) with the same conceptual meaning.
•Simple Attributes → Columns: Direct mapping with appropriate SQL data types based on semantic domains.
•Composite Attributes → Component Columns: Flatten to atomic components; the composite itself is not stored.
•Keys Stay Keys: ER key attributes become PRIMARY KEY constraints; alternate candidates become UNIQUE constraints.
•Defer Special Cases: Multivalued and derived attributes require special handling (covered in subsequent pages).
•Apply Constraints: NOT NULL, UNIQUE, CHECK, and DEFAULT constraints enforce data integrity from day one.
•Follow Conventions: Consistent naming conventions prevent confusion and reduce errors across the development lifecycle.
•Avoid Anti-Patterns: Recognize and avoid common mistakes like storing comma-separated values or using wrong data types.

What's Next:

The basic mapping process covers simple attributes and the overall entity-to-table transformation. However, real-world entities frequently include composite attributes that deserve deeper examination. On the next page, we'll explore Composite Attribute Mapping in comprehensive detail—including naming strategies, optional component handling, and complex nesting scenarios.

Page Complete

You now understand how to transform regular entities from ER diagrams into well-structured relational tables. This foundational skill applies to every database design project. Next, we'll examine composite attributes in greater depth to handle more complex entity structures.