Database Management SystemsSchema Design Problems

Schema Design Problems

LevelAdvanced

Duration90 mins

TopicSchema Design Problems

2 / 5

ER Modeling

The Visual Language of Database Design

Entity-Relationship (ER) modeling is the visual bridge between business requirements and database schemas. It translates the messy, narrative-form requirements gathered from stakeholders into precise, unambiguous diagrams that capture what data exists and how it relates—without yet specifying how it will be stored.

This separation of concerns is deliberate and powerful. ER diagrams allow you to validate your understanding with non-technical stakeholders, explore design alternatives rapidly, and identify problems before writing a single line of DDL. In interviews, a well-constructed ER diagram demonstrates your ability to think systematically about data relationships.

What You Will Learn

By the end of this page, you will master the ER modeling notation and concepts: entity identification, attribute classification, relationship types, cardinality and participation constraints, weak entities, and advanced constructs. You'll learn to construct ER diagrams systematically and avoid common modeling pitfalls that trip up even experienced designers.

The ER Model Philosophy

The Entity-Relationship model, introduced by Peter Chen in 1976, is based on three fundamental concepts:

Entities: The 'things' or 'objects' in the domain we want to store information about. Entities represent real-world objects (Customer, Product) or conceptual objects (Order, Appointment).

Attributes: The properties or characteristics that describe entities. Each entity has attributes that capture relevant information (Customer has name, email, phone).

Relationships: The associations between entities that capture how they interact or connect (Customer places Order, Order contains Product).

This triad—entity, attribute, relationship—forms a complete vocabulary for describing any data domain at a conceptual level.

ER Model Abstraction Levels
Level	Focus	Output	Audience
Conceptual	What data exists and how it relates	High-level ER diagram	Business stakeholders, analysts
Logical	Detailed entities, attributes, constraints	Detailed ER/relational schema	Database designers, developers
Physical	How data is stored (tables, indexes, partitions)	DDL, storage specifications	DBAs, infrastructure teams

Interview Context

In interviews, you often work at the conceptual level first (box-and-line diagrams) and then translate to logical/physical as needed. Starting at the physical level (CREATE TABLE) before establishing the conceptual model is a common mistake that leads to design flaws.

Entity Identification

Identifying entities is the first step in ER modeling. An entity should represent a distinct, identifiable concept in the domain that:

Has multiple instances that need to be stored (not a singleton)
Has attributes that describe it
Can be uniquely identified among all instances
Is relevant to the system's purpose

Entity Identification Heuristics

•Nouns in requirements — Look for nouns in requirement documents. 'The system tracks customers, orders, and products' reveals three candidate entities.
•Persistent things — Entities are things that persist over time and about which you store information. Transient concepts (like a calculation result) are usually not entities.
•Multiple instances — If there's only ever one instance (e.g., 'the company'), it's probably not an entity—it's a configuration or context.
•Independent existence — Entities should exist independently, not as mere attributes of other entities. 'Address components' might be attributes, while 'Shipping Addresses' might be an entity.
•Domain significance — Entities should be meaningful in the business domain. Don't create entities for technical artifacts unless required.

Strong vs. Weak Entities

Entities are classified based on their ability to exist independently:

Strong Entity: Has its own unique identifier and can exist independently. Customer, Product, and Employee are typically strong entities.

Weak Entity: Cannot be uniquely identified by its own attributes alone; depends on a 'parent' (identifying) entity for identification. Examples:

OrderItem depends on Order (OrderItem 1 in Order 100 vs OrderItem 1 in Order 200)
Room depends on Building (Room 101 in Building A vs Room 101 in Building B)
Dependent depends on Employee (for insurance purposes)

entity_analysis
Entity Analysis Example
Domain: University Course Registration System
 
STRONG ENTITIES (independently identifiable):
├── Student         → student_id uniquely identifies
├── Course          → course_code uniquely identifies
├── Instructor      → instructor_id uniquely identifies  
├── Department      → dept_code uniquely identifies
└── Semester        → semester_id (e.g., "Fall2024") uniquely identifies
 
WEAK ENTITIES (require parent for identification):
├── CourseSection   → section_number + course_code + semester_id
│                     (Section 001 of CS101 in Fall2024)
├── Assignment      → assignment_number + section_id
│                     (Assignment 1 of a specific section)
└── GradeComponent  → component_name + section_id
                      (Midterm grade for a specific section)
 
NOT ENTITIES (attributes or derived concepts):
├── Student Name    → Attribute of Student
├── Course Credits  → Attribute of Course
├── GPA             → Derived/calculated value
└── Age             → Derived from birth_date

Common Entity Identification Mistakes

Over-entityfication: Creating entities for attributes (making 'City' an entity when it's just an attribute of Address). Under-entityfication: Embedding what should be entities as complex attributes (storing order items as a JSON array instead of a separate entity). Both extremes cause problems.

Attribute Classification

Attributes describe the properties of entities. In ER modeling, we classify attributes to understand their nature and how they'll be represented in the physical schema.

Attribute Types and Their Characteristics
Attribute Type	Description	Example	Schema Implication
Simple (Atomic)	Cannot be divided further	first_name, price, age	Single column
Composite	Can be divided into sub-parts	Address (street, city, zip)	Multiple columns or embedded
Single-valued	One value per entity instance	birth_date, SSN	Single column, NOT NULL optional
Multi-valued	Multiple values per instance	phone_numbers, skills	Separate table or array type
Derived	Computed from other attributes	age (from birth_date)	Usually not stored; computed
Key (Identifier)	Uniquely identifies entity	customer_id, email	PRIMARY KEY or UNIQUE constraint
Nullable	May have no value	middle_name, fax_number	NULL allowed
Required	Must have a value	name, email	NOT NULL constraint

Key Attribute Decisions

Every entity needs a key—an attribute or combination of attributes that uniquely identifies each instance. Key decisions profoundly impact the schema.

Natural Keys

•Derived from real-world attributes (email, SSN, ISBN)
•Pros: Meaningful, no extra storage, natural lookups
•Cons: May change, may be reused, may have format issues
•Best for: Stable, universal identifiers (currency codes, country codes)

Surrogate Keys

•System-generated identifiers (auto-increment, UUID)
•Pros: Stable, guaranteed unique, efficient for joins
•Cons: No business meaning, extra storage, potential for gaps
•Best for: Most entities, especially when natural key is complex or unstable

The Composite Key Question

In interviews, you'll often face the question: 'Should I use a composite key or a surrogate?' The general guidance: Use surrogate keys for entities that other entities reference (easier foreign keys). Use composite keys for junction tables in many-to-many relationships where the key naturally consists of the two foreign keys.

Relationship Modeling

Relationships capture how entities are associated with each other. A well-defined relationship specifies:

Which entities participate — The entities being connected
The degree — How many entities participate (binary, ternary, etc.)
The cardinality — How many instances of each entity can relate
The participation — Whether participation is mandatory or optional
The relationship attributes — Data that belongs to the relationship itself

Cardinality Constraints

Cardinality specifies the number of relationship instances an entity can participate in. The three fundamental cardinalities are:

Cardinality Types with Examples
Cardinality	Description	Example	Schema Strategy
One-to-One (1:1)	Each A relates to at most one B, and vice versa	Person ↔ Passport	FK in either table, or merge tables
One-to-Many (1:N)	Each A relates to many B's; each B relates to one A	Department → Employees	FK in the 'many' side table
Many-to-Many (M:N)	Each A relates to many B's; each B relates to many A's	Students ↔ Courses	Junction/bridge table required

Participation Constraints

Participation specifies whether an entity must participate in a relationship or may participate.

Total Participation (mandatory): Every instance of the entity must participate in the relationship.

Every Employee must belong to a Department
Represented by double line in ER diagrams
Translates to NOT NULL foreign key

Partial Participation (optional): Some instances may not participate.

An Employee may or may not manage other employees
Represented by single line in ER diagrams
Translates to nullable foreign key

cardinality_examples
Cardinality Analysis
RELATIONSHIP ANALYSIS TEMPLATE:
 
Relationship: Customer PLACES Order
├── Cardinality: One-to-Many (1:N)
│   └── One customer can place many orders
│   └── Each order is placed by exactly one customer
├── Participation (Customer side): Partial
│   └── A customer may exist without any orders
├── Participation (Order side): Total
│   └── Every order must have a customer
└── Schema: order.customer_id REFERENCES customer(id) NOT NULL
 
Relationship: Student ENROLLS_IN Course
├── Cardinality: Many-to-Many (M:N)
│   └── One student enrolls in many courses
│   └── One course has many students
├── Participation: Partial on both sides
│   └── Student may have no enrollments (new student)
│   └── Course may have no students (not offered)
└── Schema: Junction table enrollment(student_id, course_id, grade, enrollment_date)
 
Relationship: Employee MANAGES Employees
├── Cardinality: One-to-Many (1:N) — recursive/self-referencing
│   └── One manager manages many employees
│   └── Each employee has at most one direct manager
├── Participation: Partial on both sides
│   └── Not all employees are managers
│   └── CEO has no manager
└── Schema: employee.manager_id REFERENCES employee(id) NULL

Reading Cardinality Notation

Different notations exist (Chen, Crow's Foot, UML, IE). In interviews, clarify which you're using or use a simple (min,max) notation: (0,1) means optional single, (1,1) means mandatory single, (0,N) means optional many, (1,N) means mandatory many. '(0,N)Customer—(1,1)Order' means: Customer has zero or more Orders; Order has exactly one Customer.

ER Diagram Notation

Several notation systems exist for ER diagrams. Understanding the major ones ensures you can communicate effectively regardless of context.

Major ER Notation Systems
Notation	Entities	Relationships	Cardinality	Common Use
Chen (Original)	Rectangle	Diamond with lines	Labels (1, N, M)	Academic, textbooks
Crow's Foot (IE)	Rectangle	Line connecting	Crow's foot symbols	Industry standard, tools
UML Class Diagram	Rectangle with sections	Lines with labels	Multiplicity notation	Software engineering
IDEF1X	Rounded rectangle	Lines with dots	Dots and bars	Government, legacy

Crow's Foot Notation (Industry Standard)

Crow's foot notation is the most widely used in industry tools (ERwin, MySQL Workbench, Lucidchart). Understanding this notation is essential.

Symbols:

Ring (○): Zero (optional)
Dash (|): One (and only one)
Crow's foot (⊳): Many

Combined meanings:

○| — Zero or one (optional single)
|| — Exactly one (mandatory single)
○⊳ — Zero or more (optional many)
|⊳ — One or more (mandatory many)

crows_foot_examples
Crow's Foot Examples
READING CROW'S FOOT DIAGRAMS:
 
Customer ||——○⊳ Order
        │        │
        │        └── Order side: zero or more (customer can have 0+ orders)
        └── Customer side: exactly one (each order has one customer)
Reading: "Each Customer has zero or more Orders; each Order has exactly one Customer"
 
 
Student ⊳○——○⊳ Course  (through Enrollment junction)
       │       │
       │       └── Course side: zero or more (student enrolls in 0+ courses)
       └── Student side: zero or more (course has 0+ students)
Reading: "Each Student enrolls in zero or more Courses; each Course has zero or more Students"
 
 
Employee ○|——||⊳ Department
        │         │
        │         └── Department side: one or more (department has 1+ employees)
        └── Employee side: exactly one (employee belongs to one department)
Reading: "Each Employee belongs to exactly one Department; each Department has one or more Employees"
 
 
Person ||——○| Passport
      │       │
      │       └── Passport side: zero or one (person may have 0 or 1 passport)
      └── Person side: exactly one (passport belongs to one person)
Reading: "Each Person has zero or one Passport; each Passport belongs to exactly one Person"

Interview Tip

In whiteboard interviews, simple box-and-line diagrams with labeled cardinality work fine. Don't worry about perfect notation—focus on clearly communicating entities, relationships, and constraints. State your cardinality in words if notation is ambiguous: 'One to many from Customer to Order.'

Advanced ER Concepts

Beyond basic entities and relationships, several advanced concepts allow for more precise modeling of complex domains.

Generalization and Specialization (IS-A Relationships)

When entities share common attributes but have specific differences, we use generalization/specialization hierarchies.

Generalization: Bottom-up process of identifying common attributes across entities and creating a supertype. (Student, Faculty, Staff → Person)

Specialization: Top-down process of identifying subtypes with specific attributes. (Account → Savings Account, Checking Account)

Inheritance: Subtypes inherit all attributes from supertypes. A SavingsAccount has all Account attributes plus interest_rate.

specialization_example
Specialization Hierarchy
ENTITY HIERARCHY: Payment Methods
 
Payment (SUPERTYPE)
├── Attributes: payment_id, amount, payment_date, status
├── Common to ALL payment types
│
├── CreditCardPayment (SUBTYPE)
│   └── Additional: card_number, expiry_date, cvv_hash, card_type
│
├── BankTransferPayment (SUBTYPE)
│   └── Additional: bank_name, account_number, routing_number, transfer_reference
│
├── DigitalWalletPayment (SUBTYPE)
│   └── Additional: wallet_provider, wallet_id, transaction_reference
│
└── CryptoPayment (SUBTYPE)
    └── Additional: currency_type, wallet_address, blockchain_tx_id
 
CONSTRAINTS:
├── Disjoint (d): Payment can only be ONE subtype
├── Overlapping (o): Payment could be multiple subtypes (rare)
├── Total: Every Payment must be a subtype (no generic payments)
└── Partial: Some Payments may not be specialized

Ternary and N-ary Relationships

While most relationships are binary (two entities), some scenarios require relationships among three or more entities simultaneously.

Example: A Supplier supplies a Part to a Project. This isn't two binary relationships—it's a single ternary relationship where all three must be specified together.

Ternary Relationship Trap

Be cautious with ternary relationships. Many apparent ternary relationships are actually multiple binary relationships. Ask: 'Does this relationship require all three entities to be meaningful, or can I express it as separate binary relationships?' If Doctor treats Patient at Hospital can be decomposed into 'Doctor treats Patient' and 'Treatment occurs at Hospital,' it may be better modeled as binary relationships.

Relationship Attributes

Sometimes, attributes belong to the relationship rather than to either participating entity.

Example: In a many-to-many relationship between Student and Course, the grade attribute doesn't belong to Student (different grades for different courses) or Course (different grades for different students). The grade belongs to the specific enrollment relationship—this indicates a relationship attribute.

Schema Impact: Relationship attributes are stored in the junction table that represents the many-to-many relationship.

relationship_attributes
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Relationship attributes belong to the junction table
CREATE TABLE Enrollment (
    student_id      INT REFERENCES Student(id),
    course_id       INT REFERENCES Course(id),
    
    -- These are RELATIONSHIP ATTRIBUTES
    -- They describe the enrollment, not the student or course
    enrollment_date DATE NOT NULL,
    grade           CHAR(2),
    status          VARCHAR(20) DEFAULT 'active',
    credits_earned  DECIMAL(3,1),
    
    PRIMARY KEY (student_id, course_id)
);
 
-- Similarly for a Supplier-Part-Project ternary relationship
CREATE TABLE Supply (
    supplier_id  INT REFERENCES Supplier(id),
    part_id      INT REFERENCES Part(id),
    project_id   INT REFERENCES Project(id),
    
    -- Relationship attributes
    quantity     INT NOT NULL,
    unit_price   DECIMAL(10,2),
    delivery_date DATE,
    
    PRIMARY KEY (supplier_id, part_id, project_id)
);

The ER Modeling Process

A systematic approach to ER modeling ensures comprehensive coverage and prevents common mistakes. Follow this structured process:

Step-by-Step ER Modeling

•Identify all entities — List all nouns from requirements. Filter to distinct, persistent concepts with multiple instances.
•Define primary keys — Determine how each entity is uniquely identified. Decide between natural and surrogate keys.
•Identify attributes — List properties of each entity. Classify as simple/composite, single/multi-valued, derived.
•Identify relationships — For each pair of entities, ask: 'Is there a meaningful association?' Name relationships with verbs.
•Determine cardinality — For each relationship, specify the cardinality constraints on both sides.
•Specify participation — For each relationship, determine if participation is total or partial on each side.
•Identify relationship attributes — Determine if any attributes belong to relationships rather than entities.
•Look for specialization — Identify IS-A hierarchies where entities share common attributes.
•Identify weak entities — Find entities that cannot be uniquely identified without a parent entity.
•Validate and refine — Walk through use cases to ensure the model supports all required operations.

er_modeling_walkthrough
ER Modeling Example
SCENARIO: Hotel Reservation System
 
STEP 1: IDENTIFY ENTITIES
├── Hotel
├── Room
├── Guest
├── Reservation
├── RoomType (potentially)
└── Payment (potentially)
 
STEP 2: DEFINE PRIMARY KEYS
├── Hotel: hotel_id (surrogate)
├── Room: room_id (surrogate) or (hotel_id, room_number) composite
├── Guest: guest_id (surrogate)
├── Reservation: reservation_id (surrogate)
└── RoomType: room_type_code (natural: "STD", "DLX", "STE")
 
STEP 3: IDENTIFY ATTRIBUTES
Hotel:
├── name, address, city, country, star_rating, phone, email
 
Room:
├── room_number, floor, status, is_smoking, has_view
 
Guest:
├── first_name, last_name, email, phone, address, 
├── loyalty_number, id_type, id_number
 
Reservation:
├── check_in_date, check_out_date, status, 
├── num_guests, special_requests, total_amount
 
STEP 4-6: RELATIONSHIPS WITH CARDINALITY & PARTICIPATION
 
Hotel ||——|⊳ Room
├── Cardinality: 1:N (hotel has many rooms; room in one hotel)
├── Hotel participation: Partial (hotel could temporarily have no rooms)
└── Room participation: Total (room must belong to a hotel)
 
Guest ⊳○——○⊳ Room (via Reservation)
├── Cardinality: M:N (guest reserves many rooms over time; room reserved by many guests)
├── Guest participation: Partial (guest may have no reservations)
└── Room participation: Partial (room may be unreserved)
 
STEP 7: RELATIONSHIP ATTRIBUTES
Reservation (the M:N junction) has attributes:
├── check_in_date, check_out_date — specific to this reservation
├── rate_charged — price at time of booking (not current room rate)
└── status — specific to this reservation instance
 
STEP 8: SPECIALIZATION
Room could specialize:
├── StandardRoom
├── Suite (has living_area_sqft, num_bedrooms)
└── AccessibleRoom (has accessibility_features)
 
STEP 9: WEAK ENTITIES
Room might be weak entity:
└── room_number only unique within a hotel
└── Identified by (hotel_id, room_number)

Iteration is Expected

ER modeling is inherently iterative. Your first diagram will have issues. As you model relationships, you'll discover missing entities. As you validate with use cases, you'll find missing attributes. Expect to revise 3-5 times before stabilizing.

Practice: Online Bookstore ER Model

Let's work through a complete ER modeling exercise. Given these requirements, construct an ER model:

Requirements:

Store information about books: ISBN, title, publication year, price, description
Books have one or more authors; authors write multiple books
Books belong to categories; a book can be in multiple categories
Track publishers: each book has one publisher
Customers can register and place orders
Orders contain multiple books with quantities
Customers can write reviews for books they've purchased
Track inventory: how many copies are in stock

entities
Entity Identification
ENTITIES IDENTIFIED:
 
1. Book (Strong Entity)
   ├── isbn (PK - natural key, globally unique)
   ├── title
   ├── publication_year
   ├── price
   ├── description
   └── stock_quantity (or separate Inventory entity?)
 
2. Author (Strong Entity)
   ├── author_id (PK - surrogate)
   ├── name
   ├── bio
   └── nationality
 
3. Category (Strong Entity)
   ├── category_id (PK - surrogate)
   ├── name
   ├── description
   └── parent_category_id (self-reference for hierarchy)
 
4. Publisher (Strong Entity)
   ├── publisher_id (PK - surrogate)
   ├── name
   ├── address
   └── website
 
5. Customer (Strong Entity)
   ├── customer_id (PK - surrogate)
   ├── email (candidate key)
   ├── name
   ├── password_hash
   └── address
 
6. Order (Strong Entity)
   ├── order_id (PK - surrogate)
   ├── order_date
   ├── status
   └── total_amount
 
7. OrderItem (Weak Entity - depends on Order)
   ├── (order_id, book_isbn) composite PK
   ├── quantity
   └── price_at_purchase
 
8. Review (Could be weak or strong)
   ├── review_id (PK) or (customer_id, book_isbn)
   ├── rating
   ├── review_text
   └── review_date

Summary and Next Steps

Key Takeaways

•ER modeling is the bridge between requirements and physical schema—work at the conceptual level first.
•Entities are distinct, identifiable concepts with attributes and keys. Distinguish strong from weak entities.
•Relationships have cardinality and participation constraints that directly impact schema design.
•Master Crow's Foot notation for industry tools, but focus on clear communication in interviews.
•Advanced concepts (specialization, ternary relationships, relationship attributes) handle complex domains.
•Follow a systematic process and expect iteration—ER modeling is inherently exploratory.

What's Next:

With entities and relationships defined, we must decide how much to normalize. The next page covers Normalization Decisions—when to normalize for data integrity, when to denormalize for performance, and how to make these trade-offs systematically in design and interview contexts.

Page Complete

You now understand ER modeling from entity identification through relationship specification. You can read and create ER diagrams in multiple notations and apply a systematic process to model complex domains. Next, we'll explore normalization decisions that refine this conceptual model into an efficient physical schema.

2 / 5

Loading learning content...

Database Management SystemsSchema Design Problems

Schema Design Problems

LevelAdvanced

Duration90 mins

TopicSchema Design Problems

2 / 5

ER Modeling

The Visual Language of Database Design

What You Will Learn

The ER Model Philosophy

The Entity-Relationship model, introduced by Peter Chen in 1976, is based on three fundamental concepts:

Entities: The 'things' or 'objects' in the domain we want to store information about. Entities represent real-world objects (Customer, Product) or conceptual objects (Order, Appointment).

Attributes: The properties or characteristics that describe entities. Each entity has attributes that capture relevant information (Customer has name, email, phone).

Relationships: The associations between entities that capture how they interact or connect (Customer places Order, Order contains Product).

This triad—entity, attribute, relationship—forms a complete vocabulary for describing any data domain at a conceptual level.

ER Model Abstraction Levels
Level	Focus	Output	Audience
Conceptual	What data exists and how it relates	High-level ER diagram	Business stakeholders, analysts
Logical	Detailed entities, attributes, constraints	Detailed ER/relational schema	Database designers, developers
Physical	How data is stored (tables, indexes, partitions)	DDL, storage specifications	DBAs, infrastructure teams

Interview Context

Entity Identification

Identifying entities is the first step in ER modeling. An entity should represent a distinct, identifiable concept in the domain that:

Has multiple instances that need to be stored (not a singleton)
Has attributes that describe it
Can be uniquely identified among all instances
Is relevant to the system's purpose

Entity Identification Heuristics

•Nouns in requirements — Look for nouns in requirement documents. 'The system tracks customers, orders, and products' reveals three candidate entities.
•Persistent things — Entities are things that persist over time and about which you store information. Transient concepts (like a calculation result) are usually not entities.
•Multiple instances — If there's only ever one instance (e.g., 'the company'), it's probably not an entity—it's a configuration or context.
•Independent existence — Entities should exist independently, not as mere attributes of other entities. 'Address components' might be attributes, while 'Shipping Addresses' might be an entity.
•Domain significance — Entities should be meaningful in the business domain. Don't create entities for technical artifacts unless required.

Strong vs. Weak Entities

Entities are classified based on their ability to exist independently:

Strong Entity: Has its own unique identifier and can exist independently. Customer, Product, and Employee are typically strong entities.

Weak Entity: Cannot be uniquely identified by its own attributes alone; depends on a 'parent' (identifying) entity for identification. Examples:

OrderItem depends on Order (OrderItem 1 in Order 100 vs OrderItem 1 in Order 200)
Room depends on Building (Room 101 in Building A vs Room 101 in Building B)
Dependent depends on Employee (for insurance purposes)

entity_analysis
Entity Analysis Example
Domain: University Course Registration System
 
STRONG ENTITIES (independently identifiable):
├── Student         → student_id uniquely identifies
├── Course          → course_code uniquely identifies
├── Instructor      → instructor_id uniquely identifies  
├── Department      → dept_code uniquely identifies
└── Semester        → semester_id (e.g., "Fall2024") uniquely identifies
 
WEAK ENTITIES (require parent for identification):
├── CourseSection   → section_number + course_code + semester_id
│                     (Section 001 of CS101 in Fall2024)
├── Assignment      → assignment_number + section_id
│                     (Assignment 1 of a specific section)
└── GradeComponent  → component_name + section_id
                      (Midterm grade for a specific section)
 
NOT ENTITIES (attributes or derived concepts):
├── Student Name    → Attribute of Student
├── Course Credits  → Attribute of Course
├── GPA             → Derived/calculated value
└── Age             → Derived from birth_date

Common Entity Identification Mistakes

Attribute Classification

Attributes describe the properties of entities. In ER modeling, we classify attributes to understand their nature and how they'll be represented in the physical schema.

Attribute Types and Their Characteristics
Attribute Type	Description	Example	Schema Implication
Simple (Atomic)	Cannot be divided further	first_name, price, age	Single column
Composite	Can be divided into sub-parts	Address (street, city, zip)	Multiple columns or embedded
Single-valued	One value per entity instance	birth_date, SSN	Single column, NOT NULL optional
Multi-valued	Multiple values per instance	phone_numbers, skills	Separate table or array type
Derived	Computed from other attributes	age (from birth_date)	Usually not stored; computed
Key (Identifier)	Uniquely identifies entity	customer_id, email	PRIMARY KEY or UNIQUE constraint
Nullable	May have no value	middle_name, fax_number	NULL allowed
Required	Must have a value	name, email	NOT NULL constraint

Key Attribute Decisions

Every entity needs a key—an attribute or combination of attributes that uniquely identifies each instance. Key decisions profoundly impact the schema.

Natural Keys

•Derived from real-world attributes (email, SSN, ISBN)
•Pros: Meaningful, no extra storage, natural lookups
•Cons: May change, may be reused, may have format issues
•Best for: Stable, universal identifiers (currency codes, country codes)

Surrogate Keys

•System-generated identifiers (auto-increment, UUID)
•Pros: Stable, guaranteed unique, efficient for joins
•Cons: No business meaning, extra storage, potential for gaps
•Best for: Most entities, especially when natural key is complex or unstable

The Composite Key Question

Relationship Modeling

Relationships capture how entities are associated with each other. A well-defined relationship specifies:

Which entities participate — The entities being connected
The degree — How many entities participate (binary, ternary, etc.)
The cardinality — How many instances of each entity can relate
The participation — Whether participation is mandatory or optional
The relationship attributes — Data that belongs to the relationship itself

Cardinality Constraints

Cardinality specifies the number of relationship instances an entity can participate in. The three fundamental cardinalities are:

Cardinality Types with Examples
Cardinality	Description	Example	Schema Strategy
One-to-One (1:1)	Each A relates to at most one B, and vice versa	Person ↔ Passport	FK in either table, or merge tables
One-to-Many (1:N)	Each A relates to many B's; each B relates to one A	Department → Employees	FK in the 'many' side table
Many-to-Many (M:N)	Each A relates to many B's; each B relates to many A's	Students ↔ Courses	Junction/bridge table required

Participation Constraints

Participation specifies whether an entity must participate in a relationship or may participate.

Total Participation (mandatory): Every instance of the entity must participate in the relationship.

Every Employee must belong to a Department
Represented by double line in ER diagrams
Translates to NOT NULL foreign key

Partial Participation (optional): Some instances may not participate.

An Employee may or may not manage other employees
Represented by single line in ER diagrams
Translates to nullable foreign key

cardinality_examples
Cardinality Analysis
RELATIONSHIP ANALYSIS TEMPLATE:
 
Relationship: Customer PLACES Order
├── Cardinality: One-to-Many (1:N)
│   └── One customer can place many orders
│   └── Each order is placed by exactly one customer
├── Participation (Customer side): Partial
│   └── A customer may exist without any orders
├── Participation (Order side): Total
│   └── Every order must have a customer
└── Schema: order.customer_id REFERENCES customer(id) NOT NULL
 
Relationship: Student ENROLLS_IN Course
├── Cardinality: Many-to-Many (M:N)
│   └── One student enrolls in many courses
│   └── One course has many students
├── Participation: Partial on both sides
│   └── Student may have no enrollments (new student)
│   └── Course may have no students (not offered)
└── Schema: Junction table enrollment(student_id, course_id, grade, enrollment_date)
 
Relationship: Employee MANAGES Employees
├── Cardinality: One-to-Many (1:N) — recursive/self-referencing
│   └── One manager manages many employees
│   └── Each employee has at most one direct manager
├── Participation: Partial on both sides
│   └── Not all employees are managers
│   └── CEO has no manager
└── Schema: employee.manager_id REFERENCES employee(id) NULL

Reading Cardinality Notation

ER Diagram Notation

Several notation systems exist for ER diagrams. Understanding the major ones ensures you can communicate effectively regardless of context.

Major ER Notation Systems
Notation	Entities	Relationships	Cardinality	Common Use
Chen (Original)	Rectangle	Diamond with lines	Labels (1, N, M)	Academic, textbooks
Crow's Foot (IE)	Rectangle	Line connecting	Crow's foot symbols	Industry standard, tools
UML Class Diagram	Rectangle with sections	Lines with labels	Multiplicity notation	Software engineering
IDEF1X	Rounded rectangle	Lines with dots	Dots and bars	Government, legacy

Crow's Foot Notation (Industry Standard)

Crow's foot notation is the most widely used in industry tools (ERwin, MySQL Workbench, Lucidchart). Understanding this notation is essential.

Symbols:

Ring (○): Zero (optional)
Dash (|): One (and only one)
Crow's foot (⊳): Many

Combined meanings:

○| — Zero or one (optional single)
|| — Exactly one (mandatory single)
○⊳ — Zero or more (optional many)
|⊳ — One or more (mandatory many)

crows_foot_examples
Crow's Foot Examples
READING CROW'S FOOT DIAGRAMS:
 
Customer ||——○⊳ Order
        │        │
        │        └── Order side: zero or more (customer can have 0+ orders)
        └── Customer side: exactly one (each order has one customer)
Reading: "Each Customer has zero or more Orders; each Order has exactly one Customer"
 
 
Student ⊳○——○⊳ Course  (through Enrollment junction)
       │       │
       │       └── Course side: zero or more (student enrolls in 0+ courses)
       └── Student side: zero or more (course has 0+ students)
Reading: "Each Student enrolls in zero or more Courses; each Course has zero or more Students"
 
 
Employee ○|——||⊳ Department
        │         │
        │         └── Department side: one or more (department has 1+ employees)
        └── Employee side: exactly one (employee belongs to one department)
Reading: "Each Employee belongs to exactly one Department; each Department has one or more Employees"
 
 
Person ||——○| Passport
      │       │
      │       └── Passport side: zero or one (person may have 0 or 1 passport)
      └── Person side: exactly one (passport belongs to one person)
Reading: "Each Person has zero or one Passport; each Passport belongs to exactly one Person"

Interview Tip

Advanced ER Concepts

Beyond basic entities and relationships, several advanced concepts allow for more precise modeling of complex domains.

Generalization and Specialization (IS-A Relationships)

When entities share common attributes but have specific differences, we use generalization/specialization hierarchies.

Generalization: Bottom-up process of identifying common attributes across entities and creating a supertype. (Student, Faculty, Staff → Person)

Specialization: Top-down process of identifying subtypes with specific attributes. (Account → Savings Account, Checking Account)

Inheritance: Subtypes inherit all attributes from supertypes. A SavingsAccount has all Account attributes plus interest_rate.

specialization_example
Specialization Hierarchy
ENTITY HIERARCHY: Payment Methods
 
Payment (SUPERTYPE)
├── Attributes: payment_id, amount, payment_date, status
├── Common to ALL payment types
│
├── CreditCardPayment (SUBTYPE)
│   └── Additional: card_number, expiry_date, cvv_hash, card_type
│
├── BankTransferPayment (SUBTYPE)
│   └── Additional: bank_name, account_number, routing_number, transfer_reference
│
├── DigitalWalletPayment (SUBTYPE)
│   └── Additional: wallet_provider, wallet_id, transaction_reference
│
└── CryptoPayment (SUBTYPE)
    └── Additional: currency_type, wallet_address, blockchain_tx_id
 
CONSTRAINTS:
├── Disjoint (d): Payment can only be ONE subtype
├── Overlapping (o): Payment could be multiple subtypes (rare)
├── Total: Every Payment must be a subtype (no generic payments)
└── Partial: Some Payments may not be specialized

Ternary and N-ary Relationships

While most relationships are binary (two entities), some scenarios require relationships among three or more entities simultaneously.

Example: A Supplier supplies a Part to a Project. This isn't two binary relationships—it's a single ternary relationship where all three must be specified together.

Ternary Relationship Trap

Relationship Attributes

Sometimes, attributes belong to the relationship rather than to either participating entity.

Schema Impact: Relationship attributes are stored in the junction table that represents the many-to-many relationship.

relationship_attributes
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Relationship attributes belong to the junction table
CREATE TABLE Enrollment (
    student_id      INT REFERENCES Student(id),
    course_id       INT REFERENCES Course(id),
    
    -- These are RELATIONSHIP ATTRIBUTES
    -- They describe the enrollment, not the student or course
    enrollment_date DATE NOT NULL,
    grade           CHAR(2),
    status          VARCHAR(20) DEFAULT 'active',
    credits_earned  DECIMAL(3,1),
    
    PRIMARY KEY (student_id, course_id)
);
 
-- Similarly for a Supplier-Part-Project ternary relationship
CREATE TABLE Supply (
    supplier_id  INT REFERENCES Supplier(id),
    part_id      INT REFERENCES Part(id),
    project_id   INT REFERENCES Project(id),
    
    -- Relationship attributes
    quantity     INT NOT NULL,
    unit_price   DECIMAL(10,2),
    delivery_date DATE,
    
    PRIMARY KEY (supplier_id, part_id, project_id)
);

The ER Modeling Process

A systematic approach to ER modeling ensures comprehensive coverage and prevents common mistakes. Follow this structured process:

Step-by-Step ER Modeling

•Identify all entities — List all nouns from requirements. Filter to distinct, persistent concepts with multiple instances.
•Define primary keys — Determine how each entity is uniquely identified. Decide between natural and surrogate keys.
•Identify attributes — List properties of each entity. Classify as simple/composite, single/multi-valued, derived.
•Identify relationships — For each pair of entities, ask: 'Is there a meaningful association?' Name relationships with verbs.
•Determine cardinality — For each relationship, specify the cardinality constraints on both sides.
•Specify participation — For each relationship, determine if participation is total or partial on each side.
•Identify relationship attributes — Determine if any attributes belong to relationships rather than entities.
•Look for specialization — Identify IS-A hierarchies where entities share common attributes.
•Identify weak entities — Find entities that cannot be uniquely identified without a parent entity.
•Validate and refine — Walk through use cases to ensure the model supports all required operations.

er_modeling_walkthrough
ER Modeling Example
SCENARIO: Hotel Reservation System
 
STEP 1: IDENTIFY ENTITIES
├── Hotel
├── Room
├── Guest
├── Reservation
├── RoomType (potentially)
└── Payment (potentially)
 
STEP 2: DEFINE PRIMARY KEYS
├── Hotel: hotel_id (surrogate)
├── Room: room_id (surrogate) or (hotel_id, room_number) composite
├── Guest: guest_id (surrogate)
├── Reservation: reservation_id (surrogate)
└── RoomType: room_type_code (natural: "STD", "DLX", "STE")
 
STEP 3: IDENTIFY ATTRIBUTES
Hotel:
├── name, address, city, country, star_rating, phone, email
 
Room:
├── room_number, floor, status, is_smoking, has_view
 
Guest:
├── first_name, last_name, email, phone, address, 
├── loyalty_number, id_type, id_number
 
Reservation:
├── check_in_date, check_out_date, status, 
├── num_guests, special_requests, total_amount
 
STEP 4-6: RELATIONSHIPS WITH CARDINALITY & PARTICIPATION
 
Hotel ||——|⊳ Room
├── Cardinality: 1:N (hotel has many rooms; room in one hotel)
├── Hotel participation: Partial (hotel could temporarily have no rooms)
└── Room participation: Total (room must belong to a hotel)
 
Guest ⊳○——○⊳ Room (via Reservation)
├── Cardinality: M:N (guest reserves many rooms over time; room reserved by many guests)
├── Guest participation: Partial (guest may have no reservations)
└── Room participation: Partial (room may be unreserved)
 
STEP 7: RELATIONSHIP ATTRIBUTES
Reservation (the M:N junction) has attributes:
├── check_in_date, check_out_date — specific to this reservation
├── rate_charged — price at time of booking (not current room rate)
└── status — specific to this reservation instance
 
STEP 8: SPECIALIZATION
Room could specialize:
├── StandardRoom
├── Suite (has living_area_sqft, num_bedrooms)
└── AccessibleRoom (has accessibility_features)
 
STEP 9: WEAK ENTITIES
Room might be weak entity:
└── room_number only unique within a hotel
└── Identified by (hotel_id, room_number)

Iteration is Expected

Practice: Online Bookstore ER Model

Let's work through a complete ER modeling exercise. Given these requirements, construct an ER model:

Requirements:

Store information about books: ISBN, title, publication year, price, description
Books have one or more authors; authors write multiple books
Books belong to categories; a book can be in multiple categories
Track publishers: each book has one publisher
Customers can register and place orders
Orders contain multiple books with quantities
Customers can write reviews for books they've purchased
Track inventory: how many copies are in stock

entities
Entity Identification
ENTITIES IDENTIFIED:
 
1. Book (Strong Entity)
   ├── isbn (PK - natural key, globally unique)
   ├── title
   ├── publication_year
   ├── price
   ├── description
   └── stock_quantity (or separate Inventory entity?)
 
2. Author (Strong Entity)
   ├── author_id (PK - surrogate)
   ├── name
   ├── bio
   └── nationality
 
3. Category (Strong Entity)
   ├── category_id (PK - surrogate)
   ├── name
   ├── description
   └── parent_category_id (self-reference for hierarchy)
 
4. Publisher (Strong Entity)
   ├── publisher_id (PK - surrogate)
   ├── name
   ├── address
   └── website
 
5. Customer (Strong Entity)
   ├── customer_id (PK - surrogate)
   ├── email (candidate key)
   ├── name
   ├── password_hash
   └── address
 
6. Order (Strong Entity)
   ├── order_id (PK - surrogate)
   ├── order_date
   ├── status
   └── total_amount
 
7. OrderItem (Weak Entity - depends on Order)
   ├── (order_id, book_isbn) composite PK
   ├── quantity
   └── price_at_purchase
 
8. Review (Could be weak or strong)
   ├── review_id (PK) or (customer_id, book_isbn)
   ├── rating
   ├── review_text
   └── review_date

Summary and Next Steps

Key Takeaways

•ER modeling is the bridge between requirements and physical schema—work at the conceptual level first.
•Entities are distinct, identifiable concepts with attributes and keys. Distinguish strong from weak entities.
•Relationships have cardinality and participation constraints that directly impact schema design.
•Master Crow's Foot notation for industry tools, but focus on clear communication in interviews.
•Advanced concepts (specialization, ternary relationships, relationship attributes) handle complex domains.
•Follow a systematic process and expect iteration—ER modeling is inherently exploratory.

What's Next:

Page Complete

2 / 5