Loading learning content...
Entity-Relationship (ER) modeling is the visual bridge between business requirements and database schemas. It translates the messy, narrative-form requirements gathered from stakeholders into precise, unambiguous diagrams that capture what data exists and how it relates—without yet specifying how it will be stored.
This separation of concerns is deliberate and powerful. ER diagrams allow you to validate your understanding with non-technical stakeholders, explore design alternatives rapidly, and identify problems before writing a single line of DDL. In interviews, a well-constructed ER diagram demonstrates your ability to think systematically about data relationships.
By the end of this page, you will master the ER modeling notation and concepts: entity identification, attribute classification, relationship types, cardinality and participation constraints, weak entities, and advanced constructs. You'll learn to construct ER diagrams systematically and avoid common modeling pitfalls that trip up even experienced designers.
The Entity-Relationship model, introduced by Peter Chen in 1976, is based on three fundamental concepts:
Entities: The 'things' or 'objects' in the domain we want to store information about. Entities represent real-world objects (Customer, Product) or conceptual objects (Order, Appointment).
Attributes: The properties or characteristics that describe entities. Each entity has attributes that capture relevant information (Customer has name, email, phone).
Relationships: The associations between entities that capture how they interact or connect (Customer places Order, Order contains Product).
This triad—entity, attribute, relationship—forms a complete vocabulary for describing any data domain at a conceptual level.
| Level | Focus | Output | Audience |
|---|---|---|---|
| Conceptual | What data exists and how it relates | High-level ER diagram | Business stakeholders, analysts |
| Logical | Detailed entities, attributes, constraints | Detailed ER/relational schema | Database designers, developers |
| Physical | How data is stored (tables, indexes, partitions) | DDL, storage specifications | DBAs, infrastructure teams |
In interviews, you often work at the conceptual level first (box-and-line diagrams) and then translate to logical/physical as needed. Starting at the physical level (CREATE TABLE) before establishing the conceptual model is a common mistake that leads to design flaws.
Identifying entities is the first step in ER modeling. An entity should represent a distinct, identifiable concept in the domain that:
Entities are classified based on their ability to exist independently:
Strong Entity: Has its own unique identifier and can exist independently. Customer, Product, and Employee are typically strong entities.
Weak Entity: Cannot be uniquely identified by its own attributes alone; depends on a 'parent' (identifying) entity for identification. Examples:
Domain: University Course Registration System STRONG ENTITIES (independently identifiable):├── Student → student_id uniquely identifies├── Course → course_code uniquely identifies├── Instructor → instructor_id uniquely identifies ├── Department → dept_code uniquely identifies└── Semester → semester_id (e.g., "Fall2024") uniquely identifies WEAK ENTITIES (require parent for identification):├── CourseSection → section_number + course_code + semester_id│ (Section 001 of CS101 in Fall2024)├── Assignment → assignment_number + section_id│ (Assignment 1 of a specific section)└── GradeComponent → component_name + section_id (Midterm grade for a specific section) NOT ENTITIES (attributes or derived concepts):├── Student Name → Attribute of Student├── Course Credits → Attribute of Course├── GPA → Derived/calculated value└── Age → Derived from birth_dateOver-entityfication: Creating entities for attributes (making 'City' an entity when it's just an attribute of Address). Under-entityfication: Embedding what should be entities as complex attributes (storing order items as a JSON array instead of a separate entity). Both extremes cause problems.
Attributes describe the properties of entities. In ER modeling, we classify attributes to understand their nature and how they'll be represented in the physical schema.
| Attribute Type | Description | Example | Schema Implication |
|---|---|---|---|
| Simple (Atomic) | Cannot be divided further | first_name, price, age | Single column |
| Composite | Can be divided into sub-parts | Address (street, city, zip) | Multiple columns or embedded |
| Single-valued | One value per entity instance | birth_date, SSN | Single column, NOT NULL optional |
| Multi-valued | Multiple values per instance | phone_numbers, skills | Separate table or array type |
| Derived | Computed from other attributes | age (from birth_date) | Usually not stored; computed |
| Key (Identifier) | Uniquely identifies entity | customer_id, email | PRIMARY KEY or UNIQUE constraint |
| Nullable | May have no value | middle_name, fax_number | NULL allowed |
| Required | Must have a value | name, email | NOT NULL constraint |
Every entity needs a key—an attribute or combination of attributes that uniquely identifies each instance. Key decisions profoundly impact the schema.
In interviews, you'll often face the question: 'Should I use a composite key or a surrogate?' The general guidance: Use surrogate keys for entities that other entities reference (easier foreign keys). Use composite keys for junction tables in many-to-many relationships where the key naturally consists of the two foreign keys.
Relationships capture how entities are associated with each other. A well-defined relationship specifies:
Cardinality specifies the number of relationship instances an entity can participate in. The three fundamental cardinalities are:
| Cardinality | Description | Example | Schema Strategy |
|---|---|---|---|
| One-to-One (1:1) | Each A relates to at most one B, and vice versa | Person ↔ Passport | FK in either table, or merge tables |
| One-to-Many (1:N) | Each A relates to many B's; each B relates to one A | Department → Employees | FK in the 'many' side table |
| Many-to-Many (M:N) | Each A relates to many B's; each B relates to many A's | Students ↔ Courses | Junction/bridge table required |
Participation specifies whether an entity must participate in a relationship or may participate.
Total Participation (mandatory): Every instance of the entity must participate in the relationship.
Partial Participation (optional): Some instances may not participate.
RELATIONSHIP ANALYSIS TEMPLATE: Relationship: Customer PLACES Order├── Cardinality: One-to-Many (1:N)│ └── One customer can place many orders│ └── Each order is placed by exactly one customer├── Participation (Customer side): Partial│ └── A customer may exist without any orders├── Participation (Order side): Total│ └── Every order must have a customer└── Schema: order.customer_id REFERENCES customer(id) NOT NULL Relationship: Student ENROLLS_IN Course├── Cardinality: Many-to-Many (M:N)│ └── One student enrolls in many courses│ └── One course has many students├── Participation: Partial on both sides│ └── Student may have no enrollments (new student)│ └── Course may have no students (not offered)└── Schema: Junction table enrollment(student_id, course_id, grade, enrollment_date) Relationship: Employee MANAGES Employees├── Cardinality: One-to-Many (1:N) — recursive/self-referencing│ └── One manager manages many employees│ └── Each employee has at most one direct manager├── Participation: Partial on both sides│ └── Not all employees are managers│ └── CEO has no manager└── Schema: employee.manager_id REFERENCES employee(id) NULLDifferent notations exist (Chen, Crow's Foot, UML, IE). In interviews, clarify which you're using or use a simple (min,max) notation: (0,1) means optional single, (1,1) means mandatory single, (0,N) means optional many, (1,N) means mandatory many. '(0,N)Customer—(1,1)Order' means: Customer has zero or more Orders; Order has exactly one Customer.
Several notation systems exist for ER diagrams. Understanding the major ones ensures you can communicate effectively regardless of context.
| Notation | Entities | Relationships | Cardinality | Common Use |
|---|---|---|---|---|
| Chen (Original) | Rectangle | Diamond with lines | Labels (1, N, M) | Academic, textbooks |
| Crow's Foot (IE) | Rectangle | Line connecting | Crow's foot symbols | Industry standard, tools |
| UML Class Diagram | Rectangle with sections | Lines with labels | Multiplicity notation | Software engineering |
| IDEF1X | Rounded rectangle | Lines with dots | Dots and bars | Government, legacy |
Crow's foot notation is the most widely used in industry tools (ERwin, MySQL Workbench, Lucidchart). Understanding this notation is essential.
Symbols:
Combined meanings:
○| — Zero or one (optional single)|| — Exactly one (mandatory single)○⊳ — Zero or more (optional many)|⊳ — One or more (mandatory many)READING CROW'S FOOT DIAGRAMS: Customer ||——○⊳ Order │ │ │ └── Order side: zero or more (customer can have 0+ orders) └── Customer side: exactly one (each order has one customer)Reading: "Each Customer has zero or more Orders; each Order has exactly one Customer" Student ⊳○——○⊳ Course (through Enrollment junction) │ │ │ └── Course side: zero or more (student enrolls in 0+ courses) └── Student side: zero or more (course has 0+ students)Reading: "Each Student enrolls in zero or more Courses; each Course has zero or more Students" Employee ○|——||⊳ Department │ │ │ └── Department side: one or more (department has 1+ employees) └── Employee side: exactly one (employee belongs to one department)Reading: "Each Employee belongs to exactly one Department; each Department has one or more Employees" Person ||——○| Passport │ │ │ └── Passport side: zero or one (person may have 0 or 1 passport) └── Person side: exactly one (passport belongs to one person)Reading: "Each Person has zero or one Passport; each Passport belongs to exactly one Person"In whiteboard interviews, simple box-and-line diagrams with labeled cardinality work fine. Don't worry about perfect notation—focus on clearly communicating entities, relationships, and constraints. State your cardinality in words if notation is ambiguous: 'One to many from Customer to Order.'
Beyond basic entities and relationships, several advanced concepts allow for more precise modeling of complex domains.
When entities share common attributes but have specific differences, we use generalization/specialization hierarchies.
Generalization: Bottom-up process of identifying common attributes across entities and creating a supertype. (Student, Faculty, Staff → Person)
Specialization: Top-down process of identifying subtypes with specific attributes. (Account → Savings Account, Checking Account)
Inheritance: Subtypes inherit all attributes from supertypes. A SavingsAccount has all Account attributes plus interest_rate.
ENTITY HIERARCHY: Payment Methods Payment (SUPERTYPE)├── Attributes: payment_id, amount, payment_date, status├── Common to ALL payment types│├── CreditCardPayment (SUBTYPE)│ └── Additional: card_number, expiry_date, cvv_hash, card_type│├── BankTransferPayment (SUBTYPE)│ └── Additional: bank_name, account_number, routing_number, transfer_reference│├── DigitalWalletPayment (SUBTYPE)│ └── Additional: wallet_provider, wallet_id, transaction_reference│└── CryptoPayment (SUBTYPE) └── Additional: currency_type, wallet_address, blockchain_tx_id CONSTRAINTS:├── Disjoint (d): Payment can only be ONE subtype├── Overlapping (o): Payment could be multiple subtypes (rare)├── Total: Every Payment must be a subtype (no generic payments)└── Partial: Some Payments may not be specializedWhile most relationships are binary (two entities), some scenarios require relationships among three or more entities simultaneously.
Example: A Supplier supplies a Part to a Project. This isn't two binary relationships—it's a single ternary relationship where all three must be specified together.
Be cautious with ternary relationships. Many apparent ternary relationships are actually multiple binary relationships. Ask: 'Does this relationship require all three entities to be meaningful, or can I express it as separate binary relationships?' If Doctor treats Patient at Hospital can be decomposed into 'Doctor treats Patient' and 'Treatment occurs at Hospital,' it may be better modeled as binary relationships.
Sometimes, attributes belong to the relationship rather than to either participating entity.
Example: In a many-to-many relationship between Student and Course, the grade attribute doesn't belong to Student (different grades for different courses) or Course (different grades for different students). The grade belongs to the specific enrollment relationship—this indicates a relationship attribute.
Schema Impact: Relationship attributes are stored in the junction table that represents the many-to-many relationship.
12345678910111213141516171819202122232425262728
-- Relationship attributes belong to the junction tableCREATE TABLE Enrollment ( student_id INT REFERENCES Student(id), course_id INT REFERENCES Course(id), -- These are RELATIONSHIP ATTRIBUTES -- They describe the enrollment, not the student or course enrollment_date DATE NOT NULL, grade CHAR(2), status VARCHAR(20) DEFAULT 'active', credits_earned DECIMAL(3,1), PRIMARY KEY (student_id, course_id)); -- Similarly for a Supplier-Part-Project ternary relationshipCREATE TABLE Supply ( supplier_id INT REFERENCES Supplier(id), part_id INT REFERENCES Part(id), project_id INT REFERENCES Project(id), -- Relationship attributes quantity INT NOT NULL, unit_price DECIMAL(10,2), delivery_date DATE, PRIMARY KEY (supplier_id, part_id, project_id));A systematic approach to ER modeling ensures comprehensive coverage and prevents common mistakes. Follow this structured process:
SCENARIO: Hotel Reservation System STEP 1: IDENTIFY ENTITIES├── Hotel├── Room├── Guest├── Reservation├── RoomType (potentially)└── Payment (potentially) STEP 2: DEFINE PRIMARY KEYS├── Hotel: hotel_id (surrogate)├── Room: room_id (surrogate) or (hotel_id, room_number) composite├── Guest: guest_id (surrogate)├── Reservation: reservation_id (surrogate)└── RoomType: room_type_code (natural: "STD", "DLX", "STE") STEP 3: IDENTIFY ATTRIBUTESHotel:├── name, address, city, country, star_rating, phone, email Room:├── room_number, floor, status, is_smoking, has_view Guest:├── first_name, last_name, email, phone, address, ├── loyalty_number, id_type, id_number Reservation:├── check_in_date, check_out_date, status, ├── num_guests, special_requests, total_amount STEP 4-6: RELATIONSHIPS WITH CARDINALITY & PARTICIPATION Hotel ||——|⊳ Room├── Cardinality: 1:N (hotel has many rooms; room in one hotel)├── Hotel participation: Partial (hotel could temporarily have no rooms)└── Room participation: Total (room must belong to a hotel) Guest ⊳○——○⊳ Room (via Reservation)├── Cardinality: M:N (guest reserves many rooms over time; room reserved by many guests)├── Guest participation: Partial (guest may have no reservations)└── Room participation: Partial (room may be unreserved) STEP 7: RELATIONSHIP ATTRIBUTESReservation (the M:N junction) has attributes:├── check_in_date, check_out_date — specific to this reservation├── rate_charged — price at time of booking (not current room rate)└── status — specific to this reservation instance STEP 8: SPECIALIZATIONRoom could specialize:├── StandardRoom├── Suite (has living_area_sqft, num_bedrooms)└── AccessibleRoom (has accessibility_features) STEP 9: WEAK ENTITIESRoom might be weak entity:└── room_number only unique within a hotel└── Identified by (hotel_id, room_number)ER modeling is inherently iterative. Your first diagram will have issues. As you model relationships, you'll discover missing entities. As you validate with use cases, you'll find missing attributes. Expect to revise 3-5 times before stabilizing.
Let's work through a complete ER modeling exercise. Given these requirements, construct an ER model:
Requirements:
ENTITIES IDENTIFIED: 1. Book (Strong Entity) ├── isbn (PK - natural key, globally unique) ├── title ├── publication_year ├── price ├── description └── stock_quantity (or separate Inventory entity?) 2. Author (Strong Entity) ├── author_id (PK - surrogate) ├── name ├── bio └── nationality 3. Category (Strong Entity) ├── category_id (PK - surrogate) ├── name ├── description └── parent_category_id (self-reference for hierarchy) 4. Publisher (Strong Entity) ├── publisher_id (PK - surrogate) ├── name ├── address └── website 5. Customer (Strong Entity) ├── customer_id (PK - surrogate) ├── email (candidate key) ├── name ├── password_hash └── address 6. Order (Strong Entity) ├── order_id (PK - surrogate) ├── order_date ├── status └── total_amount 7. OrderItem (Weak Entity - depends on Order) ├── (order_id, book_isbn) composite PK ├── quantity └── price_at_purchase 8. Review (Could be weak or strong) ├── review_id (PK) or (customer_id, book_isbn) ├── rating ├── review_text └── review_dateWhat's Next:
With entities and relationships defined, we must decide how much to normalize. The next page covers Normalization Decisions—when to normalize for data integrity, when to denormalize for performance, and how to make these trade-offs systematically in design and interview contexts.
You now understand ER modeling from entity identification through relationship specification. You can read and create ER diagrams in multiple notations and apply a systematic process to model complex domains. Next, we'll explore normalization decisions that refine this conceptual model into an efficient physical schema.