Loading content...
With requirements thoroughly documented, we now face a profound transformation: converting the messy, often contradictory, always incomplete language of business requirements into a clean, precise, mathematically rigorous model of data. This is conceptual design—the phase where database architects exercise their most creative and intellectually demanding skills.
Conceptual design answers the question: What is the structure of information in this domain? Not how it will be stored, not how it will be accessed, not what tables or indexes will exist—but fundamentally, what things exist in the business domain, what properties describe them, and how they relate to one another.
This abstraction is not merely academic elegance. The conceptual model serves as a communication bridge between business stakeholders (who think in customers, orders, and products) and database implementers (who think in tables, keys, and constraints). It captures the essential structure of the data domain without premature commitment to any particular database technology.
By the end of this page, you will master the conceptual design process: extracting entities from requirements, identifying attributes and their properties, discovering and classifying relationships, applying ER modeling conventions, creating comprehensive ER diagrams, and validating conceptual models against requirements. You will gain the ability to transform any well-defined domain into a precise conceptual schema.
A conceptual model is a high-level description of a data domain that is independent of any particular database management system, storage technology, or access method. It represents the semantic content of data—its meaning and structure in the context of the business domain.
Definition (Formal):
A conceptual data model is a representation of organizationally relevant concepts found in the real world, the relationships between those concepts, and the constraints on the relationships and concepts. It describes the semantics of the organization and represents a series of assertions about its nature.
Why Conceptual Models Matter:
Levels of Abstraction:
The conceptual model occupies a specific position in the hierarchy of database models:
| Level | Focus | Examples | Audience |
|---|---|---|---|
| Conceptual | What exists in the domain | Entities, attributes, relationships | Business stakeholders, analysts |
| Logical | How data is structured | Tables, columns, keys, constraints | Database designers, developers |
| Physical | How data is stored | Files, indexes, partitions, tablespaces | DBAs, system architects |
The Entity-Relationship Model:
The most widely used approach for conceptual modeling is the Entity-Relationship (ER) model, introduced by Peter Chen in 1976. The ER model provides:
We will use ER modeling throughout this page, as it remains the industry standard for conceptual database design.
A good conceptual model can be explained to a domain expert in under 30 seconds, and they should be able to point to entities and say 'yes, that's a real thing we work with.' If your model requires extensive technical explanation to stakeholders, you may have descended prematurely into logical design territory.
Entities are the fundamental building blocks of conceptual models. An entity represents a class of things—real or abstract—that are of interest to the organization and about which data must be stored.
Formal Definition:
An entity is a distinguishable object about which information is kept. An entity type (or entity set) is a collection of entities that share common properties. Individual members of an entity type are called entity instances.
Examples:
Techniques for Identifying Entities:
Textual Analysis Method:
Systematically extract nouns from requirements documents:
Example from requirements:
"The system shall allow customers to place orders for products. Each order contains multiple line items specifying the quantity of each product. Customers have accounts with billing addresses and payment methods."
Noun extraction:
Caution: Noun analysis is a starting point, not a complete technique. Domain understanding is essential for correct interpretation.
Entity Classification:
Entities are classified based on their characteristics:
| Classification | Description | Example | Design Implication |
|---|---|---|---|
| Strong Entity | Exists independently, has its own primary key | Customer, Product, Employee | Becomes independent table |
| Weak Entity | Depends on another entity for identification | OrderLineItem (depends on Order) | Includes partial key + owner's key |
| Associative Entity | Models a relationship with attributes | Enrollment (between Student and Course) | Junction table with attributes |
| Composite Entity | Aggregates multiple entities | Department (contains Employees) | May require hierarchy design |
A common modeling error is representing as an attribute what should be an entity. Ask: Does this concept have multiple instances? Does it have its own attributes? Could it participate in relationships independently? If yes to any, consider making it an entity. 'Country' may be an attribute if just storing names, but an entity if tracking country codes, currencies, and tax rules.
Attributes are properties that describe entity instances. They represent the specific information that must be captured and stored for each entity. Proper attribute specification requires not just identifying what data to store, but understanding its nature, constraints, and behavior.
Formal Definition:
An attribute is a property or characteristic of an entity type. Each attribute has a domain—the set of allowable values. For each entity instance, each attribute takes a specific value from its domain (or NULL if permitted).
Attribute Classifications:
Comprehensive Attribute Specification:
Each attribute should be documented with complete specifications:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
# Entity: Customer# Attribute Specifications customer_id: description: Unique identifier for each customer type: Key attribute domain: Integer, positive, auto-generated nullability: NOT NULL uniqueness: UNIQUE (primary key) example_values: [1001, 1002, 1003] full_name: description: Customer's legal full name type: Composite attribute components: - first_name: VARCHAR(50), NOT NULL - middle_name: VARCHAR(50), NULL allowed - last_name: VARCHAR(50), NOT NULL domain: Alphabetic characters, spaces, hyphens, apostrophes example_values: ["John Michael Smith", "María García"] email: description: Primary contact email address type: Simple, single-valued domain: Valid email format (RFC 5322 compliant) nullability: NOT NULL uniqueness: UNIQUE (alternate key) validation: Must contain @ and valid domain example_values: ["john.smith@example.com"] phone_numbers: description: Customer contact phone numbers type: Composite, multi-valued components: - number: VARCHAR(20) - type: ENUM('home', 'work', 'mobile', 'fax') - is_primary: BOOLEAN cardinality: 0..N (may have zero or more) constraint: At most one phone may be marked primary example_values: - {number: "+1-555-123-4567", type: "mobile", is_primary: true} - {number: "+1-555-987-6543", type: "work", is_primary: false} date_of_birth: description: Customer's date of birth type: Simple, single-valued domain: DATE, must be in past, >= 1900-01-01 nullability: NULL allowed (may be unknown) validation: Must be valid date, not in future use: Age verification, marketing segmentation age: description: Customer's current age in years type: Derived attribute derivation: FLOOR((CURRENT_DATE - date_of_birth) / 365.25) storage: Not stored, computed on query note: NULL if date_of_birth is NULL registration_date: description: Date and time customer account was created type: Simple, single-valued domain: TIMESTAMP WITH TIME ZONE nullability: NOT NULL default: CURRENT_TIMESTAMP at insertion mutability: Immutable after creationKey Attributes and Entity Identification:
Every entity type must have one or more attributes that uniquely identify instances:
Candidate Key: A minimal set of attributes that uniquely identifies each entity instance. An entity may have multiple candidate keys.
Primary Key: The candidate key selected as the principal identifier. Properties:
Examples of key decisions:
| Entity | Natural Key Options | Surrogate Key | Recommendation |
|---|---|---|---|
| Customer | SSN, email, phone | customer_id (auto) | Surrogate (SSN has privacy issues) |
| Product | SKU, UPC | product_id (auto) | Natural (SKU) if stable, else surrogate |
| Order | — (no natural key) | order_id (auto) | Surrogate required |
| Country | ISO code, name | country_id (auto) | Natural (ISO_code) — stable standard |
| Log Entry | — (no natural key) | log_id (auto) | Surrogate required |
At the conceptual level, prefer natural keys that have business meaning. Surrogate keys (arbitrary identifiers) are an implementation decision that should typically wait until logical design. However, document when natural keys don't exist—this forces surrogate key creation in logical design.
Relationships model the associations between entity types. They capture how entities interact, connect, and depend upon each other in the business domain. Proper relationship modeling is crucial—missing or incorrect relationships lead to databases that cannot represent real-world scenarios.
Formal Definition:
A relationship is an association among two or more entities that represents a fact about the domain. A relationship type defines a set of associations between entity types. Individual associations are relationship instances.
Discovering Relationships:
Verb Analysis: Verbs in requirements often indicate relationships
Stakeholder Questions:
Process Analysis:
Relationship Classification:
| Pattern | Notation | Example | Description |
|---|---|---|---|
| One-to-One (1:1) | 1 — 1 | Employee — ParkingSpace | Each employee has at most one space; each space assigned to at most one employee |
| One-to-Many (1:N) | 1 — N | Department — Employee | One department has many employees; each employee belongs to one department |
| Many-to-Many (M:N) | M — N | Student — Course | Students enroll in many courses; courses have many students |
Participation Constraints (Cardinality Bounds):
Beyond the basic pattern, relationships have precise cardinality constraints:
(min, max) notation:
Example with full cardinality:
Department (1,N) ———employs——— (1,1) Employee
Reading:
Participation Types:
Relationship Attributes:
Relationships can have their own attributes—data that describes the relationship itself, not the participating entities:
Example: Enrollment relationship
The grade doesn't belong to Student (a student has many grades) or Course (a course has many grades). It belongs to the specific relationship between a particular student and a particular course.
Higher-Degree Relationships:
While most relationships are binary (between two entities), higher-degree relationships exist:
Decision: Ternary vs. Multiple Binary
Consider: Does the relationship require all participants simultaneously, or can it be decomposed?
An entity can have a relationship with itself (recursive or unary relationship). Example: Employee 'supervises' Employee. The same entity type appears twice in the relationship with different roles (supervisor, supervisee). These require special attention in ER diagrams and logical design, as role names become essential for disambiguation.
Entity-Relationship diagrams provide a visual representation of the conceptual model. They serve as the primary communication tool between database designers, developers, and business stakeholders.
Standard ER Notation Elements:
| Symbol | Meaning | Usage |
|---|---|---|
| Rectangle | Entity type | Strong entities have single border; weak entities have double border |
| Ellipse | Attribute | Connected to entity; key attributes underlined; derived attributes dashed |
| Diamond | Relationship | Placed between related entities; labeled with relationship name |
| Line (single) | Partial participation | Entity may not participate in relationship |
| Line (double) | Total participation | Entity must participate in relationship |
| 1, N, M notations | Cardinality | Placed near entity showing cardinality constraints |
Step-by-Step ER Diagram Construction:
Step 1: Draw Entity Rectangles
Step 2: Add Key Attributes
Step 3: Add Non-Key Attributes
Step 4: Draw Relationships
Step 5: Add Cardinality
Step 6: Mark Participation
Step 7: Identify Weak Entities
Alternative Notations:
While Chen notation is historically important, several alternatives are widely used:
| Notation | Strengths | Common Usage |
|---|---|---|
| Chen | Educational clarity, explicit attributes | Academic settings, training |
| Crow's Foot (IE) | Compact, cardinality at a glance | Industry standard, ERD tools |
| UML Class Diagrams | OO integration, detailed operations | OO environments, unified modeling |
| IDEF1X | Precise, government standard | Defense, aerospace, formal environments |
| Barker (Oracle) | Compact, Oracle-centric | Oracle development environments |
Practical Recommendation: Use Crow's Foot notation for most professional work—it's compact, widely understood, and supported by most tools. Use Chen notation for educational purposes where explicit attribute shapes aid understanding.
An ER diagram with more than 15-20 entities becomes difficult to read. For large systems, create multiple diagrams: a high-level overview showing only entities and key relationships, plus detailed diagrams for subsystems. Use consistent layout patterns—entities that are tightly related should cluster together visually.
The basic ER model is extended with additional constructs that capture more sophisticated semantic structures found in real-world domains.
Generalization/Specialization Hierarchies:
Generalization defines a supertype-subtype relationship among entity types. The supertype entity contains common attributes shared by all entities, while subtypes contain attributes specific to particular categories.
Example: Person Generalization
PERSON (supertype)
│
┌─────┴─────┐
│ │
EMPLOYEE CUSTOMER (subtypes)
Specialization Constraints:
| Constraint | Description | Example | Notation |
|---|---|---|---|
| Disjoint (d) | Entity can belong to at most one subtype | Person is EITHER Employee OR Customer | Circle with 'd' |
| Overlapping (o) | Entity can belong to multiple subtypes | Person can be BOTH Employee AND Customer | Circle with 'o' |
| Total | Every supertype instance must be in some subtype | Every Vehicle must be Car, Truck, or Motorcycle | Double line to circle |
| Partial | Supertype instances need not be in any subtype | Some Persons are neither Employee nor Customer | Single line to circle |
Aggregation:
Aggregation treats a relationship (with its participating entities) as a higher-level entity that can participate in other relationships.
Use case: When you need to model a relationship about another relationship.
Example:
Consider: "Employees work on Projects" and "Managers supervise Employee-Project assignments"
SUPERVISES
│
┌───────────────┼───────────────┐
│ │ │
MANAGER [WORKS_ON] (aggregated)
│
┌───────┴───────┐
│ │
EMPLOYEE PROJECT
The SUPERVISES relationship connects MANAGER to the aggregated WORKS_ON relationship, not directly to EMPLOYEE or PROJECT.
When to use aggregation:
A category (or union type) models a collection of entities from different entity types. Example: VEHICLE_OWNER can be either PERSON or COMPANY (not subtypes of a common supertype). The category has its own attributes and relationships. This is the inverse of generalization—a subtype with multiple unrelated supertypes.
Before proceeding to logical design, the conceptual model must be validated against the original requirements and with stakeholders. A flawed conceptual model produces a flawed database, regardless of how well the subsequent phases are executed.
Validation Checklist:
Validation Techniques:
1. Transaction Pathway Analysis:
For each major transaction identified in requirements:
Example: Transaction: "Create new order for existing customer with multiple products"
2. Query Pathway Analysis:
For each major query requirement:
Example: Query: "List all orders for customer X with their products and quantities"
Path: CUSTOMER → ORDER → ORDER_LINE → PRODUCT ✓
3. Stakeholder Walkthrough:
Present the ER diagram to business stakeholders:
Conceptual design transforms the messy world of business requirements into a precise, abstract model of information structure. This model serves as the foundation for all subsequent design decisions.
What's Next:
With a validated conceptual model, we are ready to enter the logical design phase. Here, we transform the abstract ER model into a concrete logical schema—typically relational tables with columns, keys, and constraints. This translation follows systematic rules that preserve the semantics of the conceptual model while adding implementation-ready precision.
The next page explores logical design: table creation, key selection, constraint specification, normalization, and the mapping algorithms that transform ER diagrams into relational schemas.
You now understand conceptual design as the bridge between business requirements and database implementation. You can identify entities, specify attributes, discover relationships, construct ER diagrams, apply enhanced ER features, and validate conceptual models. Next, we'll transform these conceptual structures into logical database schemas.