Conceptual Design - Learning Module

Loading content...

0/241

Entity Identification: Discovering the Core of Your Data Model

The Foundation of All That Follows

In conceptual database design, entity identification is the foundational activity upon which everything else depends. Get the entities right, and relationships and attributes flow naturally. Get them wrong, and the entire model—and everything built upon it—will struggle to represent reality accurately.

Yet entity identification is far from mechanical. It requires judgment, domain understanding, and the ability to distinguish between surface-level descriptions and fundamental data concepts. A requirements document might mention "customer information, order tracking, and inventory management"—but how do we translate these loose phrases into crisp entities?

This page equips you with systematic techniques for discovering entities, criteria for evaluating candidates, and the wisdom to avoid common identification mistakes.

What You Will Learn

By the end of this page, you will be able to systematically identify entities from requirements documents, interviews, and existing systems. You'll understand what makes something an entity versus an attribute, how to handle ambiguous cases, and common patterns that help guide entity discovery across diverse domains.

What Makes Something an Entity

Before we can identify entities, we need clear criteria for what qualifies as one. An entity is something about which the organization wishes to store data—but that definition alone is insufficient for practical decision-making.

The Four Essential Characteristics

An entity typically exhibits four characteristics:

1. Independent Existence

An entity has meaning on its own, independent of other constructs. We can discuss "a customer" or "an order" without necessarily referencing other entities. Contrast this with an attribute like "order date"—we can't meaningfully discuss an order date without reference to the order it belongs to.

Question to ask: Can I describe an instance of this thing without first describing something else?

2. Distinct Identity

We can distinguish different instances of an entity from each other. There might be thousands of customers, but each is distinguishable—typically by some identifier.

Question to ask: Can I tell one instance from another? What makes them different?

3. Multiple Instances Expected

Entities generally have multiple instances in the domain. If only one instance ever exists, it might be data about the organization itself (configuration, settings) rather than an entity to model.

Question to ask: Will there be many of these? Do stakeholders refer to 'an X' and 'another X'?

4. Relevant Attributes

Entities have properties worth recording. An entity with no meaningful attributes (other than perhaps an identifier) may not be worth modeling as separate.

Question to ask: What information do we need to store about this thing?

Entity Characteristic Evaluation Examples
Candidate	Independent Existence?	Distinct Identity?	Multiple Instances?	Has Attributes?	Likely Entity?
Customer	Yes	Yes (ID, Email)	Yes (many customers)	Yes (Name, Address, etc.)	✓ Yes
Order	Yes	Yes (Order Number)	Yes (many orders)	Yes (Date, Status, Total)	✓ Yes
OrderDate	No (belongs to Order)	N/A	N/A	No	✗ Attribute
ShippingAddress	Debatable	Possibly	Yes (repeated)	Yes (Street, City, etc.)	? Needs analysis
Country	Yes	Yes (Code or Name)	Yes (many countries)	Yes (Name, Code)	✓ Yes (reference data)

The 'Real Thing' Test

A practical test: Can you point at a real-world instance and say 'That's a [entity name]'? 'That's a customer' (the person John Smith). 'That's an order' (the paper in my hand or record on screen). If you can't point at an instance, it might not be an entity.

Entity Discovery Techniques

Entity identification is both art and discipline. Experienced modelers develop intuition, but systematic techniques provide reliable starting points.

Technique 1: Noun Analysis

The most direct approach is extracting nouns from requirements documents, interview transcripts, and business documents.

Process:

Collect all written requirements and domain documentation
Highlight every noun and noun phrase
Categorize nouns: things, people, places, events, concepts
Filter candidates against entity criteria
Group synonyms ("client" and "customer" might be the same entity)

Not every noun becomes an entity. Filter out:

Nouns describing the system itself ("database", "screen", "report")
Abstract nouns unlikely to have stored instances ("functionality", "process")
Nouns that are clearly attributes ("name", "date", "amount")
Nouns outside the system scope

Technique 2: Form/Report Mining

Existing forms, reports, and documents reveal what data the organization already tracks.

For each form/report:

Identify the main subject (usually an entity)
List all fields (potential attributes or related entities)
Find groupings of related fields (might indicate embedded entities)
Note headers/sections (often corresponding to entities)

A "Customer Order Form" has Customer as main subject with customer fields, Order as another entity with order fields, and Line Items as a group of repeated fields (probably an entity).

Technique 3: Process/Workflow Analysis

Business processes involve things. Tracing workflows reveals what entities are created, modified, or consumed.

For each business process:

What triggers the process? (Event/Transaction entity?)
What information is gathered? (Data entities)
What is created or modified? (Output entities)
Who is involved? (Actor/Role entities)
What decisions are made? (Policy/Rule entities?)

Entity Discovery Question Bank

•About what things does the organization keep records? — Direct indication of entities
•What do people in this domain talk about tracking? — Nouns in conversation
•What appears as headers in existing reports? — Organizational structure of data
•What do forms in this domain collect information about? — Data entry subjects
•What events occur that need to be recorded? — Transactional entities
•Who are the actors that interact with this system? — People/organization entities
•What physical or logical things are referenced? — Objects and concepts
•What would rows represent if we were building spreadsheets? — Instance types

Technique 4: Stakeholder Role Analysis

Different stakeholders see different entities based on their roles:

Operations staff see transactional entities (Orders, Shipments)
Finance staff see monetary entities (Invoices, Payments, Accounts)
HR staff see people entities (Employees, Departments, Positions)
Executives see aggregate entities (Regions, Performance Metrics)

Interview stakeholders from different roles to get comprehensive entity coverage.

Technique 5: Existing System Reverse Engineering

If replacing an existing system, its data structures reveal entities (though perhaps imperfectly):

Database tables are likely entities (though denormalized tables might contain multiple)
Spreadsheets used in daily work show what's important
Configuration files might reveal reference data entities
API endpoints often correspond to entities

Caution: Don't blindly copy old structures. Old systems contain mistakes and workarounds. Use them as input to verify against fresh domain analysis.

Triangulation

No single technique is complete. Use multiple techniques and compare results. Entities that appear from multiple sources are likely correct. Entities appearing from only one source need careful validation.

Entity vs. Attribute: The Critical Distinction

One of the most challenging decisions in conceptual modeling is determining whether something is an entity or an attribute. The distinction has profound implications for how the database will represent and evolve the domain.

The Fundamental Difference

Entities are things with independent existence that we track as first-class objects.
Attributes are properties of entities that have no independent existence.

Customer is an entity. CustomerName is an attribute of Customer—we don't track "names" independent of who has that name.

Decision Criteria

When in doubt, consider these questions:

Does it have attributes of its own?

If "Department" has DepartmentName, DepartmentLocation, DepartmentBudget, and DepartmentHead, it's probably an entity. If you just need to store a department name as a text field for employees, it might be an attribute.

Will there be relationships to it?

If other entities need to relate to this thing, it's probably an entity. If multiple employees work in the same department and we need to reference that shared department, Department is an entity.

Is it referenced by name/value multiple times?

If the same value appears in multiple places (same department name for multiple employees), consider making it an entity to avoid redundancy.

Will queries/reports group by it?

If business users want to see "orders by country" or "sales by product category," Country and ProductCategory are likely entities (or at least reference tables).

Could it have multiple values for one entity instance?

If a customer can have multiple phone numbers, PhoneNumber might need to be a separate entity (or multivalued attribute) rather than a simple attribute.

Signs It's an Entity

•Has attributes of its own beyond just a name
•Other entities need relationships to it
•Multiple instances will share same reference
•Users want to query, report, or filter by it
•Has a lifecycle (created, modified, deleted)
•Appears as a foreign key reference pattern
•Requires enforced domain values

Signs It's an Attribute

•Only a name/value with no other properties
•Belongs to exactly one entity instance
•Free-form text without controlled values
•No need to query independently
•Changes when parent entity changes
•Never appears in relationship descriptions
•Simple descriptive property

The Evolving Boundary

The entity/attribute boundary isn't always fixed. As requirements evolve, attributes sometimes need to become entities:

Scenario: Initially, Employee has an attribute "Department" storing a text value.

Evolution: Later, requirements emerge to track department budgets, locations, and managers. Department must become an entity.

This is normal and expected. The goal during initial modeling is to make the best decision given current knowledge, while designing in a way that makes evolution feasible.

Common Modeling Choice: Reference Entities

For controlled vocabularies (Status values, Country codes, Product categories), you have a choice:

Attribute with constraint: Store as text with a CHECK constraint limiting values
Reference entity (lookup table): Create a small entity containing valid values

Reference entities offer advantages:

Easier to add new values (insert row vs. alter constraint)
Can store additional information (full name, description)
Can be used in foreign key relationships (enforced integrity)
Appear in ER model making constraints visible

For simple, stable domains (Yes/No, Active/Inactive), attributes suffice. For richer or evolving domains (Countries, Status codes, Categories), entities are often better.

When in Doubt, Start as Entity

If you're genuinely uncertain whether something is an entity or attribute, tentatively model it as an entity. It's easier to demote an entity to an attribute during refinement than to promote an attribute to an entity after the model has evolved. Entities force you to think about the concept more carefully.

Weak Entities and Identifying Relationships

Not all entities are created equal. Some entities depend on other entities for their existence and identity. Understanding weak entities is essential for accurate modeling.

What is a Weak Entity?

A weak entity is an entity that:

Cannot exist independently—it depends on an owner entity (or strong entity)
Cannot be uniquely identified by its own attributes alone—it requires the key of its owner, forming a composite key

The relationship between a weak entity and its owner is called an identifying relationship.

Classic Example: Room and Building

"Room 101" is ambiguous—Room 101 in which building?
Room is identified by (BuildingID, RoomNumber)
A Room cannot exist without a Building
If a Building is deleted, its Rooms should be deleted

Room is a weak entity; Building is a strong entity.

Another Example: Order and LineItem

A LineItem (line 1, line 2, etc.) only makes sense within an Order
LineItem is identified by (OrderID, LineNumber)
LineItems don't transfer between orders

LineItem is weak; Order is strong.

Notation

In ER diagrams:

Weak entities are shown with double-bordered rectangles
Identifying relationships are shown with double-bordered diamonds
The weak entity's partial key (discriminator) is shown with a dashed underline

Strong Entity vs. Weak Entity Comparison
Characteristic	Strong Entity	Weak Entity
Existence	Independent—can exist on its own	Dependent—requires owner entity
Identification	Complete key from own attributes	Partial key + owner's key
Deletion	Can be deleted independently	Deleted if owner is deleted
Notation	Single-bordered rectangle	Double-bordered rectangle
Relationship to owner	Regular relationship	Identifying relationship
Example	Customer, Product, Employee	Room (of Building), Dependent (of Employee)

Recognizing Weak Entities

Look for these patterns:

Composite natural keys: If the most natural identifier includes another entity's key, it's likely weak. (CourseSection identified by CourseID + SectionNumber)
Parent-child structures with no independent child identity: Line items, dependents, room numbers, version numbers.
Existence dependency: If deleting X should cascade to delete Y, Y might be weak. (Delete Order → Delete all its LineItems)
Scope-limited uniqueness: "Account number is unique within a bank"—Account might be weak (though often modeled as strong with composite key).

When Not to Create Weak Entities

Some guidelines:

If the entity can logically transfer between owners (a Product can change Categories), it's not weak.
If the entity has a globally unique identifier (ISBN, SSN, UUID), it's strong even if associated with another entity.
If deletion shouldn't cascade (removing a Department doesn't remove Employees), the dependent entity is strong.

Practical Modeling Decision

Weak entities add complexity to the model. Consider whether the weak entity designation is truly necessary:

If you give weak entities their own synthetic keys (auto-increment IDs), they become strong entities with foreign keys to their "former" owners.
This simplifies implementation but loses the semantic signal that these entities are conceptually dependent.

Choose based on whether the dependency is a fundamental aspect of the domain or an implementation convenience.

Cascading Implications

Weak entities imply cascading delete constraints in the physical schema. When the owner is deleted, all weak entity instances belonging to it are automatically deleted. Ensure this matches business requirements—some domains want dependent data preserved even when the 'parent' is removed.

Common Entity Patterns Across Domains

While every domain is unique, certain entity patterns recur across industries. Recognizing these patterns accelerates entity discovery and ensures important constructs aren't overlooked.

Party Pattern

Many systems deal with parties—people, organizations, or other agents that can take on roles:

Person — Individual humans
Organization — Companies, departments, government bodies
Party — Generalization of Person and Organization
PartyRole — A party acting in a specific role (Customer, Supplier, Employee)

This pattern is powerful because the same party can have multiple roles (a company might be both Customer and Supplier), and roles can change without losing entity identity.

Transaction/Event Pattern

Systems typically record business events:

Transaction — A meaningful business event (Order, Payment, Shipment)
TransactionLine — Details of a transaction (OrderLine, PaymentApplication)
TransactionStatus — State tracking for transaction lifecycle
TransactionType — Classification of transactions

Transactions are often central entities with many relationships, forming the "verbs" of the business model.

Product/Service Pattern

Product — Something sold or manufactured
ProductCategory — Classification hierarchy
ProductVariant — Size/color/configuration variations
Service — Intangible offerings
Offering — Generalization of Product/Service

Common Cross-Domain Entity Patterns
Pattern	Core Entities	Domain Examples	Key Insight
Party/Role	Party, Person, Organization, Role	CRM, ERP, HR systems	Separate identity from role—same person can be employee and customer
Product Hierarchy	Product, Category, Variant, SKU	Retail, Manufacturing	Products exist at multiple levels of specificity
Transaction/Event	Transaction, TransactionLine, Status	Financial, Order Management	Events are first-class entities with their own lifecycle
Location	Location, Address, Facility, Region	Logistics, Retail, Real Estate	Locations are often referenced by multiple entities
Document	Document, Version, Attachment	Legal, Medical, Publishing	Documents evolve—track versions explicitly
Schedule/Calendar	Event, Recurrence, TimeSlot, Booking	Hospitality, Healthcare, HR	Time-based reservation and availability

Location/Geographic Pattern

Address — Physical address (may be composite)
Location — Abstract location that might not have address
Facility — Physical structure (Warehouse, Store, Office)
GeoRegion — Named geographic area (Country, State, City)

Locations are often referenced by multiple entities and benefit from normalization.

Document/Content Pattern

Document — A container for content
Version — Specific revision of document
Attachment — Document associated with other entities
Template — Reusable document patterns

Temporal Pattern

When history matters:

EffectiveDated entities — Entities with ValidFrom and ValidTo dates
History/Audit entities — Shadow entities tracking changes
Snapshot entities — Point-in-time copies of data
Temporal relationships — Relationships that change over time

Status Pattern

Entities with lifecycle:

StatusCode — Reference entity with valid statuses
StatusHistory — Tracking status changes over time
Workflow state — Current processing stage

Using Patterns During Discovery

When analyzing a new domain:

Identify which high-level patterns apply
Check for the common entities from those patterns
Adapt pattern entities to domain-specific terminology
Validate with stakeholders that these entities matter in their context

Patterns as Starting Points

Don't force patterns onto domains where they don't fit. Patterns are starting points for discovery, not prescriptions. If stakeholders don't recognize pattern entities as relevant, the pattern may not apply. Always validate against actual domain requirements.

Entity Naming Conventions

Good naming is crucial for model clarity. Entity names should communicate precisely what the entity represents, using vocabulary that stakeholders recognize.

Core Naming Principles

Use Domain Vocabulary

Name entities using terms the business actually uses. If stakeholders call it a "Client," don't model it as "Customer" because that's more familiar to you. Consistent vocabulary reduces confusion.

Use Singular Nouns

Entity names should be singular: Customer (not Customers), Order (not Orders). We name the type, not the collection. An entity type has many instances, but the type itself is singular.

Be Specific

Avoid vague names. "Item" could mean anything—ProductItem, LineItem, InventoryItem? Choose names that distinguish entities clearly.

Avoid Technical Jargon

Names like "CustomerMaster", "OrderHeader", or "TransactionRecord" reveal implementation thinking. At the conceptual level, use business terms: Customer, Order, Transaction.

Be Consistent

Pick a naming pattern and stick with it:

PascalCase: CustomerOrder, ProductCategory
Underscore: Customer_Order, Product_Category
With or without prefixes: tblCustomer (avoid) vs Customer

Inconsistency creates confusion.

Entity Naming Checklist

•Singular noun: Customer, Order, Product (not plurals)
•Business vocabulary: Use terms stakeholders recognize
•Self-explanatory: Name should convey meaning without context
•Distinct: No two entities with easily confused names
•Pronounced easily: Avoid acronyms unless universally understood
•Consistent case: One style throughout (PascalCase common)
•No technical prefixes: No tbl, rec, or type indicators
•No reserved words: Avoid SQL keywords (Order vs SalesOrder)

Handling Naming Challenges

Same Concept, Different Names

Different departments may use different terms for the same thing:

Sales says "Client"
Support says "User"
Billing says "Customer"

Solution: Choose one canonical name for the entity and document the aliases. Create a glossary that maps terms to entities.

Ambiguous Terms

"Product" might mean different things:

A product definition (template)
A physical inventory item
A sellable unit (SKU)

Solution: Use qualified names to disambiguate: ProductDefinition, InventoryItem, SellableUnit.

Compound Names

Some entities naturally have compound names:

OrderLineItem (not just LineItem if context is unclear)
CustomerAddress (if Address means something different elsewhere)
EmployeeSkill (if modeling the assignment of skills to employees)

Keep compound names readable—usually 2-3 words maximum.

Abbreviations

Avoid abbreviations unless universally understood in the domain:

Cust instead of Customer — avoid
SKU (Stock Keeping Unit) — acceptable if domain-standard
HR (Human Resources) — acceptable in HR systems

Reserved Word Conflicts

Some good entity names conflict with SQL keywords:

Order → SalesOrder, CustomerOrder, PurchaseOrder
User → SystemUser, ApplicationUser
Transaction → BusinessTransaction

Don't let SQL conflicts force bad names—qualify appropriately.

Names Shape Understanding

Entity names aren't just labels—they shape how everyone thinks about the data. Poor names cause permanent confusion. An entity named 'Data' tells no one anything. An entity named 'CustomerInteractionRecord' tells everyone exactly what it represents. Invest time in naming.

Entity Identification Pitfalls

Even experienced modelers make entity identification mistakes. Awareness of common pitfalls helps avoid them.

Pitfall 1: Conflating Entities with Attributes

Symptom: An entity has almost no attributes except its identifier.

Example: Modeling "Color" as an entity with only ColorID and ColorName, when colors are just attribute values of Products.

Fix: If something is just a constrained set of values for another entity, it's either an attribute with constraints or a simple reference table—not a conceptual entity worth prominent modeling.

Pitfall 2: Missing Abstract Entities

Symptom: Model has only concrete, physical entities.

Problem: Missing entities like Agreement, Policy, Preference, Configuration—abstract things that aren't physical but still need tracking.

Fix: Ask stakeholders about rules, preferences, and policies. These often become entities.

Pitfall 3: Mega-Entities (Fat Entities)

Symptom: One entity has 50+ attributes covering many concerns.

Example: "Customer" with fields for contact info, billing, shipping, preferences, support history, and more.

Problem: The entity is doing too much. It's actually multiple entities bundled together.

Fix: Analyze attribute groupings. Split into Customer, CustomerContact, CustomerAddress, CustomerPreferences, etc.

Common Entity Identification Errors

•Too many entities — Every noun becomes an entity, creating clutter
•Too few entities — Mega-entities that bundle unrelated concepts
•Modeling the UI — "Screen," "Form," "Report" are not domain entities
•Modeling the system — "Interface," "Module," "Component" are implementation
•Actions as entities — "Create," "Delete," "Update" are events, not entities (unless tracking events)
•Duplicate entities — Same concept modeled twice with different names
•Missing event entities — Transactions, requests, sessions often overlooked
•Implementation leakage — "AuditLog," "BackupTable" are physical, not conceptual

Pitfall 4: Modeling the Application, Not the Domain

Symptom: Entities match application screens or modules rather than business concepts.

Example: "MaintenanceScreen" or "ReportModule" as entities.

Problem: You're modeling the software, not the data. Software changes; domain truths persist.

Fix: Ask "What business thing does this screen display?" Model that thing.

Pitfall 5: Ignoring Future Evolution

Symptom: Model handles current use cases but can't accommodate obvious future needs.

Example: Modeling PhoneNumber as a single field when a customer might obviously have multiple numbers.

Fix: Consider likely evolution when making entity/attribute decisions. Model multivalued attributes appropriately.

Pitfall 6: Premature Synthetic Keys

Symptom: Every entity immediately gets an "ID" attribute.

Problem: At the conceptual level, focus on natural identifiers that the business uses. CustomerEmail or CustomerNumber—not CustomerID.

Fix: Model what identifies things in the business domain. Synthetic keys are implementation decisions.

Pitfall 7: Failing to Model Events/Transactions

Symptom: Model has static entities (Customer, Product) but no event entities.

Problem: Most systems need to track events—orders, payments, requests, sessions.

Fix: Ask "What happens in this system?" not just "What things exist?" Events are entities too.

Review Against Requirements

The ultimate test: Can every piece of data mentioned in requirements traces to an entity and attribute in the model? If requirements mention 'customer phone numbers' and you can't find where that's stored, you have a gap. Map requirements to model elements as a verification step.

Summary: Entity Identification Mastery

Entity identification is perhaps the most judgment-intensive aspect of conceptual design. We've explored systematic techniques, decision criteria, patterns, and pitfalls. Let's consolidate these insights:

Key Takeaways

•Entities have four characteristics — Independent existence, distinct identity, multiple instances, and relevant attributes.
•Multiple discovery techniques exist — Noun analysis, form mining, process analysis, stakeholder interviews, system reverse engineering.
•Entity vs. attribute is a key decision — Consider relationships, attributes of its own, query needs, and multiplicity.
•Weak entities depend on owner entities — They can't exist or be identified independently; use double-bordered notation.
•Cross-domain patterns accelerate discovery — Party, Transaction, Product, Location patterns recur across industries.
•Naming requires discipline — Singular, domain-appropriate, consistent, and unambiguous names.
•Common pitfalls await the unwary — Mega-entities, missing events, modeling the application, premature implementation details.

What's Next:

With entities identified, we must now discover and model relationships—the associations that connect entities into a coherent data model. Relationship identification has its own techniques, patterns, and pitfalls. In the next page, we'll examine how to systematically discover relationships, determine their cardinality, and capture the semantic meaning of how entities interact.

Page Complete

You now have a comprehensive toolkit for entity identification—from initial discovery through refinement and validation. You understand the entity/attribute distinction, can recognize weak entities, leverage common patterns, and avoid frequent mistakes. Next, we'll apply equal rigor to relationship identification.