Loading content...
In conceptual database design, entity identification is the foundational activity upon which everything else depends. Get the entities right, and relationships and attributes flow naturally. Get them wrong, and the entire model—and everything built upon it—will struggle to represent reality accurately.
Yet entity identification is far from mechanical. It requires judgment, domain understanding, and the ability to distinguish between surface-level descriptions and fundamental data concepts. A requirements document might mention "customer information, order tracking, and inventory management"—but how do we translate these loose phrases into crisp entities?
This page equips you with systematic techniques for discovering entities, criteria for evaluating candidates, and the wisdom to avoid common identification mistakes.
By the end of this page, you will be able to systematically identify entities from requirements documents, interviews, and existing systems. You'll understand what makes something an entity versus an attribute, how to handle ambiguous cases, and common patterns that help guide entity discovery across diverse domains.
Before we can identify entities, we need clear criteria for what qualifies as one. An entity is something about which the organization wishes to store data—but that definition alone is insufficient for practical decision-making.
The Four Essential Characteristics
An entity typically exhibits four characteristics:
1. Independent Existence
An entity has meaning on its own, independent of other constructs. We can discuss "a customer" or "an order" without necessarily referencing other entities. Contrast this with an attribute like "order date"—we can't meaningfully discuss an order date without reference to the order it belongs to.
Question to ask: Can I describe an instance of this thing without first describing something else?
2. Distinct Identity
We can distinguish different instances of an entity from each other. There might be thousands of customers, but each is distinguishable—typically by some identifier.
Question to ask: Can I tell one instance from another? What makes them different?
3. Multiple Instances Expected
Entities generally have multiple instances in the domain. If only one instance ever exists, it might be data about the organization itself (configuration, settings) rather than an entity to model.
Question to ask: Will there be many of these? Do stakeholders refer to 'an X' and 'another X'?
4. Relevant Attributes
Entities have properties worth recording. An entity with no meaningful attributes (other than perhaps an identifier) may not be worth modeling as separate.
Question to ask: What information do we need to store about this thing?
| Candidate | Independent Existence? | Distinct Identity? | Multiple Instances? | Has Attributes? | Likely Entity? |
|---|---|---|---|---|---|
| Customer | Yes | Yes (ID, Email) | Yes (many customers) | Yes (Name, Address, etc.) | ✓ Yes |
| Order | Yes | Yes (Order Number) | Yes (many orders) | Yes (Date, Status, Total) | ✓ Yes |
| OrderDate | No (belongs to Order) | N/A | N/A | No | ✗ Attribute |
| ShippingAddress | Debatable | Possibly | Yes (repeated) | Yes (Street, City, etc.) | ? Needs analysis |
| Country | Yes | Yes (Code or Name) | Yes (many countries) | Yes (Name, Code) | ✓ Yes (reference data) |
A practical test: Can you point at a real-world instance and say 'That's a [entity name]'? 'That's a customer' (the person John Smith). 'That's an order' (the paper in my hand or record on screen). If you can't point at an instance, it might not be an entity.
Entity identification is both art and discipline. Experienced modelers develop intuition, but systematic techniques provide reliable starting points.
Technique 1: Noun Analysis
The most direct approach is extracting nouns from requirements documents, interview transcripts, and business documents.
Process:
Not every noun becomes an entity. Filter out:
Technique 2: Form/Report Mining
Existing forms, reports, and documents reveal what data the organization already tracks.
For each form/report:
A "Customer Order Form" has Customer as main subject with customer fields, Order as another entity with order fields, and Line Items as a group of repeated fields (probably an entity).
Technique 3: Process/Workflow Analysis
Business processes involve things. Tracing workflows reveals what entities are created, modified, or consumed.
For each business process:
Technique 4: Stakeholder Role Analysis
Different stakeholders see different entities based on their roles:
Interview stakeholders from different roles to get comprehensive entity coverage.
Technique 5: Existing System Reverse Engineering
If replacing an existing system, its data structures reveal entities (though perhaps imperfectly):
Caution: Don't blindly copy old structures. Old systems contain mistakes and workarounds. Use them as input to verify against fresh domain analysis.
No single technique is complete. Use multiple techniques and compare results. Entities that appear from multiple sources are likely correct. Entities appearing from only one source need careful validation.
One of the most challenging decisions in conceptual modeling is determining whether something is an entity or an attribute. The distinction has profound implications for how the database will represent and evolve the domain.
The Fundamental Difference
Customer is an entity. CustomerName is an attribute of Customer—we don't track "names" independent of who has that name.
Decision Criteria
When in doubt, consider these questions:
Does it have attributes of its own?
If "Department" has DepartmentName, DepartmentLocation, DepartmentBudget, and DepartmentHead, it's probably an entity. If you just need to store a department name as a text field for employees, it might be an attribute.
Will there be relationships to it?
If other entities need to relate to this thing, it's probably an entity. If multiple employees work in the same department and we need to reference that shared department, Department is an entity.
Is it referenced by name/value multiple times?
If the same value appears in multiple places (same department name for multiple employees), consider making it an entity to avoid redundancy.
Will queries/reports group by it?
If business users want to see "orders by country" or "sales by product category," Country and ProductCategory are likely entities (or at least reference tables).
Could it have multiple values for one entity instance?
If a customer can have multiple phone numbers, PhoneNumber might need to be a separate entity (or multivalued attribute) rather than a simple attribute.
The Evolving Boundary
The entity/attribute boundary isn't always fixed. As requirements evolve, attributes sometimes need to become entities:
Scenario: Initially, Employee has an attribute "Department" storing a text value.
Evolution: Later, requirements emerge to track department budgets, locations, and managers. Department must become an entity.
This is normal and expected. The goal during initial modeling is to make the best decision given current knowledge, while designing in a way that makes evolution feasible.
Common Modeling Choice: Reference Entities
For controlled vocabularies (Status values, Country codes, Product categories), you have a choice:
Reference entities offer advantages:
For simple, stable domains (Yes/No, Active/Inactive), attributes suffice. For richer or evolving domains (Countries, Status codes, Categories), entities are often better.
If you're genuinely uncertain whether something is an entity or attribute, tentatively model it as an entity. It's easier to demote an entity to an attribute during refinement than to promote an attribute to an entity after the model has evolved. Entities force you to think about the concept more carefully.
Not all entities are created equal. Some entities depend on other entities for their existence and identity. Understanding weak entities is essential for accurate modeling.
What is a Weak Entity?
A weak entity is an entity that:
The relationship between a weak entity and its owner is called an identifying relationship.
Classic Example: Room and Building
Room is a weak entity; Building is a strong entity.
Another Example: Order and LineItem
LineItem is weak; Order is strong.
Notation
In ER diagrams:
| Characteristic | Strong Entity | Weak Entity |
|---|---|---|
| Existence | Independent—can exist on its own | Dependent—requires owner entity |
| Identification | Complete key from own attributes | Partial key + owner's key |
| Deletion | Can be deleted independently | Deleted if owner is deleted |
| Notation | Single-bordered rectangle | Double-bordered rectangle |
| Relationship to owner | Regular relationship | Identifying relationship |
| Example | Customer, Product, Employee | Room (of Building), Dependent (of Employee) |
Recognizing Weak Entities
Look for these patterns:
Composite natural keys: If the most natural identifier includes another entity's key, it's likely weak. (CourseSection identified by CourseID + SectionNumber)
Parent-child structures with no independent child identity: Line items, dependents, room numbers, version numbers.
Existence dependency: If deleting X should cascade to delete Y, Y might be weak. (Delete Order → Delete all its LineItems)
Scope-limited uniqueness: "Account number is unique within a bank"—Account might be weak (though often modeled as strong with composite key).
When Not to Create Weak Entities
Some guidelines:
If the entity can logically transfer between owners (a Product can change Categories), it's not weak.
If the entity has a globally unique identifier (ISBN, SSN, UUID), it's strong even if associated with another entity.
If deletion shouldn't cascade (removing a Department doesn't remove Employees), the dependent entity is strong.
Practical Modeling Decision
Weak entities add complexity to the model. Consider whether the weak entity designation is truly necessary:
If you give weak entities their own synthetic keys (auto-increment IDs), they become strong entities with foreign keys to their "former" owners.
This simplifies implementation but loses the semantic signal that these entities are conceptually dependent.
Choose based on whether the dependency is a fundamental aspect of the domain or an implementation convenience.
Weak entities imply cascading delete constraints in the physical schema. When the owner is deleted, all weak entity instances belonging to it are automatically deleted. Ensure this matches business requirements—some domains want dependent data preserved even when the 'parent' is removed.
While every domain is unique, certain entity patterns recur across industries. Recognizing these patterns accelerates entity discovery and ensures important constructs aren't overlooked.
Party Pattern
Many systems deal with parties—people, organizations, or other agents that can take on roles:
This pattern is powerful because the same party can have multiple roles (a company might be both Customer and Supplier), and roles can change without losing entity identity.
Transaction/Event Pattern
Systems typically record business events:
Transactions are often central entities with many relationships, forming the "verbs" of the business model.
Product/Service Pattern
| Pattern | Core Entities | Domain Examples | Key Insight |
|---|---|---|---|
| Party/Role | Party, Person, Organization, Role | CRM, ERP, HR systems | Separate identity from role—same person can be employee and customer |
| Product Hierarchy | Product, Category, Variant, SKU | Retail, Manufacturing | Products exist at multiple levels of specificity |
| Transaction/Event | Transaction, TransactionLine, Status | Financial, Order Management | Events are first-class entities with their own lifecycle |
| Location | Location, Address, Facility, Region | Logistics, Retail, Real Estate | Locations are often referenced by multiple entities |
| Document | Document, Version, Attachment | Legal, Medical, Publishing | Documents evolve—track versions explicitly |
| Schedule/Calendar | Event, Recurrence, TimeSlot, Booking | Hospitality, Healthcare, HR | Time-based reservation and availability |
Location/Geographic Pattern
Locations are often referenced by multiple entities and benefit from normalization.
Document/Content Pattern
Temporal Pattern
When history matters:
Status Pattern
Entities with lifecycle:
Using Patterns During Discovery
When analyzing a new domain:
Don't force patterns onto domains where they don't fit. Patterns are starting points for discovery, not prescriptions. If stakeholders don't recognize pattern entities as relevant, the pattern may not apply. Always validate against actual domain requirements.
Good naming is crucial for model clarity. Entity names should communicate precisely what the entity represents, using vocabulary that stakeholders recognize.
Core Naming Principles
Use Domain Vocabulary
Name entities using terms the business actually uses. If stakeholders call it a "Client," don't model it as "Customer" because that's more familiar to you. Consistent vocabulary reduces confusion.
Use Singular Nouns
Entity names should be singular: Customer (not Customers), Order (not Orders). We name the type, not the collection. An entity type has many instances, but the type itself is singular.
Be Specific
Avoid vague names. "Item" could mean anything—ProductItem, LineItem, InventoryItem? Choose names that distinguish entities clearly.
Avoid Technical Jargon
Names like "CustomerMaster", "OrderHeader", or "TransactionRecord" reveal implementation thinking. At the conceptual level, use business terms: Customer, Order, Transaction.
Be Consistent
Pick a naming pattern and stick with it:
Inconsistency creates confusion.
Handling Naming Challenges
Same Concept, Different Names
Different departments may use different terms for the same thing:
Solution: Choose one canonical name for the entity and document the aliases. Create a glossary that maps terms to entities.
Ambiguous Terms
"Product" might mean different things:
Solution: Use qualified names to disambiguate: ProductDefinition, InventoryItem, SellableUnit.
Compound Names
Some entities naturally have compound names:
Keep compound names readable—usually 2-3 words maximum.
Abbreviations
Avoid abbreviations unless universally understood in the domain:
Reserved Word Conflicts
Some good entity names conflict with SQL keywords:
Don't let SQL conflicts force bad names—qualify appropriately.
Entity names aren't just labels—they shape how everyone thinks about the data. Poor names cause permanent confusion. An entity named 'Data' tells no one anything. An entity named 'CustomerInteractionRecord' tells everyone exactly what it represents. Invest time in naming.
Even experienced modelers make entity identification mistakes. Awareness of common pitfalls helps avoid them.
Pitfall 1: Conflating Entities with Attributes
Symptom: An entity has almost no attributes except its identifier.
Example: Modeling "Color" as an entity with only ColorID and ColorName, when colors are just attribute values of Products.
Fix: If something is just a constrained set of values for another entity, it's either an attribute with constraints or a simple reference table—not a conceptual entity worth prominent modeling.
Pitfall 2: Missing Abstract Entities
Symptom: Model has only concrete, physical entities.
Problem: Missing entities like Agreement, Policy, Preference, Configuration—abstract things that aren't physical but still need tracking.
Fix: Ask stakeholders about rules, preferences, and policies. These often become entities.
Pitfall 3: Mega-Entities (Fat Entities)
Symptom: One entity has 50+ attributes covering many concerns.
Example: "Customer" with fields for contact info, billing, shipping, preferences, support history, and more.
Problem: The entity is doing too much. It's actually multiple entities bundled together.
Fix: Analyze attribute groupings. Split into Customer, CustomerContact, CustomerAddress, CustomerPreferences, etc.
Pitfall 4: Modeling the Application, Not the Domain
Symptom: Entities match application screens or modules rather than business concepts.
Example: "MaintenanceScreen" or "ReportModule" as entities.
Problem: You're modeling the software, not the data. Software changes; domain truths persist.
Fix: Ask "What business thing does this screen display?" Model that thing.
Pitfall 5: Ignoring Future Evolution
Symptom: Model handles current use cases but can't accommodate obvious future needs.
Example: Modeling PhoneNumber as a single field when a customer might obviously have multiple numbers.
Fix: Consider likely evolution when making entity/attribute decisions. Model multivalued attributes appropriately.
Pitfall 6: Premature Synthetic Keys
Symptom: Every entity immediately gets an "ID" attribute.
Problem: At the conceptual level, focus on natural identifiers that the business uses. CustomerEmail or CustomerNumber—not CustomerID.
Fix: Model what identifies things in the business domain. Synthetic keys are implementation decisions.
Pitfall 7: Failing to Model Events/Transactions
Symptom: Model has static entities (Customer, Product) but no event entities.
Problem: Most systems need to track events—orders, payments, requests, sessions.
Fix: Ask "What happens in this system?" not just "What things exist?" Events are entities too.
The ultimate test: Can every piece of data mentioned in requirements traces to an entity and attribute in the model? If requirements mention 'customer phone numbers' and you can't find where that's stored, you have a gap. Map requirements to model elements as a verification step.
Entity identification is perhaps the most judgment-intensive aspect of conceptual design. We've explored systematic techniques, decision criteria, patterns, and pitfalls. Let's consolidate these insights:
What's Next:
With entities identified, we must now discover and model relationships—the associations that connect entities into a coherent data model. Relationship identification has its own techniques, patterns, and pitfalls. In the next page, we'll examine how to systematically discover relationships, determine their cardinality, and capture the semantic meaning of how entities interact.
You now have a comprehensive toolkit for entity identification—from initial discovery through refinement and validation. You understand the entity/attribute distinction, can recognize weak entities, leverage common patterns, and avoid frequent mistakes. Next, we'll apply equal rigor to relationship identification.