Loading learning content...
Every database begins with a question: What are we storing data about? The answer to this question shapes every subsequent design decision—from table structures to query patterns, from performance characteristics to maintainability. In the Entity-Relationship model, this fundamental concept is captured by the notion of an entity.
The concept of an entity seems deceptively simple at first glance. A customer is an entity. A product is an entity. A transaction is an entity. But as we delve deeper, we discover that entity identification is both an art and a science—requiring careful analysis, domain expertise, and an understanding of how data will be used throughout the system's lifecycle.
This page explores the rigorous definition of entities, their philosophical foundations, and the practical techniques for identifying them from real-world requirements. Mastering entity recognition is the foundational skill upon which all other Entity-Relationship modeling capabilities are built.
By the end of this page, you will understand the formal definition of an entity, distinguish entities from non-entities, recognize the properties that make something 'entity-worthy,' and apply systematic techniques for identifying entities from natural language requirements. This knowledge forms the bedrock of all subsequent ER modeling concepts.
In the Entity-Relationship model, introduced by Peter Chen in his seminal 1976 paper "The Entity-Relationship Model—Toward a Unified View of Data," an entity is formally defined as:
An entity is a 'thing' in the real world with an independent existence.
This definition, while elegant in its simplicity, carries profound implications. Let us unpack each component:
'A thing in the real world' — Entities represent objects, concepts, or phenomena that exist in the domain being modeled. They can be tangible (a person, a building, a product) or intangible (a course, an account, a reservation). The key requirement is that they have meaning within the problem domain.
'With an independent existence' — This is the critical distinguishing factor. An entity exists on its own merits, not merely as a property or characteristic of something else. A customer's name is not an entity—it's an attribute of the customer entity. But the customer themselves? That's an entity.
This independence criterion becomes particularly important when distinguishing entities from attributes (which describe entities) and from relationships (which connect entities).
Peter Chen's genius was recognizing that data modeling should begin with understanding the real world, not with implementation details like tables or pointers. The ER model abstracts away storage concerns to focus on what data represents—a conceptual approach that has proven remarkably durable across five decades of database evolution.
The Independence Test:
To determine if something qualifies as an entity, apply the independence test:
If the answer to all three questions is 'yes,' you likely have an entity. If any answer is 'no,' reconsider—it may be an attribute, a relationship, or perhaps not database-relevant at all.
One of the most enlightening distinctions in entity classification is between tangible (physical) and intangible (conceptual) entities. Understanding this distinction prevents the common mistake of limiting entities only to physical objects.
Tangible Entities:
These are entities that correspond to physical objects in the real world. You can touch them, see them, or interact with them physically. They occupy space and have material existence.
Examples include:
Intangible Entities:
These entities have no physical form but have independent conceptual existence. They represent abstractions, agreements, events, or intellectual constructs that the organization tracks and manages.
Examples include:
| Characteristic | Tangible Entities | Intangible Entities |
|---|---|---|
| Physical existence | Yes—can be touched, seen | No—exists as concept only |
| Examples | Person, Product, Building | Account, Course, Policy |
| Identity source | Often physical characteristics | Assigned identifiers, agreements |
| Creation | Manufacturing, birth, construction | Agreement, decision, event |
| Destruction | Physical destruction, death | Cancellation, expiration, closure |
| Modeling challenge | Straightforward identification | Requires domain knowledge |
Beginning modelers often miss intangible entities because they focus on 'things you can see.' In business applications, intangible entities frequently outnumber tangible ones. A bank's core entities—accounts, transactions, loans, investments—are all intangible. Missing these leads to incomplete models that fail to capture essential business concepts.
The Reality Check:
Consider a university database. The tangible entities are obvious: Students, Professors, Buildings, Books. But the intangible entities are often more numerous and more central to the university's operations:
A model that captures only the tangible entities (students, professors, buildings) would be fundamentally incomplete. The intangible entities carry the business logic and represent the core value the university provides.
Identifying entities from natural language requirements or domain analysis is a skill that improves with practice. Several systematic techniques help ensure comprehensive entity discovery:
1. The Noun Analysis Technique:
This classic approach involves identifying nouns in requirements documents, user stories, or domain descriptions. Nouns often represent either entities or attributes; further analysis determines which.
Example requirement: "The hospital system must track patients, their appointments, the doctors who treat them, and the medications prescribed during each visit."
Nouns identified: hospital, system, patients, appointments, doctors, medications, visit
2. The Questions Technique:
Ask yourself what questions the system needs to answer. The subjects of these questions suggest entities.
3. The Domain Expert Interview Technique:
Engage with people who understand the business domain. Ask open-ended questions:
Domain experts naturally speak in terms of the entities relevant to their work. A warehouse manager talks about shipments, inventory items, suppliers, and orders. A hospital administrator discusses patients, appointments, procedures, and insurance claims.
4. The Document Analysis Technique:
Examine existing forms, reports, spreadsheets, and paper records. The sections and headers often correspond to entities, and the fields represent attributes.
A customer invoice contains:
No single technique guarantees complete entity discovery. Professional data modelers apply multiple techniques and iterate. Start with noun analysis for breadth, refine with the questions technique for business relevance, validate with domain experts for accuracy, and confirm with document analysis for completeness.
While every domain has unique entities, certain entity patterns recur across virtually all business applications. Recognizing these patterns accelerates the modeling process and ensures you don't overlook fundamental entities.
Party Entities:
Almost every system deals with people and organizations:
Product/Service Entities:
Most systems track what is bought, sold, or provided:
Transaction Entities:
Records of activities and events:
| Domain | Typical Entities |
|---|---|
| Retail/E-commerce | Customer, Product, Order, Payment, Review, Cart, Wishlist, Category, Promotion |
| Healthcare | Patient, Doctor, Appointment, Prescription, Diagnosis, Treatment, Insurance, Claim |
| Education | Student, Course, Instructor, Enrollment, Grade, Degree, Department, Semester |
| Finance/Banking | Account, Transaction, Customer, Loan, Investment, Branch, Statement, Interest Rate |
| Manufacturing | Product, Component, Supplier, Order, Inventory, Machine, Quality Check, BillOfMaterials |
| Human Resources | Employee, Department, Position, Skill, Training, Performance Review, Benefit, Payroll |
| Logistics | Shipment, Vehicle, Route, Warehouse, Package, Driver, Delivery, Tracking Event |
| Real Estate | Property, Agent, Listing, Client, Showing, Contract, Transaction, Commission |
Use domain patterns as a starting point, not a rigid template. Every business has unique aspects that require custom entities. An e-commerce platform selling digital goods has different entities than one selling physical products. Healthcare systems vary by specialty. Start with common patterns, then refine based on specific requirements.
One of the most challenging decisions in data modeling is determining whether something should be modeled as an entity or as an attribute. This decision has lasting implications for flexibility, query complexity, and maintenance.
The Fundamental Difference:
The Color Example:
Consider modeling products in an e-commerce system where each product has a color.
As an attribute:
Product(product_id, name, price, color)
This works when:
As an entity:
Product(product_id, name, price, color_id)
Color(color_id, name, hex_code, rgb_values, pantone_reference, is_metallic)
This works when:
The Address Conundrum:
Addresses illustrate this distinction perfectly. A customer's address could be:
Simple attribute approach:
Customer(id, name, address)
Structured attribute approach:
Customer(id, name, street, city, state, zip, country)
Entity approach:
Customer(id, name, primary_address_id)
Address(id, street, city, state, zip, country, lat, lng, validated)
The Right Choice Depends on Requirements:
For a simple contact list, attribute is fine. For an e-commerce platform with shipping logistics, address as an entity is essential. There's no universally correct answer—only the answer that fits your requirements.
Promoting an attribute to an entity later requires schema changes, data migration, and application code updates. Demoting an entity to an attribute loses data and breaks relationships. This is why upfront analysis matters—the cost of correction grows exponentially over time.
Just as entities and attributes must be distinguished, so must entities and relationships. Sometimes what appears to be a simple relationship between entities actually warrants promotion to entity status itself—a concept called reifying a relationship.
The Basic Scenario:
Consider students enrolling in courses:
As a simple relationship:
Student --enrolls_in-- Course
This captures the basic fact: "Student X is enrolled in Course Y."
As a reified entity (Enrollment):
Student --has-- [Enrollment] --for-- Course
The Enrollment is now an entity with its own properties:
When to Reify Relationships:
Promote a relationship to an entity when:
The relationship has attributes — If data describes the relationship itself (not just the participants), consider an entity
The relationship has relationships — If the relationship connects to other things, it needs independent existence
History matters — If you need to track changes over time (when did enrollment start? when did it end?)
The relationship has cardinality constraints — A person can have multiple enrollments in the same course (different semesters)
Business significance — The relationship is so important it has its own lifecycle and business rules
| Simple Relationship | Reified Entity | Why Promote? |
|---|---|---|
| Employee works_in Department | Employment/Assignment | Start date, end date, is_primary, job_title |
| Student enrolls_in Course | Enrollment | Grade, semester, section, attendance |
| Actor appears_in Movie | Role/Credit | Character name, billing order, dates |
| Customer purchases Product | Order/Purchase | Quantity, price, date, shipping |
| Person follows Person | Follow/Connection | Since date, notification settings |
| Doctor treats Patient | Visit/Consultation | Date, diagnosis, notes, billing |
Reified relationships are sometimes called 'associative entities,' 'junction entities,' or 'intersection entities.' They sit at the junction of a many-to-many relationship but have earned entity status through their own attributes or relationships. In implementation, these become the 'bridge tables' that resolve M:N relationships.
The Evolution Pattern:
Many models evolve through these stages:
Stage 1 — Simple relationship: "Employees work in departments."
Stage 2 — Relationship with attributes: "We need to track when each employee started in their department and their role."
Stage 3 — Full entity: "Employees can have multiple assignments over time, each with start/end dates, supervision relationships, and budget allocations. Assignment history must be preserved for audit."
Recognizing when you've reached Stage 3 is crucial. The signals are:
At this point, the relationship has become an entity with its own identity and lifecycle.
The concept of entities in database modeling draws from centuries of philosophical inquiry into the nature of objects, identity, and classification. Understanding these foundations deepens our modeling intuition.
Ontology and Data Modeling:
Ontology—the philosophical study of being—asks fundamental questions that parallel data modeling challenges:
Data modeling is, in essence, applied ontology for a specific domain.
Identity and Individuation:
Ancient philosophers grappled with the Ship of Theseus: if every plank of a ship is gradually replaced, is it still the same ship? Database designers face the same question:
This is why entities have keys—stable identifiers that persist regardless of attribute changes. The entity's identity is defined by its key, not its current attributes.
Plato's theory of Forms distinguished between ideal types and particular instances. Every chair is an imperfect instantiation of the ideal Chair. Similarly, entity types (Customer) define the ideal structure, while entity instances (Customer #12345) are particular realizations. This type-instance distinction is fundamental to both philosophy and data modeling.
Natural Kinds vs. Nominal Kinds:
Philosophers distinguish:
Some entities correspond to natural kinds—a Person is a Person due to biological reality. Others are nominal—a 'VIP Customer' is defined by business rules that could change.
This distinction matters for modeling stability:
The Reality Criterion:
Philosopher W.V.O. Quine proposed we are committed to the existence of whatever our best theories quantify over. In data modeling terms: entities are the things your business logic makes claims about.
If business rules say "customers must have valid addresses" or "orders must be shipped within 3 days," then Customer and Order are real entities in your domain—things your system makes formal commitments about.
Practical Implications:
These philosophical considerations lead to practical guidelines:
We've explored the foundational concept of entities in the Entity-Relationship model—the building blocks upon which all database designs are constructed. Let's consolidate our understanding:
What's next:
Now that we understand what entities are individually, we'll explore how entities are organized into entity sets—collections of similar entities that share common attributes and behaviors. This collective view enables us to model not just individual things, but categories of things, laying the groundwork for database tables and their schemas.
You now understand the fundamental definition of entities—what they are, how to identify them, and how they differ from attributes and relationships. This foundational concept will inform every subsequent modeling decision. Next, we'll examine entity sets and how entities are grouped and classified.