Loading learning content...
Interviews reveal needs. Documents provide evidence. But neither interviews nor documents are directly usable for database design. The raw material of requirements gathering must be transformed into structured data requirements specifications—formal, precise descriptions of what data the database must store.
Data requirements are the bridge between the problem domain and the technical solution. They answer the fundamental question: What information must the database capture, organize, and preserve?
A well-specified data requirement is:
By the end of this page, you will understand how to specify formal data requirements for database design. You'll learn to identify and document entities, define attributes with precise domains, specify relationships and cardinalities, and create data requirements documentation that guides conceptual and logical database design.
Data requirements define what information the system must persistently store to support organizational operations, decision-making, and compliance. They specify the types of facts, objects, and relationships that the database must represent.
Data Requirements vs. Other Requirement Types:
Database projects involve multiple requirement categories. Data requirements are distinct from:
Data requirements focus specifically on the persistent information model that underlies all other functionality. You cannot process orders without storing order data. You cannot generate reports without the data they report on.
Data requirements must be precise enough to guide design but abstract enough to avoid premature implementation decisions. You're describing what data exists, not how it will be stored. The difference between 'Customer has Address' (requirement) and 'Create column CustomerAddress VARCHAR(200)' (implementation) is crucial.
Entities are the fundamental objects about which the organization needs to store information. They represent things that have independent existence and identity within the business domain.
What Makes Something an Entity?
An entity typically:
Not everything mentioned in requirements is an entity. Some are attributes (descriptions of entities), some are relationships, and some are values.
| Item | Entity or Not? | Reasoning |
|---|---|---|
| Customer | Entity | Independent existence, unique identity (CustomerID), multiple attributes, participates in relationships |
| Customer Name | Attribute | Describes a Customer; has no independent existence |
| Order | Entity | Distinct business object, unique identity (OrderNumber), multiple attributes |
| Order Status | Attribute (or Lookup) | Describes an Order; might be a lookup table if status values are complex |
| Employee manages Employee | Relationship | Describes a connection between Employee instances, not a thing itself |
| Product Category | Entity (often) | If categories have their own attributes (description, parent category), they're entities |
| Quantity Ordered | Attribute | A single value describing an order item |
| Invoice | Entity | Legal document with identity, dates, amounts, and relationships to orders/customers |
Entity Discovery Techniques:
Noun Analysis: Review interview transcripts and documents, highlighting nouns. Nouns that recur frequently and have associated facts are entity candidates.
Subject-Object Analysis: In statements like "Customers place Orders" or "Employees work in Departments," the subjects and objects are typically entities.
Form Field Analysis: Groups of related fields on forms often represent entities. A "Customer Information" section suggests a Customer entity with those fields as attributes.
Report Column Analysis: Column headers in reports often correspond to entity attributes. Grouping columns by their subject reveals entities.
Existing System Tables: Tables in legacy databases typically represent entities (or denormalized combinations thereof).
Event/Transaction Analysis: Business events that must be recorded (Order Placed, Payment Received, Shipment Sent) often become entities or relate to entities.
When uncertain whether something is an entity, ask: 'Will we have multiple instances of this? Does each instance have a unique identity? Do we need to store multiple facts about each instance?' If all answers are yes, it's likely an entity.
Attributes are the specific pieces of information that describe entities. Each attribute represents a single fact about an entity instance.
Attribute Specification Requirements:
Every attribute should be specified with:
| Specification Element | Description | Example |
|---|---|---|
| Attribute Name | Standard name following naming conventions | CustomerEmailAddress |
| Business Definition | Clear, unambiguous meaning | The primary email address for business communications with the customer |
| Data Type | Conceptual type (Text, Number, Date, Boolean) | Text |
| Length/Precision | Size constraints | Max 255 characters |
| Format Pattern | Required structure | Standard email format (x@y.z) |
| Valid Values/Range | Enumeration or range constraints | Must be valid email format; unique across customers |
| Optionality | Required, Optional, Conditional | Required for online customers; Optional for walk-in customers |
| Default Value | Value if not provided | None |
| Source | How attribute is populated | Customer provides during registration |
| Update Rules | Who/when can modify | Customer can update via profile; CSR can update via phone verification |
Types of Attributes:
Simple vs. Composite Attributes:
Design Decision: Whether to store composite attributes as single values or separate components depends on how they'll be used. If you need to query by City, store City separately.
Single-Valued vs. Multi-Valued Attributes:
Normalization Principle: Multi-valued attributes typically indicate that a separate entity is needed (e.g., CustomerPhone).
Stored vs. Derived Attributes:
Design Decision: Whether to store derived values depends on calculation complexity, performance needs, and update frequency.
Insufficient attribute granularity leads to problems. If you store 'Full Name' as a single attribute, you can't easily sort by last name or search by first name. During requirements, identify how each piece of information will be used. If it needs to be queried, sorted, or reported independently, it should be a separate attribute.
A domain is the set of all legal values that an attribute can assume. Domain specification is critical for data integrity—it defines the boundaries of valid data.
Types of Domains:
| Domain Type | Description | Examples |
|---|---|---|
| Enumeration | Fixed list of valid values | OrderStatus: {Pending, Confirmed, Shipped, Delivered, Cancelled} |
| Range | Continuous values within bounds | Quantity: Integer, 1-10000; Temperature: Decimal, -50 to +150 |
| Pattern | Values matching a format specification | PhoneNumber: (NNN) NNN-NNNN; SSN: NNN-NN-NNNN |
| Semantic Type | Complex validation beyond simple patterns | EmailAddress: valid email format; URL: valid web address |
| Referential | Values from another entity | CustomerID must exist in Customer table |
| Calculated | Values computed by formula | Age: derived from BirthDate and current date |
| Temporal | Date/time with constraints | StartDate must be before EndDate; HireDate cannot be future |
Domain Specification Best Practices:
Be Explicit: Don't assume everyone understands that 'Date' means a date without time, or that 'Amount' allows negative values. Specify precisely:
Future-Proof: Consider future needs when defining domains:
Capture Special Values: Document how to handle:
Many attributes share the same domain. Define domains once and reference them by name: 'EmailAddress', 'MonetaryAmount', 'USPhoneNumber'. This ensures consistency and simplifies maintenance when domain rules change.
Relationships define how entities are connected. They are as important as entities themselves—in fact, the relational model is named for its focus on relationships.
Relationship Specification Elements:
Every relationship should specify:
| Cardinality | Meaning | Example | Database Implication |
|---|---|---|---|
| One-to-One (1:1) | Each entity on one side relates to at most one on the other | Employee has one Passport; Passport belongs to one Employee | Foreign key in either table, or merge tables |
| One-to-Many (1:N) | One entity relates to multiple instances of another | Customer places many Orders; each Order has one Customer | Foreign key in the 'many' side table |
| Many-to-Many (M:N) | Multiple instances relate to multiple instances | Students enroll in Courses; Courses have many Students | Junction/bridge table with foreign keys to both |
Participation Constraints:
Total (Mandatory) Participation: Every instance of the entity must participate in the relationship.
Partial (Optional) Participation: Instances may or may not participate in the relationship.
Minimum and Maximum Cardinality: More precisely, we can specify:
Example: "Each Project must have at least 2 and at most 10 Team Members" → (2,10)
Every relationship can be read in two directions. 'Customer places Order' and 'Order is placed by Customer' are the same relationship. Document both perspectives to ensure complete understanding. The verb phrase helps clarify the business meaning.
Every entity must have a way to uniquely identify each instance. Identifiers are attributes (or combinations of attributes) that distinguish one entity instance from another.
Types of Identifiers:
| Identifier Type | Description | Examples | Characteristics |
|---|---|---|---|
| Natural Key | Real-world attribute that is inherently unique | Social Security Number, ISBN, VIN | Meaningful, but may change or have exceptions |
| Surrogate Key | Artificial identifier with no business meaning | CustomerID (auto-increment), GUID | Stable, guaranteed unique, but meaningless to users |
| Composite Key | Multiple attributes together forming uniqueness | OrderID + LineNumber, DepartmentCode + EmployeeNumber | Reflects natural relationships, but more complex |
| Alternate Key | Another unique attribute that could be the primary key | Email (when CustomerID is primary) | Provides additional uniqueness constraints |
Identifier Selection Criteria:
When identifying keys during requirements, evaluate:
Uniqueness: Is the identifier truly unique across all instances, past, present, and future? SSN is unique for US individuals, but what about international employees?
Stability: Will the identifier value change? Email addresses change. Names change. Assigned numbers typically don't.
Minimality: Is any part of a composite key redundant? Use the minimum attributes necessary for uniqueness.
Existence: Is the identifier available when the entity is created? You can't use InvoiceNumber as a key if invoices are created before numbers are assigned.
Format: Is the identifier format consistent and well-defined? Free-form text identifiers are problematic.
Privacy: Are there privacy concerns with using this identifier? Using SSN as a customer key creates security risks.
Most modern database designs use surrogate keys as primary keys for stability and simplicity, while enforcing uniqueness constraints on natural keys (alternate keys) to preserve business meaning. This 'belt and suspenders' approach provides both stability and business validation.
Data requirements findings must be documented in a structured, usable format. The documentation serves multiple purposes:
Core Data Requirements Document Sections:
Entity Specification Example:
ENTITY: Customer
Description:
A person or organization that purchases products or services
from the company.
Aliases: Client, Buyer, Account Holder
Primary Identifier: CustomerID (surrogate)
Alternate Identifiers: EmailAddress (for individuals),
TaxID (for organizations)
Estimated Volume: 500,000 current; 50,000 new per year
Retention: Active customers indefinitely; Inactive >5 years archived
ATTRIBUTES:
CustomerID
Definition: System-assigned unique identifier
Domain: PositiveInteger, Auto-generated
Optionality: Required
Source: System-generated at registration
CustomerType
Definition: Classification of customer
Domain: {Individual, Business, Government}
Optionality: Required
Source: Customer selection during registration
EmailAddress
Definition: Primary email for communications
Domain: EmailAddress (see Domain Definitions)
Optionality: Required for Individual; Optional for Business
Source: Customer provides
Constraint: Must be unique across all customers
RegistrationDate
Definition: Date customer account was created
Domain: Date, not future
Optionality: Required
Source: System-captured at registration
...(additional attributes)
Review data requirements for: Completeness (all entities from interviews/documents captured?), Consistency (no contradictory constraints?), Clarity (no ambiguous definitions?), Traceability (every requirement linked to a source?), and Verifiability (can we test whether implementation meets requirements?).
Data requirements documentation must be validated before proceeding to design. Validation ensures that documented requirements accurately reflect stakeholder needs and organizational realities.
Validation Techniques:
Common Validation Issues:
Terminology Conflicts: Different stakeholders use the same term differently. "Account" might mean customer account to Sales but financial account to Accounting. Establish and document canonical definitions.
Missing Entities: "Wait—you don't have Warranty anywhere? We need to track warranties!" Validation reveals entities overlooked during initial gathering.
Cardinality Errors: "Actually, a product can belong to multiple categories, not just one." Stakeholder review catches relationship misunderstandings.
Constraint Conflicts: "You show OrderDate as required, but for phone orders, we don't always capture the exact date." Validation reveals requirements that conflict with operational reality.
Domain Issues: "Status needs a 'Hold' option that isn't in your list." Enumeration domains frequently need expansion.
The temptation to skip formal validation is strong—everyone is eager to start 'real' design. Resist this. Errors discovered during design cost 10x more to fix than errors caught during validation. Errors discovered during implementation cost 100x more. Validation is investment, not overhead.
Data requirements transform raw requirements gathering output into structured specifications that directly inform database design. They answer the fundamental question of what information must be stored.
What's Next:
Data requirements define what to store; Functional Requirements define what the system must do with that data. The next page explores functional requirements—the operations, queries, updates, and reports that the database must support. Together, data and functional requirements provide a complete specification for database design.
You now understand how to specify formal data requirements for database design. This structured approach transforms raw requirements input into precise specifications that guide conceptual and logical modeling. Next, we'll examine functional requirements—what the database must do with the data it stores.