Loading content...
Every organization accumulates documents—forms, reports, manuals, emails, spreadsheets, and records—that silently encode decades of operational knowledge. These artifacts are invaluable for database design because they represent formalized, tested, and validated views of organizational data.
Document analysis is the systematic examination of existing written materials to extract data requirements. Where interviews capture what people think they do and believe they need, documents reveal what they actually do and formally require.
Consider a simple invoice form: it defines precisely what data is captured during a transaction, in what format, with what validations. That single document tells you more about the transactional data model than an hour of abstract discussion. When combined with interviews, document analysis creates a requirements base that is both comprehensive and verifiable.
By the end of this page, you will understand how to systematically analyze organizational documents to extract database requirements. You'll learn to identify valuable document sources, interpret different document types, extract data elements, and synthesize findings into actionable requirements specifications.
Document analysis complements user interviews by providing concrete, verifiable evidence of organizational data practices. It serves multiple strategic purposes in requirements gathering:
Verification and Validation: Documents confirm or contradict interview findings. When a manager says "we track customer preferences," the customer intake form either contains preference fields or it doesn't. Documents ground abstract descriptions in reality.
Discovery of Implicit Requirements: Documents often contain requirements that users take for granted. The date format on every form, the mandatory fields marked with asterisks, the approval signatures required—these are requirements users no longer consciously remember because they became automatic.
Historical Context: Old versions of documents reveal how requirements evolved. Understanding what changed and why provides insight into organizational priorities and pain points.
Scope Definition: The volume and variety of documents help define project boundaries. If an organization produces 500 distinct report types, that's a scaling factor that affects the design effort.
While documents are valuable, they have limitations. Documents show the formal view of operations, not necessarily the actual practice. Outdated documents may no longer reflect reality. Manual workarounds and exceptions often bypass documented procedures. Document analysis must be combined with interviews to get the complete picture.
Organizations produce diverse document types, each valuable for different aspects of requirements gathering. A comprehensive analysis examines multiple document categories to build a complete picture.
| Document Type | Examples | Requirements Value |
|---|---|---|
| Input Forms | Application forms, Order forms, Registration documents | Define data capture points, mandatory fields, valid values, and data formats |
| Output Reports | Financial statements, Sales reports, Status dashboards | Identify derived data, aggregations, reporting periods, and presentation requirements |
| Policy Documents | HR policies, Compliance manuals, Security guidelines | Business rules, constraints, approval workflows, and authorization levels |
| Procedure Manuals | Standard Operating Procedures, Training guides | Workflow sequences, decision points, exception handling, and user roles |
| Correspondence | Emails, Letters, Memos | Communication data requirements, relationship tracking, historical records |
| Contracts/Agreements | Service agreements, Vendor contracts, Employment contracts | Legal constraints, data retention requirements, reporting obligations |
| Legacy System Documentation | Database schemas, ERDs, Data dictionaries | Existing data models, field definitions, relationships, constraints |
| Regulatory Documents | Compliance requirements, Industry standards, Legal mandates | Mandatory data elements, retention periods, audit requirements |
Forms: The Richest Requirements Source
Forms deserve special attention because they explicitly define data collection points. Every form represents a deliberate decision about:
A well-designed database should be able to store every piece of information that appears on organizational forms—plus any contextual data (who filled it, when, why) needed for audit and analysis.
Reports: Defining Information Needs
Reports reveal what the organization considers valuable output. From reports, you can work backward to determine:
Pay special attention to handwritten additions on forms, crossed-out fields, and supplementary notes. These often indicate requirements that the current forms don't accommodate—gaps that the new database can address.
Gathering documents is itself a structured process. Documents may be scattered across departments, archived in different formats, or guarded by protective stakeholders. A systematic approach ensures comprehensive coverage.
Dealing with Document Access Challenges:
Not all documents are readily available:
Confidential Documents: Some documents (financial, HR, legal) may be restricted. Work with project sponsors to obtain appropriate access. Sign NDAs if required. In some cases, redacted samples suffice for requirements purposes.
Informal Documents: Many organizations operate with informal documents—spreadsheets passed between colleagues, personal checklists, undocumented macros. These often contain critical operational knowledge. Ask: "Beyond official forms, what tools do you use daily?"
Digital Documents: Modern organizations have much documentation in electronic formats—PDFs, online forms, database screens. Screenshots and exports capture these artifacts.
Legacy Documents: Old paper forms and outdated system printouts may exist only in physical archives. These are valuable for understanding historical decisions and data that must be migrated.
Organizations often run 'shadow systems'—spreadsheets, Access databases, or informal tracking documents that supplement official systems. These shadows indicate requirements the current systems don't meet. They're gold mines for requirements discovery.
Forms are the most direct source of data requirements. Each form field maps to potential database attributes. Systematic form analysis extracts maximum requirements value.
Form Analysis Procedure:
| Analysis Item | What to Record | Example |
|---|---|---|
| Field Name | Exact label from the form | Customer Name |
| Data Type | Text, Number, Date, Boolean, etc. | Text |
| Length/Format | Character limits, formatting patterns | 50 characters, allows apostrophes |
| Required? | Mandatory or optional indicator | Yes (marked with asterisk) |
| Valid Values | Dropdown options, checkboxes, constrained ranges | Dropdown: Mr/Ms/Mrs/Dr |
| Default Value | Pre-filled values | Today's date |
| Source | Where the data originates | Customer provides during registration |
| Dependencies | Fields that affect this field's behavior | State dropdown depends on Country selection |
| Validation Rules | Format patterns, range checks, cross-field validations | Zip code format: NNNNN or NNNNN-NNNN |
Recognizing Data Patterns:
Experienced analysts recognize common data patterns in forms:
Identifiers: Fields like "Order Number," "Employee ID," "Account #" are entity identifiers—candidates for primary keys.
References: Fields prefixed with another entity name ("Customer Number" on an order form) indicate foreign key relationships.
Repeating Groups: Sections with lines for multiple items (order line items, family members, skills) indicate one-to-many relationships requiring separate tables.
Code Fields: Dropdowns, radio buttons, and checkboxes often represent coded values that might become lookup tables.
Calculated Fields: Totals, rates, and derived dates suggest what the database must compute or store.
Date Patterns: Multiple date fields (created, modified, approved, expired) reveal lifecycle requirements.
If you obtain completed forms, pay attention to what's left blank. Consistently empty fields may indicate obsolete requirements or fields that are mandatory in theory but optional in practice. Also note fields where users have written additional information in margins—these indicate data the form doesn't accommodate but users need to capture.
Reports represent the organization's information output—what they consider valuable to know. Report analysis works backward: from presentation to the underlying data that must be stored to produce the report.
Report Analysis Approach:
| Report Element | Question to Ask | Database Implication |
|---|---|---|
| 'Monthly Sales by Region' | What defines regions? How are sales dated? | Need: Sale.SaleDate, Region.RegionID, Sale.Amount |
| 'Year-over-Year Growth %' | How far back must data be stored? | Retention: Minimum 2 years of transaction data |
| 'Top 10 Customers by Revenue' | How is revenue calculated per customer? | Need: Customer-Order-OrderItem relationship with aggregation |
| 'Average Days to Ship' | What dates are captured? | Need: Order.OrderDate, Shipment.ShipDate |
| 'Active vs Inactive Accounts' | What defines 'active'? | Need: Account.Status or Account.LastActivityDate |
Report Requirements Hierarchy:
Reports exist at different organizational levels with different implications:
| Report Level | Characteristics | Database Implications |
|---|---|---|
| Operational | Daily/real-time, detailed, transactional | Requires current, granular data with fast access |
| Tactical | Weekly/monthly, summarized, departmental | Requires aggregation capabilities, period-end snapshots |
| Strategic | Quarterly/annual, trends, executive | Requires historical data, trend calculation, drill-down capability |
| Ad-hoc | On-demand, unpredictable structure | Requires flexible query capability, denormalized access |
| Regulatory | Compliance-mandated, specific format | Requires exact data elements as specified, audit trails |
Dashboards and Visualizations:
Modern organizations use dashboards with charts, gauges, and KPIs. These are reports in visual form. For each visualization element, apply the same analysis: what data produces this chart? What time range? What filters?
Many organizations have accumulated hundreds of reports—many redundant, outdated, or rarely used. Part of requirements gathering is rationalizing reports: which are essential, which could be consolidated, which should be retired. Don't design a database to support obsolete reports.
Procedure manuals and policy documents encode business rules—the constraints and logic that govern how data is created, modified, and used. These rules must be enforced by the database or the applications that use it.
Extracting Business Rules from Documents:
Business rules appear in various forms within documentation:
| Rule Type | Document Indicator | Database Implementation |
|---|---|---|
| Existence Rules | "Every order must have a customer" | NOT NULL constraint, foreign key |
| Cardinality Rules | "Each project has one lead" | Unique constraint, check constraint |
| Domain Rules | "Status must be Active, Pending, or Closed" | CHECK constraint, lookup table |
| Derivation Rules | "Total = Quantity × Unit Price" | Computed column or trigger |
| Temporal Rules | "Invoices due 30 days from issue" | Date arithmetic, derived column |
| Authorization Rules | "Managers approve expenses over $1000" | Application logic, audit fields |
| Sequence Rules | "Shipping occurs after payment confirmation" | Status field, workflow state machine |
| Uniqueness Rules | "No two employees share the same SSN" | UNIQUE constraint |
Procedure Analysis Technique:
Procedures describe sequences of activities. From these, you can extract:
States and Transitions: Procedures often describe lifecycle stages ("pending → approved → closed"). These become status fields with valid transition rules.
Roles and Permissions: Who performs each step? This maps to user roles and access control requirements.
Triggering Events: What initiates the procedure? Events often correspond to record creation or status changes.
Decision Points: Branching logic in procedures becomes business rules. "If order > $10,000, require manager approval" is a constraint.
Time Constraints: Deadlines and SLAs revealed in procedures become temporal requirements.
Exception Handling: What happens when standard procedures fail? Exceptions often require special data structures (reversal records, exception codes, escalation logs).
Procedures describe intended behavior; actual practice may differ. During interviews, ask: 'Does the actual workflow match this documented procedure?' Organizations often evolve faster than their documentation. Use documents as a starting point, but validate against real-world practice.
When replacing or enhancing existing systems, legacy documentation is invaluable. It reveals what the organization invested engineering effort to build—a strong signal of what matters.
Types of Legacy System Documentation:
Leveraging Legacy Schemas:
Existing database schemas are requirements in themselves—even if poorly designed, they encode decisions made for reasons. When analyzing legacy schemas:
Data Profiling:
Beyond documentation, profiling actual data in legacy systems reveals:
| Aspect | Question | Implication |
|---|---|---|
| Data Volume | How much data exists? Growth rate? | Storage sizing, archival strategy |
| Data Quality | How clean is existing data? | Migration complexity, cleansing requirements |
| Active vs. Historical | What percentage is actively used? | Archival opportunities, partition strategies |
| Unused Elements | Are there unused tables/columns? | May exclude from new design |
| Missing Elements | What's tracked manually outside the system? | New requirements to address |
| Performance Issues | What queries are slow? | Index requirements, denormalization candidates |
Legacy systems often contain accumulated design debt—columns added expediently, relationships never formalized, constraints never enforced. Use legacy documentation as input, not gospel. The goal is to learn what was needed, not to replicate old mistakes.
After collecting and analyzing documents, the next step is synthesis—integrating findings into a coherent requirements specification that can guide database design.
Creating a Data Element Inventory:
A data element inventory catalogs every identified data element with standardized documentation:
| Data Element | Definition | Source Documents | Data Type | Constraints | Related Entities |
|---|---|---|---|---|---|
| CustomerID | Unique identifier assigned to each customer | Customer Form, Order Form | Integer | Primary Key, Auto-generated | Customer |
| OrderDate | Date when order was placed | Order Form | Date | Required, Not Future | Order |
| UnitPrice | Selling price of one unit | Product Catalog, Price List | Decimal(10,2) | > 0 | OrderItem, Product |
This inventory becomes a foundation for the conceptual data model.
Cross-Referencing with Interviews:
Document analysis findings should be validated against interview results:
Thorough document analysis significantly accelerates conceptual design. When you begin ER modeling, you're not guessing at entities and attributes—you have a documented inventory derived from actual organizational artifacts, validated against stakeholder input.
Document analysis transforms organizational artifacts into actionable database requirements. It provides concrete, verifiable evidence that grounds abstract interview findings in documented reality.
What's Next:
With interview findings and document analysis complete, we now have a rich understanding of organizational data. The next page focuses on Data Requirements—the systematic specification of what data the database must store, including entities, attributes, domains, and relationships. This moves us from raw requirements input toward structured requirements specification.
You now understand how to systematically analyze organizational documents to extract database requirements. Combined with interview skills, you can build a comprehensive requirements base. Next, we'll formalize these findings into structured data requirements specifications.