Loading learning content...
Consider how data lives in a typical enterprise. The Order Management System knows about orders—their line items, quantities, and fulfillment status. The CRM knows about customer interactions—support tickets, sales calls, and campaign responses. The Inventory System tracks product locations, stock levels, and reorder points.
Each system is optimized for its operational function. The CRM doesn't care about inventory levels; the Order system doesn't track customer satisfaction scores. Data is fragmented by application boundary, not organized by business meaning.
Now imagine a business analyst asking: "Which of our high-value customers in the Northeast region experienced delayed shipments last quarter, and how did that affect their subsequent purchase behavior?"
This single question spans customers (CRM), orders (OMS), inventory (WMS), and geography (master data). In an operational environment, answering it requires heroic effort—querying multiple systems, reconciling different identifiers, and manually assembling the puzzle. In a subject-oriented data warehouse, the answer flows naturally because data is organized around what the business cares about, not which system captured it.
By the end of this page, you will deeply understand subject-orientation: what it means conceptually, how it manifests in schema design, why it dramatically improves analytical capability, and how to identify and model business subjects correctly. You'll see the contrast between application-centric and subject-centric data organization with concrete examples.
Subject-orientation means that data in a warehouse is organized around the major subjects of the enterprise—the core business entities that matter for decision-making—rather than around the applications or processes that generate the data.
"A data warehouse is organized around the major subject areas of the enterprise... The major subject areas are broadly recognizable—customer, product, vendor, transaction, account, etc." — Bill Inmon
What is a "Subject"?
A subject is a fundamental business entity that:
Ask: 'Would this concept exist and matter to the business even if all our current applications were replaced?' If yes, it's likely a core subject. Customers exist independent of the CRM. Products exist independent of the inventory system. But 'CRM_Contact_Record' is application-specific—it's a representation of Customer, not a subject itself.
The difference between application-centric and subject-centric data organization is profound. Let's examine a concrete example.
Scenario: A retail company has three operational systems:
Application-Centric View
Data remains siloed by source system:
1234567891011121314151617
-- POS System Tablespos_transactionspos_transaction_itemspos_storespos_registers -- E-commerce Tablesecom_ordersecom_order_linesecom_cart_itemsecom_sessions -- Loyalty Tablesloyalty_membersloyalty_pointsloyalty_tiersloyalty_redemptionsProblems:
Subject-Centric View
Data unified around business subjects:
123456789101112131415161718
-- CUSTOMER Subjectdim_customer-- unified customer identity -- PRODUCT Subjectdim_product-- single product master -- SALES Subject (unified transactions)fact_sales-- all channels in one place -- LOCATION Subjectdim_storedim_geography -- TIME Subjectdim_dateBenefits:
The Transformation
Moving from application-centric to subject-centric requires:
Dimensional modeling is the primary technique for implementing subject-orientation in data warehouses. Developed and popularized by Ralph Kimball, it organizes data into fact tables (measurements of business processes) and dimension tables (context for those measurements).
Dimension tables represent the subjects—the nouns of the business. Fact tables represent events and measurements—the verbs that connect subjects in time.
Correctly identifying business subjects is a foundational design exercise. Get it wrong, and your warehouse will be confusing to use and painful to maintain. Here's a systematic approach:
| Candidate | Is It a Subject? | Reasoning |
|---|---|---|
| Customer | Yes | Fundamental entity across sales, marketing, service, finance. Universal interest. |
| Product | Yes | Core entity for revenue, inventory, purchasing, marketing. Cross-functional relevance. |
| CRM Contact | No | Application-specific term. The subject is 'Customer'; CRM Contact is a source representation. |
| Order Header | Depends | May be a subject (core to business analysis) or may be an event/fact. Usually modeled as a fact with Order as a degenerate dimension. |
| Website Session | Typically No | Usually an event/fact, not a subject. The subject is 'Customer' or 'Visitor'. |
| Employee | Yes (usually) | HR subject. Also appears in Sales (sales rep), Service (agent), etc. Cross-functional. |
| Invoice Line Item | No | Too granular. Part of the Sales or Billing fact, not a standalone subject. |
Watch out for: (1) Confusing source system entities with business subjects—'SAP Material' isn't a subject; Product is. (2) Over-granularity—'Product Color' isn't a subject; it's an attribute of Product. (3) Process confusion—'Shipment Processing' isn't a subject; Shipment or Order is. (4) Technical jargon—if users don't recognize the term, it's not a user-facing subject.
A subject area in a data warehouse encompasses everything related to a major business subject: its dimension tables, related facts, hierarchies, and attributes. Let's examine the Customer subject area in depth:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
-- ==============================================-- CUSTOMER SUBJECT AREA-- ============================================== -- Core Customer DimensionCREATE TABLE dim_customer ( customer_key INT PRIMARY KEY, -- Surrogate key customer_id VARCHAR(50), -- Business key (unified) -- Identity Attributes first_name VARCHAR(100), last_name VARCHAR(100), full_name VARCHAR(200), email VARCHAR(255), phone VARCHAR(50), -- Demographic Attributes birth_date DATE, age INT, -- Derived age_band VARCHAR(20), -- '18-24', '25-34', etc. gender VARCHAR(20), -- Geographic Attributes (denormalized from dim_geography) address_line1 VARCHAR(200), city VARCHAR(100), state VARCHAR(50), postal_code VARCHAR(20), country VARCHAR(50), region VARCHAR(50), -- 'Northeast', 'West', etc. -- Behavioral Segmentation customer_segment VARCHAR(50), -- 'High Value', 'At Risk', etc. acquisition_channel VARCHAR(50), -- 'Web', 'Store', 'Partner' first_purchase_date DATE, last_purchase_date DATE, lifetime_value DECIMAL(12,2), -- Aggregated metric -- Loyalty Program Attributes loyalty_member_flag BOOLEAN, loyalty_tier VARCHAR(20), -- 'Gold', 'Silver', 'Bronze' loyalty_points INT, -- SCD Type 2 Tracking effective_date DATE, expiration_date DATE, is_current BOOLEAN, -- Audit source_system VARCHAR(50), -- Primary source etl_batch_id INT, loaded_at TIMESTAMP); -- Customer Hierarchy (for roll-up analysis)-- Customer → Segment → Region → Enterprise -- Related Fact Tables:-- fact_sales (customer purchases)-- fact_customer_service (support interactions) -- fact_campaign_response (marketing engagement)-- fact_web_activity (digital behavior)Subject-orientation isn't just conceptually elegant—it delivers concrete, measurable benefits to the organization.
Subject-orientation creates a semantic layer between raw source data and business users. This layer translates technical complexity into business meaning. Modern BI tools (Looker, dbt metrics layer, Tableau semantic models) formalize this pattern, but it originates in subject-oriented warehouse design.
Subject-orientation, while powerful, introduces its own challenges that architects must address:
| Challenge | Description | Mitigation Strategy |
|---|---|---|
| Entity Resolution | Determining that records in different systems represent the same subject instance (e.g., same customer) | Master Data Management (MDM), probabilistic matching, golden record creation |
| Semantic Conflict Resolution | Different systems define the same concept differently ('revenue' includes or excludes returns) | Enterprise data governance, business glossary, data stewardship |
| Subject Granularity Decisions | Determining the right level—is 'Household' a subject separate from 'Customer'? | Stakeholder consensus, analysis requirements review, flexible hierarchies |
| Cross-Subject Dependencies | Subjects relate to each other—Customer to Account, Product to Supplier | Clear relationship modeling, bridge tables where needed, conformed dimensions |
| Evolution Over Time | Business subjects evolve—new attributes, new segmentations, new relationships | Extensible schema design, SCD strategies, dimension versioning |
| Performance vs. Denormalization | Rich subject dimensions can become wide and impact load times | Selective denormalization, outrigger dimensions, mini-dimensions for volatile attributes |
Entity resolution—determining that disparate records represent the same real-world entity—is often the hardest part of achieving subject-orientation. A customer might exist as three records in CRM, two in the order system, and one in the loyalty program. Merging these without losing information or creating false matches requires sophisticated matching logic and often dedicated MDM systems.
Subject-orientation fundamentally reshapes how data is organized and accessed. Let's consolidate the key insights:
What's Next:
Subject-orientation addresses how data is organized. The next characteristic, Integration, addresses how data from disparate sources is unified. We'll explore data integration challenges, standardization requirements, and the ETL processes that transform fragmented operational data into consistent warehouse content.
You now understand subject-orientation—organizing data around business subjects rather than applications. This principle shapes schema design, enables intuitive analysis, and provides the conceptual framework for a business-friendly data warehouse. Next, we explore Integration: unifying data from heterogeneous sources into a consistent whole.