Loading content...
Consider a global telecommunications company storing call detail records (CDRs). The table has 100+ attributes and billions of rows. Different teams access different slices:
Neither pure horizontal fragmentation (by region or time) nor pure vertical fragmentation (by attribute groups) satisfies all access patterns optimally. The solution is hybrid fragmentation—applying both horizontal and vertical partitioning in combination.
Hybrid fragmentation (also called mixed fragmentation or nested fragmentation) creates fragments that are subsets of rows AND subsets of columns simultaneously. This page explores the theoretical foundations, design methodologies, reconstruction algorithms, and practical considerations for hybrid fragmentation strategies.
By the end of this page, you will understand: (1) The formal definition and correctness properties of hybrid fragmentation, (2) HV (horizontal-then-vertical) and VH (vertical-then-horizontal) nesting strategies, (3) How to design hybrid fragmentation schemas from access pattern analysis, (4) Reconstruction algorithms for hybrid fragments using union and join combinations, (5) Complexity trade-offs and when hybrid fragmentation is justified, and (6) Real-world implementation patterns.
Hybrid fragmentation emerges from applying horizontal and vertical fragmentation in sequence or combination. Understanding the formal structure is essential for correct design.
Definition:
Given a relation R, a hybrid fragmentation produces fragments F₁, F₂, ..., Fₙ where each fragment is defined by both:
Formally:
Fᵢ = π(K ∪ Aᵢ)(σ(pᵢ)(R))
Where σ applies the selection predicate pᵢ, and π projects the key K plus attribute set Aᵢ.
Correctness Properties:
Nesting Approaches:
Hybrid fragmentation can be achieved through two nesting patterns:
HV (Horizontal-then-Vertical):
VH (Vertical-then-Horizontal):
Both approaches can produce equivalent final fragmentation, but the design process differs.
| Strategy | Fragment Definition | Reconstruction | Fragment Count |
|---|---|---|---|
| Horizontal Only | σ(pᵢ)(R) | ∪ᵢ Fᵢ | Depends on predicates |
| Vertical Only | π(K ∪ Aᵢ)(R) | ⋈ᵢ Fᵢ (on K) | Depends on attributes |
| Hybrid (HV) | π(K ∪ Aⱼ)(σ(pᵢ)(R)) | ⋈ⱼ(∪ᵢ Fᵢⱼ) | Up to m × k |
| Hybrid (VH) | σ(pᵢ)(π(K ∪ Aⱼ)(R)) | ∪ᵢ(⋈ⱼ Fⱼᵢ) | Up to m × k |
For the same final partition, σ(p)(π(A)(R)) = π(A)(σ(p)(R)) when p references only attributes in A ∪ K. This commutativity means the nesting order affects the design process but not the final fragments—choose the order that matches your primary design driver (rows first or columns first).
HV nesting starts with horizontal fragmentation as the primary partitioning dimension, then applies vertical fragmentation within each horizontal fragment.
When to Use HV:
Design Process:
Define Horizontal Predicates:
Analyze Attribute Access per Fragment:
Apply Vertical Fragmentation to Each Horizontal Fragment:
Example: Call Detail Records
Step 1: Horizontal fragmentation by region
CDR → CDR_NorthAmerica (region IN ('US', 'CA', 'MX'))
→ CDR_Europe (region IN ('UK', 'DE', 'FR', ...))
→ CDR_AsiaPacific (region IN ('JP', 'CN', 'AU', ...))
Step 2: Vertical fragmentation of each horizontal fragment
CDR_NorthAmerica → CDR_NA_Billing (tid, call_id, duration, cost, taxes)
→ CDR_NA_Technical (tid, call_id, codec, network_type, latency)
→ CDR_NA_Location (tid, call_id, origin_tower, dest_tower)
(Similarly for other regions)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
-- Hybrid HV Fragmentation: Horizontal (by region) then Vertical (by function) -- Original CDR table (conceptual - not created directly)-- CDR(cdr_id, call_time, region, -- caller_id, callee_id, duration, cost, taxes, -- Billing-- codec, network_type, latency, jitter, -- Technical-- origin_tower, dest_tower, handovers) -- Location -- Step 1: Create partitioned table (horizontal by region)CREATE TABLE cdr_records ( tid BIGINT NOT NULL, cdr_id UUID NOT NULL, call_time TIMESTAMP NOT NULL, region VARCHAR(20) NOT NULL, -- Billing attributes caller_id BIGINT, callee_id BIGINT, duration_secs INTEGER, cost_cents INTEGER, tax_cents INTEGER, -- Technical attributes codec VARCHAR(20), network_type VARCHAR(20), latency_ms INTEGER, jitter_ms INTEGER, -- Location attributes origin_tower_id BIGINT, dest_tower_id BIGINT, handover_count INTEGER, PRIMARY KEY (tid, region)) PARTITION BY LIST (region); -- Horizontal partitions by regionCREATE TABLE cdr_records_na PARTITION OF cdr_records FOR VALUES IN ('US', 'CA', 'MX');CREATE TABLE cdr_records_eu PARTITION OF cdr_records FOR VALUES IN ('UK', 'DE', 'FR', 'IT', 'ES');CREATE TABLE cdr_records_apac PARTITION OF cdr_records FOR VALUES IN ('JP', 'CN', 'AU', 'IN', 'SG'); -- Step 2: Create vertical fragment views (or materialized tables) per region -- North America - Billing fragmentCREATE VIEW cdr_na_billing ASSELECT tid, cdr_id, call_time, region, caller_id, callee_id, duration_secs, cost_cents, tax_centsFROM cdr_records_na; -- North America - Technical fragment CREATE VIEW cdr_na_technical ASSELECT tid, cdr_id, call_time, region, codec, network_type, latency_ms, jitter_msFROM cdr_records_na; -- North America - Location fragmentCREATE VIEW cdr_na_location ASSELECT tid, cdr_id, call_time, region, origin_tower_id, dest_tower_id, handover_countFROM cdr_records_na; -- Similar views for EU and APAC regions... -- For physical separation, use materialized tables with triggers for syncCREATE TABLE cdr_na_billing_physical AS SELECT * FROM cdr_na_billing; CREATE TABLE cdr_na_technical_physical AS SELECT * FROM cdr_na_technical; -- Reconstruction: Union horizontal first, then join vertical-- Full table for North AmericaCREATE VIEW cdr_na_full ASSELECT b.tid, b.cdr_id, b.call_time, b.region, b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents, t.codec, t.network_type, t.latency_ms, t.jitter_ms, l.origin_tower_id, l.dest_tower_id, l.handover_countFROM cdr_na_billing bJOIN cdr_na_technical t ON b.tid = t.tidJOIN cdr_na_location l ON b.tid = l.tid;HV nesting allows different vertical fragmentations per horizontal partition. If the EU region has regulatory requirements for separating customer identity from location data, CDR_EU can have a different vertical schema than CDR_NA. This flexibility is a key advantage of HV over uniform global fragmentation.
VH nesting starts with vertical fragmentation as the primary dimension, then applies horizontal partitioning within each vertical fragment.
When to Use VH:
Design Process:
Perform Attribute Affinity Analysis:
Analyze Row Distribution per Vertical Fragment:
Apply Horizontal Fragmentation to Each Vertical Fragment:
Example: E-commerce Product Catalog
Step 1: Vertical fragmentation by function
Products → Products_Catalog (tid, name, description, images)
→ Products_Inventory (tid, stock, warehouse, reorder_level)
→ Products_Pricing (tid, base_price, currency, discounts)
→ Products_Analytics (tid, view_count, buy_count, return_rate)
Step 2: Horizontal fragmentation per vertical fragment
Products_Catalog → Products_Catalog_Electronics (category IN ('TV', 'Phone', 'Laptop'))
→ Products_Catalog_Fashion (category IN ('Clothing', 'Shoes'))
→ Products_Catalog_Home (category IN ('Furniture', 'Kitchen'))
Products_Inventory → Products_Inventory_Warehouse1 (warehouse_id = 1)
→ Products_Inventory_Warehouse2 (warehouse_id = 2)
... (by warehouse for locality)
Products_Analytics → Products_Analytics_2024 (year = 2024)
→ Products_Analytics_2023 (year = 2023)
... (by year for archival)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
-- Hybrid VH Fragmentation: Vertical (by domain) then Horizontal (by criteria) -- Step 1: Vertical fragments (separate tables per attribute group) -- Products Core (shared TID and natural key)CREATE TABLE products_core ( tid BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, product_id UUID NOT NULL UNIQUE, category VARCHAR(50) NOT NULL, created_at TIMESTAMP DEFAULT NOW()); -- Catalog Fragment (vertically separated) - partitioned by categoryCREATE TABLE products_catalog ( tid BIGINT NOT NULL, category VARCHAR(50) NOT NULL, name VARCHAR(200) NOT NULL, description TEXT, image_urls TEXT[], PRIMARY KEY (tid, category), FOREIGN KEY (tid) REFERENCES products_core(tid)) PARTITION BY LIST (category); CREATE TABLE products_catalog_electronics PARTITION OF products_catalog FOR VALUES IN ('TV', 'Phone', 'Laptop', 'Tablet', 'Camera');CREATE TABLE products_catalog_fashion PARTITION OF products_catalog FOR VALUES IN ('Clothing', 'Shoes', 'Accessories', 'Jewelry');CREATE TABLE products_catalog_home PARTITION OF products_catalog FOR VALUES IN ('Furniture', 'Kitchen', 'Garden', 'Bedding'); -- Inventory Fragment - partitioned by warehouseCREATE TABLE products_inventory ( tid BIGINT NOT NULL, warehouse_id INTEGER NOT NULL, stock_qty INTEGER NOT NULL DEFAULT 0, reorder_level INTEGER, last_restock TIMESTAMP, PRIMARY KEY (tid, warehouse_id), FOREIGN KEY (tid) REFERENCES products_core(tid)) PARTITION BY HASH (warehouse_id); CREATE TABLE products_inventory_wh0 PARTITION OF products_inventory FOR VALUES WITH (MODULUS 4, REMAINDER 0);CREATE TABLE products_inventory_wh1 PARTITION OF products_inventory FOR VALUES WITH (MODULUS 4, REMAINDER 1);CREATE TABLE products_inventory_wh2 PARTITION OF products_inventory FOR VALUES WITH (MODULUS 4, REMAINDER 2);CREATE TABLE products_inventory_wh3 PARTITION OF products_inventory FOR VALUES WITH (MODULUS 4, REMAINDER 3); -- Analytics Fragment - partitioned by time (year)CREATE TABLE products_analytics ( tid BIGINT NOT NULL, year INTEGER NOT NULL, view_count BIGINT DEFAULT 0, add_to_cart BIGINT DEFAULT 0, purchase_count BIGINT DEFAULT 0, return_count BIGINT DEFAULT 0, revenue_cents BIGINT DEFAULT 0, PRIMARY KEY (tid, year), FOREIGN KEY (tid) REFERENCES products_core(tid)) PARTITION BY RANGE (year); CREATE TABLE products_analytics_2024 PARTITION OF products_analytics FOR VALUES FROM (2024) TO (2025);CREATE TABLE products_analytics_2023 PARTITION OF products_analytics FOR VALUES FROM (2023) TO (2024);CREATE TABLE products_analytics_historical PARTITION OF products_analytics FOR VALUES FROM (MINVALUE) TO (2023); -- Reconstruction: Join verticals, union horizontals-- Full product view for electronics category, all warehouses, all yearsCREATE VIEW products_electronics_full ASSELECT c.tid, c.name, c.description, c.image_urls, i.warehouse_id, i.stock_qty, i.reorder_level, a.year, a.view_count, a.purchase_count, a.revenue_centsFROM products_catalog_electronics cJOIN products_inventory i ON c.tid = i.tidJOIN products_analytics a ON c.tid = a.tid;Reconstructing the original relation from hybrid fragments requires combining union (for horizontal) and join (for vertical) operations in the correct order.
Reconstruction Formula for HV:
If R was fragmented horizontally into {H₁, ..., Hₘ} and each Hᵢ vertically into {Vᵢ₁, ..., Vᵢₖ}:
R = ⋈ⱼ₌₁ᵏ (∪ᵢ₌₁ᵐ Fᵢⱼ)
Where Fᵢⱼ is the fragment for horizontal partition i, vertical partition j.
Translation: For each vertical attribute group j, union all horizontal fragments together. Then join all vertical results on the tuple identifier.
Reconstruction Formula for VH:
If R was fragmented vertically into {V₁, ..., Vₖ} and each Vⱼ horizontally into {Hⱼ₁, ..., Hⱼₘ}:
R = ∪ᵢ₌₁ᵐ (⋈ⱼ₌₁ᵏ Fⱼᵢ)
Where Fⱼᵢ is the fragment for vertical partition j, horizontal partition i.
Translation: For each horizontal partition i, join all vertical fragments together. Then union all horizontal results.
Equivalence: Both formulas produce the same result when all fragments exist. The difference is in execution strategy and optimization opportunities.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
-- Reconstruction Examples for Hybrid Fragmentation -- ============================================-- HV Reconstruction: Union horizontals, then join verticals-- ============================================ -- Given: CDR table with HV fragmentation-- Horizontal: region (NA, EU, APAC)-- Vertical: billing, technical, location -- Step 1: Union horizontal fragments for each vertical group -- All billing data (across regions)CREATE VIEW cdr_billing_all ASSELECT * FROM cdr_na_billingUNION ALLSELECT * FROM cdr_eu_billingUNION ALLSELECT * FROM cdr_apac_billing; -- All technical data (across regions)CREATE VIEW cdr_technical_all ASSELECT * FROM cdr_na_technicalUNION ALLSELECT * FROM cdr_eu_technicalUNION ALLSELECT * FROM cdr_apac_technical; -- All location data (across regions)CREATE VIEW cdr_location_all ASSELECT * FROM cdr_na_locationUNION ALLSELECT * FROM cdr_eu_locationUNION ALLSELECT * FROM cdr_apac_location; -- Step 2: Join vertical groupsCREATE VIEW cdr_reconstructed_hv ASSELECT b.tid, b.cdr_id, b.call_time, b.region, b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents, t.codec, t.network_type, t.latency_ms, t.jitter_ms, l.origin_tower_id, l.dest_tower_id, l.handover_countFROM cdr_billing_all bJOIN cdr_technical_all t ON b.tid = t.tidJOIN cdr_location_all l ON b.tid = l.tid; -- ============================================-- VH Reconstruction: Join verticals, then union horizontals-- ============================================ -- Given: Products table with VH fragmentation-- Vertical: catalog, inventory, analytics-- Horizontal (per vertical): category, warehouse, year -- Step 1: For each horizontal partition, join vertical fragments -- Electronics products (all verticals)CREATE VIEW products_electronics_complete ASSELECT cat.tid, cat.name, cat.description, inv.warehouse_id, inv.stock_qty, anl.year, anl.view_count, anl.purchase_countFROM products_catalog_electronics catJOIN products_inventory inv ON cat.tid = inv.tidJOIN products_analytics anl ON cat.tid = anl.tid; -- Fashion products (all verticals) CREATE VIEW products_fashion_complete ASSELECT cat.tid, cat.name, cat.description, inv.warehouse_id, inv.stock_qty, anl.year, anl.view_count, anl.purchase_countFROM products_catalog_fashion catJOIN products_inventory inv ON cat.tid = inv.tidJOIN products_analytics anl ON cat.tid = anl.tid; -- Home products (all verticals)CREATE VIEW products_home_complete ASSELECT cat.tid, cat.name, cat.description, inv.warehouse_id, inv.stock_qty, anl.year, anl.view_count, anl.purchase_countFROM products_catalog_home catJOIN products_inventory inv ON cat.tid = inv.tidJOIN products_analytics anl ON cat.tid = anl.tid; -- Step 2: Union horizontal partitionsCREATE VIEW products_reconstructed_vh ASSELECT * FROM products_electronics_completeUNION ALLSELECT * FROM products_fashion_completeUNION ALLSELECT * FROM products_home_complete;Full reconstruction is expensive—it requires joining all vertical fragments and unioning all horizontal fragments. In practice, queries should access only necessary fragments. The query optimizer should push predicates into fragments to minimize data movement. Design your hybrid schema so common queries touch minimal fragments.
Designing hybrid fragmentation requires systematic analysis of access patterns, distribution requirements, and reconstruction costs.
Step-by-Step Design Process:
Step 1: Workload Analysis
Document all significant queries with:
Step 2: Identify Primary Partitioning Dimension
Decide whether horizontal or vertical should be the outer fragmentation:
| Criterion | Favor HV | Favor VH |
|---|---|---|
| Data sovereignty | Row-based locality required | Attribute-based security |
| Primary query pattern | Regional/temporal filtering | Attribute group access |
| Scaling requirement | Rows grow faster | Attributes vary widely |
| Team organization | Regional operations | Functional domains |
Step 3: Design Outer Fragmentation
Apply either:
Step 4: Design Inner Fragmentation Per Outer Fragment
For each outer fragment:
Step 5: Validate Against Critical Queries
For top 10-20 queries, trace fragment access:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
from dataclasses import dataclassfrom typing import List, Set, Dict, Tuplefrom enum import Enum class FragmentType(Enum): HORIZONTAL = "horizontal" VERTICAL = "vertical" HYBRID_HV = "hybrid_hv" HYBRID_VH = "hybrid_vh" @dataclassclass Query: name: str attributes: Set[str] predicates: Dict[str, str] # {attribute: condition} frequency: int # per hour max_latency_ms: int source_region: str @dataclassclass FragmentationAnalysis: """Analyze queries to recommend fragmentation strategy.""" def __init__(self, all_attributes: Set[str], queries: List[Query]): self.attributes = all_attributes self.queries = queries def compute_attribute_coverage(self) -> Dict[str, float]: """What fraction of queries access each attribute?""" coverage = {attr: 0.0 for attr in self.attributes} total_freq = sum(q.frequency for q in self.queries) for query in self.queries: for attr in query.attributes: coverage[attr] += query.frequency / total_freq return coverage def compute_predicate_frequency(self) -> Dict[str, float]: """How often is each attribute used in predicates?""" freq = {} total_freq = sum(q.frequency for q in self.queries) for query in self.queries: for attr in query.predicates: freq[attr] = freq.get(attr, 0) + query.frequency / total_freq return freq def compute_attribute_affinity(self) -> Dict[Tuple[str, str], float]: """How often are attribute pairs accessed together?""" affinity = {} for query in self.queries: attrs = list(query.attributes) for i, a1 in enumerate(attrs): for a2 in attrs[i+1:]: key = (min(a1, a2), max(a1, a2)) affinity[key] = affinity.get(key, 0) + query.frequency return affinity def recommend_strategy(self) -> FragmentType: """Recommend fragmentation type based on analysis.""" pred_freq = self.compute_predicate_frequency() attr_coverage = self.compute_attribute_coverage() # Check for strong horizontal partitioning candidates horizontal_score = 0 for attr, freq in pred_freq.items(): if freq > 0.5: # Attribute frequently in predicates horizontal_score += freq # Check for vertical partitioning potential coverage_values = list(attr_coverage.values()) vertical_score = 1 - (sum(coverage_values) / len(coverage_values)) print(f"Horizontal score: {horizontal_score:.2f}") print(f"Vertical score: {vertical_score:.2f}") if horizontal_score > 0.7 and vertical_score > 0.5: return FragmentType.HYBRID_HV elif vertical_score > 0.7 and horizontal_score > 0.5: return FragmentType.HYBRID_VH elif horizontal_score > vertical_score: return FragmentType.HORIZONTAL else: return FragmentType.VERTICAL # Example usagequeries = [ Query("NA_billing_report", {"cdr_id", "region", "caller_id", "cost", "duration"}, {"region": "IN ('US', 'CA')", "call_time": ">= '2024-01-01'"}, 1000, 500, "us-east"), Query("EU_technical_analysis", {"cdr_id", "region", "codec", "latency"}, {"region": "IN ('DE', 'FR')", "latency": "> 100"}, 500, 1000, "eu-central"), Query("Global_location_tracking", {"cdr_id", "origin_tower", "dest_tower"}, {"call_time": ">= CURRENT_DATE"}, 2000, 200, "global"), # ... more queries] all_attrs = {"cdr_id", "call_time", "region", "caller_id", "callee_id", "duration", "cost", "codec", "latency", "origin_tower", "dest_tower"} analysis = FragmentationAnalysis(all_attrs, queries)recommendation = analysis.recommend_strategy()print(f"Recommended strategy: {recommendation.value}")Production systems employ hybrid fragmentation patterns tailored to specific domains. Understanding these patterns provides templates for your designs.
Pattern 1: Multi-Tenant SaaS
Outer (Horizontal): Tenant ID
Inner (Vertical): Feature modules
Customer_Tenant_A_Core (tid, name, email, status)
Customer_Tenant_A_Billing (tid, plan, payment_method, last_payment)
Customer_Tenant_A_Usage (tid, api_calls, storage_gb, last_active)
Customer_Tenant_B_Core (...)
...
Rationale: Tenant isolation is primary (compliance, performance isolation). Within tenant, different services access different attribute groups.
Pattern 2: Time-Series IoT Data
Outer (Horizontal): Time range (month/week)
Inner (Vertical): Metric type
Sensor_2024_01_Temperature (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Humidity (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Pressure (tid, device_id, timestamp, value, unit)
Sensor_2024_02_Temperature (...)
...
Rationale: Time-based partitioning enables archival and lifecycle management. Metric separation allows different retention policies and optimized storage.
Pattern 3: Content Delivery Network
Outer (Vertical): Content type (metadata, binary, analytics)
Inner (Horizontal): Geographic region
Content_Metadata_NA (content_id, title, format, duration, cdn_url)
Content_Metadata_EU (...)
Content_Binary_NA (content_id, binary_data, encoding)
Content_Binary_EU (...)
Content_Analytics_2024 (content_id, region, views, bandwidth, errors)
Rationale: Metadata queried differently than binaries. Geographic distribution for latency. Analytics partitioned by time for aggregation.
| Industry | Outer Fragmentation | Inner Fragmentation | Key Driver |
|---|---|---|---|
| E-commerce | Product category (H) | Catalog / Inventory / Pricing (V) | Category-specific attributes vary significantly |
| Healthcare | Patient region (H) | Clinical / Administrative / Billing (V) | HIPAA, data residency, role-based access |
| Financial Services | Account type (H) | Profile / Transactions / Risk (V) | Regulatory compartmentalization |
| Gaming | Game server region (H) | Player stats / Inventory / Social (V) | Latency-critical separation |
| Telecommunications | Network region (H) | Billing / Network / Location (V) | Regulatory, operational team alignment |
Production systems often start with single-dimension fragmentation and evolve to hybrid as scale demands. Begin with the most impactful dimension (usually horizontal for scale, vertical for access patterns). Add the second dimension only when evidence shows clear benefit—hybrid complexity should be justified by measurable performance or operational improvements.
Hybrid fragmentation combines horizontal and vertical partitioning to address complex access patterns that neither approach satisfies alone. Let's consolidate the key concepts:
What's Next:
With fragmentation strategies understood, we now turn to the design process itself. The next page explores Fragmentation Design—systematic methodologies for choosing fragmentation strategies, validating correctness properties, and evolving schemas as requirements change.
You now understand hybrid fragmentation at a deep, practical level—from formal definitions through real-world patterns. This knowledge enables designing sophisticated distribution schemas that optimize for multi-dimensional access patterns while maintaining reconstruction guarantees.