Fragmentation - Learning Module

Loading content...

0/252

Hybrid Fragmentation

When One Dimension Isn't Enough

Consider a global telecommunications company storing call detail records (CDRs). The table has 100+ attributes and billions of rows. Different teams access different slices:

Regional Operations (NA, EU, APAC): Need rows for their region, specific operational attributes
Finance: Need billing attributes across all regions
Network Engineering: Need technical metrics for recent calls only
Legal/Compliance: Need full records for specific customers under investigation

Neither pure horizontal fragmentation (by region or time) nor pure vertical fragmentation (by attribute groups) satisfies all access patterns optimally. The solution is hybrid fragmentation—applying both horizontal and vertical partitioning in combination.

Hybrid fragmentation (also called mixed fragmentation or nested fragmentation) creates fragments that are subsets of rows AND subsets of columns simultaneously. This page explores the theoretical foundations, design methodologies, reconstruction algorithms, and practical considerations for hybrid fragmentation strategies.

What You Will Learn

By the end of this page, you will understand: (1) The formal definition and correctness properties of hybrid fragmentation, (2) HV (horizontal-then-vertical) and VH (vertical-then-horizontal) nesting strategies, (3) How to design hybrid fragmentation schemas from access pattern analysis, (4) Reconstruction algorithms for hybrid fragments using union and join combinations, (5) Complexity trade-offs and when hybrid fragmentation is justified, and (6) Real-world implementation patterns.

Formal Definition of Hybrid Fragmentation

Hybrid fragmentation emerges from applying horizontal and vertical fragmentation in sequence or combination. Understanding the formal structure is essential for correct design.

Definition:

Given a relation R, a hybrid fragmentation produces fragments F₁, F₂, ..., Fₙ where each fragment is defined by both:

A selection predicate pᵢ (determining row membership)
An attribute set Aᵢ (determining column membership)

Formally:

Fᵢ = π(K ∪ Aᵢ)(σ(pᵢ)(R))

Where σ applies the selection predicate pᵢ, and π projects the key K plus attribute set Aᵢ.

Correctness Properties:

Completeness: Every (tuple, attribute) pair in R must be representable in at least one fragment
Disjointness: Each (tuple, attribute) pair should appear in exactly one fragment (for non-key attributes)
Reconstruction: R can be reconstructed from fragments using appropriate union and join operations

Nesting Approaches:

Hybrid fragmentation can be achieved through two nesting patterns:

HV (Horizontal-then-Vertical):
- First partition horizontally: R → {H₁, H₂, ..., Hₘ}
- Then partition each Hᵢ vertically: Hᵢ → {Vᵢ₁, Vᵢ₂, ..., Vᵢₖ}
- Result: m × k fragments (maximum)
VH (Vertical-then-Horizontal):
- First partition vertically: R → {V₁, V₂, ..., Vₖ}
- Then partition each Vⱼ horizontally: Vⱼ → {Hⱼ₁, Hⱼ₂, ..., Hⱼₘ}
- Result: k × m fragments (maximum)

Both approaches can produce equivalent final fragmentation, but the design process differs.

Fragmentation Strategy Comparison
Strategy	Fragment Definition	Reconstruction	Fragment Count
Horizontal Only	σ(pᵢ)(R)	∪ᵢ Fᵢ	Depends on predicates
Vertical Only	π(K ∪ Aᵢ)(R)	⋈ᵢ Fᵢ (on K)	Depends on attributes
Hybrid (HV)	π(K ∪ Aⱼ)(σ(pᵢ)(R))	⋈ⱼ(∪ᵢ Fᵢⱼ)	Up to m × k
Hybrid (VH)	σ(pᵢ)(π(K ∪ Aⱼ)(R))	∪ᵢ(⋈ⱼ Fⱼᵢ)	Up to m × k

Equivalence of HV and VH

For the same final partition, σ(p)(π(A)(R)) = π(A)(σ(p)(R)) when p references only attributes in A ∪ K. This commutativity means the nesting order affects the design process but not the final fragments—choose the order that matches your primary design driver (rows first or columns first).

Horizontal-then-Vertical (HV) Nesting

HV nesting starts with horizontal fragmentation as the primary partitioning dimension, then applies vertical fragmentation within each horizontal fragment.

When to Use HV:

Geographic or temporal partitioning is the primary distribution driver
Different regions/time periods have different vertical access patterns
Site allocation is determined by horizontal predicates first
Regulatory requirements dictate row-based data residency

Design Process:

Define Horizontal Predicates:
- Identify primary partitioning criteria (region, date range, customer segment)
- Ensure predicates are complete and disjoint
- Create horizontal fragments H₁, H₂, ..., Hₘ
Analyze Attribute Access per Fragment:
- Different horizontal fragments may have different access patterns
- Build site-specific attribute affinity matrices
- Identify attribute groups relevant to each horizontal partition
Apply Vertical Fragmentation to Each Horizontal Fragment:
- May result in different vertical schemas per horizontal fragment
- Or apply uniform vertical fragmentation globally

Example: Call Detail Records

Step 1: Horizontal fragmentation by region

CDR → CDR_NorthAmerica (region IN ('US', 'CA', 'MX'))
    → CDR_Europe       (region IN ('UK', 'DE', 'FR', ...))
    → CDR_AsiaPacific  (region IN ('JP', 'CN', 'AU', ...))

Step 2: Vertical fragmentation of each horizontal fragment

CDR_NorthAmerica → CDR_NA_Billing   (tid, call_id, duration, cost, taxes)
                 → CDR_NA_Technical (tid, call_id, codec, network_type, latency)
                 → CDR_NA_Location  (tid, call_id, origin_tower, dest_tower)

(Similarly for other regions)

hv_fragmentation_example.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
-- Hybrid HV Fragmentation: Horizontal (by region) then Vertical (by function)
 
-- Original CDR table (conceptual - not created directly)
-- CDR(cdr_id, call_time, region, 
--     caller_id, callee_id, duration, cost, taxes,  -- Billing
--     codec, network_type, latency, jitter,          -- Technical
--     origin_tower, dest_tower, handovers)           -- Location
 
-- Step 1: Create partitioned table (horizontal by region)
CREATE TABLE cdr_records (
    tid             BIGINT NOT NULL,
    cdr_id          UUID NOT NULL,
    call_time       TIMESTAMP NOT NULL,
    region          VARCHAR(20) NOT NULL,
    -- Billing attributes
    caller_id       BIGINT,
    callee_id       BIGINT,
    duration_secs   INTEGER,
    cost_cents      INTEGER,
    tax_cents       INTEGER,
    -- Technical attributes
    codec           VARCHAR(20),
    network_type    VARCHAR(20),
    latency_ms      INTEGER,
    jitter_ms       INTEGER,
    -- Location attributes
    origin_tower_id BIGINT,
    dest_tower_id   BIGINT,
    handover_count  INTEGER,
    PRIMARY KEY (tid, region)
) PARTITION BY LIST (region);
 
-- Horizontal partitions by region
CREATE TABLE cdr_records_na PARTITION OF cdr_records
    FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE cdr_records_eu PARTITION OF cdr_records
    FOR VALUES IN ('UK', 'DE', 'FR', 'IT', 'ES');
CREATE TABLE cdr_records_apac PARTITION OF cdr_records
    FOR VALUES IN ('JP', 'CN', 'AU', 'IN', 'SG');
 
-- Step 2: Create vertical fragment views (or materialized tables) per region
 
-- North America - Billing fragment
CREATE VIEW cdr_na_billing AS
SELECT tid, cdr_id, call_time, region, caller_id, callee_id, 
       duration_secs, cost_cents, tax_cents
FROM cdr_records_na;
 
-- North America - Technical fragment  
CREATE VIEW cdr_na_technical AS
SELECT tid, cdr_id, call_time, region, codec, network_type,
       latency_ms, jitter_ms
FROM cdr_records_na;
 
-- North America - Location fragment
CREATE VIEW cdr_na_location AS
SELECT tid, cdr_id, call_time, region, origin_tower_id, 
       dest_tower_id, handover_count
FROM cdr_records_na;
 
-- Similar views for EU and APAC regions...
 
-- For physical separation, use materialized tables with triggers for sync
CREATE TABLE cdr_na_billing_physical AS 
SELECT * FROM cdr_na_billing;
 
CREATE TABLE cdr_na_technical_physical AS 
SELECT * FROM cdr_na_technical;
 
-- Reconstruction: Union horizontal first, then join vertical
-- Full table for North America
CREATE VIEW cdr_na_full AS
SELECT b.tid, b.cdr_id, b.call_time, b.region,
       b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents,
       t.codec, t.network_type, t.latency_ms, t.jitter_ms,
       l.origin_tower_id, l.dest_tower_id, l.handover_count
FROM cdr_na_billing b
JOIN cdr_na_technical t ON b.tid = t.tid
JOIN cdr_na_location l ON b.tid = l.tid;

Non-Uniform Vertical Fragmentation

HV nesting allows different vertical fragmentations per horizontal partition. If the EU region has regulatory requirements for separating customer identity from location data, CDR_EU can have a different vertical schema than CDR_NA. This flexibility is a key advantage of HV over uniform global fragmentation.

Vertical-then-Horizontal (VH) Nesting

VH nesting starts with vertical fragmentation as the primary dimension, then applies horizontal partitioning within each vertical fragment.

When to Use VH:

Attribute access patterns are the primary optimization target
Different attribute groups have different scaling requirements
Columns have vastly different storage/processing needs
Security boundaries are attribute-based before row-based

Design Process:

Perform Attribute Affinity Analysis:
- Build global usage matrix and affinity matrix
- Apply clustering algorithm (BEA, hierarchical)
- Define vertical fragments V₁, V₂, ..., Vₖ
Analyze Row Distribution per Vertical Fragment:
- Each vertical fragment may have different horizontal scaling needs
- Billing data might need temporal partitioning
- Location data might need geographic partitioning
Apply Horizontal Fragmentation to Each Vertical Fragment:
- Different horizontal strategies per vertical fragment are valid
- Or apply uniform horizontal partitioning globally

Example: E-commerce Product Catalog

Step 1: Vertical fragmentation by function

Products → Products_Catalog    (tid, name, description, images)
        → Products_Inventory   (tid, stock, warehouse, reorder_level)
        → Products_Pricing     (tid, base_price, currency, discounts)
        → Products_Analytics   (tid, view_count, buy_count, return_rate)

Step 2: Horizontal fragmentation per vertical fragment

Products_Catalog → Products_Catalog_Electronics (category IN ('TV', 'Phone', 'Laptop'))
                 → Products_Catalog_Fashion    (category IN ('Clothing', 'Shoes'))
                 → Products_Catalog_Home       (category IN ('Furniture', 'Kitchen'))

Products_Inventory → Products_Inventory_Warehouse1 (warehouse_id = 1)
                   → Products_Inventory_Warehouse2 (warehouse_id = 2)
                   ... (by warehouse for locality)

Products_Analytics → Products_Analytics_2024  (year = 2024)  
                   → Products_Analytics_2023  (year = 2023)
                   ... (by year for archival)

vh_fragmentation_example.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-- Hybrid VH Fragmentation: Vertical (by domain) then Horizontal (by criteria)
 
-- Step 1: Vertical fragments (separate tables per attribute group)
 
-- Products Core (shared TID and natural key)
CREATE TABLE products_core (
    tid             BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    product_id      UUID NOT NULL UNIQUE,
    category        VARCHAR(50) NOT NULL,
    created_at      TIMESTAMP DEFAULT NOW()
);
 
-- Catalog Fragment (vertically separated) - partitioned by category
CREATE TABLE products_catalog (
    tid             BIGINT NOT NULL,
    category        VARCHAR(50) NOT NULL,
    name            VARCHAR(200) NOT NULL,
    description     TEXT,
    image_urls      TEXT[],
    PRIMARY KEY (tid, category),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY LIST (category);
 
CREATE TABLE products_catalog_electronics PARTITION OF products_catalog
    FOR VALUES IN ('TV', 'Phone', 'Laptop', 'Tablet', 'Camera');
CREATE TABLE products_catalog_fashion PARTITION OF products_catalog
    FOR VALUES IN ('Clothing', 'Shoes', 'Accessories', 'Jewelry');
CREATE TABLE products_catalog_home PARTITION OF products_catalog
    FOR VALUES IN ('Furniture', 'Kitchen', 'Garden', 'Bedding');
 
-- Inventory Fragment - partitioned by warehouse
CREATE TABLE products_inventory (
    tid             BIGINT NOT NULL,
    warehouse_id    INTEGER NOT NULL,
    stock_qty       INTEGER NOT NULL DEFAULT 0,
    reorder_level   INTEGER,
    last_restock    TIMESTAMP,
    PRIMARY KEY (tid, warehouse_id),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY HASH (warehouse_id);
 
CREATE TABLE products_inventory_wh0 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE products_inventory_wh1 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE products_inventory_wh2 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE products_inventory_wh3 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
 
-- Analytics Fragment - partitioned by time (year)
CREATE TABLE products_analytics (
    tid             BIGINT NOT NULL,
    year            INTEGER NOT NULL,
    view_count      BIGINT DEFAULT 0,
    add_to_cart     BIGINT DEFAULT 0,
    purchase_count  BIGINT DEFAULT 0,
    return_count    BIGINT DEFAULT 0,
    revenue_cents   BIGINT DEFAULT 0,
    PRIMARY KEY (tid, year),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY RANGE (year);
 
CREATE TABLE products_analytics_2024 PARTITION OF products_analytics
    FOR VALUES FROM (2024) TO (2025);
CREATE TABLE products_analytics_2023 PARTITION OF products_analytics
    FOR VALUES FROM (2023) TO (2024);
CREATE TABLE products_analytics_historical PARTITION OF products_analytics
    FOR VALUES FROM (MINVALUE) TO (2023);
 
-- Reconstruction: Join verticals, union horizontals
-- Full product view for electronics category, all warehouses, all years
CREATE VIEW products_electronics_full AS
SELECT 
    c.tid, c.name, c.description, c.image_urls,
    i.warehouse_id, i.stock_qty, i.reorder_level,
    a.year, a.view_count, a.purchase_count, a.revenue_cents
FROM products_catalog_electronics c
JOIN products_inventory i ON c.tid = i.tid
JOIN products_analytics a ON c.tid = a.tid;

HV Nesting Strengths

•Natural for geographic/regulatory partitioning first
•Each region can have customized vertical schema
•Aligns with organizational structure (regional teams)
•Simpler when horizontal predicates are obvious
•Reconstruction follows human mental model

VH Nesting Strengths

•Optimizes for attribute-centric access patterns first
•Each attribute group can scale independently
•Different horizontal strategies per vertical fragment
•Microservices alignment (service owns attribute group)
•Attribute-level security boundaries clearer

Reconstruction Algorithms

Reconstructing the original relation from hybrid fragments requires combining union (for horizontal) and join (for vertical) operations in the correct order.

Reconstruction Formula for HV:

If R was fragmented horizontally into {H₁, ..., Hₘ} and each Hᵢ vertically into {Vᵢ₁, ..., Vᵢₖ}:

R = ⋈ⱼ₌₁ᵏ (∪ᵢ₌₁ᵐ Fᵢⱼ)

Where Fᵢⱼ is the fragment for horizontal partition i, vertical partition j.

Translation: For each vertical attribute group j, union all horizontal fragments together. Then join all vertical results on the tuple identifier.

Reconstruction Formula for VH:

If R was fragmented vertically into {V₁, ..., Vₖ} and each Vⱼ horizontally into {Hⱼ₁, ..., Hⱼₘ}:

R = ∪ᵢ₌₁ᵐ (⋈ⱼ₌₁ᵏ Fⱼᵢ)

Where Fⱼᵢ is the fragment for vertical partition j, horizontal partition i.

Translation: For each horizontal partition i, join all vertical fragments together. Then union all horizontal results.

Equivalence: Both formulas produce the same result when all fragments exist. The difference is in execution strategy and optimization opportunities.

reconstruction_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
-- Reconstruction Examples for Hybrid Fragmentation
 
-- ============================================
-- HV Reconstruction: Union horizontals, then join verticals
-- ============================================
 
-- Given: CDR table with HV fragmentation
-- Horizontal: region (NA, EU, APAC)
-- Vertical: billing, technical, location
 
-- Step 1: Union horizontal fragments for each vertical group
 
-- All billing data (across regions)
CREATE VIEW cdr_billing_all AS
SELECT * FROM cdr_na_billing
UNION ALL
SELECT * FROM cdr_eu_billing
UNION ALL
SELECT * FROM cdr_apac_billing;
 
-- All technical data (across regions)
CREATE VIEW cdr_technical_all AS
SELECT * FROM cdr_na_technical
UNION ALL
SELECT * FROM cdr_eu_technical
UNION ALL
SELECT * FROM cdr_apac_technical;
 
-- All location data (across regions)
CREATE VIEW cdr_location_all AS
SELECT * FROM cdr_na_location
UNION ALL
SELECT * FROM cdr_eu_location
UNION ALL
SELECT * FROM cdr_apac_location;
 
-- Step 2: Join vertical groups
CREATE VIEW cdr_reconstructed_hv AS
SELECT 
    b.tid, b.cdr_id, b.call_time, b.region,
    b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents,
    t.codec, t.network_type, t.latency_ms, t.jitter_ms,
    l.origin_tower_id, l.dest_tower_id, l.handover_count
FROM cdr_billing_all b
JOIN cdr_technical_all t ON b.tid = t.tid
JOIN cdr_location_all l ON b.tid = l.tid;
 
 
-- ============================================
-- VH Reconstruction: Join verticals, then union horizontals
-- ============================================
 
-- Given: Products table with VH fragmentation
-- Vertical: catalog, inventory, analytics
-- Horizontal (per vertical): category, warehouse, year
 
-- Step 1: For each horizontal partition, join vertical fragments
 
-- Electronics products (all verticals)
CREATE VIEW products_electronics_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_electronics cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Fashion products (all verticals)  
CREATE VIEW products_fashion_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_fashion cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Home products (all verticals)
CREATE VIEW products_home_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_home cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Step 2: Union horizontal partitions
CREATE VIEW products_reconstructed_vh AS
SELECT * FROM products_electronics_complete
UNION ALL
SELECT * FROM products_fashion_complete
UNION ALL
SELECT * FROM products_home_complete;

Reconstruction Performance Considerations

Full reconstruction is expensive—it requires joining all vertical fragments and unioning all horizontal fragments. In practice, queries should access only necessary fragments. The query optimizer should push predicates into fragments to minimize data movement. Design your hybrid schema so common queries touch minimal fragments.

Hybrid Fragmentation Design Methodology

Designing hybrid fragmentation requires systematic analysis of access patterns, distribution requirements, and reconstruction costs.

Step-by-Step Design Process:

Step 1: Workload Analysis

Document all significant queries with:

Attributes accessed (for vertical decisions)
Predicates used (for horizontal decisions)
Execution frequency and latency requirements
Source location/application (for allocation)

Step 2: Identify Primary Partitioning Dimension

Decide whether horizontal or vertical should be the outer fragmentation:

Criterion	Favor HV	Favor VH
Data sovereignty	Row-based locality required	Attribute-based security
Primary query pattern	Regional/temporal filtering	Attribute group access
Scaling requirement	Rows grow faster	Attributes vary widely
Team organization	Regional operations	Functional domains

Step 3: Design Outer Fragmentation

Apply either:

Horizontal: Define predicates, verify completeness/disjointness
Vertical: Perform affinity analysis, cluster attributes

Step 4: Design Inner Fragmentation Per Outer Fragment

For each outer fragment:

Analyze specific access patterns
May apply different inner fragmentation strategies
Ensure reconstruction correctness

Step 5: Validate Against Critical Queries

For top 10-20 queries, trace fragment access:

How many fragments accessed?
What joins/unions required?
Does fragment elimination work?
Estimate response time

fragmentation_design_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
from dataclasses import dataclass
from typing import List, Set, Dict, Tuple
from enum import Enum
 
class FragmentType(Enum):
    HORIZONTAL = "horizontal"
    VERTICAL = "vertical"
    HYBRID_HV = "hybrid_hv"
    HYBRID_VH = "hybrid_vh"
 
@dataclass
class Query:
    name: str
    attributes: Set[str]
    predicates: Dict[str, str]  # {attribute: condition}
    frequency: int  # per hour
    max_latency_ms: int
    source_region: str
 
@dataclass
class FragmentationAnalysis:
    """Analyze queries to recommend fragmentation strategy."""
    
    def __init__(self, all_attributes: Set[str], queries: List[Query]):
        self.attributes = all_attributes
        self.queries = queries
        
    def compute_attribute_coverage(self) -> Dict[str, float]:
        """What fraction of queries access each attribute?"""
        coverage = {attr: 0.0 for attr in self.attributes}
        total_freq = sum(q.frequency for q in self.queries)
        
        for query in self.queries:
            for attr in query.attributes:
                coverage[attr] += query.frequency / total_freq
        return coverage
    
    def compute_predicate_frequency(self) -> Dict[str, float]:
        """How often is each attribute used in predicates?"""
        freq = {}
        total_freq = sum(q.frequency for q in self.queries)
        
        for query in self.queries:
            for attr in query.predicates:
                freq[attr] = freq.get(attr, 0) + query.frequency / total_freq
        return freq
    
    def compute_attribute_affinity(self) -> Dict[Tuple[str, str], float]:
        """How often are attribute pairs accessed together?"""
        affinity = {}
        
        for query in self.queries:
            attrs = list(query.attributes)
            for i, a1 in enumerate(attrs):
                for a2 in attrs[i+1:]:
                    key = (min(a1, a2), max(a1, a2))
                    affinity[key] = affinity.get(key, 0) + query.frequency
        return affinity
    
    def recommend_strategy(self) -> FragmentType:
        """Recommend fragmentation type based on analysis."""
        
        pred_freq = self.compute_predicate_frequency()
        attr_coverage = self.compute_attribute_coverage()
        
        # Check for strong horizontal partitioning candidates
        horizontal_score = 0
        for attr, freq in pred_freq.items():
            if freq > 0.5:  # Attribute frequently in predicates
                horizontal_score += freq
        
        # Check for vertical partitioning potential
        coverage_values = list(attr_coverage.values())
        vertical_score = 1 - (sum(coverage_values) / len(coverage_values))
        
        print(f"Horizontal score: {horizontal_score:.2f}")
        print(f"Vertical score: {vertical_score:.2f}")
        
        if horizontal_score > 0.7 and vertical_score > 0.5:
            return FragmentType.HYBRID_HV
        elif vertical_score > 0.7 and horizontal_score > 0.5:
            return FragmentType.HYBRID_VH
        elif horizontal_score > vertical_score:
            return FragmentType.HORIZONTAL
        else:
            return FragmentType.VERTICAL
 
# Example usage
queries = [
    Query("NA_billing_report", {"cdr_id", "region", "caller_id", "cost", "duration"},
          {"region": "IN ('US', 'CA')", "call_time": ">= '2024-01-01'"},
          1000, 500, "us-east"),
    Query("EU_technical_analysis", {"cdr_id", "region", "codec", "latency"},
          {"region": "IN ('DE', 'FR')", "latency": "> 100"},
          500, 1000, "eu-central"),
    Query("Global_location_tracking", {"cdr_id", "origin_tower", "dest_tower"},
          {"call_time": ">= CURRENT_DATE"},
          2000, 200, "global"),
    # ... more queries
]
 
all_attrs = {"cdr_id", "call_time", "region", "caller_id", "callee_id", 
             "duration", "cost", "codec", "latency", "origin_tower", "dest_tower"}
 
analysis = FragmentationAnalysis(all_attrs, queries)
recommendation = analysis.recommend_strategy()
print(f"
Recommended strategy: {recommendation.value}")

Common Design Pitfalls

•Over-fragmentation — Too many fragments create management overhead, complex reconstruction, and potential optimizer confusion. Start coarse, refine based on evidence.
•Ignoring Cross-Fragment Queries — Designing for 90% of queries while making 10% catastrophically slow. Validate all significant queries.
•Asymmetric Fragment Sizes — Skewed data distribution creates hotspots. Monitor and rebalance or use hash-based distribution.
•Forgetting Evolution — Workloads change. Build monitoring and migration capabilities from day one.
•Incomplete Predicates — Missing catch-all predicates cause data loss. Always verify completeness property.

Real-World Hybrid Fragmentation Patterns

Production systems employ hybrid fragmentation patterns tailored to specific domains. Understanding these patterns provides templates for your designs.

Pattern 1: Multi-Tenant SaaS

Outer (Horizontal): Tenant ID
Inner (Vertical): Feature modules

Customer_Tenant_A_Core (tid, name, email, status)
Customer_Tenant_A_Billing (tid, plan, payment_method, last_payment)
Customer_Tenant_A_Usage (tid, api_calls, storage_gb, last_active)

Customer_Tenant_B_Core (...)
...

Rationale: Tenant isolation is primary (compliance, performance isolation). Within tenant, different services access different attribute groups.

Pattern 2: Time-Series IoT Data

Outer (Horizontal): Time range (month/week)
Inner (Vertical): Metric type

Sensor_2024_01_Temperature (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Humidity (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Pressure (tid, device_id, timestamp, value, unit)

Sensor_2024_02_Temperature (...)
...

Rationale: Time-based partitioning enables archival and lifecycle management. Metric separation allows different retention policies and optimized storage.

Pattern 3: Content Delivery Network

Outer (Vertical): Content type (metadata, binary, analytics)
Inner (Horizontal): Geographic region

Content_Metadata_NA (content_id, title, format, duration, cdn_url)
Content_Metadata_EU (...)
Content_Binary_NA (content_id, binary_data, encoding)
Content_Binary_EU (...)
Content_Analytics_2024 (content_id, region, views, bandwidth, errors)

Rationale: Metadata queried differently than binaries. Geographic distribution for latency. Analytics partitioned by time for aggregation.

Industry Hybrid Fragmentation Examples
Industry	Outer Fragmentation	Inner Fragmentation	Key Driver
E-commerce	Product category (H)	Catalog / Inventory / Pricing (V)	Category-specific attributes vary significantly
Healthcare	Patient region (H)	Clinical / Administrative / Billing (V)	HIPAA, data residency, role-based access
Financial Services	Account type (H)	Profile / Transactions / Risk (V)	Regulatory compartmentalization
Gaming	Game server region (H)	Player stats / Inventory / Social (V)	Latency-critical separation
Telecommunications	Network region (H)	Billing / Network / Location (V)	Regulatory, operational team alignment

Start Simple, Evolve Carefully

Production systems often start with single-dimension fragmentation and evolve to hybrid as scale demands. Begin with the most impactful dimension (usually horizontal for scale, vertical for access patterns). Add the second dimension only when evidence shows clear benefit—hybrid complexity should be justified by measurable performance or operational improvements.

Summary: Hybrid Fragmentation Mastery

Hybrid fragmentation combines horizontal and vertical partitioning to address complex access patterns that neither approach satisfies alone. Let's consolidate the key concepts:

Key Takeaways

•Formal Definition — Hybrid fragments are defined by both selection predicates (rows) and attribute sets (columns), creating a two-dimensional partition space.
•HV vs. VH Nesting — Choose HV when geographic/temporal partitioning is primary; choose VH when attribute access patterns drive design. Both can produce equivalent results.
•Reconstruction — Use union for horizontal fragments, join for vertical. HV: join(union per vertical). VH: union(join per horizontal).
•Design Methodology — Analyze workload, determine primary dimension, design outer fragmentation, then inner per outer fragment. Validate against critical queries.
•Non-Uniform Inner Fragmentation — Different outer fragments can have different inner fragmentations, enabling customization per region/category/etc.
•Complexity Trade-off — Hybrid fragmentation adds design and operational complexity. Justify with clear performance or compliance benefits over simpler approaches.

What's Next:

With fragmentation strategies understood, we now turn to the design process itself. The next page explores Fragmentation Design—systematic methodologies for choosing fragmentation strategies, validating correctness properties, and evolving schemas as requirements change.

Page Complete

You now understand hybrid fragmentation at a deep, practical level—from formal definitions through real-world patterns. This knowledge enables designing sophisticated distribution schemas that optimize for multi-dimensional access patterns while maintaining reconstruction guarantees.

Hybrid Fragmentation

When One Dimension Isn't Enough

Consider a global telecommunications company storing call detail records (CDRs). The table has 100+ attributes and billions of rows. Different teams access different slices:

Regional Operations (NA, EU, APAC): Need rows for their region, specific operational attributes
Finance: Need billing attributes across all regions
Network Engineering: Need technical metrics for recent calls only
Legal/Compliance: Need full records for specific customers under investigation

What You Will Learn

Formal Definition of Hybrid Fragmentation

Hybrid fragmentation emerges from applying horizontal and vertical fragmentation in sequence or combination. Understanding the formal structure is essential for correct design.

Definition:

Given a relation R, a hybrid fragmentation produces fragments F₁, F₂, ..., Fₙ where each fragment is defined by both:

A selection predicate pᵢ (determining row membership)
An attribute set Aᵢ (determining column membership)

Formally:

Fᵢ = π(K ∪ Aᵢ)(σ(pᵢ)(R))

Where σ applies the selection predicate pᵢ, and π projects the key K plus attribute set Aᵢ.

Correctness Properties:

Completeness: Every (tuple, attribute) pair in R must be representable in at least one fragment
Disjointness: Each (tuple, attribute) pair should appear in exactly one fragment (for non-key attributes)
Reconstruction: R can be reconstructed from fragments using appropriate union and join operations

Nesting Approaches:

Hybrid fragmentation can be achieved through two nesting patterns:

HV (Horizontal-then-Vertical):
- First partition horizontally: R → {H₁, H₂, ..., Hₘ}
- Then partition each Hᵢ vertically: Hᵢ → {Vᵢ₁, Vᵢ₂, ..., Vᵢₖ}
- Result: m × k fragments (maximum)
VH (Vertical-then-Horizontal):
- First partition vertically: R → {V₁, V₂, ..., Vₖ}
- Then partition each Vⱼ horizontally: Vⱼ → {Hⱼ₁, Hⱼ₂, ..., Hⱼₘ}
- Result: k × m fragments (maximum)

Both approaches can produce equivalent final fragmentation, but the design process differs.

Fragmentation Strategy Comparison
Strategy	Fragment Definition	Reconstruction	Fragment Count
Horizontal Only	σ(pᵢ)(R)	∪ᵢ Fᵢ	Depends on predicates
Vertical Only	π(K ∪ Aᵢ)(R)	⋈ᵢ Fᵢ (on K)	Depends on attributes
Hybrid (HV)	π(K ∪ Aⱼ)(σ(pᵢ)(R))	⋈ⱼ(∪ᵢ Fᵢⱼ)	Up to m × k
Hybrid (VH)	σ(pᵢ)(π(K ∪ Aⱼ)(R))	∪ᵢ(⋈ⱼ Fⱼᵢ)	Up to m × k

Equivalence of HV and VH

Horizontal-then-Vertical (HV) Nesting

HV nesting starts with horizontal fragmentation as the primary partitioning dimension, then applies vertical fragmentation within each horizontal fragment.

When to Use HV:

Geographic or temporal partitioning is the primary distribution driver
Different regions/time periods have different vertical access patterns
Site allocation is determined by horizontal predicates first
Regulatory requirements dictate row-based data residency

Design Process:

Define Horizontal Predicates:
- Identify primary partitioning criteria (region, date range, customer segment)
- Ensure predicates are complete and disjoint
- Create horizontal fragments H₁, H₂, ..., Hₘ
Analyze Attribute Access per Fragment:
- Different horizontal fragments may have different access patterns
- Build site-specific attribute affinity matrices
- Identify attribute groups relevant to each horizontal partition
Apply Vertical Fragmentation to Each Horizontal Fragment:
- May result in different vertical schemas per horizontal fragment
- Or apply uniform vertical fragmentation globally

Example: Call Detail Records

Step 1: Horizontal fragmentation by region

CDR → CDR_NorthAmerica (region IN ('US', 'CA', 'MX'))
    → CDR_Europe       (region IN ('UK', 'DE', 'FR', ...))
    → CDR_AsiaPacific  (region IN ('JP', 'CN', 'AU', ...))

Step 2: Vertical fragmentation of each horizontal fragment

CDR_NorthAmerica → CDR_NA_Billing   (tid, call_id, duration, cost, taxes)
                 → CDR_NA_Technical (tid, call_id, codec, network_type, latency)
                 → CDR_NA_Location  (tid, call_id, origin_tower, dest_tower)

(Similarly for other regions)

hv_fragmentation_example.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
-- Hybrid HV Fragmentation: Horizontal (by region) then Vertical (by function)
 
-- Original CDR table (conceptual - not created directly)
-- CDR(cdr_id, call_time, region, 
--     caller_id, callee_id, duration, cost, taxes,  -- Billing
--     codec, network_type, latency, jitter,          -- Technical
--     origin_tower, dest_tower, handovers)           -- Location
 
-- Step 1: Create partitioned table (horizontal by region)
CREATE TABLE cdr_records (
    tid             BIGINT NOT NULL,
    cdr_id          UUID NOT NULL,
    call_time       TIMESTAMP NOT NULL,
    region          VARCHAR(20) NOT NULL,
    -- Billing attributes
    caller_id       BIGINT,
    callee_id       BIGINT,
    duration_secs   INTEGER,
    cost_cents      INTEGER,
    tax_cents       INTEGER,
    -- Technical attributes
    codec           VARCHAR(20),
    network_type    VARCHAR(20),
    latency_ms      INTEGER,
    jitter_ms       INTEGER,
    -- Location attributes
    origin_tower_id BIGINT,
    dest_tower_id   BIGINT,
    handover_count  INTEGER,
    PRIMARY KEY (tid, region)
) PARTITION BY LIST (region);
 
-- Horizontal partitions by region
CREATE TABLE cdr_records_na PARTITION OF cdr_records
    FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE cdr_records_eu PARTITION OF cdr_records
    FOR VALUES IN ('UK', 'DE', 'FR', 'IT', 'ES');
CREATE TABLE cdr_records_apac PARTITION OF cdr_records
    FOR VALUES IN ('JP', 'CN', 'AU', 'IN', 'SG');
 
-- Step 2: Create vertical fragment views (or materialized tables) per region
 
-- North America - Billing fragment
CREATE VIEW cdr_na_billing AS
SELECT tid, cdr_id, call_time, region, caller_id, callee_id, 
       duration_secs, cost_cents, tax_cents
FROM cdr_records_na;
 
-- North America - Technical fragment  
CREATE VIEW cdr_na_technical AS
SELECT tid, cdr_id, call_time, region, codec, network_type,
       latency_ms, jitter_ms
FROM cdr_records_na;
 
-- North America - Location fragment
CREATE VIEW cdr_na_location AS
SELECT tid, cdr_id, call_time, region, origin_tower_id, 
       dest_tower_id, handover_count
FROM cdr_records_na;
 
-- Similar views for EU and APAC regions...
 
-- For physical separation, use materialized tables with triggers for sync
CREATE TABLE cdr_na_billing_physical AS 
SELECT * FROM cdr_na_billing;
 
CREATE TABLE cdr_na_technical_physical AS 
SELECT * FROM cdr_na_technical;
 
-- Reconstruction: Union horizontal first, then join vertical
-- Full table for North America
CREATE VIEW cdr_na_full AS
SELECT b.tid, b.cdr_id, b.call_time, b.region,
       b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents,
       t.codec, t.network_type, t.latency_ms, t.jitter_ms,
       l.origin_tower_id, l.dest_tower_id, l.handover_count
FROM cdr_na_billing b
JOIN cdr_na_technical t ON b.tid = t.tid
JOIN cdr_na_location l ON b.tid = l.tid;

Non-Uniform Vertical Fragmentation

Vertical-then-Horizontal (VH) Nesting

VH nesting starts with vertical fragmentation as the primary dimension, then applies horizontal partitioning within each vertical fragment.

When to Use VH:

Attribute access patterns are the primary optimization target
Different attribute groups have different scaling requirements
Columns have vastly different storage/processing needs
Security boundaries are attribute-based before row-based

Design Process:

Perform Attribute Affinity Analysis:
- Build global usage matrix and affinity matrix
- Apply clustering algorithm (BEA, hierarchical)
- Define vertical fragments V₁, V₂, ..., Vₖ
Analyze Row Distribution per Vertical Fragment:
- Each vertical fragment may have different horizontal scaling needs
- Billing data might need temporal partitioning
- Location data might need geographic partitioning
Apply Horizontal Fragmentation to Each Vertical Fragment:
- Different horizontal strategies per vertical fragment are valid
- Or apply uniform horizontal partitioning globally

Example: E-commerce Product Catalog

Step 1: Vertical fragmentation by function

Products → Products_Catalog    (tid, name, description, images)
        → Products_Inventory   (tid, stock, warehouse, reorder_level)
        → Products_Pricing     (tid, base_price, currency, discounts)
        → Products_Analytics   (tid, view_count, buy_count, return_rate)

Step 2: Horizontal fragmentation per vertical fragment

Products_Catalog → Products_Catalog_Electronics (category IN ('TV', 'Phone', 'Laptop'))
                 → Products_Catalog_Fashion    (category IN ('Clothing', 'Shoes'))
                 → Products_Catalog_Home       (category IN ('Furniture', 'Kitchen'))

Products_Inventory → Products_Inventory_Warehouse1 (warehouse_id = 1)
                   → Products_Inventory_Warehouse2 (warehouse_id = 2)
                   ... (by warehouse for locality)

Products_Analytics → Products_Analytics_2024  (year = 2024)  
                   → Products_Analytics_2023  (year = 2023)
                   ... (by year for archival)

vh_fragmentation_example.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-- Hybrid VH Fragmentation: Vertical (by domain) then Horizontal (by criteria)
 
-- Step 1: Vertical fragments (separate tables per attribute group)
 
-- Products Core (shared TID and natural key)
CREATE TABLE products_core (
    tid             BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    product_id      UUID NOT NULL UNIQUE,
    category        VARCHAR(50) NOT NULL,
    created_at      TIMESTAMP DEFAULT NOW()
);
 
-- Catalog Fragment (vertically separated) - partitioned by category
CREATE TABLE products_catalog (
    tid             BIGINT NOT NULL,
    category        VARCHAR(50) NOT NULL,
    name            VARCHAR(200) NOT NULL,
    description     TEXT,
    image_urls      TEXT[],
    PRIMARY KEY (tid, category),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY LIST (category);
 
CREATE TABLE products_catalog_electronics PARTITION OF products_catalog
    FOR VALUES IN ('TV', 'Phone', 'Laptop', 'Tablet', 'Camera');
CREATE TABLE products_catalog_fashion PARTITION OF products_catalog
    FOR VALUES IN ('Clothing', 'Shoes', 'Accessories', 'Jewelry');
CREATE TABLE products_catalog_home PARTITION OF products_catalog
    FOR VALUES IN ('Furniture', 'Kitchen', 'Garden', 'Bedding');
 
-- Inventory Fragment - partitioned by warehouse
CREATE TABLE products_inventory (
    tid             BIGINT NOT NULL,
    warehouse_id    INTEGER NOT NULL,
    stock_qty       INTEGER NOT NULL DEFAULT 0,
    reorder_level   INTEGER,
    last_restock    TIMESTAMP,
    PRIMARY KEY (tid, warehouse_id),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY HASH (warehouse_id);
 
CREATE TABLE products_inventory_wh0 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE products_inventory_wh1 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE products_inventory_wh2 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE products_inventory_wh3 PARTITION OF products_inventory
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
 
-- Analytics Fragment - partitioned by time (year)
CREATE TABLE products_analytics (
    tid             BIGINT NOT NULL,
    year            INTEGER NOT NULL,
    view_count      BIGINT DEFAULT 0,
    add_to_cart     BIGINT DEFAULT 0,
    purchase_count  BIGINT DEFAULT 0,
    return_count    BIGINT DEFAULT 0,
    revenue_cents   BIGINT DEFAULT 0,
    PRIMARY KEY (tid, year),
    FOREIGN KEY (tid) REFERENCES products_core(tid)
) PARTITION BY RANGE (year);
 
CREATE TABLE products_analytics_2024 PARTITION OF products_analytics
    FOR VALUES FROM (2024) TO (2025);
CREATE TABLE products_analytics_2023 PARTITION OF products_analytics
    FOR VALUES FROM (2023) TO (2024);
CREATE TABLE products_analytics_historical PARTITION OF products_analytics
    FOR VALUES FROM (MINVALUE) TO (2023);
 
-- Reconstruction: Join verticals, union horizontals
-- Full product view for electronics category, all warehouses, all years
CREATE VIEW products_electronics_full AS
SELECT 
    c.tid, c.name, c.description, c.image_urls,
    i.warehouse_id, i.stock_qty, i.reorder_level,
    a.year, a.view_count, a.purchase_count, a.revenue_cents
FROM products_catalog_electronics c
JOIN products_inventory i ON c.tid = i.tid
JOIN products_analytics a ON c.tid = a.tid;

HV Nesting Strengths

•Natural for geographic/regulatory partitioning first
•Each region can have customized vertical schema
•Aligns with organizational structure (regional teams)
•Simpler when horizontal predicates are obvious
•Reconstruction follows human mental model

VH Nesting Strengths

•Optimizes for attribute-centric access patterns first
•Each attribute group can scale independently
•Different horizontal strategies per vertical fragment
•Microservices alignment (service owns attribute group)
•Attribute-level security boundaries clearer

Reconstruction Algorithms

Reconstructing the original relation from hybrid fragments requires combining union (for horizontal) and join (for vertical) operations in the correct order.

Reconstruction Formula for HV:

If R was fragmented horizontally into {H₁, ..., Hₘ} and each Hᵢ vertically into {Vᵢ₁, ..., Vᵢₖ}:

R = ⋈ⱼ₌₁ᵏ (∪ᵢ₌₁ᵐ Fᵢⱼ)

Where Fᵢⱼ is the fragment for horizontal partition i, vertical partition j.

Translation: For each vertical attribute group j, union all horizontal fragments together. Then join all vertical results on the tuple identifier.

Reconstruction Formula for VH:

If R was fragmented vertically into {V₁, ..., Vₖ} and each Vⱼ horizontally into {Hⱼ₁, ..., Hⱼₘ}:

R = ∪ᵢ₌₁ᵐ (⋈ⱼ₌₁ᵏ Fⱼᵢ)

Where Fⱼᵢ is the fragment for vertical partition j, horizontal partition i.

Translation: For each horizontal partition i, join all vertical fragments together. Then union all horizontal results.

Equivalence: Both formulas produce the same result when all fragments exist. The difference is in execution strategy and optimization opportunities.

reconstruction_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
-- Reconstruction Examples for Hybrid Fragmentation
 
-- ============================================
-- HV Reconstruction: Union horizontals, then join verticals
-- ============================================
 
-- Given: CDR table with HV fragmentation
-- Horizontal: region (NA, EU, APAC)
-- Vertical: billing, technical, location
 
-- Step 1: Union horizontal fragments for each vertical group
 
-- All billing data (across regions)
CREATE VIEW cdr_billing_all AS
SELECT * FROM cdr_na_billing
UNION ALL
SELECT * FROM cdr_eu_billing
UNION ALL
SELECT * FROM cdr_apac_billing;
 
-- All technical data (across regions)
CREATE VIEW cdr_technical_all AS
SELECT * FROM cdr_na_technical
UNION ALL
SELECT * FROM cdr_eu_technical
UNION ALL
SELECT * FROM cdr_apac_technical;
 
-- All location data (across regions)
CREATE VIEW cdr_location_all AS
SELECT * FROM cdr_na_location
UNION ALL
SELECT * FROM cdr_eu_location
UNION ALL
SELECT * FROM cdr_apac_location;
 
-- Step 2: Join vertical groups
CREATE VIEW cdr_reconstructed_hv AS
SELECT 
    b.tid, b.cdr_id, b.call_time, b.region,
    b.caller_id, b.callee_id, b.duration_secs, b.cost_cents, b.tax_cents,
    t.codec, t.network_type, t.latency_ms, t.jitter_ms,
    l.origin_tower_id, l.dest_tower_id, l.handover_count
FROM cdr_billing_all b
JOIN cdr_technical_all t ON b.tid = t.tid
JOIN cdr_location_all l ON b.tid = l.tid;
 
 
-- ============================================
-- VH Reconstruction: Join verticals, then union horizontals
-- ============================================
 
-- Given: Products table with VH fragmentation
-- Vertical: catalog, inventory, analytics
-- Horizontal (per vertical): category, warehouse, year
 
-- Step 1: For each horizontal partition, join vertical fragments
 
-- Electronics products (all verticals)
CREATE VIEW products_electronics_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_electronics cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Fashion products (all verticals)  
CREATE VIEW products_fashion_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_fashion cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Home products (all verticals)
CREATE VIEW products_home_complete AS
SELECT 
    cat.tid, cat.name, cat.description,
    inv.warehouse_id, inv.stock_qty,
    anl.year, anl.view_count, anl.purchase_count
FROM products_catalog_home cat
JOIN products_inventory inv ON cat.tid = inv.tid
JOIN products_analytics anl ON cat.tid = anl.tid;
 
-- Step 2: Union horizontal partitions
CREATE VIEW products_reconstructed_vh AS
SELECT * FROM products_electronics_complete
UNION ALL
SELECT * FROM products_fashion_complete
UNION ALL
SELECT * FROM products_home_complete;

Reconstruction Performance Considerations

Hybrid Fragmentation Design Methodology

Designing hybrid fragmentation requires systematic analysis of access patterns, distribution requirements, and reconstruction costs.

Step-by-Step Design Process:

Step 1: Workload Analysis

Document all significant queries with:

Attributes accessed (for vertical decisions)
Predicates used (for horizontal decisions)
Execution frequency and latency requirements
Source location/application (for allocation)

Step 2: Identify Primary Partitioning Dimension

Decide whether horizontal or vertical should be the outer fragmentation:

Criterion	Favor HV	Favor VH
Data sovereignty	Row-based locality required	Attribute-based security
Primary query pattern	Regional/temporal filtering	Attribute group access
Scaling requirement	Rows grow faster	Attributes vary widely
Team organization	Regional operations	Functional domains

Step 3: Design Outer Fragmentation

Apply either:

Horizontal: Define predicates, verify completeness/disjointness
Vertical: Perform affinity analysis, cluster attributes

Step 4: Design Inner Fragmentation Per Outer Fragment

For each outer fragment:

Analyze specific access patterns
May apply different inner fragmentation strategies
Ensure reconstruction correctness

Step 5: Validate Against Critical Queries

For top 10-20 queries, trace fragment access:

How many fragments accessed?
What joins/unions required?
Does fragment elimination work?
Estimate response time

fragmentation_design_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
from dataclasses import dataclass
from typing import List, Set, Dict, Tuple
from enum import Enum
 
class FragmentType(Enum):
    HORIZONTAL = "horizontal"
    VERTICAL = "vertical"
    HYBRID_HV = "hybrid_hv"
    HYBRID_VH = "hybrid_vh"
 
@dataclass
class Query:
    name: str
    attributes: Set[str]
    predicates: Dict[str, str]  # {attribute: condition}
    frequency: int  # per hour
    max_latency_ms: int
    source_region: str
 
@dataclass
class FragmentationAnalysis:
    """Analyze queries to recommend fragmentation strategy."""
    
    def __init__(self, all_attributes: Set[str], queries: List[Query]):
        self.attributes = all_attributes
        self.queries = queries
        
    def compute_attribute_coverage(self) -> Dict[str, float]:
        """What fraction of queries access each attribute?"""
        coverage = {attr: 0.0 for attr in self.attributes}
        total_freq = sum(q.frequency for q in self.queries)
        
        for query in self.queries:
            for attr in query.attributes:
                coverage[attr] += query.frequency / total_freq
        return coverage
    
    def compute_predicate_frequency(self) -> Dict[str, float]:
        """How often is each attribute used in predicates?"""
        freq = {}
        total_freq = sum(q.frequency for q in self.queries)
        
        for query in self.queries:
            for attr in query.predicates:
                freq[attr] = freq.get(attr, 0) + query.frequency / total_freq
        return freq
    
    def compute_attribute_affinity(self) -> Dict[Tuple[str, str], float]:
        """How often are attribute pairs accessed together?"""
        affinity = {}
        
        for query in self.queries:
            attrs = list(query.attributes)
            for i, a1 in enumerate(attrs):
                for a2 in attrs[i+1:]:
                    key = (min(a1, a2), max(a1, a2))
                    affinity[key] = affinity.get(key, 0) + query.frequency
        return affinity
    
    def recommend_strategy(self) -> FragmentType:
        """Recommend fragmentation type based on analysis."""
        
        pred_freq = self.compute_predicate_frequency()
        attr_coverage = self.compute_attribute_coverage()
        
        # Check for strong horizontal partitioning candidates
        horizontal_score = 0
        for attr, freq in pred_freq.items():
            if freq > 0.5:  # Attribute frequently in predicates
                horizontal_score += freq
        
        # Check for vertical partitioning potential
        coverage_values = list(attr_coverage.values())
        vertical_score = 1 - (sum(coverage_values) / len(coverage_values))
        
        print(f"Horizontal score: {horizontal_score:.2f}")
        print(f"Vertical score: {vertical_score:.2f}")
        
        if horizontal_score > 0.7 and vertical_score > 0.5:
            return FragmentType.HYBRID_HV
        elif vertical_score > 0.7 and horizontal_score > 0.5:
            return FragmentType.HYBRID_VH
        elif horizontal_score > vertical_score:
            return FragmentType.HORIZONTAL
        else:
            return FragmentType.VERTICAL
 
# Example usage
queries = [
    Query("NA_billing_report", {"cdr_id", "region", "caller_id", "cost", "duration"},
          {"region": "IN ('US', 'CA')", "call_time": ">= '2024-01-01'"},
          1000, 500, "us-east"),
    Query("EU_technical_analysis", {"cdr_id", "region", "codec", "latency"},
          {"region": "IN ('DE', 'FR')", "latency": "> 100"},
          500, 1000, "eu-central"),
    Query("Global_location_tracking", {"cdr_id", "origin_tower", "dest_tower"},
          {"call_time": ">= CURRENT_DATE"},
          2000, 200, "global"),
    # ... more queries
]
 
all_attrs = {"cdr_id", "call_time", "region", "caller_id", "callee_id", 
             "duration", "cost", "codec", "latency", "origin_tower", "dest_tower"}
 
analysis = FragmentationAnalysis(all_attrs, queries)
recommendation = analysis.recommend_strategy()
print(f"
Recommended strategy: {recommendation.value}")

Common Design Pitfalls

•Over-fragmentation — Too many fragments create management overhead, complex reconstruction, and potential optimizer confusion. Start coarse, refine based on evidence.
•Ignoring Cross-Fragment Queries — Designing for 90% of queries while making 10% catastrophically slow. Validate all significant queries.
•Asymmetric Fragment Sizes — Skewed data distribution creates hotspots. Monitor and rebalance or use hash-based distribution.
•Forgetting Evolution — Workloads change. Build monitoring and migration capabilities from day one.
•Incomplete Predicates — Missing catch-all predicates cause data loss. Always verify completeness property.

Real-World Hybrid Fragmentation Patterns

Production systems employ hybrid fragmentation patterns tailored to specific domains. Understanding these patterns provides templates for your designs.

Pattern 1: Multi-Tenant SaaS

Outer (Horizontal): Tenant ID
Inner (Vertical): Feature modules

Customer_Tenant_A_Core (tid, name, email, status)
Customer_Tenant_A_Billing (tid, plan, payment_method, last_payment)
Customer_Tenant_A_Usage (tid, api_calls, storage_gb, last_active)

Customer_Tenant_B_Core (...)
...

Rationale: Tenant isolation is primary (compliance, performance isolation). Within tenant, different services access different attribute groups.

Pattern 2: Time-Series IoT Data

Outer (Horizontal): Time range (month/week)
Inner (Vertical): Metric type

Sensor_2024_01_Temperature (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Humidity (tid, device_id, timestamp, value, unit)
Sensor_2024_01_Pressure (tid, device_id, timestamp, value, unit)

Sensor_2024_02_Temperature (...)
...

Rationale: Time-based partitioning enables archival and lifecycle management. Metric separation allows different retention policies and optimized storage.

Pattern 3: Content Delivery Network

Outer (Vertical): Content type (metadata, binary, analytics)
Inner (Horizontal): Geographic region

Content_Metadata_NA (content_id, title, format, duration, cdn_url)
Content_Metadata_EU (...)
Content_Binary_NA (content_id, binary_data, encoding)
Content_Binary_EU (...)
Content_Analytics_2024 (content_id, region, views, bandwidth, errors)

Rationale: Metadata queried differently than binaries. Geographic distribution for latency. Analytics partitioned by time for aggregation.

Industry Hybrid Fragmentation Examples
Industry	Outer Fragmentation	Inner Fragmentation	Key Driver
E-commerce	Product category (H)	Catalog / Inventory / Pricing (V)	Category-specific attributes vary significantly
Healthcare	Patient region (H)	Clinical / Administrative / Billing (V)	HIPAA, data residency, role-based access
Financial Services	Account type (H)	Profile / Transactions / Risk (V)	Regulatory compartmentalization
Gaming	Game server region (H)	Player stats / Inventory / Social (V)	Latency-critical separation
Telecommunications	Network region (H)	Billing / Network / Location (V)	Regulatory, operational team alignment

Start Simple, Evolve Carefully

Summary: Hybrid Fragmentation Mastery

Hybrid fragmentation combines horizontal and vertical partitioning to address complex access patterns that neither approach satisfies alone. Let's consolidate the key concepts:

Key Takeaways

•Formal Definition — Hybrid fragments are defined by both selection predicates (rows) and attribute sets (columns), creating a two-dimensional partition space.
•HV vs. VH Nesting — Choose HV when geographic/temporal partitioning is primary; choose VH when attribute access patterns drive design. Both can produce equivalent results.
•Reconstruction — Use union for horizontal fragments, join for vertical. HV: join(union per vertical). VH: union(join per horizontal).
•Design Methodology — Analyze workload, determine primary dimension, design outer fragmentation, then inner per outer fragment. Validate against critical queries.
•Non-Uniform Inner Fragmentation — Different outer fragments can have different inner fragmentations, enabling customization per region/category/etc.
•Complexity Trade-off — Hybrid fragmentation adds design and operational complexity. Justify with clear performance or compliance benefits over simpler approaches.

What's Next:

Page Complete