Distributed DatabasesFragmentation

Data Fragmentation in Distributed Databases

LevelAdvanced

Duration90 mins

TopicFragmentation

5 / 5

Reconstruction

Reassembling the Distributed Puzzle

You've fragmented your data across continents—customer records split by region, order history partitioned by year, product catalogs divided by category. The fragmentation serves its purpose: locality, performance, compliance. But now a quarterly business review requires a comprehensive report spanning all regions, all years, all categories.

Reconstruction is the process of reassembling the original relation (or a query result) from distributed fragments. It's the fundamental guarantee that fragmentation doesn't lose information—that despite physical distribution, the logical view remains coherent.

This page explores reconstruction in rigorous depth: the formal operators for different fragmentation types, optimization techniques to minimize reconstruction cost, query processing strategies that avoid unnecessary reconstruction, and practical implementation patterns for production systems. Understanding reconstruction completes your mastery of fragmentation—you'll know not just how to split data, but how to seamlessly reunite it when needed.

What You Will Learn

By the end of this page, you will understand: (1) Formal reconstruction operators for horizontal, vertical, and hybrid fragmentation, (2) The role of localization in distributed query processing, (3) Optimization techniques to minimize reconstruction overhead, (4) Semi-join and bloom filter strategies for reducing data transfer, (5) Materialized views for pre-computed reconstruction, and (6) Consistency considerations during reconstruction.

Formal Reconstruction Theory

Reconstruction operators mirror fragmentation operators. Understanding the formal relationship ensures correct and complete data retrieval.

Horizontal Fragmentation Reconstruction:

Given horizontal fragments R₁, R₂, ..., Rₙ where each Rᵢ = σ(pᵢ)(R):

Reconstruction: R = R₁ ∪ R₂ ∪ ... ∪ Rₙ

The union operation combines all tuples from all fragments. Because fragments are disjoint (pᵢ ∧ pⱼ = FALSE for i≠j), there are no duplicates to eliminate—UNION ALL is sufficient and more efficient than UNION DISTINCT.

Vertical Fragmentation Reconstruction:

Given vertical fragments R₁, R₂, ..., Rₙ where each Rᵢ = π(K ∪ Aᵢ)(R) and K is the tuple identifier:

Reconstruction: R = R₁ ⋈ R₂ ⋈ ... ⋈ Rₙ (natural join on K)

The join operation matches tuples across fragments using the common tuple identifier. Because each tuple has exactly one entry in each fragment (assuming correct fragmentation), this is a lossless join with no spurious tuples.

Hybrid Fragmentation Reconstruction:

For HV (horizontal-then-vertical) fragmentation with fragments Fᵢⱼ:

Reconstruction: R = ⋈ⱼ (∪ᵢ Fᵢⱼ)

First union all horizontal fragments within each vertical group, then join the vertical results.

For VH (vertical-then-horizontal), the order reverses:

Reconstruction: R = ∪ᵢ (⋈ⱼ Fⱼᵢ)

First join all vertical fragments within each horizontal partition, then union the horizontal results.

Reconstruction Operators Summary
Fragmentation Type	Reconstruction Operator	Complexity Factor	Key Requirement
Horizontal	Union (∪)	O(n) fragments	Disjoint predicates (no duplicates)
Vertical	Natural Join (⋈ on TID)	O(n) joins	TID present in all fragments
Hybrid HV	Join(Union per vertical)	O(h × v)	Both properties above
Hybrid VH	Union(Join per horizontal)	O(h × v)	Both properties above

Reconstruction Equivalence

The correct reconstruction formula guarantees that the reconstructed relation equals the original: R_reconstructed ≡ R_original. This equivalence is the formal statement of the fragmentation correctness property. Any violation indicates a bug in fragmentation definition, a data corruption event, or a consistency anomaly in the distributed system.

Query Localization and Fragment Reduction

The goal of query processing in distributed databases is to avoid full reconstruction whenever possible. Query localization transforms global queries into fragment-specific subqueries, accessing only necessary fragments.

The Localization Program:

Transforming a query on global relation R into queries on fragments:

Replace Global Relation: Substitute R with its reconstruction expression
Push Down Selections: Move σ predicates into fragment access
Eliminate Empty Fragments: Remove fragments whose predicates contradict query predicates
Push Down Projections: Limit attributes retrieved from vertical fragments
Optimize Remaining Expression: Apply standard query optimization

Example: Horizontal Fragment Localization

Global query:

SELECT * FROM Orders WHERE region = 'US' AND amount > 1000

Fragments:

Orders_NA: region IN ('US', 'CA', 'MX')
Orders_EU: region IN ('UK', 'DE', 'FR')
Orders_APAC: region IN ('JP', 'CN', 'AU')

Localization steps:

Replace: Orders → Orders_NA ∪ Orders_EU ∪ Orders_APAC
Push selection: σ(region='US')
Eliminate: Orders_EU and Orders_APAC eliminated (region = 'US' contradicts their predicates)
Result: σ(region='US' ∧ amount>1000)(Orders_NA)

The query accesses only one fragment instead of three—significant savings in distributed systems.

query_localization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
-- Query Localization Examples
 
-- ================================================================
-- HORIZONTAL FRAGMENT LOCALIZATION
-- ================================================================
 
-- Original global query
-- SELECT customer_name, total_amount
-- FROM Orders
-- WHERE region = 'DE' AND order_date >= '2024-01-01';
 
-- After localization (only EU fragment accessed):
SELECT customer_name, total_amount
FROM Orders_EU
WHERE region = 'DE' AND order_date >= '2024-01-01';
-- Orders_NA and Orders_APAC eliminated
 
 
-- ================================================================
-- VERTICAL FRAGMENT LOCALIZATION
-- ================================================================
 
-- Given vertical fragments:
-- Customers_Contact: tid, customer_id, name, email, phone
-- Customers_Demographics: tid, age, gender, income_bracket
-- Customers_Preferences: tid, newsletter, language, timezone
 
-- Original query only needs contact info:
-- SELECT name, email FROM Customers WHERE customer_id = 12345;
 
-- After localization (only Contact fragment accessed):
SELECT name, email
FROM Customers_Contact
WHERE customer_id = 12345;
-- No join needed! Demographics and Preferences fragments not accessed
 
 
-- Query needing multiple vertical fragments:
-- SELECT name, age, newsletter FROM Customers WHERE customer_id = 12345;
 
-- After localization (two fragments, minimal attributes):
SELECT c.name, d.age, p.newsletter
FROM Customers_Contact c
JOIN Customers_Demographics d ON c.tid = d.tid
JOIN Customers_Preferences p ON c.tid = p.tid
WHERE c.customer_id = 12345;
-- Only necessary fragments joined
 
 
-- ================================================================
-- HYBRID FRAGMENT LOCALIZATION
-- ================================================================
 
-- Given HV fragmentation:
-- Orders_NA_Billing: region IN (NA), billing attributes
-- Orders_NA_Shipping: region IN (NA), shipping attributes
-- Orders_EU_Billing: region IN (EU), billing attributes
-- Orders_EU_Shipping: region IN (EU), shipping attributes
 
-- Query: Get billing info for US orders
-- SELECT order_id, amount, tax FROM Orders WHERE region = 'US';
 
-- After localization:
-- 1. Eliminate EU fragments (region mismatch)
-- 2. Eliminate Shipping fragments (attributes not needed)
 
SELECT order_id, amount, tax
FROM Orders_NA_Billing
WHERE region = 'US';
-- Only one fragment accessed out of four!
 
 
-- ================================================================
-- LOCALIZATION OPTIMIZATION VIEW
-- ================================================================
 
-- Create a view that the optimizer can localize
CREATE VIEW global_orders AS
SELECT * FROM Orders_NA
UNION ALL
SELECT * FROM Orders_EU
UNION ALL
SELECT * FROM Orders_APAC;
 
-- PostgreSQL optimizer will perform constraint exclusion
-- when querying this view with partition-compatible predicates
EXPLAIN (COSTS OFF)
SELECT * FROM global_orders WHERE region = 'JP';
 
-- OUTPUT (ideal):
-- Append
--   ->  Seq Scan on orders_apac
--         Filter: (region = 'JP')
-- (NA and EU partitions excluded)

Enabling Optimizer Localization

For automatic localization: (1) Use declarative partitioning with CHECK constraints that match fragment definitions, (2) Enable constraint_exclusion or equivalent optimizer settings, (3) Use partition-compatible data types (avoid functions on partition keys), (4) Keep statistics updated with regular ANALYZE. The optimizer can only eliminate fragments when it can prove predicate contradiction.

Distributed Join Optimization

Vertical fragment reconstruction requires joins. In distributed systems, naive join execution transfers massive data volumes. Optimization techniques minimize this overhead.

The Problem:

Vertical fragments V₁ (at Site A) and V₂ (at Site B) must be joined:

V₁ has 10M rows, 100 bytes each (1 GB)
V₂ has 10M rows, 200 bytes each (2 GB)

Naive approach: Ship V₂ to Site A (2 GB transfer), perform local join.

But if the query selects only 1% of rows, we transferred 99% unnecessary data!

Semi-Join Reduction:

The semi-join technique reduces transfer by sending only relevant keys:

Project Keys at Source: Extract TIDs from qualifying rows at Site A
Send Key Set: Transfer only TIDs (much smaller than full rows)
Filter at Remote: Site B selects only rows matching received TIDs
Return Filtered Data: Ship only matching rows back to Site A
Local Join: Complete join locally with reduced data

Cost Comparison:

Let selectivity s = 0.01 (1% of rows qualify)

Approach	Data Transfer	Calculation
Naive	2 GB	Full V₂
Semi-join	~20 MB + 2 MB	10M × 8B × 0.01 keys + 10M × 200B × 0.01 rows

Semi-join reduces transfer by ~100x for selective queries!

semi_join_optimization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
-- Semi-Join Reduction Strategy
 
-- Scenario: Reconstruct customer data from vertical fragments
-- Customers_Contact (Site A): 10M rows, contains email for filtering
-- Customers_Financial (Site B): 10M rows, contains salary data
 
-- Query: Get names and salaries for customers with @gmail.com email
-- SELECT c.name, f.salary
-- FROM Customers_Contact c
-- JOIN Customers_Financial f ON c.tid = f.tid
-- WHERE c.email LIKE '%@gmail.com';
 
 
-- ================================================================
-- NAIVE APPROACH (what NOT to do)
-- ================================================================
 
-- Step 1: Ship all of Customers_Financial (10M rows × 200B = 2GB) to Site A
-- Step 2: Join locally
-- Step 3: Filter by email
-- Result: Most transferred data is discarded!
 
 
-- ================================================================
-- SEMI-JOIN APPROACH (optimized)
-- ================================================================
 
-- Step 1: At Site A, find qualifying TIDs (site-local query)
CREATE TEMP TABLE qualifying_tids AS
SELECT tid
FROM Customers_Contact
WHERE email LIKE '%@gmail.com';
-- Result: ~100K TIDs (1% selectivity, 8 bytes each = 800KB)
 
 
-- Step 2: Send TID set to Site B (800KB transfer)
-- (In practice, this might be a federated query or explicit data transfer)
 
 
-- Step 3: At Site B, filter Customers_Financial using received TIDs
CREATE TEMP TABLE filtered_financial AS
SELECT f.tid, f.salary
FROM Customers_Financial f
WHERE f.tid IN (SELECT tid FROM qualifying_tids);
-- Result: ~100K rows (1% of table, 100K × 200B = 20MB)
 
 
-- Step 4: Send filtered result back to Site A (20MB transfer)
 
 
-- Step 5: At Site A, complete the join locally
SELECT c.name, f.salary
FROM Customers_Contact c
JOIN filtered_financial f ON c.tid = f.tid
WHERE c.email LIKE '%@gmail.com';
 
 
-- Total transfer: 800KB + 20MB ≈ 21MB
-- vs. Naive: 2GB
-- Savings: ~99%!
 
 
-- ================================================================
-- BLOOM FILTER OPTIMIZATION
-- ================================================================
 
-- For very large TID sets, sending the actual set is expensive
-- Use a Bloom filter: probabilistic set membership with false positives
 
-- Build Bloom filter for qualifying TIDs (fixed size, e.g., 1MB)
-- Send Bloom filter instead of TID list
 
-- At Site B, filter using Bloom filter
-- Some false positives will be included, but join will filter them out
 
-- PostgreSQL extension for Bloom filters
CREATE EXTENSION bloom;
 
-- Create bloom index on TID
CREATE INDEX idx_financial_tid_bloom ON Customers_Financial 
    USING bloom (tid) WITH (length=1024);
 
-- With proper integration, the optimizer can use Bloom filters
-- for semi-join reduction in distributed queries

When to Use Semi-Join Reduction

•Low Selectivity Queries — When query predicates eliminate most rows, key transfer is much cheaper than full data transfer.
•Asymmetric Fragment Sizes — When the filtered fragment is much smaller than the joining fragment.
•High Network Latency/Cost — When WAN or metered connections make data transfer expensive.
•Index Availability — When the remote site can efficiently filter using the received key set.
•NOT When: High selectivity (most rows qualify), symmetric small fragments, or very fast local network.

Materialized Reconstruction

For frequently-executed reconstruction queries, materialized views pre-compute and store the reconstructed data, trading storage for query performance.

Materialized View Strategy:

Full Reconstruction Materialization:
- Store the complete reconstructed table
- Best for: frequently accessed, rarely updated data
- Cost: full duplication of data
Partial Reconstruction Materialization:
- Store commonly-reconstructed subsets
- Best for: specific query patterns with predictable needs
- Cost: requires selection of which views to materialize
Incremental Materialization:
- Update materialized view as base fragments change
- Best for: high update frequencies requiring fresh data
- Cost: maintenance overhead for each update

Trade-off Analysis:

Factor	On-Demand Reconstruction	Materialized View
Query Latency	Higher (compute at query time)	Lower (pre-computed)
Storage	None (use fragments)	Full copy of reconstructed data
Freshness	Always current	Depends on refresh strategy
Maintenance	None	Refresh costs (incremental or full)
Flexibility	Any query pattern	Only pre-defined patterns

materialized_reconstruction.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
-- Materialized View Reconstruction Strategies
 
-- ================================================================
-- FULL RECONSTRUCTION MATERIALIZATION
-- ================================================================
 
-- Vertical fragments:
-- Customers_Contact: tid, customer_id, name, email
-- Customers_Demographics: tid, age, income_bracket
-- Customers_Activity: tid, last_login, purchase_count
 
-- Materialized view for complete customer record
CREATE MATERIALIZED VIEW customers_full AS
SELECT 
    c.tid,
    c.customer_id,
    c.name,
    c.email,
    d.age,
    d.income_bracket,
    a.last_login,
    a.purchase_count
FROM Customers_Contact c
JOIN Customers_Demographics d ON c.tid = d.tid
JOIN Customers_Activity a ON c.tid = a.tid
WITH DATA;
 
-- Create indexes for common query patterns
CREATE INDEX idx_customers_full_email ON customers_full(email);
CREATE INDEX idx_customers_full_income ON customers_full(income_bracket);
CREATE INDEX idx_customers_full_active ON customers_full(last_login) 
    WHERE last_login > CURRENT_DATE - INTERVAL '30 days';
 
-- Refresh strategy: full refresh nightly
REFRESH MATERIALIZED VIEW customers_full;
 
 
-- ================================================================
-- PARTIAL RECONSTRUCTION FOR HOT QUERIES
-- ================================================================
 
-- Horizontal fragments by region (NA, EU, APAC)
-- Common report needs only NA data with specific attributes
 
CREATE MATERIALIZED VIEW customers_na_report AS
SELECT 
    customer_id,
    name,
    email,
    state,
    revenue_ytd,
    last_order_date
FROM Customers_NA
WHERE status = 'active'
WITH DATA;
 
-- Refresh more frequently for operational data
-- Using CONCURRENTLY for zero-downtime refresh
CREATE UNIQUE INDEX idx_customers_na_report_pk ON customers_na_report(customer_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY customers_na_report;
 
 
-- ================================================================
-- INCREMENTAL MATERIALIZATION
-- ================================================================
 
-- For real-time or near-real-time requirements
-- Use triggers to maintain materialized view
 
-- Create the materialized table manually
CREATE TABLE customers_realtime (
    tid BIGINT PRIMARY KEY,
    customer_id BIGINT UNIQUE,
    name VARCHAR(100),
    email VARCHAR(255),
    age INTEGER,
    last_login TIMESTAMP,
    updated_at TIMESTAMP DEFAULT NOW()
);
 
-- Trigger function to propagate changes
CREATE OR REPLACE FUNCTION sync_customers_realtime()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_TABLE_NAME = 'customers_contact' THEN
        INSERT INTO customers_realtime (tid, customer_id, name, email)
        VALUES (NEW.tid, NEW.customer_id, NEW.name, NEW.email)
        ON CONFLICT (tid) DO UPDATE SET
            customer_id = EXCLUDED.customer_id,
            name = EXCLUDED.name,
            email = EXCLUDED.email,
            updated_at = NOW();
    ELSIF TG_TABLE_NAME = 'customers_demographics' THEN
        UPDATE customers_realtime 
        SET age = NEW.age, updated_at = NOW()
        WHERE tid = NEW.tid;
    ELSIF TG_TABLE_NAME = 'customers_activity' THEN
        UPDATE customers_realtime 
        SET last_login = NEW.last_login, updated_at = NOW()
        WHERE tid = NEW.tid;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
-- Attach triggers to vertical fragments
CREATE TRIGGER tr_sync_contact
AFTER INSERT OR UPDATE ON customers_contact
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
CREATE TRIGGER tr_sync_demographics
AFTER INSERT OR UPDATE ON customers_demographics
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
CREATE TRIGGER tr_sync_activity
AFTER INSERT OR UPDATE ON customers_activity
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
 
-- ================================================================
-- AGGREGATED RECONSTRUCTION
-- ================================================================
 
-- Sometimes full reconstruction isn't needed—only aggregates
 
CREATE MATERIALIZED VIEW regional_sales_summary AS
SELECT 
    region,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM (
    SELECT * FROM Orders_NA
    UNION ALL
    SELECT * FROM Orders_EU
    UNION ALL
    SELECT * FROM Orders_APAC
) all_orders
GROUP BY region, DATE_TRUNC('month', order_date)
WITH DATA;
 
-- Much smaller than full reconstruction!
-- Refresh daily for reporting freshness
REFRESH MATERIALIZED VIEW regional_sales_summary;

Staleness Risk

Materialized views can become stale between refreshes. For decision-critical queries, verify freshness or use real-time reconstruction. Document the expected staleness for each materialized view. Consider 'last refreshed' metadata visible to users. For financial or compliance data, staleness may be unacceptable.

Consistency During Reconstruction

In distributed systems, fragments may be updated independently. Reconstruction must address potential inconsistencies between fragments.

Consistency Challenges:

Temporal Inconsistency:
- Fragment A was updated at T1
- Fragment B was updated at T2
- Reconstruction at T3 sees different snapshots
Partial Updates:
- A distributed transaction is in progress
- Some fragments reflect new values, others old
- Reconstruction sees mixed state
Replication Lag:
- Fragment replicas haven't converged
- Reconstruction from different replicas yields different results

Consistency Levels for Reconstruction:

Level	Definition	Implementation
Strong	Reconstruction reflects a single consistent snapshot	Distributed transactions, snapshot isolation
Session	Each session sees its own writes consistently	Read-your-writes guarantee
Eventual	Reconstruction eventually converges	Tolerate temporary inconsistency
Causal	Causally related writes seen in order	Vector clocks, causal+ consistency

Achieving Strong Consistency:

Distributed Snapshot Isolation:
- Use MVCC with global transaction IDs
- Each query reads from a consistent snapshot
- Fragments maintain version history
Two-Phase Locking:
- Acquire locks across all fragments before reading
- Ensures serializable consistency
- High latency, reduced availability
Coordination Services:
- Use services like Google Spanner's TrueTime
- Global ordering of transactions
- Enables consistent reads without locks

consistent_reconstruction.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
-- Consistent Reconstruction Strategies
 
-- ================================================================
-- SNAPSHOT ISOLATION
-- ================================================================
 
-- Use a single transaction with REPEATABLE READ for consistent reconstruction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
-- All these queries see the same snapshot
SELECT c.name, c.email 
FROM Customers_Contact c 
WHERE c.tid IN (SELECT tid FROM active_customers);
 
SELECT d.age, d.income_bracket 
FROM Customers_Demographics d 
WHERE d.tid IN (SELECT tid FROM active_customers);
 
SELECT a.last_login, a.purchase_count 
FROM Customers_Activity a 
WHERE a.tid IN (SELECT tid FROM active_customers);
 
COMMIT;
-- All three queries see database as of transaction start
 
 
-- ================================================================
-- VERSIONED RECONSTRUCTION
-- ================================================================
 
-- Add version/timestamp metadata to fragments
ALTER TABLE Customers_Contact ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
ALTER TABLE Customers_Demographics ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
ALTER TABLE Customers_Activity ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
 
-- Reconstruct as of a specific point in time
-- (Requires temporal tables or versioning extension)
 
-- PostgreSQL temporal tables approach
CREATE EXTENSION temporal_tables;
 
-- Create history tables for versioning
CREATE TABLE Customers_Contact_History (LIKE Customers_Contact);
ALTER TABLE Customers_Contact_History ADD COLUMN valid_from TIMESTAMP;
ALTER TABLE Customers_Contact_History ADD COLUMN valid_until TIMESTAMP;
 
-- Query as of specific timestamp
CREATE OR REPLACE FUNCTION reconstruct_customer_at(
    p_tid BIGINT,
    p_as_of TIMESTAMP
) RETURNS TABLE (
    name VARCHAR,
    email VARCHAR,
    age INTEGER,
    last_login TIMESTAMP
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        c.name, c.email, d.age, a.last_login
    FROM 
        (SELECT * FROM Customers_Contact_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) c
    JOIN
        (SELECT * FROM Customers_Demographics_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) d
        ON c.tid = d.tid
    JOIN
        (SELECT * FROM Customers_Activity_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) a
        ON c.tid = a.tid;
END;
$$ LANGUAGE plpgsql;
 
 
-- ================================================================
-- LAG DETECTION
-- ================================================================
 
-- Monitor reconstruction consistency by tracking fragment timestamps
 
CREATE TABLE fragment_metadata (
    fragment_name VARCHAR PRIMARY KEY,
    last_write_ts TIMESTAMP,
    replica_lag_ms INTEGER
);
 
-- Update metadata on each write (via triggers or application)
 
-- Before reconstruction, check for unacceptable lag
CREATE OR REPLACE FUNCTION check_reconstruction_consistency(
    max_lag_ms INTEGER DEFAULT 1000
) RETURNS BOOLEAN AS $$
DECLARE
    max_observed_lag INTEGER;
BEGIN
    SELECT MAX(replica_lag_ms) INTO max_observed_lag
    FROM fragment_metadata
    WHERE fragment_name IN ('Customers_Contact', 'Customers_Demographics', 'Customers_Activity');
    
    IF max_observed_lag > max_lag_ms THEN
        RAISE WARNING 'Reconstruction may be inconsistent. Max lag: % ms', max_observed_lag;
        RETURN FALSE;
    END IF;
    
    RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
 
-- Usage before reconstruction queries
DO $$
BEGIN
    IF NOT check_reconstruction_consistency(500) THEN
        RAISE EXCEPTION 'Fragments too inconsistent for reconstruction';
    END IF;
END $$;

Consistency vs. Availability Trade-off

Per the CAP theorem, strong consistency during reconstruction may reduce availability. If a fragment's primary is unreachable, you can either: (1) Wait until available (sacrifice availability), (2) Read from a stale replica (sacrifice consistency), or (3) Return partial results with warnings (application-level trade-off). The right choice depends on your application's requirements.

Reconstruction Performance Patterns

Production systems employ various patterns to optimize reconstruction performance.

Pattern 1: Parallel Fragment Access

For horizontal reconstruction, access all fragments concurrently:

Fork:
  Thread 1: Query Fragment_NA
  Thread 2: Query Fragment_EU
  Thread 3: Query Fragment_APAC
Join:
  Union results as they complete

Latency = max(fragment latencies) not sum(fragment latencies)

Pattern 2: Pipelined Reconstruction

Begin processing results as they arrive instead of waiting for complete reconstruction:

1. Start Fragment_1 query
2. As rows arrive, begin filtering/projecting
3. Stream results to client
4. Fragment_2, _3 results processed similarly
5. No full materialization of reconstructed table

Reduces memory usage and time-to-first-result.

Pattern 3: Predicate Distribution

For queries with multiple predicates, distribute predicates to eliminate fragments early:

Query: WHERE region = 'US' AND income > 100000

Region predicate → eliminates non-NA fragments
Income predicate → applied only to NA fragment

No cross-fragment communication for eliminated fragments

Pattern 4: Batched Reconstruction

For bulk reconstruction (reports, exports), batch processing reduces per-record overhead:

1. Collect TIDs in batches of 10,000
2. Reconstruct batch via single join query
3. Process batch to output
4. Repeat for next batch

reconstruction_patterns.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import asyncio
import asyncpg
from typing import List, AsyncIterator
from dataclasses import dataclass
 
@dataclass
class FragmentConfig:
    name: str
    connection_string: str
    predicate: str
 
class DistributedReconstructor:
    """Parallel and pipelined reconstruction from distributed fragments."""
    
    def __init__(self, fragments: List[FragmentConfig]):
        self.fragments = fragments
        self.pools = {}
    
    async def initialize(self):
        """Create connection pools for each fragment site."""
        for frag in self.fragments:
            self.pools[frag.name] = await asyncpg.create_pool(
                frag.connection_string,
                min_size=2,
                max_size=10
            )
    
    async def close(self):
        """Close all connection pools."""
        for pool in self.pools.values():
            await pool.close()
    
    async def reconstruct_horizontal(
        self,
        query_predicates: dict,
        attributes: List[str]
    ) -> AsyncIterator[dict]:
        """
        Reconstruct from horizontal fragments with parallel access.
        Yields results as they arrive (pipelined).
        """
        # Build query for each fragment
        async def query_fragment(frag: FragmentConfig):
            # Check if fragment can be eliminated
            if not self._predicates_compatible(query_predicates, frag.predicate):
                return  # Fragment eliminated
            
            pool = self.pools[frag.name]
            async with pool.acquire() as conn:
                # Build and execute query
                sql = self._build_query(frag.name, attributes, query_predicates)
                async for record in conn.cursor(sql):
                    yield record
        
        # Execute all fragment queries in parallel
        tasks = [query_fragment(f) for f in self.fragments]
        
        # Merge results as they complete (pipelined)
        for task in asyncio.as_completed([
            asyncio.create_task(self._collect(t)) for t in tasks
        ]):
            records = await task
            for record in records:
                yield record
    
    async def reconstruct_vertical(
        self,
        tid_filter: str,
        fragment_attributes: dict  # {fragment_name: [attributes]}
    ) -> AsyncIterator[dict]:
        """
        Reconstruct from vertical fragments using parallel access and merge.
        """
        async def query_vertical_fragment(frag_name: str, attrs: List[str]):
            pool = self.pools[frag_name]
            async with pool.acquire() as conn:
                sql = f"""
                    SELECT tid, {', '.join(attrs)}
                    FROM {frag_name}
                    WHERE {tid_filter}
                """
                return await conn.fetch(sql)
        
        # Query all vertical fragments in parallel
        tasks = {
            frag: asyncio.create_task(
                query_vertical_fragment(frag, attrs)
            )
            for frag, attrs in fragment_attributes.items()
        }
        
        # Wait for all results
        results = {}
        for frag, task in tasks.items():
            results[frag] = {r['tid']: dict(r) for r in await task}
        
        # Merge on TID
        primary_frag = list(fragment_attributes.keys())[0]
        for tid, primary_record in results[primary_frag].items():
            merged = primary_record.copy()
            for frag in list(fragment_attributes.keys())[1:]:
                if tid in results[frag]:
                    merged.update(results[frag][tid])
            yield merged
    
    def _predicates_compatible(self, query_pred: dict, frag_pred: str) -> bool:
        """Check if query predicates are compatible with fragment definition."""
        # Simplified - real implementation needs SQL predicate analysis
        return True  # Placeholder
    
    def _build_query(self, table: str, attrs: List[str], preds: dict) -> str:
        """Build SQL query for fragment."""
        where_clause = ' AND '.join(f"{k} = ${i + 1}" for i, k in enumerate(preds))
        return f"SELECT {', '.join(attrs)} FROM {table} WHERE {where_clause}"
    
    async def _collect(self, async_gen) -> List:
        """Collect async generator results into list."""
        return [item async for item in async_gen]
 
 
# Usage example
async def main():
    fragments = [
        FragmentConfig("orders_na", "postgresql://db1/orders", "region IN ('US', 'CA')"),
        FragmentConfig("orders_eu", "postgresql://db2/orders", "region IN ('UK', 'DE')"),
        FragmentConfig("orders_apac", "postgresql://db3/orders", "region IN ('JP', 'AU')"),
    ]
    
    reconstructor = DistributedReconstructor(fragments)
    await reconstructor.initialize()
    
    try:
        # Parallel reconstruction with predicates
        async for order in reconstructor.reconstruct_horizontal(
            {"region": "US", "status": "pending"},
            ["order_id", "customer_id", "amount"]
        ):
            print(f"Order: {order}")
    finally:
        await reconstructor.close()
 
# asyncio.run(main())

Do: Performance Best Practices

•Parallelize fragment access where possible
•Stream results to reduce memory and latency
•Push predicates to fragments for early elimination
•Cache TID sets for repeated semi-join patterns
•Index TIDs in all vertical fragments

Don't: Common Mistakes

•Don't reconstruct full table when only subset needed
•Don't ignore network cost in optimization decisions
•Don't serialize what can be parallelized
•Don't materialize intermediate results unnecessarily
•Don't assume optimizer handles distributed joins optimally

Summary: Reconstruction Mastery

Reconstruction completes the fragmentation story—ensuring that distributed data can be seamlessly reunited when needed. Let's consolidate the key concepts:

Key Takeaways

•Formal Operators — Union reconstructs horizontal fragments; join on TID reconstructs vertical fragments. Hybrid fragmentation combines both operators appropriately.
•Query Localization — The goal is to AVOID full reconstruction. Push predicates, eliminate contradicting fragments, and project only needed attributes.
•Semi-Join Reduction — For distributed vertical joins, send key sets instead of full data. Reduces transfer by orders of magnitude for selective queries.
•Materialized Views — Pre-compute reconstruction for frequent patterns. Trade storage and freshness for query performance.
•Consistency Matters — Distributed fragments may be inconsistent. Choose appropriate consistency level; use snapshot isolation for strong guarantees.
•Performance Patterns — Parallelize fragment access, pipeline results, distribute predicates, and batch bulk operations for optimal performance.

Module Complete:

You've now completed the Fragmentation module. You understand horizontal, vertical, and hybrid fragmentation strategies; formal correctness properties; design methodologies; allocation optimization; and reconstruction techniques. This comprehensive knowledge enables you to design, implement, and operate fragmented distributed database systems that scale while maintaining correctness and performance.

Module Complete: Fragmentation

Congratulations! You've mastered fragmentation—the foundational technique for distributed database design. From formal properties through practical implementation, you can now partition data strategically across distributed nodes while ensuring complete, consistent reconstruction. Next, explore Replication to understand how to maintain multiple copies of fragments for availability and performance.

5 / 5

Loading learning content...

Distributed DatabasesFragmentation

Data Fragmentation in Distributed Databases

LevelAdvanced

Duration90 mins

TopicFragmentation

5 / 5

Reconstruction

Reassembling the Distributed Puzzle

What You Will Learn

Formal Reconstruction Theory

Reconstruction operators mirror fragmentation operators. Understanding the formal relationship ensures correct and complete data retrieval.

Horizontal Fragmentation Reconstruction:

Given horizontal fragments R₁, R₂, ..., Rₙ where each Rᵢ = σ(pᵢ)(R):

Reconstruction: R = R₁ ∪ R₂ ∪ ... ∪ Rₙ

Vertical Fragmentation Reconstruction:

Given vertical fragments R₁, R₂, ..., Rₙ where each Rᵢ = π(K ∪ Aᵢ)(R) and K is the tuple identifier:

Reconstruction: R = R₁ ⋈ R₂ ⋈ ... ⋈ Rₙ (natural join on K)

Hybrid Fragmentation Reconstruction:

For HV (horizontal-then-vertical) fragmentation with fragments Fᵢⱼ:

Reconstruction: R = ⋈ⱼ (∪ᵢ Fᵢⱼ)

First union all horizontal fragments within each vertical group, then join the vertical results.

For VH (vertical-then-horizontal), the order reverses:

Reconstruction: R = ∪ᵢ (⋈ⱼ Fⱼᵢ)

First join all vertical fragments within each horizontal partition, then union the horizontal results.

Reconstruction Operators Summary
Fragmentation Type	Reconstruction Operator	Complexity Factor	Key Requirement
Horizontal	Union (∪)	O(n) fragments	Disjoint predicates (no duplicates)
Vertical	Natural Join (⋈ on TID)	O(n) joins	TID present in all fragments
Hybrid HV	Join(Union per vertical)	O(h × v)	Both properties above
Hybrid VH	Union(Join per horizontal)	O(h × v)	Both properties above

Reconstruction Equivalence

Query Localization and Fragment Reduction

The Localization Program:

Transforming a query on global relation R into queries on fragments:

Replace Global Relation: Substitute R with its reconstruction expression
Push Down Selections: Move σ predicates into fragment access
Eliminate Empty Fragments: Remove fragments whose predicates contradict query predicates
Push Down Projections: Limit attributes retrieved from vertical fragments
Optimize Remaining Expression: Apply standard query optimization

Example: Horizontal Fragment Localization

Global query:

SELECT * FROM Orders WHERE region = 'US' AND amount > 1000

Fragments:

Orders_NA: region IN ('US', 'CA', 'MX')
Orders_EU: region IN ('UK', 'DE', 'FR')
Orders_APAC: region IN ('JP', 'CN', 'AU')

Localization steps:

Replace: Orders → Orders_NA ∪ Orders_EU ∪ Orders_APAC
Push selection: σ(region='US')
Eliminate: Orders_EU and Orders_APAC eliminated (region = 'US' contradicts their predicates)
Result: σ(region='US' ∧ amount>1000)(Orders_NA)

The query accesses only one fragment instead of three—significant savings in distributed systems.

query_localization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
-- Query Localization Examples
 
-- ================================================================
-- HORIZONTAL FRAGMENT LOCALIZATION
-- ================================================================
 
-- Original global query
-- SELECT customer_name, total_amount
-- FROM Orders
-- WHERE region = 'DE' AND order_date >= '2024-01-01';
 
-- After localization (only EU fragment accessed):
SELECT customer_name, total_amount
FROM Orders_EU
WHERE region = 'DE' AND order_date >= '2024-01-01';
-- Orders_NA and Orders_APAC eliminated
 
 
-- ================================================================
-- VERTICAL FRAGMENT LOCALIZATION
-- ================================================================
 
-- Given vertical fragments:
-- Customers_Contact: tid, customer_id, name, email, phone
-- Customers_Demographics: tid, age, gender, income_bracket
-- Customers_Preferences: tid, newsletter, language, timezone
 
-- Original query only needs contact info:
-- SELECT name, email FROM Customers WHERE customer_id = 12345;
 
-- After localization (only Contact fragment accessed):
SELECT name, email
FROM Customers_Contact
WHERE customer_id = 12345;
-- No join needed! Demographics and Preferences fragments not accessed
 
 
-- Query needing multiple vertical fragments:
-- SELECT name, age, newsletter FROM Customers WHERE customer_id = 12345;
 
-- After localization (two fragments, minimal attributes):
SELECT c.name, d.age, p.newsletter
FROM Customers_Contact c
JOIN Customers_Demographics d ON c.tid = d.tid
JOIN Customers_Preferences p ON c.tid = p.tid
WHERE c.customer_id = 12345;
-- Only necessary fragments joined
 
 
-- ================================================================
-- HYBRID FRAGMENT LOCALIZATION
-- ================================================================
 
-- Given HV fragmentation:
-- Orders_NA_Billing: region IN (NA), billing attributes
-- Orders_NA_Shipping: region IN (NA), shipping attributes
-- Orders_EU_Billing: region IN (EU), billing attributes
-- Orders_EU_Shipping: region IN (EU), shipping attributes
 
-- Query: Get billing info for US orders
-- SELECT order_id, amount, tax FROM Orders WHERE region = 'US';
 
-- After localization:
-- 1. Eliminate EU fragments (region mismatch)
-- 2. Eliminate Shipping fragments (attributes not needed)
 
SELECT order_id, amount, tax
FROM Orders_NA_Billing
WHERE region = 'US';
-- Only one fragment accessed out of four!
 
 
-- ================================================================
-- LOCALIZATION OPTIMIZATION VIEW
-- ================================================================
 
-- Create a view that the optimizer can localize
CREATE VIEW global_orders AS
SELECT * FROM Orders_NA
UNION ALL
SELECT * FROM Orders_EU
UNION ALL
SELECT * FROM Orders_APAC;
 
-- PostgreSQL optimizer will perform constraint exclusion
-- when querying this view with partition-compatible predicates
EXPLAIN (COSTS OFF)
SELECT * FROM global_orders WHERE region = 'JP';
 
-- OUTPUT (ideal):
-- Append
--   ->  Seq Scan on orders_apac
--         Filter: (region = 'JP')
-- (NA and EU partitions excluded)

Enabling Optimizer Localization

Distributed Join Optimization

Vertical fragment reconstruction requires joins. In distributed systems, naive join execution transfers massive data volumes. Optimization techniques minimize this overhead.

The Problem:

Vertical fragments V₁ (at Site A) and V₂ (at Site B) must be joined:

V₁ has 10M rows, 100 bytes each (1 GB)
V₂ has 10M rows, 200 bytes each (2 GB)

Naive approach: Ship V₂ to Site A (2 GB transfer), perform local join.

But if the query selects only 1% of rows, we transferred 99% unnecessary data!

Semi-Join Reduction:

The semi-join technique reduces transfer by sending only relevant keys:

Project Keys at Source: Extract TIDs from qualifying rows at Site A
Send Key Set: Transfer only TIDs (much smaller than full rows)
Filter at Remote: Site B selects only rows matching received TIDs
Return Filtered Data: Ship only matching rows back to Site A
Local Join: Complete join locally with reduced data

Cost Comparison:

Let selectivity s = 0.01 (1% of rows qualify)

Approach	Data Transfer	Calculation
Naive	2 GB	Full V₂
Semi-join	~20 MB + 2 MB	10M × 8B × 0.01 keys + 10M × 200B × 0.01 rows

Semi-join reduces transfer by ~100x for selective queries!

semi_join_optimization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
-- Semi-Join Reduction Strategy
 
-- Scenario: Reconstruct customer data from vertical fragments
-- Customers_Contact (Site A): 10M rows, contains email for filtering
-- Customers_Financial (Site B): 10M rows, contains salary data
 
-- Query: Get names and salaries for customers with @gmail.com email
-- SELECT c.name, f.salary
-- FROM Customers_Contact c
-- JOIN Customers_Financial f ON c.tid = f.tid
-- WHERE c.email LIKE '%@gmail.com';
 
 
-- ================================================================
-- NAIVE APPROACH (what NOT to do)
-- ================================================================
 
-- Step 1: Ship all of Customers_Financial (10M rows × 200B = 2GB) to Site A
-- Step 2: Join locally
-- Step 3: Filter by email
-- Result: Most transferred data is discarded!
 
 
-- ================================================================
-- SEMI-JOIN APPROACH (optimized)
-- ================================================================
 
-- Step 1: At Site A, find qualifying TIDs (site-local query)
CREATE TEMP TABLE qualifying_tids AS
SELECT tid
FROM Customers_Contact
WHERE email LIKE '%@gmail.com';
-- Result: ~100K TIDs (1% selectivity, 8 bytes each = 800KB)
 
 
-- Step 2: Send TID set to Site B (800KB transfer)
-- (In practice, this might be a federated query or explicit data transfer)
 
 
-- Step 3: At Site B, filter Customers_Financial using received TIDs
CREATE TEMP TABLE filtered_financial AS
SELECT f.tid, f.salary
FROM Customers_Financial f
WHERE f.tid IN (SELECT tid FROM qualifying_tids);
-- Result: ~100K rows (1% of table, 100K × 200B = 20MB)
 
 
-- Step 4: Send filtered result back to Site A (20MB transfer)
 
 
-- Step 5: At Site A, complete the join locally
SELECT c.name, f.salary
FROM Customers_Contact c
JOIN filtered_financial f ON c.tid = f.tid
WHERE c.email LIKE '%@gmail.com';
 
 
-- Total transfer: 800KB + 20MB ≈ 21MB
-- vs. Naive: 2GB
-- Savings: ~99%!
 
 
-- ================================================================
-- BLOOM FILTER OPTIMIZATION
-- ================================================================
 
-- For very large TID sets, sending the actual set is expensive
-- Use a Bloom filter: probabilistic set membership with false positives
 
-- Build Bloom filter for qualifying TIDs (fixed size, e.g., 1MB)
-- Send Bloom filter instead of TID list
 
-- At Site B, filter using Bloom filter
-- Some false positives will be included, but join will filter them out
 
-- PostgreSQL extension for Bloom filters
CREATE EXTENSION bloom;
 
-- Create bloom index on TID
CREATE INDEX idx_financial_tid_bloom ON Customers_Financial 
    USING bloom (tid) WITH (length=1024);
 
-- With proper integration, the optimizer can use Bloom filters
-- for semi-join reduction in distributed queries

When to Use Semi-Join Reduction

•Low Selectivity Queries — When query predicates eliminate most rows, key transfer is much cheaper than full data transfer.
•Asymmetric Fragment Sizes — When the filtered fragment is much smaller than the joining fragment.
•High Network Latency/Cost — When WAN or metered connections make data transfer expensive.
•Index Availability — When the remote site can efficiently filter using the received key set.
•NOT When: High selectivity (most rows qualify), symmetric small fragments, or very fast local network.

Materialized Reconstruction

For frequently-executed reconstruction queries, materialized views pre-compute and store the reconstructed data, trading storage for query performance.

Materialized View Strategy:

Full Reconstruction Materialization:
- Store the complete reconstructed table
- Best for: frequently accessed, rarely updated data
- Cost: full duplication of data
Partial Reconstruction Materialization:
- Store commonly-reconstructed subsets
- Best for: specific query patterns with predictable needs
- Cost: requires selection of which views to materialize
Incremental Materialization:
- Update materialized view as base fragments change
- Best for: high update frequencies requiring fresh data
- Cost: maintenance overhead for each update

Trade-off Analysis:

Factor	On-Demand Reconstruction	Materialized View
Query Latency	Higher (compute at query time)	Lower (pre-computed)
Storage	None (use fragments)	Full copy of reconstructed data
Freshness	Always current	Depends on refresh strategy
Maintenance	None	Refresh costs (incremental or full)
Flexibility	Any query pattern	Only pre-defined patterns

materialized_reconstruction.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
-- Materialized View Reconstruction Strategies
 
-- ================================================================
-- FULL RECONSTRUCTION MATERIALIZATION
-- ================================================================
 
-- Vertical fragments:
-- Customers_Contact: tid, customer_id, name, email
-- Customers_Demographics: tid, age, income_bracket
-- Customers_Activity: tid, last_login, purchase_count
 
-- Materialized view for complete customer record
CREATE MATERIALIZED VIEW customers_full AS
SELECT 
    c.tid,
    c.customer_id,
    c.name,
    c.email,
    d.age,
    d.income_bracket,
    a.last_login,
    a.purchase_count
FROM Customers_Contact c
JOIN Customers_Demographics d ON c.tid = d.tid
JOIN Customers_Activity a ON c.tid = a.tid
WITH DATA;
 
-- Create indexes for common query patterns
CREATE INDEX idx_customers_full_email ON customers_full(email);
CREATE INDEX idx_customers_full_income ON customers_full(income_bracket);
CREATE INDEX idx_customers_full_active ON customers_full(last_login) 
    WHERE last_login > CURRENT_DATE - INTERVAL '30 days';
 
-- Refresh strategy: full refresh nightly
REFRESH MATERIALIZED VIEW customers_full;
 
 
-- ================================================================
-- PARTIAL RECONSTRUCTION FOR HOT QUERIES
-- ================================================================
 
-- Horizontal fragments by region (NA, EU, APAC)
-- Common report needs only NA data with specific attributes
 
CREATE MATERIALIZED VIEW customers_na_report AS
SELECT 
    customer_id,
    name,
    email,
    state,
    revenue_ytd,
    last_order_date
FROM Customers_NA
WHERE status = 'active'
WITH DATA;
 
-- Refresh more frequently for operational data
-- Using CONCURRENTLY for zero-downtime refresh
CREATE UNIQUE INDEX idx_customers_na_report_pk ON customers_na_report(customer_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY customers_na_report;
 
 
-- ================================================================
-- INCREMENTAL MATERIALIZATION
-- ================================================================
 
-- For real-time or near-real-time requirements
-- Use triggers to maintain materialized view
 
-- Create the materialized table manually
CREATE TABLE customers_realtime (
    tid BIGINT PRIMARY KEY,
    customer_id BIGINT UNIQUE,
    name VARCHAR(100),
    email VARCHAR(255),
    age INTEGER,
    last_login TIMESTAMP,
    updated_at TIMESTAMP DEFAULT NOW()
);
 
-- Trigger function to propagate changes
CREATE OR REPLACE FUNCTION sync_customers_realtime()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_TABLE_NAME = 'customers_contact' THEN
        INSERT INTO customers_realtime (tid, customer_id, name, email)
        VALUES (NEW.tid, NEW.customer_id, NEW.name, NEW.email)
        ON CONFLICT (tid) DO UPDATE SET
            customer_id = EXCLUDED.customer_id,
            name = EXCLUDED.name,
            email = EXCLUDED.email,
            updated_at = NOW();
    ELSIF TG_TABLE_NAME = 'customers_demographics' THEN
        UPDATE customers_realtime 
        SET age = NEW.age, updated_at = NOW()
        WHERE tid = NEW.tid;
    ELSIF TG_TABLE_NAME = 'customers_activity' THEN
        UPDATE customers_realtime 
        SET last_login = NEW.last_login, updated_at = NOW()
        WHERE tid = NEW.tid;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
-- Attach triggers to vertical fragments
CREATE TRIGGER tr_sync_contact
AFTER INSERT OR UPDATE ON customers_contact
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
CREATE TRIGGER tr_sync_demographics
AFTER INSERT OR UPDATE ON customers_demographics
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
CREATE TRIGGER tr_sync_activity
AFTER INSERT OR UPDATE ON customers_activity
FOR EACH ROW EXECUTE FUNCTION sync_customers_realtime();
 
 
-- ================================================================
-- AGGREGATED RECONSTRUCTION
-- ================================================================
 
-- Sometimes full reconstruction isn't needed—only aggregates
 
CREATE MATERIALIZED VIEW regional_sales_summary AS
SELECT 
    region,
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM (
    SELECT * FROM Orders_NA
    UNION ALL
    SELECT * FROM Orders_EU
    UNION ALL
    SELECT * FROM Orders_APAC
) all_orders
GROUP BY region, DATE_TRUNC('month', order_date)
WITH DATA;
 
-- Much smaller than full reconstruction!
-- Refresh daily for reporting freshness
REFRESH MATERIALIZED VIEW regional_sales_summary;

Staleness Risk

Consistency During Reconstruction

In distributed systems, fragments may be updated independently. Reconstruction must address potential inconsistencies between fragments.

Consistency Challenges:

Temporal Inconsistency:
- Fragment A was updated at T1
- Fragment B was updated at T2
- Reconstruction at T3 sees different snapshots
Partial Updates:
- A distributed transaction is in progress
- Some fragments reflect new values, others old
- Reconstruction sees mixed state
Replication Lag:
- Fragment replicas haven't converged
- Reconstruction from different replicas yields different results

Consistency Levels for Reconstruction:

Level	Definition	Implementation
Strong	Reconstruction reflects a single consistent snapshot	Distributed transactions, snapshot isolation
Session	Each session sees its own writes consistently	Read-your-writes guarantee
Eventual	Reconstruction eventually converges	Tolerate temporary inconsistency
Causal	Causally related writes seen in order	Vector clocks, causal+ consistency

Achieving Strong Consistency:

Distributed Snapshot Isolation:
- Use MVCC with global transaction IDs
- Each query reads from a consistent snapshot
- Fragments maintain version history
Two-Phase Locking:
- Acquire locks across all fragments before reading
- Ensures serializable consistency
- High latency, reduced availability
Coordination Services:
- Use services like Google Spanner's TrueTime
- Global ordering of transactions
- Enables consistent reads without locks

consistent_reconstruction.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
-- Consistent Reconstruction Strategies
 
-- ================================================================
-- SNAPSHOT ISOLATION
-- ================================================================
 
-- Use a single transaction with REPEATABLE READ for consistent reconstruction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
-- All these queries see the same snapshot
SELECT c.name, c.email 
FROM Customers_Contact c 
WHERE c.tid IN (SELECT tid FROM active_customers);
 
SELECT d.age, d.income_bracket 
FROM Customers_Demographics d 
WHERE d.tid IN (SELECT tid FROM active_customers);
 
SELECT a.last_login, a.purchase_count 
FROM Customers_Activity a 
WHERE a.tid IN (SELECT tid FROM active_customers);
 
COMMIT;
-- All three queries see database as of transaction start
 
 
-- ================================================================
-- VERSIONED RECONSTRUCTION
-- ================================================================
 
-- Add version/timestamp metadata to fragments
ALTER TABLE Customers_Contact ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
ALTER TABLE Customers_Demographics ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
ALTER TABLE Customers_Activity ADD COLUMN version_ts TIMESTAMP DEFAULT NOW();
 
-- Reconstruct as of a specific point in time
-- (Requires temporal tables or versioning extension)
 
-- PostgreSQL temporal tables approach
CREATE EXTENSION temporal_tables;
 
-- Create history tables for versioning
CREATE TABLE Customers_Contact_History (LIKE Customers_Contact);
ALTER TABLE Customers_Contact_History ADD COLUMN valid_from TIMESTAMP;
ALTER TABLE Customers_Contact_History ADD COLUMN valid_until TIMESTAMP;
 
-- Query as of specific timestamp
CREATE OR REPLACE FUNCTION reconstruct_customer_at(
    p_tid BIGINT,
    p_as_of TIMESTAMP
) RETURNS TABLE (
    name VARCHAR,
    email VARCHAR,
    age INTEGER,
    last_login TIMESTAMP
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        c.name, c.email, d.age, a.last_login
    FROM 
        (SELECT * FROM Customers_Contact_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) c
    JOIN
        (SELECT * FROM Customers_Demographics_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) d
        ON c.tid = d.tid
    JOIN
        (SELECT * FROM Customers_Activity_History 
         WHERE tid = p_tid 
           AND valid_from <= p_as_of 
           AND (valid_until IS NULL OR valid_until > p_as_of)) a
        ON c.tid = a.tid;
END;
$$ LANGUAGE plpgsql;
 
 
-- ================================================================
-- LAG DETECTION
-- ================================================================
 
-- Monitor reconstruction consistency by tracking fragment timestamps
 
CREATE TABLE fragment_metadata (
    fragment_name VARCHAR PRIMARY KEY,
    last_write_ts TIMESTAMP,
    replica_lag_ms INTEGER
);
 
-- Update metadata on each write (via triggers or application)
 
-- Before reconstruction, check for unacceptable lag
CREATE OR REPLACE FUNCTION check_reconstruction_consistency(
    max_lag_ms INTEGER DEFAULT 1000
) RETURNS BOOLEAN AS $$
DECLARE
    max_observed_lag INTEGER;
BEGIN
    SELECT MAX(replica_lag_ms) INTO max_observed_lag
    FROM fragment_metadata
    WHERE fragment_name IN ('Customers_Contact', 'Customers_Demographics', 'Customers_Activity');
    
    IF max_observed_lag > max_lag_ms THEN
        RAISE WARNING 'Reconstruction may be inconsistent. Max lag: % ms', max_observed_lag;
        RETURN FALSE;
    END IF;
    
    RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
 
-- Usage before reconstruction queries
DO $$
BEGIN
    IF NOT check_reconstruction_consistency(500) THEN
        RAISE EXCEPTION 'Fragments too inconsistent for reconstruction';
    END IF;
END $$;

Consistency vs. Availability Trade-off

Reconstruction Performance Patterns

Production systems employ various patterns to optimize reconstruction performance.

Pattern 1: Parallel Fragment Access

For horizontal reconstruction, access all fragments concurrently:

Fork:
  Thread 1: Query Fragment_NA
  Thread 2: Query Fragment_EU
  Thread 3: Query Fragment_APAC
Join:
  Union results as they complete

Latency = max(fragment latencies) not sum(fragment latencies)

Pattern 2: Pipelined Reconstruction

Begin processing results as they arrive instead of waiting for complete reconstruction:

1. Start Fragment_1 query
2. As rows arrive, begin filtering/projecting
3. Stream results to client
4. Fragment_2, _3 results processed similarly
5. No full materialization of reconstructed table

Reduces memory usage and time-to-first-result.

Pattern 3: Predicate Distribution

For queries with multiple predicates, distribute predicates to eliminate fragments early:

Query: WHERE region = 'US' AND income > 100000

Region predicate → eliminates non-NA fragments
Income predicate → applied only to NA fragment

No cross-fragment communication for eliminated fragments

Pattern 4: Batched Reconstruction

For bulk reconstruction (reports, exports), batch processing reduces per-record overhead:

1. Collect TIDs in batches of 10,000
2. Reconstruct batch via single join query
3. Process batch to output
4. Repeat for next batch

reconstruction_patterns.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import asyncio
import asyncpg
from typing import List, AsyncIterator
from dataclasses import dataclass
 
@dataclass
class FragmentConfig:
    name: str
    connection_string: str
    predicate: str
 
class DistributedReconstructor:
    """Parallel and pipelined reconstruction from distributed fragments."""
    
    def __init__(self, fragments: List[FragmentConfig]):
        self.fragments = fragments
        self.pools = {}
    
    async def initialize(self):
        """Create connection pools for each fragment site."""
        for frag in self.fragments:
            self.pools[frag.name] = await asyncpg.create_pool(
                frag.connection_string,
                min_size=2,
                max_size=10
            )
    
    async def close(self):
        """Close all connection pools."""
        for pool in self.pools.values():
            await pool.close()
    
    async def reconstruct_horizontal(
        self,
        query_predicates: dict,
        attributes: List[str]
    ) -> AsyncIterator[dict]:
        """
        Reconstruct from horizontal fragments with parallel access.
        Yields results as they arrive (pipelined).
        """
        # Build query for each fragment
        async def query_fragment(frag: FragmentConfig):
            # Check if fragment can be eliminated
            if not self._predicates_compatible(query_predicates, frag.predicate):
                return  # Fragment eliminated
            
            pool = self.pools[frag.name]
            async with pool.acquire() as conn:
                # Build and execute query
                sql = self._build_query(frag.name, attributes, query_predicates)
                async for record in conn.cursor(sql):
                    yield record
        
        # Execute all fragment queries in parallel
        tasks = [query_fragment(f) for f in self.fragments]
        
        # Merge results as they complete (pipelined)
        for task in asyncio.as_completed([
            asyncio.create_task(self._collect(t)) for t in tasks
        ]):
            records = await task
            for record in records:
                yield record
    
    async def reconstruct_vertical(
        self,
        tid_filter: str,
        fragment_attributes: dict  # {fragment_name: [attributes]}
    ) -> AsyncIterator[dict]:
        """
        Reconstruct from vertical fragments using parallel access and merge.
        """
        async def query_vertical_fragment(frag_name: str, attrs: List[str]):
            pool = self.pools[frag_name]
            async with pool.acquire() as conn:
                sql = f"""
                    SELECT tid, {', '.join(attrs)}
                    FROM {frag_name}
                    WHERE {tid_filter}
                """
                return await conn.fetch(sql)
        
        # Query all vertical fragments in parallel
        tasks = {
            frag: asyncio.create_task(
                query_vertical_fragment(frag, attrs)
            )
            for frag, attrs in fragment_attributes.items()
        }
        
        # Wait for all results
        results = {}
        for frag, task in tasks.items():
            results[frag] = {r['tid']: dict(r) for r in await task}
        
        # Merge on TID
        primary_frag = list(fragment_attributes.keys())[0]
        for tid, primary_record in results[primary_frag].items():
            merged = primary_record.copy()
            for frag in list(fragment_attributes.keys())[1:]:
                if tid in results[frag]:
                    merged.update(results[frag][tid])
            yield merged
    
    def _predicates_compatible(self, query_pred: dict, frag_pred: str) -> bool:
        """Check if query predicates are compatible with fragment definition."""
        # Simplified - real implementation needs SQL predicate analysis
        return True  # Placeholder
    
    def _build_query(self, table: str, attrs: List[str], preds: dict) -> str:
        """Build SQL query for fragment."""
        where_clause = ' AND '.join(f"{k} = ${i + 1}" for i, k in enumerate(preds))
        return f"SELECT {', '.join(attrs)} FROM {table} WHERE {where_clause}"
    
    async def _collect(self, async_gen) -> List:
        """Collect async generator results into list."""
        return [item async for item in async_gen]
 
 
# Usage example
async def main():
    fragments = [
        FragmentConfig("orders_na", "postgresql://db1/orders", "region IN ('US', 'CA')"),
        FragmentConfig("orders_eu", "postgresql://db2/orders", "region IN ('UK', 'DE')"),
        FragmentConfig("orders_apac", "postgresql://db3/orders", "region IN ('JP', 'AU')"),
    ]
    
    reconstructor = DistributedReconstructor(fragments)
    await reconstructor.initialize()
    
    try:
        # Parallel reconstruction with predicates
        async for order in reconstructor.reconstruct_horizontal(
            {"region": "US", "status": "pending"},
            ["order_id", "customer_id", "amount"]
        ):
            print(f"Order: {order}")
    finally:
        await reconstructor.close()
 
# asyncio.run(main())

Do: Performance Best Practices

•Parallelize fragment access where possible
•Stream results to reduce memory and latency
•Push predicates to fragments for early elimination
•Cache TID sets for repeated semi-join patterns
•Index TIDs in all vertical fragments

Don't: Common Mistakes

•Don't reconstruct full table when only subset needed
•Don't ignore network cost in optimization decisions
•Don't serialize what can be parallelized
•Don't materialize intermediate results unnecessarily
•Don't assume optimizer handles distributed joins optimally

Summary: Reconstruction Mastery

Reconstruction completes the fragmentation story—ensuring that distributed data can be seamlessly reunited when needed. Let's consolidate the key concepts:

Key Takeaways

•Formal Operators — Union reconstructs horizontal fragments; join on TID reconstructs vertical fragments. Hybrid fragmentation combines both operators appropriately.
•Query Localization — The goal is to AVOID full reconstruction. Push predicates, eliminate contradicting fragments, and project only needed attributes.
•Semi-Join Reduction — For distributed vertical joins, send key sets instead of full data. Reduces transfer by orders of magnitude for selective queries.
•Materialized Views — Pre-compute reconstruction for frequent patterns. Trade storage and freshness for query performance.
•Consistency Matters — Distributed fragments may be inconsistent. Choose appropriate consistency level; use snapshot isolation for strong guarantees.
•Performance Patterns — Parallelize fragment access, pipeline results, distribute predicates, and batch bulk operations for optimal performance.

Module Complete:

Module Complete: Fragmentation

5 / 5