Fragmentation - Learning Module

Loading content...

0/252

Fragmentation Design

From Theory to Production Reality

Understanding fragmentation types—horizontal, vertical, hybrid—is necessary but not sufficient. The real challenge is design: How do you determine the optimal fragmentation strategy for a specific system with specific requirements?

Fragmentation design sits at the intersection of database theory, systems engineering, and operational pragmatics. A theoretically optimal fragmentation that's impossible to maintain operationally isn't optimal at all. Conversely, an easy-to-maintain fragmentation that causes critical queries to time out isn't viable either.

This page presents rigorous methodologies for fragmentation design—from requirements gathering through validation, allocation, and evolution. You'll learn to make principled design decisions, validate correctness properties systematically, and build fragmentation schemas that serve both current needs and future growth.

What You Will Learn

By the end of this page, you will understand: (1) The fragmentation design lifecycle from requirements to production, (2) Cost models for evaluating fragmentation alternatives, (3) Systematic validation of completeness, disjointness, and reconstruction properties, (4) Allocation optimization techniques, (5) Schema evolution strategies for growing systems, and (6) Tools and automation for fragmentation management.

Requirements Gathering and Analysis

Fragmentation design begins long before choosing horizontal vs. vertical. It starts with understanding the complete operational context.

Workload Characterization:

Document the query workload comprehensively:

Query Inventory:
- All significant query patterns (parameterized)
- Read vs. write ratio
- Execution frequency (queries/second, queries/day)
- Latency requirements (P50, P95, P99)
- Source application/service
Data Profile:
- Current table sizes (rows, bytes)
- Growth projections (monthly, yearly)
- Attribute sizes and distributions
- Temporal patterns (hot/cold data)
Access Patterns:
- Point lookup vs. range scan vs. full scan
- Attribute access frequency
- Join patterns between tables
- Aggregation and grouping requirements

Constraints Identification:

Constraint Category	Examples	Impact on Design
Performance	P99 < 100ms for order lookup	Must support fragment elimination
Regulatory	GDPR data residency	Geographic horizontal fragmentation
Security	PII isolation	Attribute-level vertical fragmentation
Operational	Max 50 partitions/table	Limits fragmentation granularity
Infrastructure	3 datacenters, 20TB each	Bounds allocation options

requirements_template.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Fragmentation Requirements Document Template
 
system:
  name: "Order Management System"
  owner: "Commerce Platform Team"
  version: "2.1"
  
tables:
  - name: orders
    current_rows: 500_000_000
    current_size_gb: 850
    growth_rate_monthly_percent: 5
    attributes:
      - name: order_id
        type: uuid
        primary_key: true
        size_bytes: 16
      - name: customer_id
        type: bigint
        foreign_key: customers.customer_id
        predicate_frequency: 0.4  # Used in 40% of predicates
      - name: region
        type: varchar(20)
        predicate_frequency: 0.7
        cardinality: 50  # 50 distinct values
      - name: order_date
        type: timestamp
        predicate_frequency: 0.8
      - name: status
        type: enum
        predicate_frequency: 0.3
        cardinality: 6
      - name: total_amount
        type: decimal
        access_frequency: 0.6
      - name: shipping_address
        type: jsonb
        access_frequency: 0.2
        size_bytes_avg: 500
      - name: internal_notes
        type: text
        access_frequency: 0.02
        size_bytes_avg: 2000
 
queries:
  - name: "customer_order_history"
    pattern: |
      SELECT order_id, order_date, status, total_amount
      FROM orders
      WHERE customer_id = ? AND order_date >= ?
    frequency_per_hour: 50000
    latency_p99_ms: 100
    source: "customer-webapp"
    
  - name: "region_daily_summary"
    pattern: |
      SELECT region, COUNT(*), SUM(total_amount)
      FROM orders
      WHERE order_date >= ? AND order_date < ?
      GROUP BY region
    frequency_per_hour: 24
    latency_p99_ms: 5000
    source: "analytics-service"
    
  - name: "order_detail_lookup"
    pattern: |
      SELECT * FROM orders WHERE order_id = ?
    frequency_per_hour: 100000
    latency_p99_ms: 50
    source: "order-service"
 
constraints:
  regulatory:
    - type: data_residency
      description: "EU customer orders must be stored in EU datacenter"
      applies_to: region IN ('DE', 'FR', 'IT', 'ES', 'NL')
      
  performance:
    - metric: "order_detail_lookup P99"
      threshold_ms: 50
      priority: critical
      
  operational:
    - type: max_partitions
      value: 100
      reason: "PostgreSQL partition planning overhead"
    - type: max_fragment_size_gb
      value: 100
      reason: "Backup/restore time constraints"
 
infrastructure:
  datacenters:
    - name: us-east-1
      capacity_tb: 50
      regions_served: ['US', 'CA', 'MX']
    - name: eu-central-1
      capacity_tb: 30
      regions_served: ['DE', 'FR', 'IT', 'ES', 'NL', 'UK']
    - name: ap-southeast-1
      capacity_tb: 20
      regions_served: ['JP', 'CN', 'AU', 'IN', 'SG']

Involve Stakeholders Early

Requirements gathering must involve application developers, DBAs, security/compliance teams, and capacity planners. A fragmentation decision made without operations input often fails in production. A decision made without security review may violate compliance requirements. Build a cross-functional design document before any implementation.

Cost Model Development

Comparing fragmentation alternatives requires a cost model that quantifies the impact of each design choice on query performance, storage, and operations.

Query Cost Model:

For each query Q and fragmentation scheme F:

Cost(Q, F) = Σᵢ [access_cost(Fᵢ) + transfer_cost(Fᵢ)] × frequency(Q)

Where:

access_cost(Fᵢ) = I/O cost to read fragment i (based on fragment size and selectivity)
transfer_cost(Fᵢ) = Network cost if fragment is remote (based on result size and network latency)

Fragment Access Determination:

For each query, determine which fragments are accessed:

If query predicate contradicts fragment predicate → fragment NOT accessed
If query predicate matches fragment predicate → fragment accessed
If query predicate is orthogonal → fragment MAY be accessed (worst case: all)

Fragment Elimination Efficiency:

E(Q, F) = 1 - (fragments_accessed / total_fragments)

Higher elimination efficiency means better fragmentation for that query.

Total Cost:

Total_Cost(F) = Σ_Q [Cost(Q, F)] + Storage_Cost(F) + Maintenance_Cost(F)

Storage_Cost includes: fragment overhead, TID replication (vertical), index duplication Maintenance_Cost includes: partition creation, rebalancing, schema changes

fragmentation_cost_model.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
from dataclasses import dataclass
from typing import List, Dict, Set, Optional
import math
 
@dataclass
class Fragment:
    """Represents a single fragment in a fragmentation scheme."""
    id: str
    predicate: Dict[str, str]  # {attribute: condition}
    attributes: Set[str]  # For vertical fragmentation
    estimated_rows: int
    estimated_size_mb: float
    location: str  # datacenter
 
@dataclass
class Query:
    """Represents a query workload component."""
    id: str
    predicates: Dict[str, str]
    attributes_accessed: Set[str]
    frequency_per_hour: int
    source_location: str
    max_latency_ms: int
 
class FragmentationCostModel:
    """Evaluate fragmentation schemes against workload."""
    
    def __init__(self,
                 io_cost_per_mb: float = 0.1,  # ms per MB read
                 network_latency_ms: float = 20,  # inter-DC latency
                 network_cost_per_mb: float = 1.0,  # ms per MB transferred
                 tid_overhead_bytes: int = 8):  # TID size for vertical
        self.io_cost_per_mb = io_cost_per_mb
        self.network_latency_ms = network_latency_ms
        self.network_cost_per_mb = network_cost_per_mb
        self.tid_overhead = tid_overhead_bytes
    
    def predicate_matches(self, query_pred: Dict, frag_pred: Dict) -> str:
        """
        Determine relationship between query and fragment predicates.
        Returns: 'match', 'contradict', or 'overlap'
        """
        for attr, q_cond in query_pred.items():
            if attr in frag_pred:
                f_cond = frag_pred[attr]
                # Simplified logic - in reality, need full predicate analysis
                if q_cond == f_cond:
                    return 'match'
                elif self._conditions_exclusive(q_cond, f_cond):
                    return 'contradict'
        return 'overlap'
    
    def _conditions_exclusive(self, cond1: str, cond2: str) -> bool:
        """Check if two conditions are mutually exclusive."""
        # Simplified - real implementation needs SQL predicate analysis
        # e.g., "region = 'US'" and "region = 'EU'" are exclusive
        if '=' in cond1 and '=' in cond2:
            val1 = cond1.split('=')[1].strip()
            val2 = cond2.split('=')[1].strip()
            return val1 != val2
        return False
    
    def fragments_accessed(self, query: Query, fragments: List[Fragment]) -> List[Fragment]:
        """Determine which fragments a query must access."""
        accessed = []
        for frag in fragments:
            # Check predicate elimination
            relationship = self.predicate_matches(query.predicates, frag.predicate)
            if relationship == 'contradict':
                continue  # Fragment eliminated
            
            # Check attribute coverage (for vertical fragmentation)
            if query.attributes_accessed and frag.attributes:
                if not query.attributes_accessed.intersection(frag.attributes):
                    continue  # Fragment doesn't have needed attributes
            
            accessed.append(frag)
        return accessed
    
    def query_cost(self, query: Query, fragments: List[Fragment]) -> Dict:
        """Calculate cost for a single query against fragmentation scheme."""
        accessed = self.fragments_accessed(query, fragments)
        
        total_io_cost = 0
        total_network_cost = 0
        
        for frag in accessed:
            # I/O cost - assume we read 10% of fragment (selectivity estimate)
            selectivity = 0.1
            read_mb = frag.estimated_size_mb * selectivity
            total_io_cost += read_mb * self.io_cost_per_mb
            
            # Network cost if fragment is remote
            if frag.location != query.source_location:
                result_mb = read_mb * 0.1  # Assume 10% of read becomes result
                total_network_cost += (self.network_latency_ms + 
                                       result_mb * self.network_cost_per_mb)
        
        elimination_efficiency = 1 - (len(accessed) / len(fragments)) if fragments else 0
        
        return {
            'query_id': query.id,
            'fragments_accessed': len(accessed),
            'total_fragments': len(fragments),
            'elimination_efficiency': elimination_efficiency,
            'io_cost_ms': total_io_cost,
            'network_cost_ms': total_network_cost,
            'estimated_latency_ms': total_io_cost + total_network_cost,
            'meets_sla': (total_io_cost + total_network_cost) <= query.max_latency_ms,
            'hourly_cost': (total_io_cost + total_network_cost) * query.frequency_per_hour
        }
    
    def evaluate_scheme(self, 
                        queries: List[Query], 
                        fragments: List[Fragment]) -> Dict:
        """Evaluate a complete fragmentation scheme."""
        results = []
        total_hourly_cost = 0
        sla_violations = 0
        
        for query in queries:
            result = self.query_cost(query, fragments)
            results.append(result)
            total_hourly_cost += result['hourly_cost']
            if not result['meets_sla']:
                sla_violations += 1
        
        # Storage cost
        total_storage_mb = sum(f.estimated_size_mb for f in fragments)
        
        return {
            'query_results': results,
            'total_hourly_cost_ms': total_hourly_cost,
            'sla_violations': sla_violations,
            'total_fragments': len(fragments),
            'total_storage_mb': total_storage_mb,
            'avg_elimination_efficiency': (
                sum(r['elimination_efficiency'] for r in results) / len(results)
            )
        }
 
# Example: Compare two fragmentation schemes
model = FragmentationCostModel()
 
# Scheme 1: Fragment by region only
scheme1 = [
    Fragment("orders_na", {"region": "= 'NA'"}, set(), 200_000_000, 400_000, "us-east"),
    Fragment("orders_eu", {"region": "= 'EU'"}, set(), 150_000_000, 300_000, "eu-central"),
    Fragment("orders_apac", {"region": "= 'APAC'"}, set(), 100_000_000, 200_000, "ap-southeast"),
]
 
# Scheme 2: Fragment by region AND month
scheme2 = [
    Fragment("orders_na_2024_01", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 
             15_000_000, 30_000, "us-east"),
    # ... (many more fragments)
]
 
queries = [
    Query("q1", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 1000, "us-east", 100),
    Query("q2", {"region": "= 'EU'"}, set(), 500, "eu-central", 200),
]
 
result1 = model.evaluate_scheme(queries, scheme1)
print(f"Scheme 1 - Hourly cost: {result1['total_hourly_cost_ms']:.0f}ms, "
      f"SLA violations: {result1['sla_violations']}")

Model Calibration

Cost model parameters (I/O cost, network latency, selectivity estimates) must be calibrated against actual system measurements. Run representative queries, measure actual costs, and adjust model parameters. An uncalibrated model can lead to wrong design decisions. Validate model predictions against production benchmarks.

Correctness Validation

A fragmentation scheme must satisfy formal correctness properties. Validation is not optional—incorrect fragmentation causes data loss, corruption, or inconsistency.

Property 1: Completeness Validation

Every tuple from the original relation must appear in at least one fragment.

Horizontal Completeness: Predicates must be collectively exhaustive: p₁ ∨ p₂ ∨ ... ∨ pₙ ≡ TRUE

Validation Approach:

Symbolic analysis of predicates
Check for gaps in coverage
Verify catch-all predicate exists (e.g., ELSE clause)
Runtime validation: COUNT() on union equals COUNT() on original

Vertical Completeness: Every non-key attribute must appear in at least one fragment.

Validation Approach:

Enumerate attributes in each fragment
Verify union of attribute sets includes all original attributes
Tools: Schema comparison utilities

Property 2: Disjointness Validation

For horizontal fragmentation, no tuple should appear in multiple fragments.

Validation Approach:

Predicate analysis: pᵢ ∧ pⱼ ≡ FALSE for i ≠ j
Look for overlapping ranges or conflicting conditions
Runtime validation: SUM of fragment counts equals original count

fragmentation_validator.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
-- Fragmentation Correctness Validation Queries
 
-- ================================================================
-- HORIZONTAL FRAGMENTATION VALIDATION
-- ================================================================
 
-- Given: orders table fragmented by region into 4 partitions
 
-- 1. COMPLETENESS: All rows accounted for
WITH fragment_counts AS (
    SELECT 'orders_na' AS fragment, COUNT(*) AS cnt FROM orders_na
    UNION ALL
    SELECT 'orders_eu', COUNT(*) FROM orders_eu
    UNION ALL
    SELECT 'orders_apac', COUNT(*) FROM orders_apac
    UNION ALL
    SELECT 'orders_other', COUNT(*) FROM orders_other
),
original_count AS (
    SELECT COUNT(*) AS original_cnt FROM orders_original_backup
)
SELECT 
    (SELECT SUM(cnt) FROM fragment_counts) AS fragment_total,
    (SELECT original_cnt FROM original_count) AS original_total,
    CASE 
        WHEN (SELECT SUM(cnt) FROM fragment_counts) = 
             (SELECT original_cnt FROM original_count)
        THEN 'PASS: Completeness validated'
        ELSE 'FAIL: Row count mismatch - possible data loss!'
    END AS completeness_check;
 
-- 2. DISJOINTNESS: No overlapping predicates
-- Check for rows that match multiple fragment predicates
SELECT 
    order_id,
    CASE WHEN region IN ('US', 'CA', 'MX') THEN 1 ELSE 0 END AS matches_na,
    CASE WHEN region IN ('UK', 'DE', 'FR', 'IT', 'ES') THEN 1 ELSE 0 END AS matches_eu,
    CASE WHEN region IN ('JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_apac,
    CASE WHEN region NOT IN ('US', 'CA', 'MX', 'UK', 'DE', 'FR', 'IT', 'ES', 
                             'JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_other
FROM orders_original_backup
HAVING (matches_na + matches_eu + matches_apac + matches_other) != 1
-- Should return empty if predicates are properly disjoint
 
-- 3. RECONSTRUCTION: Union reproduces original
CREATE TEMP TABLE orders_reconstructed AS
SELECT * FROM orders_na
UNION ALL SELECT * FROM orders_eu
UNION ALL SELECT * FROM orders_apac
UNION ALL SELECT * FROM orders_other;
 
-- Verify schema match
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'orders_reconstructed'
EXCEPT
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'orders_original_backup';
-- Should return empty if schemas match
 
-- Verify data match (sample-based for large tables)
SELECT 'Mismatch' AS status
FROM orders_original_backup o
LEFT JOIN orders_reconstructed r USING (order_id)
WHERE r.order_id IS NULL
   OR o.customer_id != r.customer_id
   OR o.total_amount != r.total_amount
LIMIT 1;
-- Should return empty if data matches
 
 
-- ================================================================
-- VERTICAL FRAGMENTATION VALIDATION
-- ================================================================
 
-- Given: employees table split into 3 vertical fragments
 
-- 1. ATTRIBUTE COVERAGE: All attributes present
WITH original_attrs AS (
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_original_backup'
),
fragment_attrs AS (
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_personal'
    UNION
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_employment'
    UNION
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_performance'
)
SELECT o.column_name AS missing_attribute
FROM original_attrs o
LEFT JOIN fragment_attrs f ON o.column_name = f.column_name
WHERE f.column_name IS NULL;
-- Should return empty if all attributes covered
 
-- 2. TID CONSISTENCY: All fragments have same TIDs
SELECT 'TID mismatch' AS status, 'personal vs employment' AS fragments
FROM employees_personal p
FULL OUTER JOIN employees_employment e ON p.tid = e.tid
WHERE p.tid IS NULL OR e.tid IS NULL
LIMIT 1
 
UNION ALL
 
SELECT 'TID mismatch', 'employment vs performance'
FROM employees_employment e
FULL OUTER JOIN employees_performance pf ON e.tid = pf.tid
WHERE e.tid IS NULL OR pf.tid IS NULL
LIMIT 1;
-- Should return empty if TIDs consistent
 
-- 3. LOSSLESS JOIN: Reconstruction produces correct tuples
CREATE TEMP TABLE employees_reconstructed AS
SELECT 
    p.tid, p.emp_id, p.name, p.address, p.phone,
    e.department, e.salary, e.hire_date,
    pf.review_score, pf.last_review
FROM employees_personal p
JOIN employees_employment e ON p.tid = e.tid
JOIN employees_performance pf ON p.tid = pf.tid;
 
SELECT COUNT(*) AS original, 
       (SELECT COUNT(*) FROM employees_reconstructed) AS reconstructed,
       CASE WHEN COUNT(*) = (SELECT COUNT(*) FROM employees_reconstructed)
            THEN 'PASS' ELSE 'FAIL' END AS lossless_check
FROM employees_original_backup;

Validation Best Practices

•Validate Before Production — Run full validation suite in staging with production-like data before deploying fragmentation changes.
•Continuous Monitoring — Schedule periodic validation jobs (daily/weekly) to detect drift from correctness properties.
•Validate After Schema Changes — Any ALTER TABLE, new partition, or migration must trigger re-validation.
•Sample-Based Validation — For very large tables, use statistical sampling with confidence intervals rather than full scans.
•Preserve Original for Recovery — Keep a validated backup of the original table until fragmentation is proven correct in production.

Allocation Optimization

After defining fragments, allocation assigns fragments to physical database nodes. Optimal allocation minimizes query cost while respecting capacity and regulatory constraints.

The Allocation Problem:

Inputs:

Fragments F = {F₁, F₂, ..., Fₙ}
Sites S = {S₁, S₂, ..., Sₘ}
Query workload Q with frequencies and sources
Site capacities and costs
Replication factor r (1 = no replication)

Output:

Allocation A: F → P(S) where |A(Fᵢ)| = r

Objective: Minimize Σ_Q frequency(Q) × [access_cost(Q, A) + transfer_cost(Q, A)]

Subject to:

Capacity: Σ_{Fᵢ allocated to Sⱼ} size(Fᵢ) ≤ capacity(Sⱼ)
Regulatory: certain fragments must be at specific sites
Availability: replicas at different failure domains

Allocation Strategies:

Strategy	Description	Best When
Locality-based	Fragments near query sources	Clear geographic access patterns
Load-balanced	Distribute evenly by size	Uniform access, no locality benefit
Affinity-based	Co-locate frequently joined fragments	Join-heavy workloads
Regulatory	Fragments at mandated locations	Compliance requirements
Hybrid	Combine above strategies	Complex real-world scenarios

allocation_optimizer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import List, Dict, Set, Tuple
from ortools.linear_solver import pywraplp
 
@dataclass
class Fragment:
    id: str
    size_gb: float
    required_sites: Set[str] = None  # Regulatory constraints
 
@dataclass
class Site:
    id: str
    capacity_gb: float
    cost_per_gb: float
 
@dataclass 
class QueryAccess:
    fragment_id: str
    source_site: str
    frequency_per_hour: int
 
class AllocationOptimizer:
    """Optimize fragment allocation using Mixed Integer Programming."""
    
    def __init__(self, 
                 fragments: List[Fragment],
                 sites: List[Site],
                 queries: List[QueryAccess],
                 replication_factor: int = 1,
                 remote_access_cost: float = 10.0):  # Cost multiplier for remote access
        self.fragments = {f.id: f for f in fragments}
        self.sites = {s.id: s for s in sites}
        self.queries = queries
        self.replication_factor = replication_factor
        self.remote_cost = remote_access_cost
    
    def optimize(self) -> Dict[str, List[str]]:
        """
        Solve the allocation problem.
        Returns: {fragment_id: [list of allocated sites]}
        """
        solver = pywraplp.Solver.CreateSolver('SCIP')
        if not solver:
            raise Exception("Solver not available")
        
        # Decision variables: x[f][s] = 1 if fragment f is allocated to site s
        x = {}
        for f_id in self.fragments:
            x[f_id] = {}
            for s_id in self.sites:
                x[f_id][s_id] = solver.IntVar(0, 1, f'x_{f_id}_{s_id}')
        
        # Constraint: Each fragment allocated to exactly r sites
        for f_id in self.fragments:
            solver.Add(
                sum(x[f_id][s_id] for s_id in self.sites) == self.replication_factor
            )
        
        # Constraint: Site capacity
        for s_id, site in self.sites.items():
            solver.Add(
                sum(x[f_id][s_id] * self.fragments[f_id].size_gb 
                    for f_id in self.fragments) <= site.capacity_gb
            )
        
        # Constraint: Regulatory requirements
        for f_id, fragment in self.fragments.items():
            if fragment.required_sites:
                for s_id in fragment.required_sites:
                    solver.Add(x[f_id][s_id] == 1)
        
        # Objective: Minimize weighted access cost
        # Cost = sum of (frequency * access_cost) for all queries
        # access_cost = 1 if local, remote_cost if remote
        
        # We need auxiliary variables for "is query local"
        objective_terms = []
        
        for query in self.queries:
            f_id = query.fragment_id
            source = query.source_site
            freq = query.frequency_per_hour
            
            # If fragment is at source site, cost is 1, else remote_cost
            # local_cost * x[f][source] + remote_cost * (1 - x[f][source])
            # = remote_cost - (remote_cost - 1) * x[f][source]
            
            if source in self.sites:
                local_benefit = (self.remote_cost - 1) * x[f_id][source]
                objective_terms.append(freq * self.remote_cost)
                objective_terms.append(-freq * local_benefit)
        
        solver.Minimize(sum(objective_terms))
        
        # Solve
        status = solver.Solve()
        
        if status != pywraplp.Solver.OPTIMAL:
            raise Exception(f"No optimal solution found. Status: {status}")
        
        # Extract solution
        allocation = {}
        for f_id in self.fragments:
            allocation[f_id] = [
                s_id for s_id in self.sites 
                if x[f_id][s_id].solution_value() > 0.5
            ]
        
        return allocation
 
# Example usage
fragments = [
    Fragment("orders_na", 400, required_sites={"us-east"}),  # Must be in US
    Fragment("orders_eu", 300, required_sites={"eu-central"}),  # Must be in EU
    Fragment("orders_apac", 200),
]
 
sites = [
    Site("us-east", 600, 0.10),
    Site("eu-central", 400, 0.12),
    Site("ap-southeast", 300, 0.08),
]
 
queries = [
    QueryAccess("orders_na", "us-east", 10000),
    QueryAccess("orders_eu", "eu-central", 8000),
    QueryAccess("orders_apac", "ap-southeast", 5000),
    QueryAccess("orders_na", "eu-central", 500),  # Cross-region access
]
 
optimizer = AllocationOptimizer(fragments, sites, queries, replication_factor=1)
allocation = optimizer.optimize()
 
print("Optimal Allocation:")
for frag, sites in allocation.items():
    print(f"  {frag} -> {sites}")

Practical Allocation Heuristics

For simple scenarios, exact optimization may be overkill. Common heuristics: (1) Allocate fragments to sites serving their data's region, (2) Replicate hot fragments to all sites with significant access, (3) Load-balance cold fragments across sites with spare capacity, (4) Co-locate fragments with foreign key relationships. Use optimization for complex multi-objective scenarios.

Fragmentation Schema Evolution

Fragmentation schemas must evolve as data grows, workloads change, and requirements shift. Planning for evolution from day one prevents painful migrations later.

Common Evolution Scenarios:

Fragment Splitting:
- A fragment becomes too large or hot
- Split into multiple smaller fragments
- Update routing/predicates accordingly
Fragment Merging:
- Multiple fragments are underutilized
- Consolidate to reduce management overhead
- May conflict with original design rationale
Predicate Change:
- Business rules change (new regions, new categories)
- Adjust predicates without full re-fragmentation
- Must preserve completeness/disjointness
Vertical Restructuring:
- Access patterns evolve—different attributes needed together
- Restructure vertical fragments
- Complex: requires TID management changes
Replication Changes:
- Increase replication for availability
- Reduce replication to save storage
- Add/remove replica sites

Evolution Best Practices:

Design for Splitting: Choose predicate ranges that can be subdivided (dates, hash ranges)
Monitor Continuously: Track fragment size, access patterns, hotspots
Automate Where Possible: Use database-native partition management
Test Migrations Thoroughly: Evolution in production requires careful rollout
Document Decisions: Record why fragments were designed as they are

fragmentation_evolution.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
-- Fragmentation Evolution Examples
 
-- ================================================================
-- SCENARIO 1: Splitting a Large Partition
-- ================================================================
 
-- Current: Single partition for 2024 is getting too large
-- orders_2024: 500GB, 1 billion rows
 
-- Step 1: Create new quarterly partitions
CREATE TABLE orders_2024_q3 PARTITION OF orders
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
CREATE TABLE orders_2024_q4 PARTITION OF orders
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
 
-- Step 2: Detach the old yearly partition
ALTER TABLE orders DETACH PARTITION orders_2024;
 
-- Step 3: Migrate data to new partitions (can be done in batches)
INSERT INTO orders
SELECT * FROM orders_2024
WHERE order_date >= '2024-07-01' AND order_date < '2024-10-01';
 
INSERT INTO orders  
SELECT * FROM orders_2024
WHERE order_date >= '2024-10-01' AND order_date < '2025-01-01';
 
-- Step 4: Recreate original partition with smaller range
CREATE TABLE orders_2024_q1_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-07-01');
 
INSERT INTO orders
SELECT * FROM orders_2024
WHERE order_date >= '2024-01-01' AND order_date < '2024-07-01';
 
-- Step 5: Drop old partition after validation
-- VALIDATE FIRST: Count rows, verify completeness
DROP TABLE orders_2024;
 
 
-- ================================================================
-- SCENARIO 2: Adding New Region
-- ================================================================
 
-- Current: NA, EU, APAC regions
-- New: LATAM region being launched
 
-- Step 1: Create new partition
CREATE TABLE orders_latam PARTITION OF orders_by_region
    FOR VALUES IN ('BR', 'MX', 'AR', 'CL', 'CO');
 
-- But wait! MX was in NA partition - need to migrate
 
-- Step 2: Handle overlap - move MX data
BEGIN;
 
-- Lock to prevent new inserts during migration
LOCK TABLE orders_by_region IN EXCLUSIVE MODE;
 
-- Create temp table with MX data
CREATE TEMP TABLE orders_mx AS
SELECT * FROM orders_na WHERE region = 'MX';
 
-- Delete from old partition
DELETE FROM orders_na WHERE region = 'MX';
 
-- Modify NA partition to exclude MX (requires detach/recreate)
ALTER TABLE orders_by_region DETACH PARTITION orders_na;
DROP TABLE orders_na;
 
CREATE TABLE orders_na PARTITION OF orders_by_region
    FOR VALUES IN ('US', 'CA');  -- MX removed
 
-- Re-insert non-MX data
-- (In practice, use pg_dump/restore or partition swap)
 
-- Insert MX data into LATAM
INSERT INTO orders_latam SELECT * FROM orders_mx;
 
COMMIT;
 
 
-- ================================================================
-- SCENARIO 3: Automated Partition Maintenance
-- ================================================================
 
-- Create function to auto-create future time partitions
CREATE OR REPLACE FUNCTION create_time_partition(
    parent_table TEXT,
    partition_prefix TEXT,
    start_date DATE,
    end_date DATE
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
BEGIN
    partition_name := partition_prefix || to_char(start_date, 'YYYY_MM');
    
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
         FOR VALUES FROM (%L) TO (%L)',
        partition_name,
        parent_table,
        start_date,
        end_date
    );
    
    RAISE NOTICE 'Created partition: %', partition_name;
END;
$$ LANGUAGE plpgsql;
 
-- Scheduled job to create partitions 3 months ahead
DO $$
DECLARE
    future_month DATE;
BEGIN
    FOR i IN 1..3 LOOP
        future_month := date_trunc('month', CURRENT_DATE + (i || ' month')::interval);
        PERFORM create_time_partition(
            'orders',
            'orders_',
            future_month,
            future_month + '1 month'::interval
        );
    END LOOP;
END $$;

Evolution Risks

Fragmentation evolution in production carries risks: data loss during migration, performance degradation during transition, constraint violations if not careful. Always: (1) Test in staging with production-scale data, (2) Have rollback plan, (3) Schedule during low-traffic windows, (4) Monitor closely during and after migration, (5) Validate correctness properties post-migration.

Tools and Automation

Modern database systems provide native partitioning features and management tools. Understanding the landscape helps you choose appropriate tooling.

Database Native Partitioning:

Database	Partitioning Type	Key Features
PostgreSQL	Declarative (10+)	Range, list, hash; partition pruning; default partitions
MySQL	Native	Range, list, hash, key; subpartitioning; partition-level operations
Oracle	Advanced	Range, list, hash, composite; interval partitioning; online operations
SQL Server	Table partitioning	Range; partition switching; sliding window patterns
Cloud (Aurora, Spanner, etc.)	Managed	Auto-sharding, invisible partitioning

Distributed Database Systems:

NewSQL and distributed databases handle fragmentation automatically:

CockroachDB: Automatic range-based sharding with rebalancing
TiDB: Range-based regions, automatic splitting/merging
YugabyteDB: Hash and range sharding with tablet splitting
Google Spanner: Automatic sharding based on primary key ranges

These systems abstract fragmentation from application developers but allow hints/configuration for optimization.

Monitoring and Management Tools:

Partition Statistics: Track size, row count, access frequency per partition
Query Analysis: Identify queries that don't achieve partition pruning
Hotspot Detection: Alert when partitions become disproportionately loaded
Automated Maintenance: Scripts for partition creation, archival, deletion

fragmentation_monitoring.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
-- Fragmentation Monitoring Queries for PostgreSQL
 
-- ================================================================
-- Partition Size and Row Distribution
-- ================================================================
SELECT 
    parent.relname AS parent_table,
    child.relname AS partition_name,
    pg_size_pretty(pg_relation_size(child.oid)) AS partition_size,
    pg_stat_user_tables.n_live_tup AS row_count,
    pg_stat_user_tables.seq_scan AS sequential_scans,
    pg_stat_user_tables.idx_scan AS index_scans,
    pg_stat_user_tables.n_tup_ins AS inserts,
    pg_stat_user_tables.n_tup_upd AS updates,
    pg_stat_user_tables.n_tup_del AS deletes,
    pg_stat_user_tables.last_vacuum,
    pg_stat_user_tables.last_analyze
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_stat_user_tables ON child.relname = pg_stat_user_tables.relname
WHERE parent.relname = 'orders'
ORDER BY pg_relation_size(child.oid) DESC;
 
 
-- ================================================================
-- Partition Pruning Analysis
-- ================================================================
 
-- Check if partition pruning is happening for a query
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM orders 
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
 
-- Look for "Subplans Removed" in output - indicates pruning
-- If all partitions appear in plan, pruning is NOT working
 
 
-- ================================================================
-- Identify Queries Missing Partition Pruning
-- ================================================================
 
-- From pg_stat_statements, find queries on partitioned tables
-- with high execution counts that might benefit from pruning
SELECT 
    query,
    calls,
    mean_exec_time,
    rows,
    shared_blks_read + shared_blks_hit AS total_blocks
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
  AND mean_exec_time > 100  -- ms
ORDER BY calls * mean_exec_time DESC
LIMIT 20;
 
 
-- ================================================================
-- Partition Skew Detection
-- ================================================================
WITH partition_sizes AS (
    SELECT 
        child.relname AS partition_name,
        pg_relation_size(child.oid) AS size_bytes
    FROM pg_inherits
    JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child ON pg_inherits.inhrelid = child.oid
    WHERE parent.relname = 'orders'
),
stats AS (
    SELECT 
        AVG(size_bytes) AS avg_size,
        STDDEV(size_bytes) AS stddev_size
    FROM partition_sizes
)
SELECT 
    partition_name,
    pg_size_pretty(size_bytes) AS size,
    ROUND((size_bytes - avg_size) / NULLIF(stddev_size, 0), 2) AS z_score,
    CASE 
        WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) > 2 
        THEN '⚠️ LARGE OUTLIER'
        WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) < -2 
        THEN '⚠️ SMALL OUTLIER'
        ELSE '✓ Normal'
    END AS status
FROM partition_sizes, stats
ORDER BY size_bytes DESC;
 
 
-- ================================================================
-- Automated Partition Creation Check
-- ================================================================
 
-- Find partitions that might need to be created soon
WITH latest_partition AS (
    SELECT MAX(child.relname) AS latest
    FROM pg_inherits
    JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child ON pg_inherits.inhrelid = child.oid
    WHERE parent.relname = 'orders'
      AND child.relname ~ 'orders_\d{4}_\d{2}'  -- Pattern: orders_YYYY_MM
)
SELECT 
    latest,
    CASE 
        WHEN latest < 'orders_' || to_char(CURRENT_DATE + '2 month', 'YYYY_MM')
        THEN '⚠️ Need to create future partitions!'
        ELSE '✓ Future partitions exist'
    END AS status
FROM latest_partition;

Automation Recommendations

•Scheduled Partition Creation — For time-based partitioning, create future partitions automatically (cron job, pg_cron, etc.)
•Alerting on Size Thresholds — Alert when partitions exceed size limits, triggering review for splitting
•Periodic Validation Jobs — Weekly runs of completeness/disjointness checks
•Query Log Analysis — Identify queries not achieving pruning; feed back to development teams
•Automated Statistics — Ensure ANALYZE runs regularly on all partitions for query optimizer accuracy

Summary: Fragmentation Design Mastery

Fragmentation design transforms theoretical knowledge into production-ready systems. Let's consolidate the key concepts:

Key Takeaways

•Requirements First — Comprehensive workload analysis, constraint identification, and stakeholder involvement precede any design decisions.
•Cost Models — Quantify fragmentation alternatives using calibrated cost models. Compare query costs, storage overhead, and maintenance burden.
•Correctness Validation — Systematically verify completeness, disjointness, and reconstruction. Validation is not optional—it prevents data loss.
•Allocation Optimization — Assignment of fragments to sites balances query locality, capacity constraints, and regulatory requirements.
•Schema Evolution — Design for change from day one. Monitor continuously, automate maintenance, and plan migration procedures.
•Tooling and Automation — Leverage native database features and build monitoring/maintenance automation to manage complexity.

What's Next:

With fragments distributed across sites, queries must access remote data. The final page in this module explores Reconstruction—the algorithms and optimization techniques for reassembling data from distributed fragments while minimizing network overhead and maximizing query performance.

Page Complete

You now understand fragmentation design at a practical, production-ready level—from requirements analysis through validation, allocation, and evolution. This knowledge enables you to design, implement, and operate fragmented database systems that meet performance, compliance, and operational requirements.

Fragmentation Design

From Theory to Production Reality

What You Will Learn

Requirements Gathering and Analysis

Fragmentation design begins long before choosing horizontal vs. vertical. It starts with understanding the complete operational context.

Workload Characterization:

Document the query workload comprehensively:

Query Inventory:
- All significant query patterns (parameterized)
- Read vs. write ratio
- Execution frequency (queries/second, queries/day)
- Latency requirements (P50, P95, P99)
- Source application/service
Data Profile:
- Current table sizes (rows, bytes)
- Growth projections (monthly, yearly)
- Attribute sizes and distributions
- Temporal patterns (hot/cold data)
Access Patterns:
- Point lookup vs. range scan vs. full scan
- Attribute access frequency
- Join patterns between tables
- Aggregation and grouping requirements

Constraints Identification:

Constraint Category	Examples	Impact on Design
Performance	P99 < 100ms for order lookup	Must support fragment elimination
Regulatory	GDPR data residency	Geographic horizontal fragmentation
Security	PII isolation	Attribute-level vertical fragmentation
Operational	Max 50 partitions/table	Limits fragmentation granularity
Infrastructure	3 datacenters, 20TB each	Bounds allocation options

requirements_template.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Fragmentation Requirements Document Template
 
system:
  name: "Order Management System"
  owner: "Commerce Platform Team"
  version: "2.1"
  
tables:
  - name: orders
    current_rows: 500_000_000
    current_size_gb: 850
    growth_rate_monthly_percent: 5
    attributes:
      - name: order_id
        type: uuid
        primary_key: true
        size_bytes: 16
      - name: customer_id
        type: bigint
        foreign_key: customers.customer_id
        predicate_frequency: 0.4  # Used in 40% of predicates
      - name: region
        type: varchar(20)
        predicate_frequency: 0.7
        cardinality: 50  # 50 distinct values
      - name: order_date
        type: timestamp
        predicate_frequency: 0.8
      - name: status
        type: enum
        predicate_frequency: 0.3
        cardinality: 6
      - name: total_amount
        type: decimal
        access_frequency: 0.6
      - name: shipping_address
        type: jsonb
        access_frequency: 0.2
        size_bytes_avg: 500
      - name: internal_notes
        type: text
        access_frequency: 0.02
        size_bytes_avg: 2000
 
queries:
  - name: "customer_order_history"
    pattern: |
      SELECT order_id, order_date, status, total_amount
      FROM orders
      WHERE customer_id = ? AND order_date >= ?
    frequency_per_hour: 50000
    latency_p99_ms: 100
    source: "customer-webapp"
    
  - name: "region_daily_summary"
    pattern: |
      SELECT region, COUNT(*), SUM(total_amount)
      FROM orders
      WHERE order_date >= ? AND order_date < ?
      GROUP BY region
    frequency_per_hour: 24
    latency_p99_ms: 5000
    source: "analytics-service"
    
  - name: "order_detail_lookup"
    pattern: |
      SELECT * FROM orders WHERE order_id = ?
    frequency_per_hour: 100000
    latency_p99_ms: 50
    source: "order-service"
 
constraints:
  regulatory:
    - type: data_residency
      description: "EU customer orders must be stored in EU datacenter"
      applies_to: region IN ('DE', 'FR', 'IT', 'ES', 'NL')
      
  performance:
    - metric: "order_detail_lookup P99"
      threshold_ms: 50
      priority: critical
      
  operational:
    - type: max_partitions
      value: 100
      reason: "PostgreSQL partition planning overhead"
    - type: max_fragment_size_gb
      value: 100
      reason: "Backup/restore time constraints"
 
infrastructure:
  datacenters:
    - name: us-east-1
      capacity_tb: 50
      regions_served: ['US', 'CA', 'MX']
    - name: eu-central-1
      capacity_tb: 30
      regions_served: ['DE', 'FR', 'IT', 'ES', 'NL', 'UK']
    - name: ap-southeast-1
      capacity_tb: 20
      regions_served: ['JP', 'CN', 'AU', 'IN', 'SG']

Involve Stakeholders Early

Cost Model Development

Comparing fragmentation alternatives requires a cost model that quantifies the impact of each design choice on query performance, storage, and operations.

Query Cost Model:

For each query Q and fragmentation scheme F:

Cost(Q, F) = Σᵢ [access_cost(Fᵢ) + transfer_cost(Fᵢ)] × frequency(Q)

Where:

access_cost(Fᵢ) = I/O cost to read fragment i (based on fragment size and selectivity)
transfer_cost(Fᵢ) = Network cost if fragment is remote (based on result size and network latency)

Fragment Access Determination:

For each query, determine which fragments are accessed:

If query predicate contradicts fragment predicate → fragment NOT accessed
If query predicate matches fragment predicate → fragment accessed
If query predicate is orthogonal → fragment MAY be accessed (worst case: all)

Fragment Elimination Efficiency:

E(Q, F) = 1 - (fragments_accessed / total_fragments)

Higher elimination efficiency means better fragmentation for that query.

Total Cost:

Total_Cost(F) = Σ_Q [Cost(Q, F)] + Storage_Cost(F) + Maintenance_Cost(F)

Storage_Cost includes: fragment overhead, TID replication (vertical), index duplication Maintenance_Cost includes: partition creation, rebalancing, schema changes

fragmentation_cost_model.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
from dataclasses import dataclass
from typing import List, Dict, Set, Optional
import math
 
@dataclass
class Fragment:
    """Represents a single fragment in a fragmentation scheme."""
    id: str
    predicate: Dict[str, str]  # {attribute: condition}
    attributes: Set[str]  # For vertical fragmentation
    estimated_rows: int
    estimated_size_mb: float
    location: str  # datacenter
 
@dataclass
class Query:
    """Represents a query workload component."""
    id: str
    predicates: Dict[str, str]
    attributes_accessed: Set[str]
    frequency_per_hour: int
    source_location: str
    max_latency_ms: int
 
class FragmentationCostModel:
    """Evaluate fragmentation schemes against workload."""
    
    def __init__(self,
                 io_cost_per_mb: float = 0.1,  # ms per MB read
                 network_latency_ms: float = 20,  # inter-DC latency
                 network_cost_per_mb: float = 1.0,  # ms per MB transferred
                 tid_overhead_bytes: int = 8):  # TID size for vertical
        self.io_cost_per_mb = io_cost_per_mb
        self.network_latency_ms = network_latency_ms
        self.network_cost_per_mb = network_cost_per_mb
        self.tid_overhead = tid_overhead_bytes
    
    def predicate_matches(self, query_pred: Dict, frag_pred: Dict) -> str:
        """
        Determine relationship between query and fragment predicates.
        Returns: 'match', 'contradict', or 'overlap'
        """
        for attr, q_cond in query_pred.items():
            if attr in frag_pred:
                f_cond = frag_pred[attr]
                # Simplified logic - in reality, need full predicate analysis
                if q_cond == f_cond:
                    return 'match'
                elif self._conditions_exclusive(q_cond, f_cond):
                    return 'contradict'
        return 'overlap'
    
    def _conditions_exclusive(self, cond1: str, cond2: str) -> bool:
        """Check if two conditions are mutually exclusive."""
        # Simplified - real implementation needs SQL predicate analysis
        # e.g., "region = 'US'" and "region = 'EU'" are exclusive
        if '=' in cond1 and '=' in cond2:
            val1 = cond1.split('=')[1].strip()
            val2 = cond2.split('=')[1].strip()
            return val1 != val2
        return False
    
    def fragments_accessed(self, query: Query, fragments: List[Fragment]) -> List[Fragment]:
        """Determine which fragments a query must access."""
        accessed = []
        for frag in fragments:
            # Check predicate elimination
            relationship = self.predicate_matches(query.predicates, frag.predicate)
            if relationship == 'contradict':
                continue  # Fragment eliminated
            
            # Check attribute coverage (for vertical fragmentation)
            if query.attributes_accessed and frag.attributes:
                if not query.attributes_accessed.intersection(frag.attributes):
                    continue  # Fragment doesn't have needed attributes
            
            accessed.append(frag)
        return accessed
    
    def query_cost(self, query: Query, fragments: List[Fragment]) -> Dict:
        """Calculate cost for a single query against fragmentation scheme."""
        accessed = self.fragments_accessed(query, fragments)
        
        total_io_cost = 0
        total_network_cost = 0
        
        for frag in accessed:
            # I/O cost - assume we read 10% of fragment (selectivity estimate)
            selectivity = 0.1
            read_mb = frag.estimated_size_mb * selectivity
            total_io_cost += read_mb * self.io_cost_per_mb
            
            # Network cost if fragment is remote
            if frag.location != query.source_location:
                result_mb = read_mb * 0.1  # Assume 10% of read becomes result
                total_network_cost += (self.network_latency_ms + 
                                       result_mb * self.network_cost_per_mb)
        
        elimination_efficiency = 1 - (len(accessed) / len(fragments)) if fragments else 0
        
        return {
            'query_id': query.id,
            'fragments_accessed': len(accessed),
            'total_fragments': len(fragments),
            'elimination_efficiency': elimination_efficiency,
            'io_cost_ms': total_io_cost,
            'network_cost_ms': total_network_cost,
            'estimated_latency_ms': total_io_cost + total_network_cost,
            'meets_sla': (total_io_cost + total_network_cost) <= query.max_latency_ms,
            'hourly_cost': (total_io_cost + total_network_cost) * query.frequency_per_hour
        }
    
    def evaluate_scheme(self, 
                        queries: List[Query], 
                        fragments: List[Fragment]) -> Dict:
        """Evaluate a complete fragmentation scheme."""
        results = []
        total_hourly_cost = 0
        sla_violations = 0
        
        for query in queries:
            result = self.query_cost(query, fragments)
            results.append(result)
            total_hourly_cost += result['hourly_cost']
            if not result['meets_sla']:
                sla_violations += 1
        
        # Storage cost
        total_storage_mb = sum(f.estimated_size_mb for f in fragments)
        
        return {
            'query_results': results,
            'total_hourly_cost_ms': total_hourly_cost,
            'sla_violations': sla_violations,
            'total_fragments': len(fragments),
            'total_storage_mb': total_storage_mb,
            'avg_elimination_efficiency': (
                sum(r['elimination_efficiency'] for r in results) / len(results)
            )
        }
 
# Example: Compare two fragmentation schemes
model = FragmentationCostModel()
 
# Scheme 1: Fragment by region only
scheme1 = [
    Fragment("orders_na", {"region": "= 'NA'"}, set(), 200_000_000, 400_000, "us-east"),
    Fragment("orders_eu", {"region": "= 'EU'"}, set(), 150_000_000, 300_000, "eu-central"),
    Fragment("orders_apac", {"region": "= 'APAC'"}, set(), 100_000_000, 200_000, "ap-southeast"),
]
 
# Scheme 2: Fragment by region AND month
scheme2 = [
    Fragment("orders_na_2024_01", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 
             15_000_000, 30_000, "us-east"),
    # ... (many more fragments)
]
 
queries = [
    Query("q1", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 1000, "us-east", 100),
    Query("q2", {"region": "= 'EU'"}, set(), 500, "eu-central", 200),
]
 
result1 = model.evaluate_scheme(queries, scheme1)
print(f"Scheme 1 - Hourly cost: {result1['total_hourly_cost_ms']:.0f}ms, "
      f"SLA violations: {result1['sla_violations']}")

Model Calibration

Correctness Validation

A fragmentation scheme must satisfy formal correctness properties. Validation is not optional—incorrect fragmentation causes data loss, corruption, or inconsistency.

Property 1: Completeness Validation

Every tuple from the original relation must appear in at least one fragment.

Horizontal Completeness: Predicates must be collectively exhaustive: p₁ ∨ p₂ ∨ ... ∨ pₙ ≡ TRUE

Validation Approach:

Symbolic analysis of predicates
Check for gaps in coverage
Verify catch-all predicate exists (e.g., ELSE clause)
Runtime validation: COUNT() on union equals COUNT() on original

Vertical Completeness: Every non-key attribute must appear in at least one fragment.

Validation Approach:

Enumerate attributes in each fragment
Verify union of attribute sets includes all original attributes
Tools: Schema comparison utilities

Property 2: Disjointness Validation

For horizontal fragmentation, no tuple should appear in multiple fragments.

Validation Approach:

Predicate analysis: pᵢ ∧ pⱼ ≡ FALSE for i ≠ j
Look for overlapping ranges or conflicting conditions
Runtime validation: SUM of fragment counts equals original count

fragmentation_validator.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
-- Fragmentation Correctness Validation Queries
 
-- ================================================================
-- HORIZONTAL FRAGMENTATION VALIDATION
-- ================================================================
 
-- Given: orders table fragmented by region into 4 partitions
 
-- 1. COMPLETENESS: All rows accounted for
WITH fragment_counts AS (
    SELECT 'orders_na' AS fragment, COUNT(*) AS cnt FROM orders_na
    UNION ALL
    SELECT 'orders_eu', COUNT(*) FROM orders_eu
    UNION ALL
    SELECT 'orders_apac', COUNT(*) FROM orders_apac
    UNION ALL
    SELECT 'orders_other', COUNT(*) FROM orders_other
),
original_count AS (
    SELECT COUNT(*) AS original_cnt FROM orders_original_backup
)
SELECT 
    (SELECT SUM(cnt) FROM fragment_counts) AS fragment_total,
    (SELECT original_cnt FROM original_count) AS original_total,
    CASE 
        WHEN (SELECT SUM(cnt) FROM fragment_counts) = 
             (SELECT original_cnt FROM original_count)
        THEN 'PASS: Completeness validated'
        ELSE 'FAIL: Row count mismatch - possible data loss!'
    END AS completeness_check;
 
-- 2. DISJOINTNESS: No overlapping predicates
-- Check for rows that match multiple fragment predicates
SELECT 
    order_id,
    CASE WHEN region IN ('US', 'CA', 'MX') THEN 1 ELSE 0 END AS matches_na,
    CASE WHEN region IN ('UK', 'DE', 'FR', 'IT', 'ES') THEN 1 ELSE 0 END AS matches_eu,
    CASE WHEN region IN ('JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_apac,
    CASE WHEN region NOT IN ('US', 'CA', 'MX', 'UK', 'DE', 'FR', 'IT', 'ES', 
                             'JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_other
FROM orders_original_backup
HAVING (matches_na + matches_eu + matches_apac + matches_other) != 1
-- Should return empty if predicates are properly disjoint
 
-- 3. RECONSTRUCTION: Union reproduces original
CREATE TEMP TABLE orders_reconstructed AS
SELECT * FROM orders_na
UNION ALL SELECT * FROM orders_eu
UNION ALL SELECT * FROM orders_apac
UNION ALL SELECT * FROM orders_other;
 
-- Verify schema match
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'orders_reconstructed'
EXCEPT
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'orders_original_backup';
-- Should return empty if schemas match
 
-- Verify data match (sample-based for large tables)
SELECT 'Mismatch' AS status
FROM orders_original_backup o
LEFT JOIN orders_reconstructed r USING (order_id)
WHERE r.order_id IS NULL
   OR o.customer_id != r.customer_id
   OR o.total_amount != r.total_amount
LIMIT 1;
-- Should return empty if data matches
 
 
-- ================================================================
-- VERTICAL FRAGMENTATION VALIDATION
-- ================================================================
 
-- Given: employees table split into 3 vertical fragments
 
-- 1. ATTRIBUTE COVERAGE: All attributes present
WITH original_attrs AS (
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_original_backup'
),
fragment_attrs AS (
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_personal'
    UNION
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_employment'
    UNION
    SELECT column_name FROM information_schema.columns
    WHERE table_name = 'employees_performance'
)
SELECT o.column_name AS missing_attribute
FROM original_attrs o
LEFT JOIN fragment_attrs f ON o.column_name = f.column_name
WHERE f.column_name IS NULL;
-- Should return empty if all attributes covered
 
-- 2. TID CONSISTENCY: All fragments have same TIDs
SELECT 'TID mismatch' AS status, 'personal vs employment' AS fragments
FROM employees_personal p
FULL OUTER JOIN employees_employment e ON p.tid = e.tid
WHERE p.tid IS NULL OR e.tid IS NULL
LIMIT 1
 
UNION ALL
 
SELECT 'TID mismatch', 'employment vs performance'
FROM employees_employment e
FULL OUTER JOIN employees_performance pf ON e.tid = pf.tid
WHERE e.tid IS NULL OR pf.tid IS NULL
LIMIT 1;
-- Should return empty if TIDs consistent
 
-- 3. LOSSLESS JOIN: Reconstruction produces correct tuples
CREATE TEMP TABLE employees_reconstructed AS
SELECT 
    p.tid, p.emp_id, p.name, p.address, p.phone,
    e.department, e.salary, e.hire_date,
    pf.review_score, pf.last_review
FROM employees_personal p
JOIN employees_employment e ON p.tid = e.tid
JOIN employees_performance pf ON p.tid = pf.tid;
 
SELECT COUNT(*) AS original, 
       (SELECT COUNT(*) FROM employees_reconstructed) AS reconstructed,
       CASE WHEN COUNT(*) = (SELECT COUNT(*) FROM employees_reconstructed)
            THEN 'PASS' ELSE 'FAIL' END AS lossless_check
FROM employees_original_backup;

Validation Best Practices

•Validate Before Production — Run full validation suite in staging with production-like data before deploying fragmentation changes.
•Continuous Monitoring — Schedule periodic validation jobs (daily/weekly) to detect drift from correctness properties.
•Validate After Schema Changes — Any ALTER TABLE, new partition, or migration must trigger re-validation.
•Sample-Based Validation — For very large tables, use statistical sampling with confidence intervals rather than full scans.
•Preserve Original for Recovery — Keep a validated backup of the original table until fragmentation is proven correct in production.

Allocation Optimization

After defining fragments, allocation assigns fragments to physical database nodes. Optimal allocation minimizes query cost while respecting capacity and regulatory constraints.

The Allocation Problem:

Inputs:

Fragments F = {F₁, F₂, ..., Fₙ}
Sites S = {S₁, S₂, ..., Sₘ}
Query workload Q with frequencies and sources
Site capacities and costs
Replication factor r (1 = no replication)

Output:

Allocation A: F → P(S) where |A(Fᵢ)| = r

Objective: Minimize Σ_Q frequency(Q) × [access_cost(Q, A) + transfer_cost(Q, A)]

Subject to:

Capacity: Σ_{Fᵢ allocated to Sⱼ} size(Fᵢ) ≤ capacity(Sⱼ)
Regulatory: certain fragments must be at specific sites
Availability: replicas at different failure domains

Allocation Strategies:

Strategy	Description	Best When
Locality-based	Fragments near query sources	Clear geographic access patterns
Load-balanced	Distribute evenly by size	Uniform access, no locality benefit
Affinity-based	Co-locate frequently joined fragments	Join-heavy workloads
Regulatory	Fragments at mandated locations	Compliance requirements
Hybrid	Combine above strategies	Complex real-world scenarios

allocation_optimizer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
from dataclasses import dataclass
from typing import List, Dict, Set, Tuple
from ortools.linear_solver import pywraplp
 
@dataclass
class Fragment:
    id: str
    size_gb: float
    required_sites: Set[str] = None  # Regulatory constraints
 
@dataclass
class Site:
    id: str
    capacity_gb: float
    cost_per_gb: float
 
@dataclass 
class QueryAccess:
    fragment_id: str
    source_site: str
    frequency_per_hour: int
 
class AllocationOptimizer:
    """Optimize fragment allocation using Mixed Integer Programming."""
    
    def __init__(self, 
                 fragments: List[Fragment],
                 sites: List[Site],
                 queries: List[QueryAccess],
                 replication_factor: int = 1,
                 remote_access_cost: float = 10.0):  # Cost multiplier for remote access
        self.fragments = {f.id: f for f in fragments}
        self.sites = {s.id: s for s in sites}
        self.queries = queries
        self.replication_factor = replication_factor
        self.remote_cost = remote_access_cost
    
    def optimize(self) -> Dict[str, List[str]]:
        """
        Solve the allocation problem.
        Returns: {fragment_id: [list of allocated sites]}
        """
        solver = pywraplp.Solver.CreateSolver('SCIP')
        if not solver:
            raise Exception("Solver not available")
        
        # Decision variables: x[f][s] = 1 if fragment f is allocated to site s
        x = {}
        for f_id in self.fragments:
            x[f_id] = {}
            for s_id in self.sites:
                x[f_id][s_id] = solver.IntVar(0, 1, f'x_{f_id}_{s_id}')
        
        # Constraint: Each fragment allocated to exactly r sites
        for f_id in self.fragments:
            solver.Add(
                sum(x[f_id][s_id] for s_id in self.sites) == self.replication_factor
            )
        
        # Constraint: Site capacity
        for s_id, site in self.sites.items():
            solver.Add(
                sum(x[f_id][s_id] * self.fragments[f_id].size_gb 
                    for f_id in self.fragments) <= site.capacity_gb
            )
        
        # Constraint: Regulatory requirements
        for f_id, fragment in self.fragments.items():
            if fragment.required_sites:
                for s_id in fragment.required_sites:
                    solver.Add(x[f_id][s_id] == 1)
        
        # Objective: Minimize weighted access cost
        # Cost = sum of (frequency * access_cost) for all queries
        # access_cost = 1 if local, remote_cost if remote
        
        # We need auxiliary variables for "is query local"
        objective_terms = []
        
        for query in self.queries:
            f_id = query.fragment_id
            source = query.source_site
            freq = query.frequency_per_hour
            
            # If fragment is at source site, cost is 1, else remote_cost
            # local_cost * x[f][source] + remote_cost * (1 - x[f][source])
            # = remote_cost - (remote_cost - 1) * x[f][source]
            
            if source in self.sites:
                local_benefit = (self.remote_cost - 1) * x[f_id][source]
                objective_terms.append(freq * self.remote_cost)
                objective_terms.append(-freq * local_benefit)
        
        solver.Minimize(sum(objective_terms))
        
        # Solve
        status = solver.Solve()
        
        if status != pywraplp.Solver.OPTIMAL:
            raise Exception(f"No optimal solution found. Status: {status}")
        
        # Extract solution
        allocation = {}
        for f_id in self.fragments:
            allocation[f_id] = [
                s_id for s_id in self.sites 
                if x[f_id][s_id].solution_value() > 0.5
            ]
        
        return allocation
 
# Example usage
fragments = [
    Fragment("orders_na", 400, required_sites={"us-east"}),  # Must be in US
    Fragment("orders_eu", 300, required_sites={"eu-central"}),  # Must be in EU
    Fragment("orders_apac", 200),
]
 
sites = [
    Site("us-east", 600, 0.10),
    Site("eu-central", 400, 0.12),
    Site("ap-southeast", 300, 0.08),
]
 
queries = [
    QueryAccess("orders_na", "us-east", 10000),
    QueryAccess("orders_eu", "eu-central", 8000),
    QueryAccess("orders_apac", "ap-southeast", 5000),
    QueryAccess("orders_na", "eu-central", 500),  # Cross-region access
]
 
optimizer = AllocationOptimizer(fragments, sites, queries, replication_factor=1)
allocation = optimizer.optimize()
 
print("Optimal Allocation:")
for frag, sites in allocation.items():
    print(f"  {frag} -> {sites}")

Practical Allocation Heuristics

Fragmentation Schema Evolution

Fragmentation schemas must evolve as data grows, workloads change, and requirements shift. Planning for evolution from day one prevents painful migrations later.

Common Evolution Scenarios:

Fragment Splitting:
- A fragment becomes too large or hot
- Split into multiple smaller fragments
- Update routing/predicates accordingly
Fragment Merging:
- Multiple fragments are underutilized
- Consolidate to reduce management overhead
- May conflict with original design rationale
Predicate Change:
- Business rules change (new regions, new categories)
- Adjust predicates without full re-fragmentation
- Must preserve completeness/disjointness
Vertical Restructuring:
- Access patterns evolve—different attributes needed together
- Restructure vertical fragments
- Complex: requires TID management changes
Replication Changes:
- Increase replication for availability
- Reduce replication to save storage
- Add/remove replica sites

Evolution Best Practices:

Design for Splitting: Choose predicate ranges that can be subdivided (dates, hash ranges)
Monitor Continuously: Track fragment size, access patterns, hotspots
Automate Where Possible: Use database-native partition management
Test Migrations Thoroughly: Evolution in production requires careful rollout
Document Decisions: Record why fragments were designed as they are

fragmentation_evolution.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
-- Fragmentation Evolution Examples
 
-- ================================================================
-- SCENARIO 1: Splitting a Large Partition
-- ================================================================
 
-- Current: Single partition for 2024 is getting too large
-- orders_2024: 500GB, 1 billion rows
 
-- Step 1: Create new quarterly partitions
CREATE TABLE orders_2024_q3 PARTITION OF orders
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
CREATE TABLE orders_2024_q4 PARTITION OF orders
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
 
-- Step 2: Detach the old yearly partition
ALTER TABLE orders DETACH PARTITION orders_2024;
 
-- Step 3: Migrate data to new partitions (can be done in batches)
INSERT INTO orders
SELECT * FROM orders_2024
WHERE order_date >= '2024-07-01' AND order_date < '2024-10-01';
 
INSERT INTO orders  
SELECT * FROM orders_2024
WHERE order_date >= '2024-10-01' AND order_date < '2025-01-01';
 
-- Step 4: Recreate original partition with smaller range
CREATE TABLE orders_2024_q1_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-07-01');
 
INSERT INTO orders
SELECT * FROM orders_2024
WHERE order_date >= '2024-01-01' AND order_date < '2024-07-01';
 
-- Step 5: Drop old partition after validation
-- VALIDATE FIRST: Count rows, verify completeness
DROP TABLE orders_2024;
 
 
-- ================================================================
-- SCENARIO 2: Adding New Region
-- ================================================================
 
-- Current: NA, EU, APAC regions
-- New: LATAM region being launched
 
-- Step 1: Create new partition
CREATE TABLE orders_latam PARTITION OF orders_by_region
    FOR VALUES IN ('BR', 'MX', 'AR', 'CL', 'CO');
 
-- But wait! MX was in NA partition - need to migrate
 
-- Step 2: Handle overlap - move MX data
BEGIN;
 
-- Lock to prevent new inserts during migration
LOCK TABLE orders_by_region IN EXCLUSIVE MODE;
 
-- Create temp table with MX data
CREATE TEMP TABLE orders_mx AS
SELECT * FROM orders_na WHERE region = 'MX';
 
-- Delete from old partition
DELETE FROM orders_na WHERE region = 'MX';
 
-- Modify NA partition to exclude MX (requires detach/recreate)
ALTER TABLE orders_by_region DETACH PARTITION orders_na;
DROP TABLE orders_na;
 
CREATE TABLE orders_na PARTITION OF orders_by_region
    FOR VALUES IN ('US', 'CA');  -- MX removed
 
-- Re-insert non-MX data
-- (In practice, use pg_dump/restore or partition swap)
 
-- Insert MX data into LATAM
INSERT INTO orders_latam SELECT * FROM orders_mx;
 
COMMIT;
 
 
-- ================================================================
-- SCENARIO 3: Automated Partition Maintenance
-- ================================================================
 
-- Create function to auto-create future time partitions
CREATE OR REPLACE FUNCTION create_time_partition(
    parent_table TEXT,
    partition_prefix TEXT,
    start_date DATE,
    end_date DATE
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
BEGIN
    partition_name := partition_prefix || to_char(start_date, 'YYYY_MM');
    
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
         FOR VALUES FROM (%L) TO (%L)',
        partition_name,
        parent_table,
        start_date,
        end_date
    );
    
    RAISE NOTICE 'Created partition: %', partition_name;
END;
$$ LANGUAGE plpgsql;
 
-- Scheduled job to create partitions 3 months ahead
DO $$
DECLARE
    future_month DATE;
BEGIN
    FOR i IN 1..3 LOOP
        future_month := date_trunc('month', CURRENT_DATE + (i || ' month')::interval);
        PERFORM create_time_partition(
            'orders',
            'orders_',
            future_month,
            future_month + '1 month'::interval
        );
    END LOOP;
END $$;

Evolution Risks

Tools and Automation

Modern database systems provide native partitioning features and management tools. Understanding the landscape helps you choose appropriate tooling.

Database Native Partitioning:

Database	Partitioning Type	Key Features
PostgreSQL	Declarative (10+)	Range, list, hash; partition pruning; default partitions
MySQL	Native	Range, list, hash, key; subpartitioning; partition-level operations
Oracle	Advanced	Range, list, hash, composite; interval partitioning; online operations
SQL Server	Table partitioning	Range; partition switching; sliding window patterns
Cloud (Aurora, Spanner, etc.)	Managed	Auto-sharding, invisible partitioning

Distributed Database Systems:

NewSQL and distributed databases handle fragmentation automatically:

CockroachDB: Automatic range-based sharding with rebalancing
TiDB: Range-based regions, automatic splitting/merging
YugabyteDB: Hash and range sharding with tablet splitting
Google Spanner: Automatic sharding based on primary key ranges

These systems abstract fragmentation from application developers but allow hints/configuration for optimization.

Monitoring and Management Tools:

Partition Statistics: Track size, row count, access frequency per partition
Query Analysis: Identify queries that don't achieve partition pruning
Hotspot Detection: Alert when partitions become disproportionately loaded
Automated Maintenance: Scripts for partition creation, archival, deletion

fragmentation_monitoring.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
-- Fragmentation Monitoring Queries for PostgreSQL
 
-- ================================================================
-- Partition Size and Row Distribution
-- ================================================================
SELECT 
    parent.relname AS parent_table,
    child.relname AS partition_name,
    pg_size_pretty(pg_relation_size(child.oid)) AS partition_size,
    pg_stat_user_tables.n_live_tup AS row_count,
    pg_stat_user_tables.seq_scan AS sequential_scans,
    pg_stat_user_tables.idx_scan AS index_scans,
    pg_stat_user_tables.n_tup_ins AS inserts,
    pg_stat_user_tables.n_tup_upd AS updates,
    pg_stat_user_tables.n_tup_del AS deletes,
    pg_stat_user_tables.last_vacuum,
    pg_stat_user_tables.last_analyze
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_stat_user_tables ON child.relname = pg_stat_user_tables.relname
WHERE parent.relname = 'orders'
ORDER BY pg_relation_size(child.oid) DESC;
 
 
-- ================================================================
-- Partition Pruning Analysis
-- ================================================================
 
-- Check if partition pruning is happening for a query
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM orders 
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01';
 
-- Look for "Subplans Removed" in output - indicates pruning
-- If all partitions appear in plan, pruning is NOT working
 
 
-- ================================================================
-- Identify Queries Missing Partition Pruning
-- ================================================================
 
-- From pg_stat_statements, find queries on partitioned tables
-- with high execution counts that might benefit from pruning
SELECT 
    query,
    calls,
    mean_exec_time,
    rows,
    shared_blks_read + shared_blks_hit AS total_blocks
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
  AND mean_exec_time > 100  -- ms
ORDER BY calls * mean_exec_time DESC
LIMIT 20;
 
 
-- ================================================================
-- Partition Skew Detection
-- ================================================================
WITH partition_sizes AS (
    SELECT 
        child.relname AS partition_name,
        pg_relation_size(child.oid) AS size_bytes
    FROM pg_inherits
    JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child ON pg_inherits.inhrelid = child.oid
    WHERE parent.relname = 'orders'
),
stats AS (
    SELECT 
        AVG(size_bytes) AS avg_size,
        STDDEV(size_bytes) AS stddev_size
    FROM partition_sizes
)
SELECT 
    partition_name,
    pg_size_pretty(size_bytes) AS size,
    ROUND((size_bytes - avg_size) / NULLIF(stddev_size, 0), 2) AS z_score,
    CASE 
        WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) > 2 
        THEN '⚠️ LARGE OUTLIER'
        WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) < -2 
        THEN '⚠️ SMALL OUTLIER'
        ELSE '✓ Normal'
    END AS status
FROM partition_sizes, stats
ORDER BY size_bytes DESC;
 
 
-- ================================================================
-- Automated Partition Creation Check
-- ================================================================
 
-- Find partitions that might need to be created soon
WITH latest_partition AS (
    SELECT MAX(child.relname) AS latest
    FROM pg_inherits
    JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child ON pg_inherits.inhrelid = child.oid
    WHERE parent.relname = 'orders'
      AND child.relname ~ 'orders_\d{4}_\d{2}'  -- Pattern: orders_YYYY_MM
)
SELECT 
    latest,
    CASE 
        WHEN latest < 'orders_' || to_char(CURRENT_DATE + '2 month', 'YYYY_MM')
        THEN '⚠️ Need to create future partitions!'
        ELSE '✓ Future partitions exist'
    END AS status
FROM latest_partition;

Automation Recommendations

•Scheduled Partition Creation — For time-based partitioning, create future partitions automatically (cron job, pg_cron, etc.)
•Alerting on Size Thresholds — Alert when partitions exceed size limits, triggering review for splitting
•Periodic Validation Jobs — Weekly runs of completeness/disjointness checks
•Query Log Analysis — Identify queries not achieving pruning; feed back to development teams
•Automated Statistics — Ensure ANALYZE runs regularly on all partitions for query optimizer accuracy

Summary: Fragmentation Design Mastery

Fragmentation design transforms theoretical knowledge into production-ready systems. Let's consolidate the key concepts:

Key Takeaways

•Requirements First — Comprehensive workload analysis, constraint identification, and stakeholder involvement precede any design decisions.
•Cost Models — Quantify fragmentation alternatives using calibrated cost models. Compare query costs, storage overhead, and maintenance burden.
•Correctness Validation — Systematically verify completeness, disjointness, and reconstruction. Validation is not optional—it prevents data loss.
•Allocation Optimization — Assignment of fragments to sites balances query locality, capacity constraints, and regulatory requirements.
•Schema Evolution — Design for change from day one. Monitor continuously, automate maintenance, and plan migration procedures.
•Tooling and Automation — Leverage native database features and build monitoring/maintenance automation to manage complexity.

What's Next:

Page Complete