Loading content...
Understanding fragmentation types—horizontal, vertical, hybrid—is necessary but not sufficient. The real challenge is design: How do you determine the optimal fragmentation strategy for a specific system with specific requirements?
Fragmentation design sits at the intersection of database theory, systems engineering, and operational pragmatics. A theoretically optimal fragmentation that's impossible to maintain operationally isn't optimal at all. Conversely, an easy-to-maintain fragmentation that causes critical queries to time out isn't viable either.
This page presents rigorous methodologies for fragmentation design—from requirements gathering through validation, allocation, and evolution. You'll learn to make principled design decisions, validate correctness properties systematically, and build fragmentation schemas that serve both current needs and future growth.
By the end of this page, you will understand: (1) The fragmentation design lifecycle from requirements to production, (2) Cost models for evaluating fragmentation alternatives, (3) Systematic validation of completeness, disjointness, and reconstruction properties, (4) Allocation optimization techniques, (5) Schema evolution strategies for growing systems, and (6) Tools and automation for fragmentation management.
Fragmentation design begins long before choosing horizontal vs. vertical. It starts with understanding the complete operational context.
Workload Characterization:
Document the query workload comprehensively:
Query Inventory:
Data Profile:
Access Patterns:
Constraints Identification:
| Constraint Category | Examples | Impact on Design |
|---|---|---|
| Performance | P99 < 100ms for order lookup | Must support fragment elimination |
| Regulatory | GDPR data residency | Geographic horizontal fragmentation |
| Security | PII isolation | Attribute-level vertical fragmentation |
| Operational | Max 50 partitions/table | Limits fragmentation granularity |
| Infrastructure | 3 datacenters, 20TB each | Bounds allocation options |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
# Fragmentation Requirements Document Template system: name: "Order Management System" owner: "Commerce Platform Team" version: "2.1" tables: - name: orders current_rows: 500_000_000 current_size_gb: 850 growth_rate_monthly_percent: 5 attributes: - name: order_id type: uuid primary_key: true size_bytes: 16 - name: customer_id type: bigint foreign_key: customers.customer_id predicate_frequency: 0.4 # Used in 40% of predicates - name: region type: varchar(20) predicate_frequency: 0.7 cardinality: 50 # 50 distinct values - name: order_date type: timestamp predicate_frequency: 0.8 - name: status type: enum predicate_frequency: 0.3 cardinality: 6 - name: total_amount type: decimal access_frequency: 0.6 - name: shipping_address type: jsonb access_frequency: 0.2 size_bytes_avg: 500 - name: internal_notes type: text access_frequency: 0.02 size_bytes_avg: 2000 queries: - name: "customer_order_history" pattern: | SELECT order_id, order_date, status, total_amount FROM orders WHERE customer_id = ? AND order_date >= ? frequency_per_hour: 50000 latency_p99_ms: 100 source: "customer-webapp" - name: "region_daily_summary" pattern: | SELECT region, COUNT(*), SUM(total_amount) FROM orders WHERE order_date >= ? AND order_date < ? GROUP BY region frequency_per_hour: 24 latency_p99_ms: 5000 source: "analytics-service" - name: "order_detail_lookup" pattern: | SELECT * FROM orders WHERE order_id = ? frequency_per_hour: 100000 latency_p99_ms: 50 source: "order-service" constraints: regulatory: - type: data_residency description: "EU customer orders must be stored in EU datacenter" applies_to: region IN ('DE', 'FR', 'IT', 'ES', 'NL') performance: - metric: "order_detail_lookup P99" threshold_ms: 50 priority: critical operational: - type: max_partitions value: 100 reason: "PostgreSQL partition planning overhead" - type: max_fragment_size_gb value: 100 reason: "Backup/restore time constraints" infrastructure: datacenters: - name: us-east-1 capacity_tb: 50 regions_served: ['US', 'CA', 'MX'] - name: eu-central-1 capacity_tb: 30 regions_served: ['DE', 'FR', 'IT', 'ES', 'NL', 'UK'] - name: ap-southeast-1 capacity_tb: 20 regions_served: ['JP', 'CN', 'AU', 'IN', 'SG']Requirements gathering must involve application developers, DBAs, security/compliance teams, and capacity planners. A fragmentation decision made without operations input often fails in production. A decision made without security review may violate compliance requirements. Build a cross-functional design document before any implementation.
Comparing fragmentation alternatives requires a cost model that quantifies the impact of each design choice on query performance, storage, and operations.
Query Cost Model:
For each query Q and fragmentation scheme F:
Cost(Q, F) = Σᵢ [access_cost(Fᵢ) + transfer_cost(Fᵢ)] × frequency(Q)
Where:
Fragment Access Determination:
For each query, determine which fragments are accessed:
Fragment Elimination Efficiency:
E(Q, F) = 1 - (fragments_accessed / total_fragments)
Higher elimination efficiency means better fragmentation for that query.
Total Cost:
Total_Cost(F) = Σ_Q [Cost(Q, F)] + Storage_Cost(F) + Maintenance_Cost(F)
Storage_Cost includes: fragment overhead, TID replication (vertical), index duplication Maintenance_Cost includes: partition creation, rebalancing, schema changes
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166
from dataclasses import dataclassfrom typing import List, Dict, Set, Optionalimport math @dataclassclass Fragment: """Represents a single fragment in a fragmentation scheme.""" id: str predicate: Dict[str, str] # {attribute: condition} attributes: Set[str] # For vertical fragmentation estimated_rows: int estimated_size_mb: float location: str # datacenter @dataclassclass Query: """Represents a query workload component.""" id: str predicates: Dict[str, str] attributes_accessed: Set[str] frequency_per_hour: int source_location: str max_latency_ms: int class FragmentationCostModel: """Evaluate fragmentation schemes against workload.""" def __init__(self, io_cost_per_mb: float = 0.1, # ms per MB read network_latency_ms: float = 20, # inter-DC latency network_cost_per_mb: float = 1.0, # ms per MB transferred tid_overhead_bytes: int = 8): # TID size for vertical self.io_cost_per_mb = io_cost_per_mb self.network_latency_ms = network_latency_ms self.network_cost_per_mb = network_cost_per_mb self.tid_overhead = tid_overhead_bytes def predicate_matches(self, query_pred: Dict, frag_pred: Dict) -> str: """ Determine relationship between query and fragment predicates. Returns: 'match', 'contradict', or 'overlap' """ for attr, q_cond in query_pred.items(): if attr in frag_pred: f_cond = frag_pred[attr] # Simplified logic - in reality, need full predicate analysis if q_cond == f_cond: return 'match' elif self._conditions_exclusive(q_cond, f_cond): return 'contradict' return 'overlap' def _conditions_exclusive(self, cond1: str, cond2: str) -> bool: """Check if two conditions are mutually exclusive.""" # Simplified - real implementation needs SQL predicate analysis # e.g., "region = 'US'" and "region = 'EU'" are exclusive if '=' in cond1 and '=' in cond2: val1 = cond1.split('=')[1].strip() val2 = cond2.split('=')[1].strip() return val1 != val2 return False def fragments_accessed(self, query: Query, fragments: List[Fragment]) -> List[Fragment]: """Determine which fragments a query must access.""" accessed = [] for frag in fragments: # Check predicate elimination relationship = self.predicate_matches(query.predicates, frag.predicate) if relationship == 'contradict': continue # Fragment eliminated # Check attribute coverage (for vertical fragmentation) if query.attributes_accessed and frag.attributes: if not query.attributes_accessed.intersection(frag.attributes): continue # Fragment doesn't have needed attributes accessed.append(frag) return accessed def query_cost(self, query: Query, fragments: List[Fragment]) -> Dict: """Calculate cost for a single query against fragmentation scheme.""" accessed = self.fragments_accessed(query, fragments) total_io_cost = 0 total_network_cost = 0 for frag in accessed: # I/O cost - assume we read 10% of fragment (selectivity estimate) selectivity = 0.1 read_mb = frag.estimated_size_mb * selectivity total_io_cost += read_mb * self.io_cost_per_mb # Network cost if fragment is remote if frag.location != query.source_location: result_mb = read_mb * 0.1 # Assume 10% of read becomes result total_network_cost += (self.network_latency_ms + result_mb * self.network_cost_per_mb) elimination_efficiency = 1 - (len(accessed) / len(fragments)) if fragments else 0 return { 'query_id': query.id, 'fragments_accessed': len(accessed), 'total_fragments': len(fragments), 'elimination_efficiency': elimination_efficiency, 'io_cost_ms': total_io_cost, 'network_cost_ms': total_network_cost, 'estimated_latency_ms': total_io_cost + total_network_cost, 'meets_sla': (total_io_cost + total_network_cost) <= query.max_latency_ms, 'hourly_cost': (total_io_cost + total_network_cost) * query.frequency_per_hour } def evaluate_scheme(self, queries: List[Query], fragments: List[Fragment]) -> Dict: """Evaluate a complete fragmentation scheme.""" results = [] total_hourly_cost = 0 sla_violations = 0 for query in queries: result = self.query_cost(query, fragments) results.append(result) total_hourly_cost += result['hourly_cost'] if not result['meets_sla']: sla_violations += 1 # Storage cost total_storage_mb = sum(f.estimated_size_mb for f in fragments) return { 'query_results': results, 'total_hourly_cost_ms': total_hourly_cost, 'sla_violations': sla_violations, 'total_fragments': len(fragments), 'total_storage_mb': total_storage_mb, 'avg_elimination_efficiency': ( sum(r['elimination_efficiency'] for r in results) / len(results) ) } # Example: Compare two fragmentation schemesmodel = FragmentationCostModel() # Scheme 1: Fragment by region onlyscheme1 = [ Fragment("orders_na", {"region": "= 'NA'"}, set(), 200_000_000, 400_000, "us-east"), Fragment("orders_eu", {"region": "= 'EU'"}, set(), 150_000_000, 300_000, "eu-central"), Fragment("orders_apac", {"region": "= 'APAC'"}, set(), 100_000_000, 200_000, "ap-southeast"),] # Scheme 2: Fragment by region AND monthscheme2 = [ Fragment("orders_na_2024_01", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 15_000_000, 30_000, "us-east"), # ... (many more fragments)] queries = [ Query("q1", {"region": "= 'NA'", "month": "= 2024-01"}, set(), 1000, "us-east", 100), Query("q2", {"region": "= 'EU'"}, set(), 500, "eu-central", 200),] result1 = model.evaluate_scheme(queries, scheme1)print(f"Scheme 1 - Hourly cost: {result1['total_hourly_cost_ms']:.0f}ms, " f"SLA violations: {result1['sla_violations']}")Cost model parameters (I/O cost, network latency, selectivity estimates) must be calibrated against actual system measurements. Run representative queries, measure actual costs, and adjust model parameters. An uncalibrated model can lead to wrong design decisions. Validate model predictions against production benchmarks.
A fragmentation scheme must satisfy formal correctness properties. Validation is not optional—incorrect fragmentation causes data loss, corruption, or inconsistency.
Property 1: Completeness Validation
Every tuple from the original relation must appear in at least one fragment.
Horizontal Completeness: Predicates must be collectively exhaustive: p₁ ∨ p₂ ∨ ... ∨ pₙ ≡ TRUE
Validation Approach:
Vertical Completeness: Every non-key attribute must appear in at least one fragment.
Validation Approach:
Property 2: Disjointness Validation
For horizontal fragmentation, no tuple should appear in multiple fragments.
Validation Approach:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
-- Fragmentation Correctness Validation Queries -- ================================================================-- HORIZONTAL FRAGMENTATION VALIDATION-- ================================================================ -- Given: orders table fragmented by region into 4 partitions -- 1. COMPLETENESS: All rows accounted forWITH fragment_counts AS ( SELECT 'orders_na' AS fragment, COUNT(*) AS cnt FROM orders_na UNION ALL SELECT 'orders_eu', COUNT(*) FROM orders_eu UNION ALL SELECT 'orders_apac', COUNT(*) FROM orders_apac UNION ALL SELECT 'orders_other', COUNT(*) FROM orders_other),original_count AS ( SELECT COUNT(*) AS original_cnt FROM orders_original_backup)SELECT (SELECT SUM(cnt) FROM fragment_counts) AS fragment_total, (SELECT original_cnt FROM original_count) AS original_total, CASE WHEN (SELECT SUM(cnt) FROM fragment_counts) = (SELECT original_cnt FROM original_count) THEN 'PASS: Completeness validated' ELSE 'FAIL: Row count mismatch - possible data loss!' END AS completeness_check; -- 2. DISJOINTNESS: No overlapping predicates-- Check for rows that match multiple fragment predicatesSELECT order_id, CASE WHEN region IN ('US', 'CA', 'MX') THEN 1 ELSE 0 END AS matches_na, CASE WHEN region IN ('UK', 'DE', 'FR', 'IT', 'ES') THEN 1 ELSE 0 END AS matches_eu, CASE WHEN region IN ('JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_apac, CASE WHEN region NOT IN ('US', 'CA', 'MX', 'UK', 'DE', 'FR', 'IT', 'ES', 'JP', 'CN', 'AU', 'IN', 'SG') THEN 1 ELSE 0 END AS matches_otherFROM orders_original_backupHAVING (matches_na + matches_eu + matches_apac + matches_other) != 1-- Should return empty if predicates are properly disjoint -- 3. RECONSTRUCTION: Union reproduces originalCREATE TEMP TABLE orders_reconstructed ASSELECT * FROM orders_naUNION ALL SELECT * FROM orders_euUNION ALL SELECT * FROM orders_apacUNION ALL SELECT * FROM orders_other; -- Verify schema matchSELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders_reconstructed'EXCEPTSELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'orders_original_backup';-- Should return empty if schemas match -- Verify data match (sample-based for large tables)SELECT 'Mismatch' AS statusFROM orders_original_backup oLEFT JOIN orders_reconstructed r USING (order_id)WHERE r.order_id IS NULL OR o.customer_id != r.customer_id OR o.total_amount != r.total_amountLIMIT 1;-- Should return empty if data matches -- ================================================================-- VERTICAL FRAGMENTATION VALIDATION-- ================================================================ -- Given: employees table split into 3 vertical fragments -- 1. ATTRIBUTE COVERAGE: All attributes presentWITH original_attrs AS ( SELECT column_name FROM information_schema.columns WHERE table_name = 'employees_original_backup'),fragment_attrs AS ( SELECT column_name FROM information_schema.columns WHERE table_name = 'employees_personal' UNION SELECT column_name FROM information_schema.columns WHERE table_name = 'employees_employment' UNION SELECT column_name FROM information_schema.columns WHERE table_name = 'employees_performance')SELECT o.column_name AS missing_attributeFROM original_attrs oLEFT JOIN fragment_attrs f ON o.column_name = f.column_nameWHERE f.column_name IS NULL;-- Should return empty if all attributes covered -- 2. TID CONSISTENCY: All fragments have same TIDsSELECT 'TID mismatch' AS status, 'personal vs employment' AS fragmentsFROM employees_personal pFULL OUTER JOIN employees_employment e ON p.tid = e.tidWHERE p.tid IS NULL OR e.tid IS NULLLIMIT 1 UNION ALL SELECT 'TID mismatch', 'employment vs performance'FROM employees_employment eFULL OUTER JOIN employees_performance pf ON e.tid = pf.tidWHERE e.tid IS NULL OR pf.tid IS NULLLIMIT 1;-- Should return empty if TIDs consistent -- 3. LOSSLESS JOIN: Reconstruction produces correct tuplesCREATE TEMP TABLE employees_reconstructed ASSELECT p.tid, p.emp_id, p.name, p.address, p.phone, e.department, e.salary, e.hire_date, pf.review_score, pf.last_reviewFROM employees_personal pJOIN employees_employment e ON p.tid = e.tidJOIN employees_performance pf ON p.tid = pf.tid; SELECT COUNT(*) AS original, (SELECT COUNT(*) FROM employees_reconstructed) AS reconstructed, CASE WHEN COUNT(*) = (SELECT COUNT(*) FROM employees_reconstructed) THEN 'PASS' ELSE 'FAIL' END AS lossless_checkFROM employees_original_backup;After defining fragments, allocation assigns fragments to physical database nodes. Optimal allocation minimizes query cost while respecting capacity and regulatory constraints.
The Allocation Problem:
Inputs:
Output:
Objective: Minimize Σ_Q frequency(Q) × [access_cost(Q, A) + transfer_cost(Q, A)]
Subject to:
Allocation Strategies:
| Strategy | Description | Best When |
|---|---|---|
| Locality-based | Fragments near query sources | Clear geographic access patterns |
| Load-balanced | Distribute evenly by size | Uniform access, no locality benefit |
| Affinity-based | Co-locate frequently joined fragments | Join-heavy workloads |
| Regulatory | Fragments at mandated locations | Compliance requirements |
| Hybrid | Combine above strategies | Complex real-world scenarios |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
from dataclasses import dataclassfrom typing import List, Dict, Set, Tuplefrom ortools.linear_solver import pywraplp @dataclassclass Fragment: id: str size_gb: float required_sites: Set[str] = None # Regulatory constraints @dataclassclass Site: id: str capacity_gb: float cost_per_gb: float @dataclass class QueryAccess: fragment_id: str source_site: str frequency_per_hour: int class AllocationOptimizer: """Optimize fragment allocation using Mixed Integer Programming.""" def __init__(self, fragments: List[Fragment], sites: List[Site], queries: List[QueryAccess], replication_factor: int = 1, remote_access_cost: float = 10.0): # Cost multiplier for remote access self.fragments = {f.id: f for f in fragments} self.sites = {s.id: s for s in sites} self.queries = queries self.replication_factor = replication_factor self.remote_cost = remote_access_cost def optimize(self) -> Dict[str, List[str]]: """ Solve the allocation problem. Returns: {fragment_id: [list of allocated sites]} """ solver = pywraplp.Solver.CreateSolver('SCIP') if not solver: raise Exception("Solver not available") # Decision variables: x[f][s] = 1 if fragment f is allocated to site s x = {} for f_id in self.fragments: x[f_id] = {} for s_id in self.sites: x[f_id][s_id] = solver.IntVar(0, 1, f'x_{f_id}_{s_id}') # Constraint: Each fragment allocated to exactly r sites for f_id in self.fragments: solver.Add( sum(x[f_id][s_id] for s_id in self.sites) == self.replication_factor ) # Constraint: Site capacity for s_id, site in self.sites.items(): solver.Add( sum(x[f_id][s_id] * self.fragments[f_id].size_gb for f_id in self.fragments) <= site.capacity_gb ) # Constraint: Regulatory requirements for f_id, fragment in self.fragments.items(): if fragment.required_sites: for s_id in fragment.required_sites: solver.Add(x[f_id][s_id] == 1) # Objective: Minimize weighted access cost # Cost = sum of (frequency * access_cost) for all queries # access_cost = 1 if local, remote_cost if remote # We need auxiliary variables for "is query local" objective_terms = [] for query in self.queries: f_id = query.fragment_id source = query.source_site freq = query.frequency_per_hour # If fragment is at source site, cost is 1, else remote_cost # local_cost * x[f][source] + remote_cost * (1 - x[f][source]) # = remote_cost - (remote_cost - 1) * x[f][source] if source in self.sites: local_benefit = (self.remote_cost - 1) * x[f_id][source] objective_terms.append(freq * self.remote_cost) objective_terms.append(-freq * local_benefit) solver.Minimize(sum(objective_terms)) # Solve status = solver.Solve() if status != pywraplp.Solver.OPTIMAL: raise Exception(f"No optimal solution found. Status: {status}") # Extract solution allocation = {} for f_id in self.fragments: allocation[f_id] = [ s_id for s_id in self.sites if x[f_id][s_id].solution_value() > 0.5 ] return allocation # Example usagefragments = [ Fragment("orders_na", 400, required_sites={"us-east"}), # Must be in US Fragment("orders_eu", 300, required_sites={"eu-central"}), # Must be in EU Fragment("orders_apac", 200),] sites = [ Site("us-east", 600, 0.10), Site("eu-central", 400, 0.12), Site("ap-southeast", 300, 0.08),] queries = [ QueryAccess("orders_na", "us-east", 10000), QueryAccess("orders_eu", "eu-central", 8000), QueryAccess("orders_apac", "ap-southeast", 5000), QueryAccess("orders_na", "eu-central", 500), # Cross-region access] optimizer = AllocationOptimizer(fragments, sites, queries, replication_factor=1)allocation = optimizer.optimize() print("Optimal Allocation:")for frag, sites in allocation.items(): print(f" {frag} -> {sites}")For simple scenarios, exact optimization may be overkill. Common heuristics: (1) Allocate fragments to sites serving their data's region, (2) Replicate hot fragments to all sites with significant access, (3) Load-balance cold fragments across sites with spare capacity, (4) Co-locate fragments with foreign key relationships. Use optimization for complex multi-objective scenarios.
Fragmentation schemas must evolve as data grows, workloads change, and requirements shift. Planning for evolution from day one prevents painful migrations later.
Common Evolution Scenarios:
Fragment Splitting:
Fragment Merging:
Predicate Change:
Vertical Restructuring:
Replication Changes:
Evolution Best Practices:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
-- Fragmentation Evolution Examples -- ================================================================-- SCENARIO 1: Splitting a Large Partition-- ================================================================ -- Current: Single partition for 2024 is getting too large-- orders_2024: 500GB, 1 billion rows -- Step 1: Create new quarterly partitionsCREATE TABLE orders_2024_q3 PARTITION OF orders FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');CREATE TABLE orders_2024_q4 PARTITION OF orders FOR VALUES FROM ('2024-10-01') TO ('2025-01-01'); -- Step 2: Detach the old yearly partitionALTER TABLE orders DETACH PARTITION orders_2024; -- Step 3: Migrate data to new partitions (can be done in batches)INSERT INTO ordersSELECT * FROM orders_2024WHERE order_date >= '2024-07-01' AND order_date < '2024-10-01'; INSERT INTO orders SELECT * FROM orders_2024WHERE order_date >= '2024-10-01' AND order_date < '2025-01-01'; -- Step 4: Recreate original partition with smaller rangeCREATE TABLE orders_2024_q1_q2 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-07-01'); INSERT INTO ordersSELECT * FROM orders_2024WHERE order_date >= '2024-01-01' AND order_date < '2024-07-01'; -- Step 5: Drop old partition after validation-- VALIDATE FIRST: Count rows, verify completenessDROP TABLE orders_2024; -- ================================================================-- SCENARIO 2: Adding New Region-- ================================================================ -- Current: NA, EU, APAC regions-- New: LATAM region being launched -- Step 1: Create new partitionCREATE TABLE orders_latam PARTITION OF orders_by_region FOR VALUES IN ('BR', 'MX', 'AR', 'CL', 'CO'); -- But wait! MX was in NA partition - need to migrate -- Step 2: Handle overlap - move MX dataBEGIN; -- Lock to prevent new inserts during migrationLOCK TABLE orders_by_region IN EXCLUSIVE MODE; -- Create temp table with MX dataCREATE TEMP TABLE orders_mx ASSELECT * FROM orders_na WHERE region = 'MX'; -- Delete from old partitionDELETE FROM orders_na WHERE region = 'MX'; -- Modify NA partition to exclude MX (requires detach/recreate)ALTER TABLE orders_by_region DETACH PARTITION orders_na;DROP TABLE orders_na; CREATE TABLE orders_na PARTITION OF orders_by_region FOR VALUES IN ('US', 'CA'); -- MX removed -- Re-insert non-MX data-- (In practice, use pg_dump/restore or partition swap) -- Insert MX data into LATAMINSERT INTO orders_latam SELECT * FROM orders_mx; COMMIT; -- ================================================================-- SCENARIO 3: Automated Partition Maintenance-- ================================================================ -- Create function to auto-create future time partitionsCREATE OR REPLACE FUNCTION create_time_partition( parent_table TEXT, partition_prefix TEXT, start_date DATE, end_date DATE) RETURNS VOID AS $$DECLARE partition_name TEXT;BEGIN partition_name := partition_prefix || to_char(start_date, 'YYYY_MM'); EXECUTE format( 'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)', partition_name, parent_table, start_date, end_date ); RAISE NOTICE 'Created partition: %', partition_name;END;$$ LANGUAGE plpgsql; -- Scheduled job to create partitions 3 months aheadDO $$DECLARE future_month DATE;BEGIN FOR i IN 1..3 LOOP future_month := date_trunc('month', CURRENT_DATE + (i || ' month')::interval); PERFORM create_time_partition( 'orders', 'orders_', future_month, future_month + '1 month'::interval ); END LOOP;END $$;Fragmentation evolution in production carries risks: data loss during migration, performance degradation during transition, constraint violations if not careful. Always: (1) Test in staging with production-scale data, (2) Have rollback plan, (3) Schedule during low-traffic windows, (4) Monitor closely during and after migration, (5) Validate correctness properties post-migration.
Modern database systems provide native partitioning features and management tools. Understanding the landscape helps you choose appropriate tooling.
Database Native Partitioning:
| Database | Partitioning Type | Key Features |
|---|---|---|
| PostgreSQL | Declarative (10+) | Range, list, hash; partition pruning; default partitions |
| MySQL | Native | Range, list, hash, key; subpartitioning; partition-level operations |
| Oracle | Advanced | Range, list, hash, composite; interval partitioning; online operations |
| SQL Server | Table partitioning | Range; partition switching; sliding window patterns |
| Cloud (Aurora, Spanner, etc.) | Managed | Auto-sharding, invisible partitioning |
Distributed Database Systems:
NewSQL and distributed databases handle fragmentation automatically:
These systems abstract fragmentation from application developers but allow hints/configuration for optimization.
Monitoring and Management Tools:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
-- Fragmentation Monitoring Queries for PostgreSQL -- ================================================================-- Partition Size and Row Distribution-- ================================================================SELECT parent.relname AS parent_table, child.relname AS partition_name, pg_size_pretty(pg_relation_size(child.oid)) AS partition_size, pg_stat_user_tables.n_live_tup AS row_count, pg_stat_user_tables.seq_scan AS sequential_scans, pg_stat_user_tables.idx_scan AS index_scans, pg_stat_user_tables.n_tup_ins AS inserts, pg_stat_user_tables.n_tup_upd AS updates, pg_stat_user_tables.n_tup_del AS deletes, pg_stat_user_tables.last_vacuum, pg_stat_user_tables.last_analyzeFROM pg_inheritsJOIN pg_class parent ON pg_inherits.inhparent = parent.oidJOIN pg_class child ON pg_inherits.inhrelid = child.oidJOIN pg_stat_user_tables ON child.relname = pg_stat_user_tables.relnameWHERE parent.relname = 'orders'ORDER BY pg_relation_size(child.oid) DESC; -- ================================================================-- Partition Pruning Analysis-- ================================================================ -- Check if partition pruning is happening for a queryEXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'; -- Look for "Subplans Removed" in output - indicates pruning-- If all partitions appear in plan, pruning is NOT working -- ================================================================-- Identify Queries Missing Partition Pruning-- ================================================================ -- From pg_stat_statements, find queries on partitioned tables-- with high execution counts that might benefit from pruningSELECT query, calls, mean_exec_time, rows, shared_blks_read + shared_blks_hit AS total_blocksFROM pg_stat_statementsWHERE query ILIKE '%orders%' AND mean_exec_time > 100 -- msORDER BY calls * mean_exec_time DESCLIMIT 20; -- ================================================================-- Partition Skew Detection-- ================================================================WITH partition_sizes AS ( SELECT child.relname AS partition_name, pg_relation_size(child.oid) AS size_bytes FROM pg_inherits JOIN pg_class parent ON pg_inherits.inhparent = parent.oid JOIN pg_class child ON pg_inherits.inhrelid = child.oid WHERE parent.relname = 'orders'),stats AS ( SELECT AVG(size_bytes) AS avg_size, STDDEV(size_bytes) AS stddev_size FROM partition_sizes)SELECT partition_name, pg_size_pretty(size_bytes) AS size, ROUND((size_bytes - avg_size) / NULLIF(stddev_size, 0), 2) AS z_score, CASE WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) > 2 THEN '⚠️ LARGE OUTLIER' WHEN (size_bytes - avg_size) / NULLIF(stddev_size, 0) < -2 THEN '⚠️ SMALL OUTLIER' ELSE '✓ Normal' END AS statusFROM partition_sizes, statsORDER BY size_bytes DESC; -- ================================================================-- Automated Partition Creation Check-- ================================================================ -- Find partitions that might need to be created soonWITH latest_partition AS ( SELECT MAX(child.relname) AS latest FROM pg_inherits JOIN pg_class parent ON pg_inherits.inhparent = parent.oid JOIN pg_class child ON pg_inherits.inhrelid = child.oid WHERE parent.relname = 'orders' AND child.relname ~ 'orders_\d{4}_\d{2}' -- Pattern: orders_YYYY_MM)SELECT latest, CASE WHEN latest < 'orders_' || to_char(CURRENT_DATE + '2 month', 'YYYY_MM') THEN '⚠️ Need to create future partitions!' ELSE '✓ Future partitions exist' END AS statusFROM latest_partition;Fragmentation design transforms theoretical knowledge into production-ready systems. Let's consolidate the key concepts:
What's Next:
With fragments distributed across sites, queries must access remote data. The final page in this module explores Reconstruction—the algorithms and optimization techniques for reassembling data from distributed fragments while minimizing network overhead and maximizing query performance.
You now understand fragmentation design at a practical, production-ready level—from requirements analysis through validation, allocation, and evolution. This knowledge enables you to design, implement, and operate fragmented database systems that meet performance, compliance, and operational requirements.