System Design (HLD)Data Modeling Fundamentals

Data Modeling Fundamentals

LevelIntermediate

Duration90 mins

TopicData Modeling Fundamentals

5 / 5

Modeling for Scale

From Prototype to Planet-Scale

A data model that works beautifully at 1,000 records can collapse spectacularly at 1,000,000,000. Modeling for scale is the practice of designing data structures that maintain performance, manageability, and clarity as systems grow by orders of magnitude.

This isn't premature optimization—it's architectural foresight. The choices made in your data model's first week determine whether scaling requires minor adjustments or complete rewrites years later.

The patterns in this page represent hard-won lessons from systems serving billions of users. They're not theoretical abstractions but practical techniques deployed at companies like Google, Amazon, Netflix, and Uber.

What You Will Learn

By the end of this page, you will understand scaling dimensions (read, write, storage), sharding strategies, time-series and hot spot handling, partition key design, growth anticipation patterns, and the trade-offs inherent in scaling data models.

Understanding Scale Dimensions

Scale is not a single dimension. A system can be read-heavy, write-heavy, storage-heavy, or some combination. Understanding which dimensions matter for your system determines which scaling strategies apply.

The Four Scaling Dimensions:

Scaling Dimensions and Strategies
Dimension	Symptoms at Limit	Primary Strategies
Read Scale	High CPU from query processing, slow response times, connection pool exhaustion	Read replicas, caching, CDN, query optimization
Write Scale	Write queue buildup, replica lag, transaction timeouts, lock contention	Sharding, write-ahead patterns, batching, async processing
Storage Scale	Disk space exhaustion, slow backups, long recovery times	Archiving, partitioning, compression, tiered storage
Connection Scale	Max connections reached, connection wait timeouts	Connection pooling, serverless databases, sharding

Identifying Your Dominant Dimension:

Most systems have one or two dominant scaling challenges:

Social networks: Read-heavy (feeds), write-heavy (posts, likes), connection-heavy (many users)
E-commerce: Read-heavy (product pages), spike-driven (sales events), storage-growing (order history)
IoT/Telemetry: Write-heavy (constant sensor data), storage-heavy (time-series retention)
SaaS B2B: Connection-heavy (many isolated tenants), storage-variable (per-tenant data volumes)

Your data model must be optimized for your specific scaling dimensions, not for generic 'scale.'

scale-analysis.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Quantifying scale dimensions for capacity planning
 
interface ScaleProfile {
  reads: {
    averagePerSecond: number;
    peakPerSecond: number;
    peakToAverageRatio: number;  // Higher = spikier
    latencyP99Target: number;     // milliseconds
  };
  writes: {
    averagePerSecond: number;
    peakPerSecond: number;
    averagePayloadBytes: number;
    durabilityRequirement: 'eventual' | 'immediate' | 'synchronous-replicated';
  };
  storage: {
    currentSizeGB: number;
    monthlyGrowthGB: number;
    retentionPolicy: {
      hotDataDays: number;
      warmDataDays: number;
      coldDataDays: number;
    };
  };
  connections: {
    concurrentConnections: number;
    connectionDurationMs: number;  // Average
    peakMultiplier: number;
  };
}
 
// Example: E-commerce platform profile
const ecommerceProfile: ScaleProfile = {
  reads: {
    averagePerSecond: 50000,
    peakPerSecond: 500000,    // 10x during sales
    peakToAverageRatio: 10,
    latencyP99Target: 100,
  },
  writes: {
    averagePerSecond: 1000,
    peakPerSecond: 10000,
    averagePayloadBytes: 2048,
    durabilityRequirement: 'immediate',
  },
  storage: {
    currentSizeGB: 500,
    monthlyGrowthGB: 20,
    retentionPolicy: {
      hotDataDays: 30,
      warmDataDays: 365,
      coldDataDays: 2555,  // 7 years for compliance
    },
  },
  connections: {
    concurrentConnections: 10000,
    connectionDurationMs: 50,
    peakMultiplier: 5,
  },
};
 
// Strategy: Read replicas + CDN for reads, modest sharding for writes

Measure Before Designing

Before making scale-driven decisions, instrument your system to know actual read/write ratios, query patterns, and growth rates. Assumptions about scale are often wrong. Data-driven decisions prevent over-engineering for theoretical problems.

Sharding Strategies

When a single database can't handle your workload, you shard—splitting data across multiple database instances. Sharding is the most powerful but also most complex scaling technique.

Horizontal vs. Vertical Sharding:

Horizontal Sharding: Same table spread across multiple databases by row. User 1-1000000 on shard A, 1000001-2000000 on shard B.
Vertical Sharding: Different tables on different databases. Users on DB1, Orders on DB2, Analytics on DB3.

Horizontal sharding addresses per-table scale limits; vertical sharding separates workloads with different characteristics.

sharding-approaches.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// RANGE-BASED SHARDING
// Partition by value ranges (e.g., user ID, date)
 
function getShardByRange(userId: number): string {
  if (userId < 1_000_000) return 'shard-1';
  if (userId < 2_000_000) return 'shard-2';
  if (userId < 3_000_000) return 'shard-3';
  return 'shard-4';
}
 
// Pros: Sequential data locality, easy range queries
// Cons: Uneven distribution (new users cluster on latest shard)
 
// ============================================
 
// HASH-BASED SHARDING
// Partition by hash of key for even distribution
 
function getShardByHash(userId: number, numShards: number): string {
  const hash = murmurhash(userId.toString());
  const shardIndex = hash % numShards;
  return `shard-${shardIndex + 1}`;
}
 
// Pros: Even distribution, no hot spots from sequential IDs
// Cons: Adding shards requires rehashing (data movement)
 
// ============================================
 
// CONSISTENT HASHING
// Minimize reshuffling when adding/removing shards
 
class ConsistentHashRing {
  private ring: Map<number, string> = new Map();
  private sortedHashes: number[] = [];
  private virtualNodes: number = 150;
 
  addShard(shardId: string) {
    for (let i = 0; i < this.virtualNodes; i++) {
      const hash = murmurhash(`${shardId}-vnode-${i}`);
      this.ring.set(hash, shardId);
      this.sortedHashes.push(hash);
    }
    this.sortedHashes.sort((a, b) => a - b);
  }
 
  getShard(key: string): string {
    const hash = murmurhash(key);
    // Find first hash >= key's hash (clockwise on ring)
    for (const nodeHash of this.sortedHashes) {
      if (nodeHash >= hash) {
        return this.ring.get(nodeHash)!;
      }
    }
    // Wrap around to first node
    return this.ring.get(this.sortedHashes[0])!;
  }
}
 
// Pros: Adding shard moves only 1/N of data (not all)
// Cons: More complex implementation, potential temporary imbalance
 
// ============================================
 
// DIRECTORY-BASED SHARDING
// Lookup table maps keys to shards
 
async function getShardByDirectory(userId: number): Promise<string> {
  const mapping = await redis.get(`user:${userId}:shard`);
  if (mapping) return mapping;
  
  // New user: assign to least-loaded shard
  const shardId = await getLeastLoadedShard();
  await redis.set(`user:${userId}:shard`, shardId);
  return shardId;
}
 
// Pros: Complete control, easy rebalancing
// Cons: Directory becomes single point of failure/bottleneck

Sharding Strategy Comparison
Strategy	Distribution	Resharding Difficulty	Best For
Range-based	Uneven (time/sequence bias)	Easy (update boundaries)	Time-series, logs, append-only data
Hash-based	Even	Hard (full reshuffle)	User data, general purpose
Consistent hashing	Even	Moderate (1/N moves)	Large scale, dynamic shard counts
Directory-based	Controlled	Easy (update directory)	Multi-tenant, precise control needed

Sharding Is Irreversible Complexity

Once you shard, cross-shard queries become expensive or impossible. JOINs across shards require application-level logic. Transactions across shards require distributed transaction protocols. Before sharding, exhaust vertical scaling, read replicas, caching, and query optimization.

Choosing the Right Partition Key

The partition key (or shard key) is the most critical decision in a sharded or distributed database. It determines data locality, query efficiency, and scalability headroom.

Key Selection Criteria:

Cardinality: The key should have many distinct values. A boolean (true/false) is useless as a partition key—everything goes to two shards.
Distribution: Values should be evenly distributed. Using 'country' as a partition key means the USA shard is 100x larger than the Luxembourg shard.
Query Alignment: The key should match your most common query patterns. If you always query by user_id, partition by user_id. Queries that don't include the partition key hit all shards.
Immutability: Keys should rarely change. Changing a partition key requires moving data between shards.

Good Partition Keys

•user_id: High cardinality, queries always filter by user
•tenant_id: Natural isolation for multi-tenant SaaS
•device_id: IoT systems where queries are per-device
•order_id + date: E-commerce with date-range queries
•compound keys: (customer_id, year) for balanced time-series

Bad Partition Keys

•status: Low cardinality (active/inactive/pending)
•timestamp alone: Hot spot on current time
•country/region: Highly uneven distribution
•email domain: @gmail.com creates massive shard
•sequential ID alone: All recent data on one shard

partition-key-examples.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// DynamoDB: Partition key design examples
 
// GOOD: High cardinality, even distribution
const userItemKey = {
  PK: `USER#${userId}`,    // Partition key: many users, even access
  SK: `PROFILE`,           // Sort key: type of data within user
};
 
// GOOD: Compound key for multi-tenant + time-series
const analyticsEventKey = {
  PK: `TENANT#${tenantId}#DATE#${dateString}`,
  SK: `EVENT#${timestamp}#${eventId}`,
};
// Each tenant-day is a partition; prevents single tenant from overwhelming
 
// BAD: Low cardinality partition key
const orderByStatusKey = {
  PK: `STATUS#${status}`,   // Only 5 possible values!
  SK: `ORDER#${orderId}`,
};
// "pending" partition becomes massive, others empty
 
// BAD: Time-only partition key
const eventByTimeKey = {
  PK: `HOUR#${hourTimestamp}`,  // Current hour is always hot
  SK: `EVENT#${eventId}`,
};
// Single partition handles all current writes (hot partition)
 
// SOLUTION: Add randomness or tenant to time-based keys
const balancedEventKey = {
  PK: `HOUR#${hourTimestamp}#SHARD#${eventId.hashCode() % 10}`,
  SK: `EVENT#${eventId}`,
};
// Spreads current-hour writes across 10 partitions

Analyze Before Committing

Before choosing a partition key, analyze your existing data. Calculate the distribution of candidate keys. Simulate query patterns against the proposed sharding. DynamoDB's NoSQL Workbench and similar tools help visualize partition heat maps.

Handling Hot Spots

Hot spots occur when data access is concentrated on a small subset of your data—and therefore a small subset of your shards or partitions. Even with good partition key choices, hot spots can emerge from:

Celebrity users: One user account with millions of followers
Viral content: Single post receiving millions of views
Temporal locality: All activity concentrated on 'today'
Skewed distributions: Power-law (Zipf) distributions in real-world data

Hot Spot Mitigation Strategies:

Hot Spot Solutions

•Write sharding (scatter): Append random suffix to hot keys, distributing writes. Read requires scatter-gather across suffixes.
•Caching hot reads: Put hot data in Redis/Memcached. Never hit the database for viral content.
•Aggregation batching: Batch increment operations (like counters) and write periodically, not per-event.
•Separate hot data: Move celebrity users or viral content to dedicated, over-provisioned storage.
•Rate limiting at key level: Limit writes per key to prevent any single key from overwhelming storage.

hot-spot-handling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
// WRITE SHARDING: Distribute a hot key across multiple partitions
 
const NUM_WRITE_SHARDS = 10;
 
// Write: Randomly distribute among shards
async function incrementViewCount(postId: string): Promise<void> {
  const shardIndex = Math.floor(Math.random() * NUM_WRITE_SHARDS);
  const shardKey = `post:${postId}:view_count:shard:${shardIndex}`;
  
  await redis.incr(shardKey);
}
 
// Read: Gather from all shards and sum
async function getViewCount(postId: string): Promise<number> {
  const shardKeys = Array.from({ length: NUM_WRITE_SHARDS }, (_, i) =>
    `post:${postId}:view_count:shard:${i}`
  );
  
  const counts = await redis.mget(shardKeys);
  return counts.reduce((sum, count) => sum + (parseInt(count) || 0), 0);
}
 
// ============================================
 
// BATCHING: Aggregate writes in memory, flush periodically
 
class CounterBatcher {
  private pending: Map<string, number> = new Map();
  private flushInterval: NodeJS.Timeout;
  
  constructor(flushIntervalMs: number = 1000) {
    this.flushInterval = setInterval(() => this.flush(), flushIntervalMs);
  }
  
  increment(key: string, amount: number = 1): void {
    const current = this.pending.get(key) || 0;
    this.pending.set(key, current + amount);
  }
  
  private async flush(): Promise<void> {
    if (this.pending.size === 0) return;
    
    const batch = this.pending;
    this.pending = new Map();
    
    // Single database write for all accumulated increments
    const pipeline = redis.pipeline();
    for (const [key, increment] of batch) {
      pipeline.incrby(key, increment);
    }
    await pipeline.exec();
  }
}
 
// Usage: 1000 views/sec becomes 1 write/sec with 1000 increment
const viewCounter = new CounterBatcher(1000);
viewCounter.increment('post:viral-post-id:views');  // Batched, not immediate
 
// ============================================
 
// CACHING: Never hit database for hot read paths
 
async function getPostWithViews(postId: string): Promise<Post> {
  const cacheKey = `post:${postId}`;
  
  // Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Cache miss: fetch from DB
  const post = await db.query('SELECT * FROM posts WHERE id = $1', [postId]);
  
  // Cache with short TTL for hot content
  const ttl = isHotContent(postId) ? 60 : 3600;  // 1 min for hot, 1 hour for normal
  await redis.setex(cacheKey, ttl, JSON.stringify(post));
  
  return post;
}

Hot Spots Are Inevitable

In any system with organic user behavior, some keys will be hotter than others. The goal isn't to eliminate hot spots but to detect them quickly and have mitigation playbooks ready. Monitor per-partition metrics and alert when any partition exceeds thresholds.

Time-Series Data Modeling

Time-series data—logs, metrics, events, IoT readings—has unique characteristics that require specialized modeling:

Append-heavy: Data is almost always inserted, rarely updated or deleted
Time-ordered access: Queries almost always filter by time range
Decay in value: Recent data is accessed frequently; old data is rarely accessed
High volume: Time-series can produce millions of events per second

Partitioning by Time:

time-series-partitioning.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- PostgreSQL: Time-based partitioning for events
 
CREATE TABLE events (
    event_id        UUID NOT NULL,
    event_time      TIMESTAMPTZ NOT NULL,
    event_type      VARCHAR(100) NOT NULL,
    user_id         UUID,
    payload         JSONB,
    PRIMARY KEY (event_id, event_time)
) PARTITION BY RANGE (event_time);
 
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Auto-create future partitions with pg_partman or cron
 
-- Benefits:
-- 1. Queries on date range only scan relevant partitions
-- 2. Dropping old data = DROP TABLE events_2023_01 (instant)
-- 3. Indexes are per-partition (smaller, faster)
-- 4. Maintenance (VACUUM, ANALYZE) can run per-partition
 
-- Query with partition pruning
EXPLAIN ANALYZE
SELECT * FROM events 
WHERE event_time >= '2024-02-01' AND event_time < '2024-02-15';
-- Only scans events_2024_02 partition
 
-- Retention: Drop partitions older than 90 days
DROP TABLE events_2023_10;
DROP TABLE events_2023_11;
-- Instant deletion, no row-by-row DELETE

Compound Partition Keys for Time-Series:

Pure time-based partitioning creates hot partitions (all writes to 'now'). Add a distribution dimension to spread writes.

compound-time-partitioning.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// DynamoDB: Compound partition key for time-series IoT data
 
// BAD: Time-only partition key
// All current readings go to same partition → hot partition
const badKey = {
  PK: `DATE#2024-02-15`,
  SK: `TIME#14:30:00#DEVICE#sensor-123`,
};
 
// GOOD: Device + time compound key
// Each device's data is separate; no hot partition
const goodKey = {
  PK: `DEVICE#sensor-123#DATE#2024-02-15`,
  SK: `TIME#14:30:00.000`,
};
 
// Query: Get all readings for device in date range
// KeyCondition: PK = 'DEVICE#sensor-123#DATE#2024-02-15'
// Returns all readings for that device-day, sorted by time
 
// ============================================
 
// For analytics: Aggregate time-series per time bucket
interface TimeSeriesAggregation {
  PK: string;       // METRIC#cpu_usage
  SK: string;       // BUCKET#2024-02-15T14:00#HOST#server-1
  min: number;
  max: number;
  sum: number;
  count: number;
  average: number;  // Computed: sum/count
}
 
// Store hourly aggregations instead of individual samples
// 1000 samples/second → 1 aggregate/hour = 3,600,000x reduction
 
async function recordMetric(host: string, metric: string, value: number) {
  const bucket = getCurrentHourBucket();  // e.g., "2024-02-15T14:00"
  
  await dynamodb.update({
    TableName: 'Metrics',
    Key: {
      PK: `METRIC#${metric}`,
      SK: `BUCKET#${bucket}#HOST#${host}`,
    },
    UpdateExpression: `
      SET #min = if_not_exists(#min, :val),
          #max = if_not_exists(#max, :val),
          #sum = if_not_exists(#sum, :zero) + :val,
          #count = if_not_exists(#count, :zero) + :one
      SET #min = IF(#min > :val, :val, #min)
      SET #max = IF(#max < :val, :val, #max)
    `,
    ExpressionAttributeNames: { '#min': 'min', '#max': 'max', '#sum': 'sum', '#count': 'count' },
    ExpressionAttributeValues: { ':val': value, ':zero': 0, ':one': 1 },
  });
}

Consider Time-Series Specialized Databases

For true time-series workloads, specialized databases like InfluxDB, TimescaleDB, or ClickHouse offer better compression, query languages for time-based analytics, and automatic rollup/retention. General-purpose databases can handle time-series but with more manual optimization.

Multi-Tenancy Patterns

SaaS systems serve multiple tenants (customers) from shared infrastructure. The data model must provide isolation, scalability, and fairness across tenants with vastly different sizes.

Multi-Tenancy Strategies:

Multi-Tenancy Isolation Levels
Strategy	Isolation	Scalability	Cost Efficiency
Database per tenant	Highest (complete isolation)	Limited (connection overhead)	Low (resource overhead per tenant)
Schema per tenant	High (schema isolation)	Moderate	Moderate
Shared tables (tenant_id column)	Low (row-level only)	Highest	Highest
Hybrid (small: shared, large: dedicated)	Variable	High	High

multi-tenancy-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- PATTERN 1: Shared tables with tenant_id
-- Most common for SaaS with many small-to-medium tenants
 
CREATE TABLE tenants (
    tenant_id       UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL,
    tier            VARCHAR(50) NOT NULL,  -- 'free', 'pro', 'enterprise'
    created_at      TIMESTAMPTZ DEFAULT NOW()
);
 
CREATE TABLE projects (
    project_id      UUID PRIMARY KEY,
    tenant_id       UUID NOT NULL REFERENCES tenants(tenant_id),
    name            VARCHAR(255) NOT NULL,
    -- ... other columns
);
 
-- CRITICAL: Index on tenant_id for every tenant-scoped table
CREATE INDEX idx_projects_tenant ON projects(tenant_id);
 
-- CRITICAL: Always filter by tenant_id in application layer
-- Use Row-Level Security (RLS) to enforce at database level
 
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY tenant_isolation ON projects
    FOR ALL
    USING (tenant_id = current_setting('app.current_tenant')::UUID);
 
-- Application sets tenant context on each request
SET app.current_tenant = 'tenant-uuid-here';
SELECT * FROM projects;  -- Only sees current tenant's projects
 
-- ============================================
 
-- PATTERN 2: Schema per tenant
-- Better isolation, manageable for hundreds of tenants
 
CREATE SCHEMA tenant_acme;
CREATE SCHEMA tenant_globex;
 
CREATE TABLE tenant_acme.projects (
    project_id      UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL
    -- No tenant_id needed; schema provides isolation
);
 
CREATE TABLE tenant_globex.projects (
    project_id      UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL
);
 
-- Application switches schema per request
SET search_path TO tenant_acme, public;
SELECT * FROM projects;  -- Queries tenant_acme.projects

Noisy Neighbor Prevention:

In shared-table multi-tenancy, one large tenant can impact others. Mitigation strategies include:

Rate limiting per tenant: Limit queries/second, storage, or compute per tenant at application layer
Query timeout by tier: Free tenants get 5s timeout; enterprise gets 60s
Resource pools: Separate connection pools for different tenant tiers
Throttling on hot partitions: DynamoDB's adaptive capacity helps but isn't magic

Don't Forget Tenant ID

The most dangerous bug in multi-tenant systems: queries that forget the tenant_id filter. This exposes one tenant's data to another. Use Row-Level Security, middleware that injects tenant filters, and integration tests that verify isolation. This is a security-critical concern.

Denormalization for Scale

At scale, the traditional normalization-denormalization trade-off shifts heavily toward denormalization. JOINs across shards are often impossible; cross-table queries become dramatically more expensive.

Scale-Driven Denormalization Patterns:

Denormalization Patterns for Scale

•Embedding: Store related data within the parent record (order contains items, post contains author info)
•Materialized aggregates: Store precomputed counts, sums, averages alongside source data
•Snapshotting: Store historical values at transaction time (order captures product price, not current price)
•Reverse indexes: Store data in multiple formats for different access patterns (user's followers AND user's following)
•Locality duplication: Duplicate data to ensure all queries can be satisfied from one shard

scale-denormalization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// PATTERN: Materialized aggregates for social platform
 
interface UserProfile {
  userId: string;
  name: string;
  bio: string;
  
  // Materialized aggregates (updated on every follow/unfollow)
  followerCount: number;
  followingCount: number;
  postCount: number;
  
  // Denormalized for display
  recentPosts: Array<{
    postId: string;
    preview: string;
    postedAt: Date;
  }>;
}
 
// On new follow
async function handleFollow(followerId: string, followeeId: string) {
  await db.transaction(async (tx) => {
    // Insert follow relationship
    await tx.insert('follows', { followerId, followeeId, createdAt: new Date() });
    
    // Update materialized counts
    await tx.increment('users', followerId, 'followingCount', 1);
    await tx.increment('users', followeeId, 'followerCount', 1);
  });
}
 
// ============================================
 
// PATTERN: Dual write for bidirectional relationship
 
interface FollowsData {
  // Forward index: Who does user follow?
  following: Map<string, string[]>;  // userId → [followeeIds]
  
  // Reverse index: Who follows user?
  followers: Map<string, string[]>;  // userId → [followerIds]
}
 
async function addFollow(followerId: string, followeeId: string) {
  // Write to both indexes (in transaction if possible)
  await Promise.all([
    redis.sadd(`user:${followerId}:following`, followeeId),
    redis.sadd(`user:${followeeId}:followers`, followerId),
  ]);
}
 
// Query: "Who do I follow?" → user:me:following (O(1) lookup, O(n) scan)
// Query: "Who follows me?" → user:me:followers (O(1) lookup, O(n) scan)
 
// ============================================
 
// PATTERN: Event-driven denormalization with CDC
 
interface OrderEvent {
  type: 'ORDER_CREATED' | 'ORDER_UPDATED';
  order: Order;
  timestamp: Date;
}
 
// Listen to order events, update denormalized views
async function handleOrderEvent(event: OrderEvent) {
  switch (event.type) {
    case 'ORDER_CREATED':
      // Update customer's order summary
      await updateCustomerOrderSummary(event.order.customerId, event.order);
      
      // Update product sales counts
      for (const item of event.order.items) {
        await incrementProductSalesCount(item.productId, item.quantity);
      }
      
      // Update daily revenue aggregate
      await updateDailyRevenue(event.order.createdAt, event.order.total);
      break;
  }
}

Event-Driven Consistency

At scale, synchronous denormalization updates become bottlenecks. Event-driven approaches (CDC, message queues) allow asynchronous updates with eventual consistency. This trades immediate consistency for throughput and resilience—an acceptable trade-off for most read-heavy workloads.

Archiving and Data Lifecycle

Data that grows forever eventually becomes unmanageable. Data lifecycle management is the practice of handling data differently based on age and access patterns.

The Hot/Warm/Cold Model:

Hot data: Recent, frequently accessed. Stored on fast storage with full indexing. (Last 30 days)
Warm data: Less recent, occasionally accessed. May be on slower storage with selective indexes. (30 days - 1 year)
Cold data: Historical, rarely accessed. Archived to cheap storage, possibly not indexed. (1+ years)
Frozen data: Deep archive, compliance/legal hold. Retrieved in hours, not seconds.

Data Lifecycle Storage Tiers
Tier	Access Pattern	Storage	Query Speed	Cost
Hot	Constant (100s/sec)	SSD, in-memory	< 50ms	$$$
Warm	Regular (10s/min)	SSD, standard disk	< 500ms	$$
Cold	Rare (per day/week)	S3/GCS, compressed	< 30s	$
Frozen	Almost never (compliance)	Glacier, Archive	Hours	¢

data-lifecycle.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- PATTERN: Time-based partitioning with automated archival
 
-- Main table for hot data (current month)
CREATE TABLE orders (
    order_id        UUID NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL,
    -- ... other columns
    PRIMARY KEY (order_id, created_at)
) PARTITION BY RANGE (created_at);
 
-- Current month partition (hot)
CREATE TABLE orders_2024_02 PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
 
-- Archive table for cold data (different storage or remote)
CREATE TABLE orders_archive (
    order_id        UUID NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL,
    archived_at     TIMESTAMPTZ DEFAULT NOW(),
    -- ... same columns, optionally compressed
    PRIMARY KEY (order_id)
) TABLESPACE archive_storage;  -- Slower, cheaper storage
 
-- Archival job: Move old partitions to archive
-- Run monthly via pg_cron or external scheduler
DO $$
DECLARE
    partition_name TEXT := 'orders_2023_11';  -- 3 months ago
BEGIN
    -- Move data to archive
    INSERT INTO orders_archive 
    SELECT *, NOW() as archived_at 
    FROM orders_2023_11;
    
    -- Drop the partition
    DROP TABLE orders_2023_11;
    
    -- Log the archival
    INSERT INTO archive_log (table_name, archived_at, row_count)
    SELECT 'orders_2023_11', NOW(), COUNT(*) FROM orders_archive WHERE archived_at > NOW() - INTERVAL '1 minute';
END $$;
 
-- Federated query: Search across hot and cold with UNION
CREATE VIEW orders_all AS
    SELECT order_id, created_at, /* columns */ 'hot' as tier FROM orders
    UNION ALL
    SELECT order_id, created_at, /* columns */ 'archive' as tier FROM orders_archive;

Automated Lifecycle Policies:

Cloud databases and storage often support declarative lifecycle policies:

DynamoDB TTL: Set expiration timestamp, items automatically deleted
S3 Lifecycle Rules: Move to Glacier after 90 days, delete after 7 years
PostgreSQL pg_partman: Automatically create/drop partitions based on age
TimescaleDB compression: Automatically compress chunks older than threshold

Compliance Drives Retention

Data retention policies are often driven by legal requirements (GDPR: right to deletion, SOX: 7-year financial records, HIPAA: 6-year medical records). Work with legal/compliance teams to define lifecycle rules that satisfy both operational and regulatory needs.

Summary: Modeling for Scale

Modeling for scale is the culmination of all data modeling principles, applied with awareness of growth trajectories and system limits.

Key Takeaways

•Understand your scale dimensions: Read-heavy, write-heavy, storage-heavy, connection-heavy. Different dimensions require different strategies.
•Choose sharding strategy carefully: Range, hash, consistent hashing, or directory-based. Once sharded, cross-shard operations become expensive or impossible.
•Partition key selection is critical: High cardinality, even distribution, query-aligned, immutable. Bad keys create hot spots and limit scalability.
•Hot spots are inevitable: Prepare with write sharding, caching, batching, and monitoring. Don't assume uniform access patterns.
•Time-series needs special handling: Partition by time, compound with distribution keys, consider rollups and specialized databases.
•Multi-tenancy requires isolation planning: Shared tables with RLS, schema-per-tenant, or database-per-tenant based on isolation and scale needs.
•Denormalization increases with scale: JOINs become impossible across shards. Embrace embedding, materialization, and event-driven synchronization.
•Manage data lifecycle: Hot/warm/cold tiering, automated archival, and TTL-based expiration keep active datasets manageable.

Module Complete:

You've now mastered the fundamentals of data modeling for system design. From entity-relationship basics through normalization trade-offs, access pattern optimization, safe schema evolution, and scaling techniques—you have the toolkit to design data structures for systems serving millions of users.

The next chapter continues the database journey with Database Replication & Partitioning, where we'll explore how to actually implement these scaling strategies at the infrastructure level.

Module Complete

Congratulations! You've completed the Data Modeling Fundamentals module. You now understand how to design data models that are not just logically correct, but operationally excellent—ready for the real-world demands of production systems at scale.

5 / 5

Loading learning content...

System Design (HLD)Data Modeling Fundamentals

Data Modeling Fundamentals

LevelIntermediate

Duration90 mins

TopicData Modeling Fundamentals

5 / 5

Modeling for Scale

From Prototype to Planet-Scale

What You Will Learn

Understanding Scale Dimensions

The Four Scaling Dimensions:

Scaling Dimensions and Strategies
Dimension	Symptoms at Limit	Primary Strategies
Read Scale	High CPU from query processing, slow response times, connection pool exhaustion	Read replicas, caching, CDN, query optimization
Write Scale	Write queue buildup, replica lag, transaction timeouts, lock contention	Sharding, write-ahead patterns, batching, async processing
Storage Scale	Disk space exhaustion, slow backups, long recovery times	Archiving, partitioning, compression, tiered storage
Connection Scale	Max connections reached, connection wait timeouts	Connection pooling, serverless databases, sharding

Identifying Your Dominant Dimension:

Most systems have one or two dominant scaling challenges:

Social networks: Read-heavy (feeds), write-heavy (posts, likes), connection-heavy (many users)
E-commerce: Read-heavy (product pages), spike-driven (sales events), storage-growing (order history)
IoT/Telemetry: Write-heavy (constant sensor data), storage-heavy (time-series retention)
SaaS B2B: Connection-heavy (many isolated tenants), storage-variable (per-tenant data volumes)

Your data model must be optimized for your specific scaling dimensions, not for generic 'scale.'

scale-analysis.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Quantifying scale dimensions for capacity planning
 
interface ScaleProfile {
  reads: {
    averagePerSecond: number;
    peakPerSecond: number;
    peakToAverageRatio: number;  // Higher = spikier
    latencyP99Target: number;     // milliseconds
  };
  writes: {
    averagePerSecond: number;
    peakPerSecond: number;
    averagePayloadBytes: number;
    durabilityRequirement: 'eventual' | 'immediate' | 'synchronous-replicated';
  };
  storage: {
    currentSizeGB: number;
    monthlyGrowthGB: number;
    retentionPolicy: {
      hotDataDays: number;
      warmDataDays: number;
      coldDataDays: number;
    };
  };
  connections: {
    concurrentConnections: number;
    connectionDurationMs: number;  // Average
    peakMultiplier: number;
  };
}
 
// Example: E-commerce platform profile
const ecommerceProfile: ScaleProfile = {
  reads: {
    averagePerSecond: 50000,
    peakPerSecond: 500000,    // 10x during sales
    peakToAverageRatio: 10,
    latencyP99Target: 100,
  },
  writes: {
    averagePerSecond: 1000,
    peakPerSecond: 10000,
    averagePayloadBytes: 2048,
    durabilityRequirement: 'immediate',
  },
  storage: {
    currentSizeGB: 500,
    monthlyGrowthGB: 20,
    retentionPolicy: {
      hotDataDays: 30,
      warmDataDays: 365,
      coldDataDays: 2555,  // 7 years for compliance
    },
  },
  connections: {
    concurrentConnections: 10000,
    connectionDurationMs: 50,
    peakMultiplier: 5,
  },
};
 
// Strategy: Read replicas + CDN for reads, modest sharding for writes

Measure Before Designing

Sharding Strategies

When a single database can't handle your workload, you shard—splitting data across multiple database instances. Sharding is the most powerful but also most complex scaling technique.

Horizontal vs. Vertical Sharding:

Horizontal Sharding: Same table spread across multiple databases by row. User 1-1000000 on shard A, 1000001-2000000 on shard B.
Vertical Sharding: Different tables on different databases. Users on DB1, Orders on DB2, Analytics on DB3.

Horizontal sharding addresses per-table scale limits; vertical sharding separates workloads with different characteristics.

sharding-approaches.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// RANGE-BASED SHARDING
// Partition by value ranges (e.g., user ID, date)
 
function getShardByRange(userId: number): string {
  if (userId < 1_000_000) return 'shard-1';
  if (userId < 2_000_000) return 'shard-2';
  if (userId < 3_000_000) return 'shard-3';
  return 'shard-4';
}
 
// Pros: Sequential data locality, easy range queries
// Cons: Uneven distribution (new users cluster on latest shard)
 
// ============================================
 
// HASH-BASED SHARDING
// Partition by hash of key for even distribution
 
function getShardByHash(userId: number, numShards: number): string {
  const hash = murmurhash(userId.toString());
  const shardIndex = hash % numShards;
  return `shard-${shardIndex + 1}`;
}
 
// Pros: Even distribution, no hot spots from sequential IDs
// Cons: Adding shards requires rehashing (data movement)
 
// ============================================
 
// CONSISTENT HASHING
// Minimize reshuffling when adding/removing shards
 
class ConsistentHashRing {
  private ring: Map<number, string> = new Map();
  private sortedHashes: number[] = [];
  private virtualNodes: number = 150;
 
  addShard(shardId: string) {
    for (let i = 0; i < this.virtualNodes; i++) {
      const hash = murmurhash(`${shardId}-vnode-${i}`);
      this.ring.set(hash, shardId);
      this.sortedHashes.push(hash);
    }
    this.sortedHashes.sort((a, b) => a - b);
  }
 
  getShard(key: string): string {
    const hash = murmurhash(key);
    // Find first hash >= key's hash (clockwise on ring)
    for (const nodeHash of this.sortedHashes) {
      if (nodeHash >= hash) {
        return this.ring.get(nodeHash)!;
      }
    }
    // Wrap around to first node
    return this.ring.get(this.sortedHashes[0])!;
  }
}
 
// Pros: Adding shard moves only 1/N of data (not all)
// Cons: More complex implementation, potential temporary imbalance
 
// ============================================
 
// DIRECTORY-BASED SHARDING
// Lookup table maps keys to shards
 
async function getShardByDirectory(userId: number): Promise<string> {
  const mapping = await redis.get(`user:${userId}:shard`);
  if (mapping) return mapping;
  
  // New user: assign to least-loaded shard
  const shardId = await getLeastLoadedShard();
  await redis.set(`user:${userId}:shard`, shardId);
  return shardId;
}
 
// Pros: Complete control, easy rebalancing
// Cons: Directory becomes single point of failure/bottleneck

Sharding Strategy Comparison
Strategy	Distribution	Resharding Difficulty	Best For
Range-based	Uneven (time/sequence bias)	Easy (update boundaries)	Time-series, logs, append-only data
Hash-based	Even	Hard (full reshuffle)	User data, general purpose
Consistent hashing	Even	Moderate (1/N moves)	Large scale, dynamic shard counts
Directory-based	Controlled	Easy (update directory)	Multi-tenant, precise control needed

Sharding Is Irreversible Complexity

Choosing the Right Partition Key

The partition key (or shard key) is the most critical decision in a sharded or distributed database. It determines data locality, query efficiency, and scalability headroom.

Key Selection Criteria:

Cardinality: The key should have many distinct values. A boolean (true/false) is useless as a partition key—everything goes to two shards.
Distribution: Values should be evenly distributed. Using 'country' as a partition key means the USA shard is 100x larger than the Luxembourg shard.
Query Alignment: The key should match your most common query patterns. If you always query by user_id, partition by user_id. Queries that don't include the partition key hit all shards.
Immutability: Keys should rarely change. Changing a partition key requires moving data between shards.

Good Partition Keys

•user_id: High cardinality, queries always filter by user
•tenant_id: Natural isolation for multi-tenant SaaS
•device_id: IoT systems where queries are per-device
•order_id + date: E-commerce with date-range queries
•compound keys: (customer_id, year) for balanced time-series

Bad Partition Keys

•status: Low cardinality (active/inactive/pending)
•timestamp alone: Hot spot on current time
•country/region: Highly uneven distribution
•email domain: @gmail.com creates massive shard
•sequential ID alone: All recent data on one shard

partition-key-examples.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// DynamoDB: Partition key design examples
 
// GOOD: High cardinality, even distribution
const userItemKey = {
  PK: `USER#${userId}`,    // Partition key: many users, even access
  SK: `PROFILE`,           // Sort key: type of data within user
};
 
// GOOD: Compound key for multi-tenant + time-series
const analyticsEventKey = {
  PK: `TENANT#${tenantId}#DATE#${dateString}`,
  SK: `EVENT#${timestamp}#${eventId}`,
};
// Each tenant-day is a partition; prevents single tenant from overwhelming
 
// BAD: Low cardinality partition key
const orderByStatusKey = {
  PK: `STATUS#${status}`,   // Only 5 possible values!
  SK: `ORDER#${orderId}`,
};
// "pending" partition becomes massive, others empty
 
// BAD: Time-only partition key
const eventByTimeKey = {
  PK: `HOUR#${hourTimestamp}`,  // Current hour is always hot
  SK: `EVENT#${eventId}`,
};
// Single partition handles all current writes (hot partition)
 
// SOLUTION: Add randomness or tenant to time-based keys
const balancedEventKey = {
  PK: `HOUR#${hourTimestamp}#SHARD#${eventId.hashCode() % 10}`,
  SK: `EVENT#${eventId}`,
};
// Spreads current-hour writes across 10 partitions

Analyze Before Committing

Handling Hot Spots

Celebrity users: One user account with millions of followers
Viral content: Single post receiving millions of views
Temporal locality: All activity concentrated on 'today'
Skewed distributions: Power-law (Zipf) distributions in real-world data

Hot Spot Mitigation Strategies:

Hot Spot Solutions

•Write sharding (scatter): Append random suffix to hot keys, distributing writes. Read requires scatter-gather across suffixes.
•Caching hot reads: Put hot data in Redis/Memcached. Never hit the database for viral content.
•Aggregation batching: Batch increment operations (like counters) and write periodically, not per-event.
•Separate hot data: Move celebrity users or viral content to dedicated, over-provisioned storage.
•Rate limiting at key level: Limit writes per key to prevent any single key from overwhelming storage.

hot-spot-handling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
// WRITE SHARDING: Distribute a hot key across multiple partitions
 
const NUM_WRITE_SHARDS = 10;
 
// Write: Randomly distribute among shards
async function incrementViewCount(postId: string): Promise<void> {
  const shardIndex = Math.floor(Math.random() * NUM_WRITE_SHARDS);
  const shardKey = `post:${postId}:view_count:shard:${shardIndex}`;
  
  await redis.incr(shardKey);
}
 
// Read: Gather from all shards and sum
async function getViewCount(postId: string): Promise<number> {
  const shardKeys = Array.from({ length: NUM_WRITE_SHARDS }, (_, i) =>
    `post:${postId}:view_count:shard:${i}`
  );
  
  const counts = await redis.mget(shardKeys);
  return counts.reduce((sum, count) => sum + (parseInt(count) || 0), 0);
}
 
// ============================================
 
// BATCHING: Aggregate writes in memory, flush periodically
 
class CounterBatcher {
  private pending: Map<string, number> = new Map();
  private flushInterval: NodeJS.Timeout;
  
  constructor(flushIntervalMs: number = 1000) {
    this.flushInterval = setInterval(() => this.flush(), flushIntervalMs);
  }
  
  increment(key: string, amount: number = 1): void {
    const current = this.pending.get(key) || 0;
    this.pending.set(key, current + amount);
  }
  
  private async flush(): Promise<void> {
    if (this.pending.size === 0) return;
    
    const batch = this.pending;
    this.pending = new Map();
    
    // Single database write for all accumulated increments
    const pipeline = redis.pipeline();
    for (const [key, increment] of batch) {
      pipeline.incrby(key, increment);
    }
    await pipeline.exec();
  }
}
 
// Usage: 1000 views/sec becomes 1 write/sec with 1000 increment
const viewCounter = new CounterBatcher(1000);
viewCounter.increment('post:viral-post-id:views');  // Batched, not immediate
 
// ============================================
 
// CACHING: Never hit database for hot read paths
 
async function getPostWithViews(postId: string): Promise<Post> {
  const cacheKey = `post:${postId}`;
  
  // Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Cache miss: fetch from DB
  const post = await db.query('SELECT * FROM posts WHERE id = $1', [postId]);
  
  // Cache with short TTL for hot content
  const ttl = isHotContent(postId) ? 60 : 3600;  // 1 min for hot, 1 hour for normal
  await redis.setex(cacheKey, ttl, JSON.stringify(post));
  
  return post;
}

Hot Spots Are Inevitable

Time-Series Data Modeling

Time-series data—logs, metrics, events, IoT readings—has unique characteristics that require specialized modeling:

Append-heavy: Data is almost always inserted, rarely updated or deleted
Time-ordered access: Queries almost always filter by time range
Decay in value: Recent data is accessed frequently; old data is rarely accessed
High volume: Time-series can produce millions of events per second

Partitioning by Time:

time-series-partitioning.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- PostgreSQL: Time-based partitioning for events
 
CREATE TABLE events (
    event_id        UUID NOT NULL,
    event_time      TIMESTAMPTZ NOT NULL,
    event_type      VARCHAR(100) NOT NULL,
    user_id         UUID,
    payload         JSONB,
    PRIMARY KEY (event_id, event_time)
) PARTITION BY RANGE (event_time);
 
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Auto-create future partitions with pg_partman or cron
 
-- Benefits:
-- 1. Queries on date range only scan relevant partitions
-- 2. Dropping old data = DROP TABLE events_2023_01 (instant)
-- 3. Indexes are per-partition (smaller, faster)
-- 4. Maintenance (VACUUM, ANALYZE) can run per-partition
 
-- Query with partition pruning
EXPLAIN ANALYZE
SELECT * FROM events 
WHERE event_time >= '2024-02-01' AND event_time < '2024-02-15';
-- Only scans events_2024_02 partition
 
-- Retention: Drop partitions older than 90 days
DROP TABLE events_2023_10;
DROP TABLE events_2023_11;
-- Instant deletion, no row-by-row DELETE

Compound Partition Keys for Time-Series:

Pure time-based partitioning creates hot partitions (all writes to 'now'). Add a distribution dimension to spread writes.

compound-time-partitioning.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// DynamoDB: Compound partition key for time-series IoT data
 
// BAD: Time-only partition key
// All current readings go to same partition → hot partition
const badKey = {
  PK: `DATE#2024-02-15`,
  SK: `TIME#14:30:00#DEVICE#sensor-123`,
};
 
// GOOD: Device + time compound key
// Each device's data is separate; no hot partition
const goodKey = {
  PK: `DEVICE#sensor-123#DATE#2024-02-15`,
  SK: `TIME#14:30:00.000`,
};
 
// Query: Get all readings for device in date range
// KeyCondition: PK = 'DEVICE#sensor-123#DATE#2024-02-15'
// Returns all readings for that device-day, sorted by time
 
// ============================================
 
// For analytics: Aggregate time-series per time bucket
interface TimeSeriesAggregation {
  PK: string;       // METRIC#cpu_usage
  SK: string;       // BUCKET#2024-02-15T14:00#HOST#server-1
  min: number;
  max: number;
  sum: number;
  count: number;
  average: number;  // Computed: sum/count
}
 
// Store hourly aggregations instead of individual samples
// 1000 samples/second → 1 aggregate/hour = 3,600,000x reduction
 
async function recordMetric(host: string, metric: string, value: number) {
  const bucket = getCurrentHourBucket();  // e.g., "2024-02-15T14:00"
  
  await dynamodb.update({
    TableName: 'Metrics',
    Key: {
      PK: `METRIC#${metric}`,
      SK: `BUCKET#${bucket}#HOST#${host}`,
    },
    UpdateExpression: `
      SET #min = if_not_exists(#min, :val),
          #max = if_not_exists(#max, :val),
          #sum = if_not_exists(#sum, :zero) + :val,
          #count = if_not_exists(#count, :zero) + :one
      SET #min = IF(#min > :val, :val, #min)
      SET #max = IF(#max < :val, :val, #max)
    `,
    ExpressionAttributeNames: { '#min': 'min', '#max': 'max', '#sum': 'sum', '#count': 'count' },
    ExpressionAttributeValues: { ':val': value, ':zero': 0, ':one': 1 },
  });
}

Consider Time-Series Specialized Databases

Multi-Tenancy Patterns

SaaS systems serve multiple tenants (customers) from shared infrastructure. The data model must provide isolation, scalability, and fairness across tenants with vastly different sizes.

Multi-Tenancy Strategies:

Multi-Tenancy Isolation Levels
Strategy	Isolation	Scalability	Cost Efficiency
Database per tenant	Highest (complete isolation)	Limited (connection overhead)	Low (resource overhead per tenant)
Schema per tenant	High (schema isolation)	Moderate	Moderate
Shared tables (tenant_id column)	Low (row-level only)	Highest	Highest
Hybrid (small: shared, large: dedicated)	Variable	High	High

multi-tenancy-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- PATTERN 1: Shared tables with tenant_id
-- Most common for SaaS with many small-to-medium tenants
 
CREATE TABLE tenants (
    tenant_id       UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL,
    tier            VARCHAR(50) NOT NULL,  -- 'free', 'pro', 'enterprise'
    created_at      TIMESTAMPTZ DEFAULT NOW()
);
 
CREATE TABLE projects (
    project_id      UUID PRIMARY KEY,
    tenant_id       UUID NOT NULL REFERENCES tenants(tenant_id),
    name            VARCHAR(255) NOT NULL,
    -- ... other columns
);
 
-- CRITICAL: Index on tenant_id for every tenant-scoped table
CREATE INDEX idx_projects_tenant ON projects(tenant_id);
 
-- CRITICAL: Always filter by tenant_id in application layer
-- Use Row-Level Security (RLS) to enforce at database level
 
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY tenant_isolation ON projects
    FOR ALL
    USING (tenant_id = current_setting('app.current_tenant')::UUID);
 
-- Application sets tenant context on each request
SET app.current_tenant = 'tenant-uuid-here';
SELECT * FROM projects;  -- Only sees current tenant's projects
 
-- ============================================
 
-- PATTERN 2: Schema per tenant
-- Better isolation, manageable for hundreds of tenants
 
CREATE SCHEMA tenant_acme;
CREATE SCHEMA tenant_globex;
 
CREATE TABLE tenant_acme.projects (
    project_id      UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL
    -- No tenant_id needed; schema provides isolation
);
 
CREATE TABLE tenant_globex.projects (
    project_id      UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL
);
 
-- Application switches schema per request
SET search_path TO tenant_acme, public;
SELECT * FROM projects;  -- Queries tenant_acme.projects

Noisy Neighbor Prevention:

In shared-table multi-tenancy, one large tenant can impact others. Mitigation strategies include:

Rate limiting per tenant: Limit queries/second, storage, or compute per tenant at application layer
Query timeout by tier: Free tenants get 5s timeout; enterprise gets 60s
Resource pools: Separate connection pools for different tenant tiers
Throttling on hot partitions: DynamoDB's adaptive capacity helps but isn't magic

Don't Forget Tenant ID

Denormalization for Scale

Scale-Driven Denormalization Patterns:

Denormalization Patterns for Scale

•Embedding: Store related data within the parent record (order contains items, post contains author info)
•Materialized aggregates: Store precomputed counts, sums, averages alongside source data
•Snapshotting: Store historical values at transaction time (order captures product price, not current price)
•Reverse indexes: Store data in multiple formats for different access patterns (user's followers AND user's following)
•Locality duplication: Duplicate data to ensure all queries can be satisfied from one shard

scale-denormalization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
// PATTERN: Materialized aggregates for social platform
 
interface UserProfile {
  userId: string;
  name: string;
  bio: string;
  
  // Materialized aggregates (updated on every follow/unfollow)
  followerCount: number;
  followingCount: number;
  postCount: number;
  
  // Denormalized for display
  recentPosts: Array<{
    postId: string;
    preview: string;
    postedAt: Date;
  }>;
}
 
// On new follow
async function handleFollow(followerId: string, followeeId: string) {
  await db.transaction(async (tx) => {
    // Insert follow relationship
    await tx.insert('follows', { followerId, followeeId, createdAt: new Date() });
    
    // Update materialized counts
    await tx.increment('users', followerId, 'followingCount', 1);
    await tx.increment('users', followeeId, 'followerCount', 1);
  });
}
 
// ============================================
 
// PATTERN: Dual write for bidirectional relationship
 
interface FollowsData {
  // Forward index: Who does user follow?
  following: Map<string, string[]>;  // userId → [followeeIds]
  
  // Reverse index: Who follows user?
  followers: Map<string, string[]>;  // userId → [followerIds]
}
 
async function addFollow(followerId: string, followeeId: string) {
  // Write to both indexes (in transaction if possible)
  await Promise.all([
    redis.sadd(`user:${followerId}:following`, followeeId),
    redis.sadd(`user:${followeeId}:followers`, followerId),
  ]);
}
 
// Query: "Who do I follow?" → user:me:following (O(1) lookup, O(n) scan)
// Query: "Who follows me?" → user:me:followers (O(1) lookup, O(n) scan)
 
// ============================================
 
// PATTERN: Event-driven denormalization with CDC
 
interface OrderEvent {
  type: 'ORDER_CREATED' | 'ORDER_UPDATED';
  order: Order;
  timestamp: Date;
}
 
// Listen to order events, update denormalized views
async function handleOrderEvent(event: OrderEvent) {
  switch (event.type) {
    case 'ORDER_CREATED':
      // Update customer's order summary
      await updateCustomerOrderSummary(event.order.customerId, event.order);
      
      // Update product sales counts
      for (const item of event.order.items) {
        await incrementProductSalesCount(item.productId, item.quantity);
      }
      
      // Update daily revenue aggregate
      await updateDailyRevenue(event.order.createdAt, event.order.total);
      break;
  }
}

Event-Driven Consistency

Archiving and Data Lifecycle

Data that grows forever eventually becomes unmanageable. Data lifecycle management is the practice of handling data differently based on age and access patterns.

The Hot/Warm/Cold Model:

Hot data: Recent, frequently accessed. Stored on fast storage with full indexing. (Last 30 days)
Warm data: Less recent, occasionally accessed. May be on slower storage with selective indexes. (30 days - 1 year)
Cold data: Historical, rarely accessed. Archived to cheap storage, possibly not indexed. (1+ years)
Frozen data: Deep archive, compliance/legal hold. Retrieved in hours, not seconds.

Data Lifecycle Storage Tiers
Tier	Access Pattern	Storage	Query Speed	Cost
Hot	Constant (100s/sec)	SSD, in-memory	< 50ms	$$$
Warm	Regular (10s/min)	SSD, standard disk	< 500ms	$$
Cold	Rare (per day/week)	S3/GCS, compressed	< 30s	$
Frozen	Almost never (compliance)	Glacier, Archive	Hours	¢

data-lifecycle.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- PATTERN: Time-based partitioning with automated archival
 
-- Main table for hot data (current month)
CREATE TABLE orders (
    order_id        UUID NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL,
    -- ... other columns
    PRIMARY KEY (order_id, created_at)
) PARTITION BY RANGE (created_at);
 
-- Current month partition (hot)
CREATE TABLE orders_2024_02 PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
 
-- Archive table for cold data (different storage or remote)
CREATE TABLE orders_archive (
    order_id        UUID NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL,
    archived_at     TIMESTAMPTZ DEFAULT NOW(),
    -- ... same columns, optionally compressed
    PRIMARY KEY (order_id)
) TABLESPACE archive_storage;  -- Slower, cheaper storage
 
-- Archival job: Move old partitions to archive
-- Run monthly via pg_cron or external scheduler
DO $$
DECLARE
    partition_name TEXT := 'orders_2023_11';  -- 3 months ago
BEGIN
    -- Move data to archive
    INSERT INTO orders_archive 
    SELECT *, NOW() as archived_at 
    FROM orders_2023_11;
    
    -- Drop the partition
    DROP TABLE orders_2023_11;
    
    -- Log the archival
    INSERT INTO archive_log (table_name, archived_at, row_count)
    SELECT 'orders_2023_11', NOW(), COUNT(*) FROM orders_archive WHERE archived_at > NOW() - INTERVAL '1 minute';
END $$;
 
-- Federated query: Search across hot and cold with UNION
CREATE VIEW orders_all AS
    SELECT order_id, created_at, /* columns */ 'hot' as tier FROM orders
    UNION ALL
    SELECT order_id, created_at, /* columns */ 'archive' as tier FROM orders_archive;

Automated Lifecycle Policies:

Cloud databases and storage often support declarative lifecycle policies:

DynamoDB TTL: Set expiration timestamp, items automatically deleted
S3 Lifecycle Rules: Move to Glacier after 90 days, delete after 7 years
PostgreSQL pg_partman: Automatically create/drop partitions based on age
TimescaleDB compression: Automatically compress chunks older than threshold

Compliance Drives Retention

Summary: Modeling for Scale

Modeling for scale is the culmination of all data modeling principles, applied with awareness of growth trajectories and system limits.

Key Takeaways

•Understand your scale dimensions: Read-heavy, write-heavy, storage-heavy, connection-heavy. Different dimensions require different strategies.
•Choose sharding strategy carefully: Range, hash, consistent hashing, or directory-based. Once sharded, cross-shard operations become expensive or impossible.
•Partition key selection is critical: High cardinality, even distribution, query-aligned, immutable. Bad keys create hot spots and limit scalability.
•Hot spots are inevitable: Prepare with write sharding, caching, batching, and monitoring. Don't assume uniform access patterns.
•Time-series needs special handling: Partition by time, compound with distribution keys, consider rollups and specialized databases.
•Multi-tenancy requires isolation planning: Shared tables with RLS, schema-per-tenant, or database-per-tenant based on isolation and scale needs.
•Denormalization increases with scale: JOINs become impossible across shards. Embrace embedding, materialization, and event-driven synchronization.
•Manage data lifecycle: Hot/warm/cold tiering, automated archival, and TTL-based expiration keep active datasets manageable.

Module Complete:

The next chapter continues the database journey with Database Replication & Partitioning, where we'll explore how to actually implement these scaling strategies at the infrastructure level.

Module Complete

5 / 5