Document Stores - Learning Module

Loading content...

0/273

Flexible Schemas: Freedom with Discipline

The Double-Edged Sword of Schema Flexibility

Document databases are often described as "schemaless"—but this is a dangerous oversimplification. MongoDB doesn't require you to declare a schema upfront, but your data still has a schema. It's just implicit in your application code rather than explicit in the database.

This flexibility is simultaneously the greatest strength and the most common source of pain in document database deployments:

Strength: Rapid iteration, easy schema evolution, polymorphic data, and no migration downtime
Pain: Data inconsistency, broken queries from unexpected fields, debugging nightmares, and technical debt

The difference between teams that thrive with document databases and those that struggle isn't the technology—it's discipline. This page teaches you to harness schema flexibility as a tool rather than suffer it as chaos.

What You Will Learn

By the end of this page, you will understand the schema-on-read paradigm and its implications, master schema evolution patterns that avoid data corruption, implement validation strategies that balance flexibility with integrity, and design polymorphic data models that handle real-world complexity.

Schema-on-Read vs Schema-on-Write

Traditional relational databases enforce schema-on-write: you declare the structure upfront, and the database rejects any data that doesn't conform. Document databases traditionally use schema-on-read: data is written as-is, and structure is interpreted when reading.

Understanding these paradigms is crucial for making informed database choices and avoiding common pitfalls.

Schema-on-Write (Relational)

•Structure declared in DDL statements
•Database validates data on insert/update
•Invalid data is rejected immediately
•Schema changes require migrations
•Guarantees: All data conforms to schema
•Cost: Friction on schema evolution

Schema-on-Read (Document)

•No upfront schema declaration required
•Application interprets data at read time
•Any document structure is accepted
•Schema changes are instant (new fields appear)
•Guarantees: None by default
•Cost: Application must handle variability

The Hidden Cost of Schema-on-Read

Schema-on-read shifts complexity from the database to the application. Every query, every data access must handle potential schema variations:

schema-on-read-handling.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Legacy documents might have different structures
// The application must handle ALL variations
 
async function getUserDisplayName(userId) {
  const user = await users.findOne({ _id: userId });
  
  // Version 1: Had a single "name" field
  if (typeof user.name === 'string') {
    return user.name;
  }
  
  // Version 2: Split into firstName + lastName
  if (user.firstName && user.lastName) {
    return `${user.firstName} ${user.lastName}`;
  }
  
  // Version 3: Nested name object
  if (user.name && typeof user.name === 'object') {
    return `${user.name.first} ${user.name.last}`;
  }
  
  // Version 4: Added displayName as override
  if (user.displayName) {
    return user.displayName;
  }
  
  // Fallback for corrupted/unknown data
  return user.email || 'Unknown User';
}
 
// This function handles 5+ schema variations
// Every new version adds complexity
// Bugs hide in edge cases between versions
// Testing is exponentially harder

The Schema Entropy Problem

Without discipline, document collections accumulate 'schema entropy'—documents created at different times, by different code versions, with varying structures. Over time, querying becomes a minefield. The solution isn't to abandon flexibility, but to manage it deliberately.

Modern Approach: Schema Flexibility with Validation

The best practice isn't pure schema-on-write or pure schema-on-read—it's flexible schema with validation. MongoDB supports JSON Schema validation that can be as strict or permissive as your domain requires:

schema-validation.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Create collection with JSON Schema validation
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "createdAt", "status"],
      properties: {
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
          description: "Valid email required"
        },
        name: {
          bsonType: "object",
          required: ["first", "last"],
          properties: {
            first: { bsonType: "string", minLength: 1 },
            last: { bsonType: "string", minLength: 1 },
            middle: { bsonType: "string" }  // Optional
          }
        },
        status: {
          enum: ["active", "inactive", "suspended", "deleted"],
          description: "Must be a valid status"
        },
        createdAt: {
          bsonType: "date"
        },
        metadata: {
          bsonType: "object",
          // Additional properties allowed - flexible section
          additionalProperties: true
        }
      },
      // Allow fields not specified above (partial flexibility)
      additionalProperties: true
    }
  },
  validationLevel: "moderate",  // Apply to inserts and updates
  validationAction: "error"      // Reject invalid documents
});
 
// Validation levels:
// "strict" - Validate all inserts and updates
// "moderate" - Only validate documents that already meet schema
 
// Validation actions:
// "error" - Reject documents that fail validation
// "warn" - Allow but log warning

Schema Evolution Patterns

Unlike relational databases where schema changes require DDL migrations and often downtime, document databases allow schema evolution without stopping the system. However, this freedom requires careful patterns to avoid data inconsistency.

Pattern 1: Additive Changes (Safe by Default)

Adding new fields is the safest schema change. Existing documents don't have the field; new documents do:

additive-changes.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Before: Simple user document
const oldUser = {
  _id: ObjectId("..."),
  email: "alice@example.com",
  name: "Alice"
};
 
// After: Add new optional field (phoneNumber)
const newUser = {
  _id: ObjectId("..."),
  email: "bob@example.com",
  name: "Bob",
  phoneNumber: "+1-555-0100"  // New field
};
 
// Application code handles missing field gracefully
function getUserPhone(user) {
  return user.phoneNumber || null;  // Simple null check
}
 
// Optionally backfill existing documents
await users.updateMany(
  { phoneNumber: { $exists: false } },
  { $set: { phoneNumber: null } }  // Explicit null for clarity
);

Pattern 2: Field Restructuring with Version Markers

When changing field structure (not just adding), use a schema version field to track which format each document uses:

version-markers.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Version 1: Original structure
const v1User = {
  _id: ObjectId("..."),
  schemaVersion: 1,
  name: "Alice Johnson",  // Single string
  email: "alice@example.com"
};
 
// Version 2: Split name into object
const v2User = {
  _id: ObjectId("..."),
  schemaVersion: 2,
  name: {
    first: "Bob",
    last: "Smith",
    middle: "William"  // New capability
  },
  email: "bob@example.com"
};
 
// Universal accessor function
function getFullName(user) {
  switch (user.schemaVersion) {
    case 1:
      return user.name;
    case 2:
      const n = user.name;
      return [n.first, n.middle, n.last].filter(Boolean).join(' ');
    default:
      // Handle unknown/missing version
      throw new Error(`Unknown schema version: ${user.schemaVersion}`);
  }
}
 
// Migration strategy: Lazy upgrade on read
async function getUserWithMigration(userId) {
  const user = await users.findOne({ _id: userId });
  
  if (user.schemaVersion === 1) {
    // Migrate on read
    const migrated = migrateV1ToV2(user);
    await users.replaceOne({ _id: userId }, migrated);
    return migrated;
  }
  
  return user;
}
 
function migrateV1ToV2(v1User) {
  const nameParts = v1User.name.split(' ');
  return {
    ...v1User,
    schemaVersion: 2,
    name: {
      first: nameParts[0] || '',
      last: nameParts[nameParts.length - 1] || '',
      middle: nameParts.length > 2 ? nameParts.slice(1, -1).join(' ') : undefined
    }
  };
}

Pattern 3: Expansion/Contraction (Safe Renames)

Renaming fields without a flag day requires a multi-phase approach:

expansion-contraction.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Goal: Rename "userName" to "username" (case change)
 
// Phase 1: EXPAND - Write to both fields
// Application writes to both old and new name
async function updateUser(userId, newUsername) {
  await users.updateOne(
    { _id: userId },
    { 
      $set: { 
        userName: newUsername,   // Old field (for old code)
        username: newUsername    // New field (for new code)
      } 
    }
  );
}
 
// Phase 2: MIGRATE - Backfill old documents
await users.updateMany(
  { username: { $exists: false } },          // Docs without new field
  [
    { $set: { username: "$userName" } }      // Copy from old field
  ]
);
 
// Phase 3: Verify - Check all documents have new field
const missing = await users.countDocuments({ username: { $exists: false } });
console.assert(missing === 0, "Migration incomplete!");
 
// Phase 4: CONTRACT - Stop writing to old field
async function updateUserV2(userId, newUsername) {
  await users.updateOne(
    { _id: userId },
    { $set: { username: newUsername } }      // Only new field
  );
}
 
// Phase 5: CLEANUP - Remove old field (optional, can defer)
await users.updateMany(
  { userName: { $exists: true } },
  { $unset: { userName: "" } }
);

Never Remove Fields Hastily

The expansion/contraction pattern ensures no code path breaks during migration. Old code reads the old field; new code reads the new field. Only after ALL code is updated and ALL documents are migrated should you remove the old field. This typically requires waiting for a full release cycle.

Schema Validation Strategies

Modern MongoDB deployments should use layered validation: database-level validation for critical constraints, and application-level validation for complex business rules.

Database-Level Validation (JSON Schema)

Use MongoDB's JSON Schema validator for structural integrity:

layered-validation.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Comprehensive JSON Schema validator
db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["customerId", "items", "status", "createdAt"],
      properties: {
        customerId: {
          bsonType: "objectId",
          description: "Customer reference is required"
        },
        items: {
          bsonType: "array",
          minItems: 1,
          description: "At least one item required",
          items: {
            bsonType: "object",
            required: ["productId", "quantity", "priceAtOrder"],
            properties: {
              productId: { bsonType: "objectId" },
              quantity: { 
                bsonType: "int", 
                minimum: 1,
                maximum: 999
              },
              priceAtOrder: {
                bsonType: "decimal",
                description: "Use Decimal128 for currency"
              },
              discount: {
                bsonType: "object",
                properties: {
                  type: { enum: ["percentage", "fixed"] },
                  value: { bsonType: "decimal" }
                }
              }
            }
          }
        },
        status: {
          enum: ["pending", "confirmed", "shipped", "delivered", "cancelled"],
          description: "Valid order status"
        },
        shipping: {
          bsonType: "object",
          properties: {
            method: { enum: ["standard", "express", "overnight"] },
            trackingNumber: { bsonType: "string" },
            address: {
              bsonType: "object",
              required: ["street", "city", "country"],
              properties: {
                street: { bsonType: "string" },
                city: { bsonType: "string" },
                state: { bsonType: "string" },
                postalCode: { bsonType: "string" },
                country: { bsonType: "string" }
              }
            }
          }
        },
        totals: {
          bsonType: "object",
          required: ["subtotal", "tax", "total"],
          properties: {
            subtotal: { bsonType: "decimal" },
            tax: { bsonType: "decimal" },
            shipping: { bsonType: "decimal" },
            discount: { bsonType: "decimal" },
            total: { bsonType: "decimal" }
          }
        },
        createdAt: { bsonType: "date" },
        updatedAt: { bsonType: "date" }
      }
    }
  },
  validationLevel: "strict",
  validationAction: "error"
});

Application-Level Validation

Complex business rules that can't be expressed in JSON Schema should be validated in the application:

application-validation.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import { z } from 'zod';
 
// Zod schema for application-level validation
const OrderItemSchema = z.object({
  productId: z.string().refine(isValidObjectId),
  quantity: z.number().int().min(1).max(999),
  priceAtOrder: z.number().positive(),
  discount: z.object({
    type: z.enum(['percentage', 'fixed']),
    value: z.number().nonnegative()
  }).optional()
});
 
const OrderSchema = z.object({
  customerId: z.string().refine(isValidObjectId),
  items: z.array(OrderItemSchema).min(1),
  status: z.enum(['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']),
  shipping: z.object({
    method: z.enum(['standard', 'express', 'overnight']),
    trackingNumber: z.string().optional(),
    address: z.object({
      street: z.string().min(1),
      city: z.string().min(1),
      state: z.string().optional(),
      postalCode: z.string(),
      country: z.string().length(2)  // ISO country code
    })
  }),
  createdAt: z.date(),
  updatedAt: z.date().optional()
}).refine(
  // Business rule: overnight shipping only for domestic orders
  (order) => {
    if (order.shipping.method === 'overnight') {
      return order.shipping.address.country === 'US';
    }
    return true;
  },
  { message: "Overnight shipping only available for US addresses" }
).refine(
  // Business rule: total discount can't exceed 50%
  (order) => {
    const totalDiscount = order.items.reduce((sum, item) => {
      if (item.discount?.type === 'percentage') {
        return sum + (item.priceAtOrder * item.quantity * item.discount.value / 100);
      }
      return sum + (item.discount?.value || 0);
    }, 0);
    const subtotal = order.items.reduce(
      (sum, item) => sum + item.priceAtOrder * item.quantity, 
      0
    );
    return totalDiscount <= subtotal * 0.5;
  },
  { message: "Total discount cannot exceed 50%" }
);
 
// Usage in service layer
async function createOrder(orderData: unknown) {
  // Application validation (business rules)
  const validatedOrder = OrderSchema.parse(orderData);
  
  // Insert (database validation is backup)
  const result = await orders.insertOne({
    ...validatedOrder,
    createdAt: new Date()
  });
  
  return result;
}

Defense in Depth

Use both layers: application validation catches issues early with better error messages, while database validation is the final safety net that prevents corruption if application validation is bypassed (direct DB access, bugs, or race conditions).

Polymorphic Data Modeling

One of document databases' genuine strengths is modeling polymorphic data—entities of the same type that have different structures. This is natural in document databases but awkward in relational schemas.

Example: Product Catalog with Varying Attributes

An e-commerce platform sells electronics, clothing, and food—each with completely different attributes:

polymorphic-products.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
// All products in the same collection with different structures
 
// Electronics product
const laptop = {
  _id: ObjectId("..."),
  type: "electronics",
  category: "computers",
  name: "Pro Laptop 15",
  brand: "TechCorp",
  price: 1299.00,
  
  // Electronics-specific attributes
  specifications: {
    processor: "Intel Core i7-12700H",
    ram: "16GB DDR5",
    storage: "512GB NVMe SSD",
    display: {
      size: "15.6 inches",
      resolution: "2560x1440",
      refreshRate: 165
    },
    battery: "72Wh",
    weight: "1.8kg"
  },
  warranty: {
    duration: 24,
    type: "manufacturer"
  }
};
 
// Clothing product
const shirt = {
  _id: ObjectId("..."),
  type: "clothing",
  category: "tops",
  name: "Classic Oxford Shirt",
  brand: "StyleCo",
  price: 59.99,
  
  // Clothing-specific attributes
  material: "100% Cotton",
  care: ["Machine wash cold", "Tumble dry low"],
  
  // Size/color variants
  variants: [
    { size: "S", color: "White", sku: "OXF-WH-S", stock: 23 },
    { size: "M", color: "White", sku: "OXF-WH-M", stock: 45 },
    { size: "L", color: "White", sku: "OXF-WH-L", stock: 12 },
    { size: "S", color: "Blue", sku: "OXF-BL-S", stock: 18 },
    // ... more variants
  ],
  
  fit: "Regular",
  measurements: {
    chest: { S: "36-38", M: "39-41", L: "42-44" }
  }
};
 
// Food product
const cereal = {
  _id: ObjectId("..."),
  type: "grocery",
  category: "breakfast",
  name: "Organic Oat Clusters",
  brand: "NatureFoods",
  price: 6.99,
  
  // Food-specific attributes
  nutrition: {
    servingSize: "55g",
    servingsPerContainer: 10,
    calories: 210,
    fatGrams: 4,
    carbsGrams: 42,
    proteinGrams: 6,
    fiberGrams: 5,
    sugarGrams: 9
  },
  
  ingredients: [
    "Whole Grain Oats",
    "Cane Sugar",
    "Sunflower Oil",
    "Honey"
  ],
  
  allergens: ["Contains: Wheat"],
  certifications: ["USDA Organic", "Non-GMO"],
  
  expiration: {
    shelfLife: 365,
    requiresRefrigeration: false
  }
};

Querying Polymorphic Collections

Queries can target common fields or type-specific fields:

polymorphic-queries.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Query common fields (works on all document shapes)
const expensiveProducts = await products.find({
  price: { $gte: 100 }
}).toArray();
 
// Type-specific queries
const highResLaptops = await products.find({
  type: "electronics",
  category: "computers",
  "specifications.display.resolution": "2560x1440"
});
 
const mediumBlueShirts = await products.find({
  type: "clothing",
  "variants": {
    $elemMatch: {
      size: "M",
      color: "Blue",
      stock: { $gt: 0 }
    }
  }
});
 
const organicCereals = await products.find({
  type: "grocery",
  certifications: "USDA Organic"
});
 
// Aggregation across types
const brandSales = await products.aggregate([
  { $group: { _id: "$brand", totalProducts: { $sum: 1 } } },
  { $sort: { totalProducts: -1 } }
]);

Index Strategy for Polymorphic Collections

Create partial indexes for type-specific queries: createIndex({ 'specifications.processor': 1 }, { partialFilterExpression: { type: 'electronics' } }). This keeps indexes small and focused. Always include the type field in frequently used compound indexes.

Pattern: Single Collection vs Multiple Collections

The polymorphic approach (single collection) works well when:

Entities share many common fields
Queries often span types (search across all products)
Type variations are manageable (under 10-15 types)

Consider separate collections when:

Entity structures are vastly different
Access patterns are completely separate
Schema validation rules differ significantly

Single vs Multiple Collection Trade-offs
Approach	Advantages	Disadvantages
Single Polymorphic Collection	Unified search, simpler aggregations, fewer joins	Complex validation, larger indexes, harder type-safety
Multiple Collections per Type	Clean separation, simple validation, focused indexes	Cross-type queries require $unionWith, more collections to manage
Hybrid: Base + Type Collections	Common data centralized, type-specific data separate	Requires joins ($lookup), complexity in keeping in sync

Managing Schema Changes in Production

Production schema changes in document databases require as much care as relational migrations—just with different techniques. Here's a systematic approach:

Step 1: Impact Analysis

Before any schema change, understand the blast radius:

impact-analysis.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// Analyze current field usage before changes
async function analyzeFieldUsage(collectionName, fieldPath) {
  const collection = db.collection(collectionName);
  const totalDocs = await collection.countDocuments();
  
  // Documents with field present
  const hasField = await collection.countDocuments({ 
    [fieldPath]: { $exists: true } 
  });
  
  // Documents with field null/undefined
  const hasNull = await collection.countDocuments({ 
    [fieldPath]: null 
  });
  
  // Sample distinct values
  const distinctValues = await collection.distinct(fieldPath);
  
  // Type distribution
  const typeDistribution = await collection.aggregate([
    { $group: { 
      _id: { $type: `$${fieldPath}` }, 
      count: { $sum: 1 } 
    }}
  ]).toArray();
  
  return {
    totalDocuments: totalDocs,
    documentsWithField: hasField,
    documentsWithNull: hasNull,
    fieldCoverage: (hasField / totalDocs * 100).toFixed(2) + '%',
    distinctValueCount: distinctValues.length,
    sampleValues: distinctValues.slice(0, 10),
    typeDistribution
  };
}
 
// Example usage
const analysis = await analyzeFieldUsage('users', 'preferences.theme');
// {
//   totalDocuments: 1000000,
//   documentsWithField: 450000,
//   documentsWithNull: 25000,
//   fieldCoverage: "45.00%",
//   distinctValueCount: 3,
//   sampleValues: ["dark", "light", "system"],
//   typeDistribution: [{ _id: "string", count: 450000 }]
// }

Step 2: Rolling Migration Strategy

For large collections, batch migrations prevent performance impact:

rolling-migration.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
async function rollingMigration(options) {
  const {
    collection,
    filter,
    update,
    batchSize = 1000,
    delayMs = 100,
    dryRun = true
  } = options;
 
  let totalProcessed = 0;
  let lastId = null;
 
  while (true) {
    // Build query for next batch
    const query = lastId 
      ? { ...filter, _id: { $gt: lastId } }
      : filter;
 
    // Fetch batch of documents
    const batch = await collection
      .find(query)
      .sort({ _id: 1 })
      .limit(batchSize)
      .toArray();
 
    if (batch.length === 0) break;
 
    // Process batch
    if (dryRun) {
      console.log(`[DRY RUN] Would update ${batch.length} documents`);
    } else {
      const ids = batch.map(doc => doc._id);
      await collection.updateMany(
        { _id: { $in: ids } },
        update
      );
    }
 
    totalProcessed += batch.length;
    lastId = batch[batch.length - 1]._id;
 
    // Progress logging
    if (totalProcessed % 10000 === 0) {
      console.log(`Processed ${totalProcessed} documents...`);
    }
 
    // Throttle to reduce load
    await new Promise(resolve => setTimeout(resolve, delayMs));
  }
 
  return { totalProcessed };
}
 
// Example: Add schemaVersion to all documents
await rollingMigration({
  collection: db.users,
  filter: { schemaVersion: { $exists: false } },
  update: { $set: { schemaVersion: 2 } },
  batchSize: 500,
  delayMs: 50,
  dryRun: false
});

Step 3: Validation Rule Updates

Update validation rules carefully in a specific order:

Safe Validation Update Sequence

•Relax first: If adding required fields, first update validation to make them optional
•Migrate data: Backfill all documents with the new field
•Verify: Confirm all documents meet new schema requirements
•Tighten: Update validation to make the field required

Never Tighten Before Migrating

If you add a required field to validation before migrating existing documents, all updates to those documents will fail validation. This can break your application in production. Always migrate first, then tighten validation.

Type Safety in Document Databases

Schema flexibility doesn't mean abandoning type safety. TypeScript and similar tools make your schema explicit in code, catching errors at compile time:

typed-mongodb.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import { ObjectId, Collection, Db } from 'mongodb';
 
// Define document types
interface BaseDocument {
  _id: ObjectId;
  createdAt: Date;
  updatedAt?: Date;
}
 
interface UserDocument extends BaseDocument {
  email: string;
  name: {
    first: string;
    last: string;
    middle?: string;
  };
  status: 'active' | 'inactive' | 'suspended';
  roles: string[];
  preferences?: {
    theme: 'light' | 'dark' | 'system';
    notifications: boolean;
  };
}
 
// Discriminated union for polymorphic products
interface BaseProduct extends BaseDocument {
  name: string;
  price: number;
  brand: string;
}
 
interface ElectronicsProduct extends BaseProduct {
  type: 'electronics';
  specifications: {
    processor?: string;
    ram?: string;
    storage?: string;
  };
  warranty: {
    duration: number;
    type: string;
  };
}
 
interface ClothingProduct extends BaseProduct {
  type: 'clothing';
  material: string;
  sizes: ('XS' | 'S' | 'M' | 'L' | 'XL')[];
  variants: Array<{
    size: string;
    color: string;
    sku: string;
    stock: number;
  }>;
}
 
type Product = ElectronicsProduct | ClothingProduct;
 
// Typed collection access
class DatabaseService {
  private db: Db;
  
  get users(): Collection<UserDocument> {
    return this.db.collection<UserDocument>('users');
  }
  
  get products(): Collection<Product> {
    return this.db.collection<Product>('products');
  }
  
  // Type-safe query methods
  async findUserByEmail(email: string): Promise<UserDocument | null> {
    return this.users.findOne({ email });
  }
  
  async findElectronics(): Promise<ElectronicsProduct[]> {
    // TypeScript knows this returns ElectronicsProduct[]
    return this.products.find({ 
      type: 'electronics' 
    }).toArray() as Promise<ElectronicsProduct[]>;
  }
  
  async updateUserPreferences(
    userId: ObjectId,
    preferences: UserDocument['preferences']
  ): Promise<void> {
    await this.users.updateOne(
      { _id: userId },
      { $set: { preferences, updatedAt: new Date() } }
    );
  }
}

Types as Documentation

TypeScript interfaces serve as living documentation of your schema. When the schema evolves, update the types first—TypeScript will then show you every code location that needs updating. This dramatically reduces bugs from schema changes.

Summary: Mastering Schema Flexibility

Schema flexibility is a powerful tool when used with discipline. The key is treating it as intentional design rather than letting chaos accumulate.

Key Takeaways

•Schema-on-read shifts complexity to application code — Your data still has shape; it's just implicit rather than explicit.
•Use schema version markers — Track document versions to handle multiple schema shapes in one collection.
•Expansion/contraction pattern for safe renames — Write to both fields, migrate, then remove old field.
•Layer validation: database + application — JSON Schema for structure, application code for business rules.
•Polymorphic modeling is a genuine strength — Different entity shapes in one collection works naturally.
•Production migrations require batching and throttling — Never run unthrottled bulk updates on production data.
•TypeScript makes schema explicit in code — Types serve as documentation and catch errors at compile time.
•Never tighten validation before migrating — Relax rules, migrate data, verify, then tighten.

What's Next:

With schema design patterns understood, we'll explore querying documents—MongoDB's powerful query language, aggregation pipelines, and how to design indexes that make queries fast at scale.

Page Complete

You now understand how to harness schema flexibility without succumbing to schema chaos. You can evolve schemas safely, implement layered validation, model polymorphic data, and maintain type safety in your application code. Next, we'll dive into MongoDB's query capabilities.