Dbms Applications - Learning Module

Loading content...

0/241

E-commerce Systems: Scaling to Billions of Transactions

The Scale of Modern E-commerce

On November 11, 2023—Singles' Day—Alibaba processed 583,000 orders per second at peak. Amazon handles over 1.6 million packages per day during regular operations, spiking to tens of millions during Prime Day. These numbers represent more than logistics achievements; they're database engineering marvels.

E-commerce presents a unique set of challenges for Database Management Systems. Unlike banking, where consistency is absolute, e-commerce must balance consistency with availability across globally distributed infrastructure. The customer in Tokyo browsing products at 2 AM must have the same seamless experience as the customer in New York at noon—despite these requests hitting entirely different data centers.

Learning Objectives

By the end of this page, you will understand: (1) How e-commerce databases differ from traditional transactional systems, (2) The architecture of product catalogs serving millions of SKUs, (3) Inventory management challenges and solutions, (4) How personalization engines leverage database technology, and (5) The CAP theorem tradeoffs e-commerce platforms make.

The E-commerce Database Landscape

E-commerce applications are inherently polyglot—they use multiple database types, each optimized for specific workload characteristics. A single product page view might query five or more different database systems:

Product information from a document database, pricing from a relational database, reviews from a separate review service, recommendations from a machine learning feature store, and inventory availability from a near-real-time cache.

This complexity isn't over-engineering; it's a necessary response to the diverse data access patterns e-commerce demands.

Database Types in E-commerce Architecture
Data Type	Database Choice	Why This Choice	Example Products
Product Catalog	Document DB (MongoDB, DynamoDB)	Schema flexibility for varied product attributes	Electronics have different attributes than clothing
Orders & Transactions	Relational (PostgreSQL, MySQL)	ACID transactions, complex joins	Order-items-payments-shipping relationships
User Sessions	Key-Value (Redis, Memcached)	Sub-millisecond access, automatic expiry	Shopping cart, authentication tokens
Search Index	Search Engine (Elasticsearch, Solr)	Full-text search, faceted navigation	"Blue running shoes size 10" search
Recommendations	Graph/ML Store (Neo4j, Redis ML)	Relationship traversal, vector similarity	"Customers also bought" suggestions
Analytics	Columnar (Redshift, BigQuery)	Aggregations over billions of events	Conversion funnels, revenue reports

The Right Tool for Each Job

The art of e-commerce database architecture lies not in choosing one database but in orchestrating many. Each database type excels at specific access patterns. Forcing a relational database to handle session storage or a document database to handle complex transactions leads to poor performance and engineering pain.

Product Catalog: The Heart of E-commerce

The product catalog is the most viewed and most complex data structure in e-commerce. A major retailer like Amazon lists over 350 million products, each with unique attributes, pricing rules, availability status, images, reviews, and relationships to other products.

The Challenge of Product Data:

Products have vastly different attribute schemas. A laptop has processor speed, RAM, and screen size. A dress has fabric, color, and size. A book has author, ISBN, and page count. A traditional relational schema would require either:

One giant table with hundreds of nullable columns (wasteful, confusing)
Entity-Attribute-Value (EAV) pattern (flexible but query nightmares)
Separate tables per category (maintenance nightmare at scale)

Relational Approach (EAV Pattern)

product_eav.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Traditional EAV approach
-- Requires complex joins for simple queries
 
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    category_id INT,
    base_price DECIMAL(10,2)
);
 
CREATE TABLE product_attributes (
    product_id INT REFERENCES products,
    attribute_name VARCHAR(100),
    attribute_value TEXT,
    PRIMARY KEY (product_id, attribute_name)
);
 
-- Query for laptops with 16GB RAM:
SELECT p.* 
FROM products p
JOIN product_attributes a1 ON p.product_id = a1.product_id
  AND a1.attribute_name = 'ram_gb'
  AND a1.attribute_value = '16'
WHERE p.category_id = 'laptops';
 
-- Adding more filters = more joins
-- Extremely slow at scale

Document Database Approach

product_document.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Document approach: Natural fit for products
// Each product has its own schema
 
// Laptop document
{
  "_id": "laptop-001",
  "name": "ThinkPad X1 Carbon",
  "category": "laptops",
  "basePrice": 1299.00,
  "attributes": {
    "processor": "Intel i7-1365U",
    "ramGb": 16,
    "storageTb": 1,
    "screenInches": 14,
    "resolution": "2880x1800"
  },
  "images": ["front.jpg", "side.jpg"],
  "variants": [
    {"ramGb": 8, "priceAdjust": -200},
    {"ramGb": 32, "priceAdjust": 300}
  ]
}
 
// Single query with natural filter:
db.products.find({
  "category": "laptops",
  "attributes.ramGb": 16
})

Why Document Databases Won the Product Catalog:

Schema flexibility: Each category can have different attributes without schema migrations
Denormalization by design: Product data is self-contained, minimizing joins
Hierarchical data: Variants, bundles, and nested attributes are natural
Read optimization: A single document fetch retrieves everything needed for display

The Tradeoff:

Document databases sacrifice some consistency guarantees and query flexibility. Cross-collection joins are expensive or impossible. Updates affecting multiple documents require careful application logic. But for read-heavy product catalog workloads, these tradeoffs are acceptable.

Inventory Management: The Hardest Problem

Inventory management is arguably the most challenging database problem in e-commerce. Unlike product catalogs (mostly reads) or user profiles (low contention), inventory is a hot spot—thousands of requests per second competing to modify the same records.

The Inventory Challenge Illustrated:

Imagine a flash sale: 10,000 customers try to buy the same limited-edition sneaker at exactly 9:00 AM. Only 500 pairs exist. The database must:

Prevent overselling (more than 500 orders)
Process requests fairly (first-come, first-served)
Respond within milliseconds (customers won't wait)
Handle request failures gracefully (network issues, payment failures)

The Overselling Disaster

Overselling damages brands severely. When customers receive emails saying 'Sorry, we sold you something we didn't have,' trust evaporates. Airlines, retailers, and ticketing companies have all faced lawsuits and PR disasters from overselling. Correct inventory management isn't just a technical problem—it's a business survival issue.

The Naive (Broken) Approach:

Many developers implement inventory like this:

broken_inventory.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# BROKEN: Race condition leads to overselling
 
def purchase_item(product_id, quantity):
    # Step 1: Check availability
    product = db.query(
        "SELECT inventory FROM products WHERE id = ?", 
        product_id
    )
    
    # Step 2: Verify sufficient inventory
    if product.inventory >= quantity:
        # Step 3: Decrement inventory
        db.execute(
            "UPDATE products SET inventory = inventory - ? WHERE id = ?",
            quantity, product_id
        )
        # Step 4: Create order
        create_order(product_id, quantity)
        return "Success"
    else:
        return "Out of stock"
 
# THE PROBLEM:
# Between steps 1-2 and step 3, another request can execute
# Two requests both see inventory = 500
# Both verify 1 <= 500 is True
# Both decrement: 500 - 1 = 499
# But we sold 2 items! Inventory should be 498
# 
# At scale: 1000 concurrent requests could ALL see 500
# All proceed: inventory becomes -500 (sold 1000 items we didn't have)

Order Processing: Where ACID Still Rules

While product catalogs use document databases and inventory uses distributed caches, order processing typically remains relational. Orders involve complex relationships (customer → order → items → payments → shipments) and require transactional guarantees that document databases can't provide.

The Order Data Model:

An order isn't just "customer X bought product Y." It's a complex entity with:

Multiple line items with individual pricing, discounts, and tax calculations
Payment split across multiple methods (gift card + credit card)
Shipping to potentially different addresses
Promotions and coupon applications
Tax calculations based on jurisdiction
Refund and return history
Fulfillment state machine (pending → processing → shipped → delivered)

order_schema.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- E-commerce order schema (simplified)
 
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES customers(customer_id),
    status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    subtotal DECIMAL(12, 2) NOT NULL,
    tax_amount DECIMAL(12, 2) NOT NULL,
    shipping_amount DECIMAL(12, 2) NOT NULL,
    total_amount DECIMAL(12, 2) NOT NULL,
    currency VARCHAR(3) NOT NULL DEFAULT 'USD',
    shipping_address_id UUID REFERENCES addresses(address_id),
    billing_address_id UUID REFERENCES addresses(address_id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    CONSTRAINT valid_status CHECK (status IN (
        'PENDING', 'CONFIRMED', 'PROCESSING', 
        'SHIPPED', 'DELIVERED', 'CANCELLED', 'REFUNDED'
    )),
    CONSTRAINT positive_amounts CHECK (
        subtotal >= 0 AND tax_amount >= 0 AND 
        shipping_amount >= 0 AND total_amount >= 0
    )
);
 
CREATE TABLE order_items (
    item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(order_id),
    product_id VARCHAR(50) NOT NULL,
    product_snapshot JSONB NOT NULL,  -- Denormalized product data at purchase time
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(12, 2) NOT NULL,
    discount_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    tax_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    line_total DECIMAL(12, 2) NOT NULL,
    fulfillment_status VARCHAR(20) NOT NULL DEFAULT 'PENDING'
);
 
CREATE TABLE order_payments (
    payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(order_id),
    payment_method VARCHAR(50) NOT NULL,  -- 'credit_card', 'gift_card', 'paypal'
    payment_reference VARCHAR(255),  -- External payment provider reference
    amount DECIMAL(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    processed_at TIMESTAMPTZ,
    
    CONSTRAINT valid_payment_status CHECK (status IN (
        'PENDING', 'AUTHORIZED', 'CAPTURED', 'FAILED', 'REFUNDED'
    ))
);
 
-- Indexes for common query patterns
CREATE INDEX idx_orders_customer ON orders(customer_id, created_at DESC);
CREATE INDEX idx_orders_status ON orders(status, created_at DESC);
CREATE INDEX idx_order_items_product ON order_items(product_id);

Product Snapshot Pattern

Notice the product_snapshot JSONB column. When an order is placed, we store a complete copy of product details at that moment. If the product's price, description, or images change later, historical orders still show what the customer actually purchased. This denormalization is intentional—order history must be immutable.

Search and Discovery: Beyond Traditional Databases

Product search is where traditional databases truly fall short. A customer searching for "comfortable blue running shoes for marathon" expects:

Fuzzy matching ("runing" → "running")
Synonym handling ("sneakers" = "running shoes")
Attribute filtering (color: blue, size: 10)
Relevance ranking (popular items first)
Personalization (show preferred brands)
Faceted navigation (filter by brand, price range, rating)
Sub-100ms response times across millions of products

No relational or document database can deliver all of this efficiently. E-commerce platforms use specialized search engines.

Converting Mermaid diagram...

The Search Technology Stack:

1. Elasticsearch/OpenSearch The dominant choice for e-commerce search. Built on Apache Lucene, providing:

Inverted index for full-text search
Near real-time indexing (products appear in search seconds after creation)
Distributed architecture for horizontal scaling
Rich query DSL for complex search requirements

2. Algolia SaaS search solution popular with mid-size retailers:

Instant results (as-you-type)
Automatic typo tolerance
Built-in personalization
Simpler integration, higher per-query cost

3. Vector Search (Emerging) Using embeddings for semantic search:

"Comfortable shoes for standing all day" matches products without exact keyword overlap
Product images searchable by visual similarity
Powers "more like this" recommendations

elasticsearch_query.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Elasticsearch query for e-commerce product search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "blue running shoes",
            "fields": ["name^3", "description", "brand^2", "category"],
            "type": "best_fields",
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "gte": 50, "lte": 200 } } },
        { "term": { "size": "10" } }
      ],
      "should": [
        { "term": { "featured": { "value": true, "boost": 2 } } },
        { "range": { "average_rating": { "gte": 4, "boost": 1.5 } } }
      ]
    }
  },
  "aggs": {
    "brands": { "terms": { "field": "brand.keyword", "size": 20 } },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 50 },
          { "from": 50, "to": 100 },
          { "from": 100, "to": 150 },
          { "from": 150 }
        ]
      }
    },
    "colors": { "terms": { "field": "color.keyword" } }
  },
  "sort": [
    { "_score": "desc" },
    { "sales_rank": "asc" }
  ]
}

Personalization: The Database-Driven Competitive Edge

Amazon attributes 35% of its revenue to recommendations. Netflix estimates its recommendation engine saves $1 billion annually by reducing churn. Personalization isn't a nice-to-have feature—it's a fundamental business driver, and it's entirely dependent on sophisticated database systems.

Database Requirements for Personalization

•Behavioral Event Streaming — Capture every click, view, search, add-to-cart, and purchase in real-time. This requires streaming databases (Kafka + ksqlDB) processing millions of events per second.
•User Profile Store — Aggregate behaviors into user profiles: preferences, segments, predicted interests. Often stored in key-value stores (Redis, DynamoDB) for sub-millisecond access.
•Product Relationships — Store computed similarities: 'users who viewed X also viewed Y', 'frequently bought together', 'similar products'. Graph databases or pre-computed similarity matrices.
•Feature Stores — ML models need features at prediction time. Feature stores (Feast, Tecton) provide consistent feature serving at low latency.
•A/B Testing Infrastructure — Track which recommendation algorithm each user sees and measure outcomes. Requires careful data design to enable causal inference.

Converting Mermaid diagram...

The Cold Start Problem

New users have no behavioral history—what do you recommend? Solutions include: collaborative filtering (recommend what similar users liked), content-based (recommend items similar to what little information exists), and popularity-based fallbacks. Each approach requires different database queries and data structures.

CAP Theorem Tradeoffs in E-commerce

The CAP theorem states that distributed systems can only guarantee two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable in distributed systems, the real choice is between consistency and availability during failures.

E-commerce makes different CAP choices for different data:

CAP Tradeoffs by E-commerce Domain
Domain	CAP Choice	Rationale	Implementation
Inventory	CP (Consistency)	Overselling is unacceptable; better to show 'unavailable' than sell what we don't have	Synchronous replication, pessimistic locking
Product Catalog	AP (Availability)	Showing slightly stale price is better than showing nothing; users can refresh	Eventual consistency, CDN caching
Shopping Cart	AP (Availability)	Users frustrated by cart errors; slight inconsistency rarely noticed	Session storage with async sync
Order Confirmation	CP (Consistency)	Order must be recorded correctly; user can wait for confirmation	Synchronous commit, distributed transaction
Reviews/Ratings	AP (Availability)	Eventual consistency fine; reviews don't need real-time consistency	Async indexing, eventual merge

Practical Example: Amazon's Approach

Amazon famously uses different consistency models simultaneously:

Shopping cart: Highly available, eventually consistent. If two browser tabs add items, both eventually appear (no item lost). Resolved by "always accept writes, merge on read."
Inventory during checkout: Strongly consistent. At the moment of purchase, inventory check is synchronous. If unavailable, order fails immediately.
Product pages: Cached aggressively (eventually consistent). A price change takes minutes to propagate to all edge caches. Acceptable for browsing; corrected at checkout.
Order history: Strongly consistent for writes (order placed = order recorded), eventually consistent for reads (new order might take seconds to appear in history).

This nuanced approach—different consistency guarantees for different data—is the hallmark of sophisticated e-commerce architecture.

The Price of Wrong Choices

Choosing CP for product catalog means customers see error pages during outages (lost sales). Choosing AP for payments means potential double-charges or lost orders (customer trust destruction). Understanding which consistency model applies to which data is a critical architectural skill.

Summary: DBMS in E-commerce

E-commerce represents a different DBMS challenge than banking. Rather than absolute consistency, e-commerce optimizes for global availability, sub-millisecond response times, and flexible data models. Let's consolidate the key insights:

Key Takeaways

•Polyglot persistence is essential — Different data types demand different database technologies: documents for products, relational for orders, search engines for discovery, caches for sessions.
•Product catalogs favor document databases — Schema flexibility and read optimization outweigh the loss of relational joins.
•Inventory is the hardest problem — Preventing overselling while maintaining performance requires sophisticated locking strategies or distributed allocation.
•Order processing remains relational — ACID transactions are non-negotiable when money changes hands.
•Search requires specialized engines — Full-text search, faceted navigation, and relevance ranking are beyond traditional database capabilities.
•Personalization drives revenue — Recommendation systems require real-time feature stores, behavioral streaming, and ML model serving infrastructure.
•CAP tradeoffs are contextual — Different data domains accept different consistency guarantees based on business impact.

Looking Ahead:

The next page explores DBMS applications in healthcare—a domain with unique requirements around data privacy, interoperability, and regulatory compliance. Unlike e-commerce, where the primary concern is performance at scale, healthcare databases must navigate complex privacy laws (HIPAA, GDPR) while enabling life-critical clinical workflows.

Page Complete

You now understand how e-commerce platforms leverage multiple database technologies to achieve global scale, sub-second response times, and personalized experiences. The key insight is that no single database fits all e-commerce needs—architectural success comes from thoughtfully combining specialized systems. Next, we'll explore the equally complex but very different world of healthcare databases.

E-commerce Systems: Scaling to Billions of Transactions

The Scale of Modern E-commerce

Learning Objectives

The E-commerce Database Landscape

This complexity isn't over-engineering; it's a necessary response to the diverse data access patterns e-commerce demands.

Database Types in E-commerce Architecture
Data Type	Database Choice	Why This Choice	Example Products
Product Catalog	Document DB (MongoDB, DynamoDB)	Schema flexibility for varied product attributes	Electronics have different attributes than clothing
Orders & Transactions	Relational (PostgreSQL, MySQL)	ACID transactions, complex joins	Order-items-payments-shipping relationships
User Sessions	Key-Value (Redis, Memcached)	Sub-millisecond access, automatic expiry	Shopping cart, authentication tokens
Search Index	Search Engine (Elasticsearch, Solr)	Full-text search, faceted navigation	"Blue running shoes size 10" search
Recommendations	Graph/ML Store (Neo4j, Redis ML)	Relationship traversal, vector similarity	"Customers also bought" suggestions
Analytics	Columnar (Redshift, BigQuery)	Aggregations over billions of events	Conversion funnels, revenue reports

The Right Tool for Each Job

Product Catalog: The Heart of E-commerce

The Challenge of Product Data:

One giant table with hundreds of nullable columns (wasteful, confusing)
Entity-Attribute-Value (EAV) pattern (flexible but query nightmares)
Separate tables per category (maintenance nightmare at scale)

Relational Approach (EAV Pattern)

product_eav.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Traditional EAV approach
-- Requires complex joins for simple queries
 
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    category_id INT,
    base_price DECIMAL(10,2)
);
 
CREATE TABLE product_attributes (
    product_id INT REFERENCES products,
    attribute_name VARCHAR(100),
    attribute_value TEXT,
    PRIMARY KEY (product_id, attribute_name)
);
 
-- Query for laptops with 16GB RAM:
SELECT p.* 
FROM products p
JOIN product_attributes a1 ON p.product_id = a1.product_id
  AND a1.attribute_name = 'ram_gb'
  AND a1.attribute_value = '16'
WHERE p.category_id = 'laptops';
 
-- Adding more filters = more joins
-- Extremely slow at scale

Document Database Approach

product_document.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Document approach: Natural fit for products
// Each product has its own schema
 
// Laptop document
{
  "_id": "laptop-001",
  "name": "ThinkPad X1 Carbon",
  "category": "laptops",
  "basePrice": 1299.00,
  "attributes": {
    "processor": "Intel i7-1365U",
    "ramGb": 16,
    "storageTb": 1,
    "screenInches": 14,
    "resolution": "2880x1800"
  },
  "images": ["front.jpg", "side.jpg"],
  "variants": [
    {"ramGb": 8, "priceAdjust": -200},
    {"ramGb": 32, "priceAdjust": 300}
  ]
}
 
// Single query with natural filter:
db.products.find({
  "category": "laptops",
  "attributes.ramGb": 16
})

Why Document Databases Won the Product Catalog:

Schema flexibility: Each category can have different attributes without schema migrations
Denormalization by design: Product data is self-contained, minimizing joins
Hierarchical data: Variants, bundles, and nested attributes are natural
Read optimization: A single document fetch retrieves everything needed for display

The Tradeoff:

Inventory Management: The Hardest Problem

The Inventory Challenge Illustrated:

Imagine a flash sale: 10,000 customers try to buy the same limited-edition sneaker at exactly 9:00 AM. Only 500 pairs exist. The database must:

Prevent overselling (more than 500 orders)
Process requests fairly (first-come, first-served)
Respond within milliseconds (customers won't wait)
Handle request failures gracefully (network issues, payment failures)

The Overselling Disaster

The Naive (Broken) Approach:

Many developers implement inventory like this:

broken_inventory.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# BROKEN: Race condition leads to overselling
 
def purchase_item(product_id, quantity):
    # Step 1: Check availability
    product = db.query(
        "SELECT inventory FROM products WHERE id = ?", 
        product_id
    )
    
    # Step 2: Verify sufficient inventory
    if product.inventory >= quantity:
        # Step 3: Decrement inventory
        db.execute(
            "UPDATE products SET inventory = inventory - ? WHERE id = ?",
            quantity, product_id
        )
        # Step 4: Create order
        create_order(product_id, quantity)
        return "Success"
    else:
        return "Out of stock"
 
# THE PROBLEM:
# Between steps 1-2 and step 3, another request can execute
# Two requests both see inventory = 500
# Both verify 1 <= 500 is True
# Both decrement: 500 - 1 = 499
# But we sold 2 items! Inventory should be 498
# 
# At scale: 1000 concurrent requests could ALL see 500
# All proceed: inventory becomes -500 (sold 1000 items we didn't have)

Order Processing: Where ACID Still Rules

The Order Data Model:

An order isn't just "customer X bought product Y." It's a complex entity with:

Multiple line items with individual pricing, discounts, and tax calculations
Payment split across multiple methods (gift card + credit card)
Shipping to potentially different addresses
Promotions and coupon applications
Tax calculations based on jurisdiction
Refund and return history
Fulfillment state machine (pending → processing → shipped → delivered)

order_schema.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- E-commerce order schema (simplified)
 
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES customers(customer_id),
    status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    subtotal DECIMAL(12, 2) NOT NULL,
    tax_amount DECIMAL(12, 2) NOT NULL,
    shipping_amount DECIMAL(12, 2) NOT NULL,
    total_amount DECIMAL(12, 2) NOT NULL,
    currency VARCHAR(3) NOT NULL DEFAULT 'USD',
    shipping_address_id UUID REFERENCES addresses(address_id),
    billing_address_id UUID REFERENCES addresses(address_id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    CONSTRAINT valid_status CHECK (status IN (
        'PENDING', 'CONFIRMED', 'PROCESSING', 
        'SHIPPED', 'DELIVERED', 'CANCELLED', 'REFUNDED'
    )),
    CONSTRAINT positive_amounts CHECK (
        subtotal >= 0 AND tax_amount >= 0 AND 
        shipping_amount >= 0 AND total_amount >= 0
    )
);
 
CREATE TABLE order_items (
    item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(order_id),
    product_id VARCHAR(50) NOT NULL,
    product_snapshot JSONB NOT NULL,  -- Denormalized product data at purchase time
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(12, 2) NOT NULL,
    discount_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    tax_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    line_total DECIMAL(12, 2) NOT NULL,
    fulfillment_status VARCHAR(20) NOT NULL DEFAULT 'PENDING'
);
 
CREATE TABLE order_payments (
    payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(order_id),
    payment_method VARCHAR(50) NOT NULL,  -- 'credit_card', 'gift_card', 'paypal'
    payment_reference VARCHAR(255),  -- External payment provider reference
    amount DECIMAL(12, 2) NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    processed_at TIMESTAMPTZ,
    
    CONSTRAINT valid_payment_status CHECK (status IN (
        'PENDING', 'AUTHORIZED', 'CAPTURED', 'FAILED', 'REFUNDED'
    ))
);
 
-- Indexes for common query patterns
CREATE INDEX idx_orders_customer ON orders(customer_id, created_at DESC);
CREATE INDEX idx_orders_status ON orders(status, created_at DESC);
CREATE INDEX idx_order_items_product ON order_items(product_id);

Product Snapshot Pattern

Search and Discovery: Beyond Traditional Databases

Product search is where traditional databases truly fall short. A customer searching for "comfortable blue running shoes for marathon" expects:

Fuzzy matching ("runing" → "running")
Synonym handling ("sneakers" = "running shoes")
Attribute filtering (color: blue, size: 10)
Relevance ranking (popular items first)
Personalization (show preferred brands)
Faceted navigation (filter by brand, price range, rating)
Sub-100ms response times across millions of products

No relational or document database can deliver all of this efficiently. E-commerce platforms use specialized search engines.

Converting Mermaid diagram...

The Search Technology Stack:

1. Elasticsearch/OpenSearch The dominant choice for e-commerce search. Built on Apache Lucene, providing:

Inverted index for full-text search
Near real-time indexing (products appear in search seconds after creation)
Distributed architecture for horizontal scaling
Rich query DSL for complex search requirements

2. Algolia SaaS search solution popular with mid-size retailers:

Instant results (as-you-type)
Automatic typo tolerance
Built-in personalization
Simpler integration, higher per-query cost

3. Vector Search (Emerging) Using embeddings for semantic search:

"Comfortable shoes for standing all day" matches products without exact keyword overlap
Product images searchable by visual similarity
Powers "more like this" recommendations

elasticsearch_query.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Elasticsearch query for e-commerce product search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "blue running shoes",
            "fields": ["name^3", "description", "brand^2", "category"],
            "type": "best_fields",
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "price": { "gte": 50, "lte": 200 } } },
        { "term": { "size": "10" } }
      ],
      "should": [
        { "term": { "featured": { "value": true, "boost": 2 } } },
        { "range": { "average_rating": { "gte": 4, "boost": 1.5 } } }
      ]
    }
  },
  "aggs": {
    "brands": { "terms": { "field": "brand.keyword", "size": 20 } },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 50 },
          { "from": 50, "to": 100 },
          { "from": 100, "to": 150 },
          { "from": 150 }
        ]
      }
    },
    "colors": { "terms": { "field": "color.keyword" } }
  },
  "sort": [
    { "_score": "desc" },
    { "sales_rank": "asc" }
  ]
}

Personalization: The Database-Driven Competitive Edge

Database Requirements for Personalization

•Behavioral Event Streaming — Capture every click, view, search, add-to-cart, and purchase in real-time. This requires streaming databases (Kafka + ksqlDB) processing millions of events per second.
•User Profile Store — Aggregate behaviors into user profiles: preferences, segments, predicted interests. Often stored in key-value stores (Redis, DynamoDB) for sub-millisecond access.
•Product Relationships — Store computed similarities: 'users who viewed X also viewed Y', 'frequently bought together', 'similar products'. Graph databases or pre-computed similarity matrices.
•Feature Stores — ML models need features at prediction time. Feature stores (Feast, Tecton) provide consistent feature serving at low latency.
•A/B Testing Infrastructure — Track which recommendation algorithm each user sees and measure outcomes. Requires careful data design to enable causal inference.

Converting Mermaid diagram...

The Cold Start Problem

CAP Theorem Tradeoffs in E-commerce

E-commerce makes different CAP choices for different data:

CAP Tradeoffs by E-commerce Domain
Domain	CAP Choice	Rationale	Implementation
Inventory	CP (Consistency)	Overselling is unacceptable; better to show 'unavailable' than sell what we don't have	Synchronous replication, pessimistic locking
Product Catalog	AP (Availability)	Showing slightly stale price is better than showing nothing; users can refresh	Eventual consistency, CDN caching
Shopping Cart	AP (Availability)	Users frustrated by cart errors; slight inconsistency rarely noticed	Session storage with async sync
Order Confirmation	CP (Consistency)	Order must be recorded correctly; user can wait for confirmation	Synchronous commit, distributed transaction
Reviews/Ratings	AP (Availability)	Eventual consistency fine; reviews don't need real-time consistency	Async indexing, eventual merge

Practical Example: Amazon's Approach

Amazon famously uses different consistency models simultaneously:

Shopping cart: Highly available, eventually consistent. If two browser tabs add items, both eventually appear (no item lost). Resolved by "always accept writes, merge on read."
Inventory during checkout: Strongly consistent. At the moment of purchase, inventory check is synchronous. If unavailable, order fails immediately.
Product pages: Cached aggressively (eventually consistent). A price change takes minutes to propagate to all edge caches. Acceptable for browsing; corrected at checkout.
Order history: Strongly consistent for writes (order placed = order recorded), eventually consistent for reads (new order might take seconds to appear in history).

This nuanced approach—different consistency guarantees for different data—is the hallmark of sophisticated e-commerce architecture.

The Price of Wrong Choices

Summary: DBMS in E-commerce

Key Takeaways

•Polyglot persistence is essential — Different data types demand different database technologies: documents for products, relational for orders, search engines for discovery, caches for sessions.
•Product catalogs favor document databases — Schema flexibility and read optimization outweigh the loss of relational joins.
•Inventory is the hardest problem — Preventing overselling while maintaining performance requires sophisticated locking strategies or distributed allocation.
•Order processing remains relational — ACID transactions are non-negotiable when money changes hands.
•Search requires specialized engines — Full-text search, faceted navigation, and relevance ranking are beyond traditional database capabilities.
•Personalization drives revenue — Recommendation systems require real-time feature stores, behavioral streaming, and ML model serving infrastructure.
•CAP tradeoffs are contextual — Different data domains accept different consistency guarantees based on business impact.

Looking Ahead:

Page Complete