Loading content...
Having understood what lost updates are, seen them manifest in real scenarios, and examined the depth of corruption they cause, a natural question arises: How do we prevent them?
Prevention of lost updates is not optional for any system that modifies shared data concurrently. The silent, cumulative, and trust-eroding nature of the problem means that "waiting until it becomes an issue" is not a viable strategy. By the time lost updates manifest visibly, significant damage has already occurred.
This page explores why prevention is essential, the layers at which prevention can occur, and the primary mechanisms used in production systems. We'll develop a framework for selecting appropriate prevention strategies based on specific application requirements.
Prevention and detection are complementary but not equivalent. Detection identifies corruption after it occurs—useful for forensics but the damage is done. Prevention stops corruption before it happens. For data integrity, prevention is always the primary strategy; detection is a safety net.
Some engineering problems allow "monitor and react" approaches—you observe the system, and when issues arise, you address them. Lost updates do not permit this approach for several fundamental reasons:
1. Corruption is Silent
Lost updates produce no errors, warnings, or alerts. You cannot "react" to something you cannot detect. By the time you realize data is wrong (through audits, customer complaints, or invariant violations), the corruption has already propagated.
2. Damage is Cumulative
Each lost update adds to the total corruption. A system running for months without prevention accumulates drift that becomes increasingly difficult to correct. There is no steady state where "some" lost updates are acceptable.
3. Correction is Often Impossible
Once a counter shows 9,800 instead of 10,000, how do you determine which 200 increments were lost? Which specific customers' transactions were affected? Often, the information needed to correct corruption was never recorded because the system "didn't know" objects were being lost.
4. Trust Damage is Irreversible
When users discover that a system produces incorrect data, their trust is damaged. Even after fixing the root cause, the memory of "that time the numbers were wrong" persists. Trust is easy to lose and hard to rebuild.
In reliability engineering, the axiom is: 'If it can fail silently, it will fail silently at the worst possible time.' Lost updates are the textbook example. Prevention is not gold-plating or over-engineering—it is the baseline for professional database application development.
Lost update prevention can be implemented at multiple layers of the system stack. Each layer offers different tradeoffs between control, performance, and complexity.
| Layer | Mechanism | Advantages | Disadvantages |
|---|---|---|---|
| Database Engine | Isolation levels, internal locking | Transparent to application; well-tested | May not prevent all cases; can reduce concurrency |
| SQL Statement | Atomic operations, locking hints | Precise control; clear semantics | Requires developer discipline; database-specific |
| Transaction | Explicit locking, serializable isolation | Guarantees correctness; composable | Can cause deadlocks; reduces throughput |
| Application | Optimistic locking, version checks | Full control; works across databases | Complex to implement correctly; easy to bypass |
| Architecture | Serializing queues, event sourcing | Eliminates concurrency at source | Major architectural change; may not suit all workloads |
Layer Selection Principles:
Prefer Lower Layers: When possible, rely on database-level prevention. The database engine is tested, optimized, and handles edge cases you might not anticipate.
Move Up When Necessary: Application-level prevention becomes necessary when:
Combine Layers for Defense in Depth: Production systems often use multiple prevention layers—database serializable isolation with application-level version checking provides redundant protection.
Pessimistic locking assumes that conflicts are likely and prevents them by acquiring locks before accessing data. The philosophy is: "Assume the worst—lock early and hold until done."
How It Prevents Lost Updates:
By acquiring an exclusive lock on a row before reading it, a transaction ensures that no other transaction can read or modify that row until the lock is released. This serializes access and eliminates the read-modify-write race condition.
123456789101112131415161718192021222324252627
-- Pessimistic locking example: Bank withdrawal-- Using SELECT ... FOR UPDATE to acquire an exclusive lock BEGIN TRANSACTION; -- Acquire exclusive lock on the row while reading-- Other transactions attempting to SELECT FOR UPDATE, UPDATE, or DELETE-- will block until this transaction commits or rolls backSELECT balance FROM accounts WHERE account_id = 'ACC-7821' FOR UPDATE; -- Now we have exclusive access - no other transaction can interfere-- Read balance: $500 -- Check sufficient funds-- (Application logic - if balance < withdrawal_amount, abort) -- Perform the update safelyUPDATE accounts SET balance = balance - 100 WHERE account_id = 'ACC-7821'; COMMIT; -- Lock is released on commit-- Now other transactions can proceedTimeline with Pessimistic Locking:
| Time | T₁ (Withdrawal $100) | T₂ (Withdrawal $400) | Locks Held | Balance |
|---|---|---|---|---|
| t₀ | BEGIN | BEGIN | None | $500 |
| t₁ | SELECT FOR UPDATE → $500 | — | T₁ holds X-lock on row | $500 |
| t₂ | — | SELECT FOR UPDATE → BLOCKED | T₁ holds X-lock | $500 |
| t₃ | UPDATE balance = 400 | WAITING... | T₁ holds X-lock | $400 |
| t₄ | COMMIT (releases lock) | WAITING... | None | $400 |
| t₅ | — | UNBLOCKED: SELECT → $400 | T₂ holds X-lock | $400 |
| t₆ | — | UPDATE balance = 0 | T₂ holds X-lock | $0 |
| t₇ | — | COMMIT | None | $0 ✓ |
T₂ was blocked until T₁ completed. When T₂ finally read the balance, it saw the correct post-T₁ value ($400) and computed the correct final value ($0). No update was lost.
Optimistic locking (also called optimistic concurrency control) assumes that conflicts are rare and checks for conflicts only at write time. The philosophy is: "Assume the best—check before committing."
How It Works:
1234567891011121314151617181920212223242526272829303132
-- Optimistic locking using version number-- Schema includes a version columnCREATE TABLE accounts ( account_id VARCHAR(50) PRIMARY KEY, balance DECIMAL(15, 2) NOT NULL, version INT NOT NULL DEFAULT 1, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Transaction T1: Withdraw $100BEGIN TRANSACTION; -- Read balance AND versionSELECT balance, version FROM accounts WHERE account_id = 'ACC-7821';-- Returns: balance = $500, version = 1 -- Application computes new balance (no lock held)-- new_balance = 500 - 100 = 400 -- Attempt update with version checkUPDATE accounts SET balance = 400, version = version + 1, updated_at = NOW()WHERE account_id = 'ACC-7821' AND version = 1; -- Only succeed if version unchanged -- Check rows affected-- If 1 row affected: Success! Version was still 1.-- If 0 rows affected: CONFLICT! Someone else modified the row. COMMIT;Handling Optimistic Lock Failures:
When the UPDATE affects 0 rows, the application must handle the conflict:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
def withdraw_with_optimistic_locking(account_id: str, amount: Decimal) -> Result: """ Implements withdrawal with optimistic locking and retry logic """ MAX_RETRIES = 3 for attempt in range(MAX_RETRIES): try: # Start transaction db.begin() # Read current state including version account = db.query(""" SELECT balance, version FROM accounts WHERE account_id = %s """, (account_id,)) # Check business rule if account['balance'] < amount: db.rollback() return Result.failure("Insufficient funds") new_balance = account['balance'] - amount old_version = account['version'] # Attempt conditional update rows_updated = db.execute(""" UPDATE accounts SET balance = %s, version = version + 1 WHERE account_id = %s AND version = %s """, (new_balance, account_id, old_version)) if rows_updated == 1: # Success - no conflict db.commit() return Result.success(f"Withdrawn {amount}, new balance: {new_balance}") else: # Conflict detected - version changed db.rollback() # On conflict, retry with updated data # Exponential backoff to reduce collision probability time.sleep(0.01 * (2 ** attempt)) continue except Exception as e: db.rollback() raise # Max retries exceeded return Result.failure("Transaction failed after maximum retries - high contention")Use optimistic locking when: read-heavy workloads, low probability of conflicting writes, stateless applications, or distributed/cross-database scenarios. Use pessimistic locking when: high write contention on same records, complex multi-statement transactions, or conflicts are common and retry storms would be expensive.
The simplest and most efficient prevention mechanism is eliminating the read-modify-write pattern entirely by using atomic operations. Instead of reading a value, computing in the application, and writing back, perform the entire operation in a single database statement.
The Key Insight:
The lost update vulnerability exists because there's a gap between reading and writing. Atomic operations close this gap by combining read, modify, and write into a single indivisible unit.
1234567891011121314151617181920
-- VULNERABLE: Read-Modify-Write pattern-- Step 1: ReadSELECT balance FROM accounts WHERE account_id = 'ACC-7821';-- Gap: another transaction can modify balance here!-- Step 2: Write computed valueUPDATE accounts SET balance = 400 WHERE account_id = 'ACC-7821'; -- SAFE: Atomic operation-- Single statement combines read + modify + writeUPDATE accounts SET balance = balance - 100 WHERE account_id = 'ACC-7821'; -- The database engine ensures atomicity:-- 1. Acquires lock on row-- 2. Reads current balance-- 3. Computes balance - 100-- 4. Writes new value-- 5. Releases lock-- All in one atomic operation - no gap for interferenceCommon Atomic Operation Patterns:
123456789101112131415161718192021222324252627282930313233
-- Counter increment-- Instead of: SELECT count; UPDATE count = count + 1UPDATE page_views SET count = count + 1 WHERE page_id = 'P-123'; -- Balance modification-- Instead of: SELECT balance; check; UPDATE balance = computedUPDATE accounts SET balance = balance - 100 WHERE account_id = 'ACC-7821' AND balance >= 100;-- Returns 1 row affected if successful, 0 if insufficient funds -- Inventory decrement with constraintUPDATE inventory SET quantity = quantity - 1 WHERE sku = 'WIDGET-X' AND quantity >= 1RETURNING quantity; -- PostgreSQL: return new value -- Conditional state change (booking)UPDATE rooms SET is_available = FALSE, booked_by = 'user-123'WHERE room_id = 'CONF-A' AND time_slot = '2024-01-15 14:00' AND is_available = TRUE;-- Returns 1 if booking succeeded, 0 if room was already booked -- Append to JSON array (PostgreSQL)UPDATE user_logs SET events = events || '{"action": "login", "time": "..."}'::jsonbWHERE user_id = 'U-456'; -- Increment with atomic fetch (PostgreSQL sequence-like)UPDATE sequences SET next_val = next_val + 1 WHERE seq_name = 'order_number'RETURNING next_val - 1 AS current_val;Atomic operations are the gold standard for preventing lost updates. They're simpler, faster (no extra round-trips), and eliminate the problem at its source. Whenever operation logic can be expressed in a single SQL statement, prefer atomic operations over read-modify-write patterns.
Limitations of Atomic Operations:
Not all logic can be expressed atomically:
In these cases, pessimistic or optimistic locking becomes necessary.
SQL defines four standard isolation levels, each providing different guarantees. However, only SERIALIZABLE fully prevents lost updates through the database engine alone. Understanding why lower isolation levels don't suffice is critical for system design.
| Isolation Level | Prevents Lost Updates? | How/Why |
|---|---|---|
| READ UNCOMMITTED | No | Allows dirty reads. No protection against R→R→W→W pattern. |
| READ COMMITTED | No | Prevents dirty reads but allows concurrent writes on same row. The R→R→W→W pattern is fully possible. |
| REPEATABLE READ | Depends on implementation | SQL standard: No. PostgreSQL: Yes (via snapshot isolation). MySQL InnoDB: No. |
| SERIALIZABLE | Yes | Guarantees equivalent to serial execution. Any interleaving that would cause lost updates is detected and one transaction aborted. |
REPEATABLE READ in the SQL standard does NOT prevent lost updates. It only guarantees that if you read a row twice in the same transaction, you'll see the same value. Two transactions can still both read the same row and write conflicting updates. PostgreSQL extends REPEATABLE READ to use snapshot isolation, which does detect conflicts—but this is implementation-specific, not standard-mandated.
Why Developers Get Confused:
The isolation level documentation often says REPEATABLE READ prevents "non-repeatable reads," and developers interpret this as preventing all concurrent modification issues. The confusion arises because:
Non-repeatable read (anomaly name): T₁ reads X, T₂ modifies X, T₁ reads X again and sees a different value.
Lost update (different anomaly): T₁ reads X, T₂ reads X, both modify and write back, one write is lost.
These are different problems! REPEATABLE READ addresses #1 by locking read rows against modification (or using snapshots). It does not address #2 because both transactions reading concurrently is allowed—the problem is on the write side, not the read side.
The SERIALIZABLE Solution:
SERIALIZABLE isolation truly prevents lost updates by ensuring that the final result is equivalent to some serial execution order. If two transactions would produce a lost update, the database detects the conflict (via predicate locking, snapshot isolation with conflict detection, or other mechanisms) and aborts one transaction.
12345678910111213141516171819202122232425262728
-- Using SERIALIZABLE to prevent lost updates-- Transaction T1SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;BEGIN; SELECT balance FROM accounts WHERE account_id = 'ACC-7821';-- Returns $500 -- Meanwhile, T2 also reads $500 and attempts to update UPDATE accounts SET balance = 400 WHERE account_id = 'ACC-7821';COMMIT;-- T1 commits successfully -- Transaction T2 (concurrent)SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;BEGIN; SELECT balance FROM accounts WHERE account_id = 'ACC-7821';-- Returns $500 (same snapshot or value) UPDATE accounts SET balance = 100 WHERE account_id = 'ACC-7821';COMMIT;-- ERROR: could not serialize access due to concurrent update-- T2 must retry -- After retry, T2 reads $400 (T1's committed value)-- Correctly computes $0Given multiple prevention mechanisms, how do you choose the right one for your application? The decision depends on several factors: contention patterns, latency requirements, transaction complexity, and architectural constraints.
| Scenario | Recommended Strategy | Rationale |
|---|---|---|
| Simple counter increment | Atomic UPDATE | Single statement, no gap possible |
| Inventory decrement with check | Atomic UPDATE with WHERE condition | Combine check and decrement atomically |
| Multi-field update, low contention | Optimistic locking (version column) | No blocking, retries are rare |
| Bank transfer (debit + credit) | Pessimistic locking (SELECT FOR UPDATE) | Complex transaction, must not fail halfway |
| Booking system with high demand | Pessimistic locking or SERIALIZABLE | High contention on popular slots |
| Distributed microservices | Application-level optimistic locking | No single database to control; use version tokens |
| Legacy system, can't modify schema | SERIALIZABLE isolation level | Prevention without schema changes |
In critical systems, combine strategies: use atomic operations where possible, add version columns for application-level verification, and set database isolation to SERIALIZABLE for the most sensitive transactions. Defense in depth ensures that if one layer fails to prevent an issue, another catches it.
We've established that lost update prevention is not optional—it's a fundamental requirement for any system with concurrent data modification. We've explored the primary prevention mechanisms and how to select among them.
What's Next:
With prevention strategies understood, the final piece is detection—how to identify when lost updates have occurred despite prevention efforts, for forensics and system validation. The next page explores detection mechanisms.
You now understand why lost update prevention is essential and have a framework for selecting appropriate prevention strategies. This knowledge enables you to design concurrent systems that maintain data integrity from the start.