Loading learning content...
In the world of distributed systems, availability isn't a nice-to-have—it's an existential requirement. When Amazon's website goes down for even a minute, the company loses an estimated $220,000 in sales. When banking systems become unavailable, financial transactions halt, businesses suffer, and regulators take notice. When social media platforms become unreachable, users flee to competitors.
Yet achieving availability in distributed systems presents a fundamental challenge: the more machines you add to handle load and provide redundancy, the more likely it becomes that something will fail at any given moment. A system with 100 servers, each with 99.9% uptime, will experience failures multiple times per day.
This is where the BASE consistency model enters the picture—and its first principle, Basically Available, represents a radical rethinking of how distributed systems should behave when things go wrong.
By the end of this page, you will understand what 'Basically Available' means in the context of distributed databases, how it differs from traditional availability concepts, the theoretical foundations from the CAP theorem, and the architectural patterns that enable systems to remain available even during partial failures. You'll also learn how to design systems that gracefully degrade rather than catastrophically fail.
The term "Basically Available" might seem like a vague or weak guarantee, but it represents a carefully considered engineering philosophy. To understand it, we must first distinguish it from absolute availability and strong consistency availability.
Absolute Availability promises that every request will receive a response—guaranteed. This is theoretically impossible in a distributed system subject to network partitions and node failures.
Strong Consistency Availability promises that every request will receive the correct response—the most up-to-date data. This requires coordination between nodes, which becomes impossible when nodes can't communicate.
Basic Availability takes a different approach: the system will always attempt to provide a response, even if that response might be slightly stale or incomplete. The system remains functional even when it can't be perfect.
The word 'Basically' in 'Basically Available' is intentional. It acknowledges that perfect availability is impossible in distributed systems. Instead, the system guarantees to remain functional 'to the greatest extent possible'—always responding to requests, always serving data, even if that data isn't perfectly consistent across all nodes. This pragmatic approach prioritizes user experience over theoretical purity.
The Fundamental Insight:
Basic availability is built on a crucial observation: for most applications, an available but slightly stale response is far more valuable than no response at all. Consider these scenarios:
The alternative—refusing to serve requests until perfect consistency is achieved—often means:
Basic availability chooses functionality over perfection.
Basic availability's theoretical underpinning comes from the CAP theorem, one of the most important results in distributed systems theory. Proposed by Eric Brewer in 2000 and formally proven by Seth Gilbert and Nancy Lynch in 2002, the CAP theorem states:
A distributed data store cannot simultaneously provide more than two of the following three guarantees:
In a distributed system, network partitions are not a question of 'if' but 'when.' Networks fail. Cables get cut. Routers malfunction. Data centers lose connectivity. This means partition tolerance (P) is not optional—it's a requirement. The real choice in modern distributed systems is between Consistency (CP) and Availability (AP) during network partitions.
Why Partitions Are Inevitable:
Consider what happens in a real distributed system spanning multiple data centers:
When a partition occurs, nodes on either side of the partition can't communicate. At this moment, the system must make a choice:
Option 1 (Choose Consistency): Nodes refuse to serve requests until the partition heals and they can verify data consistency. Users see errors or timeouts.
Option 2 (Choose Availability): Nodes continue serving requests using their local data, accepting that different nodes might have temporarily different views of the data.
BASE systems choose Option 2. They prioritize availability over strong consistency during partitions.
| System Type | During Normal Operation | During Network Partition | Examples |
|---|---|---|---|
| CP (Consistent + Partition-Tolerant) | Strong consistency, full availability | Consistency maintained, availability sacrificed | Google Spanner, etcd, ZooKeeper |
| AP (Available + Partition-Tolerant) | Eventual consistency, full availability | Availability maintained, consistency sacrificed | Cassandra, DynamoDB, CouchDB |
| CA (Consistent + Available) | Theoretically impossible in distributed systems | Cannot exist—partitions will occur | Single-node databases only |
Understanding basic availability requires understanding how availability is measured and what different availability targets mean in practice. The industry standard for measuring availability is the number of nines—expressed as a percentage of uptime over a given period.
| Availability % | Nines | Downtime per Year | Downtime per Month | Downtime per Day |
|---|---|---|---|---|
| 99% | Two nines | 3.65 days | 7.31 hours | 14.40 minutes |
| 99.9% | Three nines | 8.77 hours | 43.83 minutes | 1.44 minutes |
| 99.99% | Four nines | 52.60 minutes | 4.38 minutes | 8.64 seconds |
| 99.999% | Five nines | 5.26 minutes | 26.30 seconds | 864 milliseconds |
| 99.9999% | Six nines | 31.56 seconds | 2.63 seconds | 86.4 milliseconds |
Each additional nine of availability typically requires an order of magnitude more engineering effort and infrastructure cost. Going from 99% to 99.9% might require redundancy. Going from 99.99% to 99.999% might require multi-region deployments, sophisticated failover mechanisms, and extensive monitoring. The decision of how many nines to target should be driven by business requirements, not engineering ambition.
How Basic Availability Affects SLAs:
When we say a system is 'Basically Available,' we're making a specific claim: the system will respond to requests, but the response might not reflect the absolute latest state. This distinction affects how we measure availability:
Traditional Availability Measurement:
Basic Availability Measurement:
This relaxed definition allows basically available systems to achieve higher availability numbers by counting 'stale but served' as successful responses.
Practical Example:
Consider a global e-commerce system with data centers in US, Europe, and Asia:
Under traditional availability, this stale read might be counted as a failure. Under basic availability, it's counted as a success—the user got a response, and the data will eventually be consistent.
Achieving basic availability in distributed systems requires a combination of architectural patterns that work together to ensure the system can respond to requests even when components fail. These patterns form the foundation of the world's most reliable systems.
Deep Dive: Replication for Availability
Replication is the cornerstone of availability. By maintaining multiple copies of data, we ensure that no single failure can make data inaccessible. However, replication introduces a fundamental challenge: keeping replicas synchronized.
There are three primary replication strategies:
Synchronous Replication:
Asynchronous Replication:
Quorum-Based Replication:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
// Example: Quorum-based write in a distributed system// Assuming N = 3 replicas, W = 2 (write quorum) interface WriteResult { success: boolean; confirmedReplicas: number; errors: Error[];} async function quorumWrite( key: string, value: any, replicas: Replica[], writeQuorum: number): Promise<WriteResult> { const N = replicas.length; const W = writeQuorum; // Send write to all replicas in parallel const writePromises = replicas.map(async (replica) => { try { await replica.write(key, value, Date.now()); return { success: true, replica }; } catch (error) { return { success: false, replica, error }; } }); // Wait for responses with timeout const results = await Promise.allSettled( writePromises.map(p => withTimeout(p, 5000)) ); const successes = results.filter( r => r.status === 'fulfilled' && r.value.success ).length; // Quorum achieved? if (successes >= W) { // Write is considered successful - available! // Remaining replicas will eventually receive the write return { success: true, confirmedReplicas: successes, errors: [] }; } else { // Quorum not achieved - write failed // In AP systems, might still succeed with warning return { success: false, confirmedReplicas: successes, errors: results .filter(r => r.status === 'rejected') .map(r => r.reason) }; }}Graceful degradation is a key strategy for maintaining basic availability. Rather than failing completely when resources are constrained or components fail, the system reduces functionality incrementally, prioritizing the most critical features.
Implementing Graceful Degradation:
Effective graceful degradation requires planning. You must identify:
Feature Priority Tiers:
Fallback Strategies:
Circuit Breakers:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
enum CircuitState { CLOSED, // Normal operation - requests flow through OPEN, // Failure detected - requests blocked, fallback used HALF_OPEN // Testing recovery - limited requests allowed} class CircuitBreaker { private state: CircuitState = CircuitState.CLOSED; private failureCount: number = 0; private lastFailureTime: number = 0; constructor( private failureThreshold: number = 5, private resetTimeout: number = 30000 // 30 seconds ) {} async execute<T>( primaryFn: () => Promise<T>, fallbackFn: () => Promise<T> ): Promise<T> { // Check if circuit should transition from OPEN to HALF_OPEN if (this.state === CircuitState.OPEN) { if (Date.now() - this.lastFailureTime > this.resetTimeout) { this.state = CircuitState.HALF_OPEN; } else { // Circuit is open - use fallback for availability return fallbackFn(); } } try { const result = await primaryFn(); // Success - reset circuit if (this.state === CircuitState.HALF_OPEN) { this.state = CircuitState.CLOSED; } this.failureCount = 0; return result; } catch (error) { this.failureCount++; this.lastFailureTime = Date.now(); if (this.failureCount >= this.failureThreshold) { this.state = CircuitState.OPEN; console.log('Circuit opened - switching to fallback'); } // Provide graceful degradation via fallback return fallbackFn(); } }} // Usage Exampleconst recommendationBreaker = new CircuitBreaker(5, 30000); async function getProductRecommendations(userId: string) { return recommendationBreaker.execute( // Primary: real-time personalized recommendations async () => { return await recommendationService.getPersonalized(userId); }, // Fallback: cached popular items (always available) async () => { return await cache.get('popular-products'); } );}Let's examine how major companies implement basic availability in their systems. These patterns have been battle-tested at scales of millions of requests per second.
Amazon calculated that every 100ms of latency costs them 1% in sales. But complete unavailability costs 100% of sales. This makes the basic availability trade-off clear: slightly stale data that's always available is vastly more valuable than perfect data that's sometimes unavailable. The business always chooses availability.
We've explored the first and arguably most important pillar of the BASE consistency model: Basically Available. Let's consolidate the key takeaways:
What's Next:
Now that we understand basic availability, we'll explore the second pillar of BASE: Soft State. Soft state describes how data in a basically available system isn't permanent—it can change over time even without explicit user input, as the system works to reconcile differences between replicas. This concept fundamentally changes how we think about data management in distributed systems.
You now understand what 'Basically Available' means in the context of distributed databases. This pillar of BASE represents a deliberate trade-off: by relaxing consistency guarantees, distributed systems can remain available even during partial failures. Next, we'll explore how 'Soft State' enables this availability through flexible data management.