Database Management SystemsStatic Hashing

Static Hashing: Fundamentals and Implementation

LevelIntermediate

Duration90 mins

TopicStatic Hashing

2 / 5

Hash Function Design: Principles and Practice

The Engine of Hash Indexing

If buckets are the containers of a hash index, then the hash function is the routing mechanism that directs each key to its designated bucket. The quality of this routing determines everything: lookup performance, space utilization, and the frequency of collisions that degrade O(1) operations to linear searches.

A hash function for database indexing must satisfy demanding, often conflicting requirements. It must be fast enough that it doesn't become the bottleneck in high-throughput systems. It must distribute keys uniformly to prevent bucket hotspots. It must be deterministic so the same key always reaches the same bucket. And it must handle arbitrary key types—integers, strings, composite keys, binary blobs—with equal effectiveness.

Designing or selecting a hash function is not a trivial decision. The wrong choice can turn an O(1) data structure into an O(n) liability, silently degrading system performance until it becomes a crisis.

What You Will Learn

By the end of this page, you will understand: (1) The mathematical properties that define an effective hash function, (2) Common hash algorithms used in database systems, (3) How to evaluate hash function quality, (4) Techniques for hashing different key types, and (5) Production-grade implementation considerations.

Essential Properties of Hash Functions

A function h: K → {0, 1, ..., N-1} that maps keys from domain K to bucket indices must exhibit several properties to be useful for database indexing. Understanding these properties helps in both selecting existing hash functions and recognizing when a chosen function is underperforming.

Determinism

The most fundamental requirement: given the same key and the same hash function configuration, the output must always be identical.

h(k) = h(k), always

This seems obvious, but violations occur more often than expected. Using uninitialized memory, floating-point keys with NaN handling, or functions that sample random values internally will break determinism. Without it, a stored record becomes unfindable.

Uniformity (Even Distribution)

The hash function should distribute keys as evenly as possible across all bucket indices. Formally, for a uniform random key k drawn from K:

P(h(k) = i) = 1/N for all buckets i ∈ {0, 1, ..., N-1}

In practice, real keys are not uniformly random. They cluster (sequential IDs), have patterns (timestamps), or share prefixes (URLs). A good hash function must break these patterns and produce outputs that appear random even when inputs are structured.

Hash Function Properties for Database Indexing
Property	Definition	Consequence of Violation
Determinism	Same input always yields same output	Records become unfindable after storage
Uniformity	All outputs equally probable	Bucket hotspots cause overflow chains
Efficiency	Computation completes in O(key_length)	Hash computation becomes bottleneck
Sensitivity	Small input changes → large output changes	Similar keys cluster in same buckets
Completeness	Uses all input bits in computation	Unused bits create unnecessary collisions

Avalanche Effect

A desirable property for high-quality hash functions: changing a single bit in the input should change approximately half of the output bits. This ensures that:

Sequential keys (id=1, id=2, id=3) don't hash to sequential buckets
Similar strings ("user1", "user2") distribute across different buckets
Prefix-heavy data (URLs with same domain) spreads evenly

The avalanche effect quantifies how well the hash function "mixes" input bits. Poor mixing leads to clustering; good mixing leads to pseudo-random distribution.

Efficiency: The Speed Imperative

Unlike cryptographic hashing where security justifies computational expense, database hash functions must be fast. Hash computation occurs on every:

Index lookup (SELECT with equality predicate)
Index insertion (INSERT, UPDATE)
Index deletion (DELETE)
Hash join probe

For a table processing 100,000 queries per second, even a 1μs hash computation adds 100ms of aggregate CPU time per second—a significant overhead.

Cryptographic vs. Database Hash Functions

Do NOT use cryptographic hash functions (SHA-256, MD5) for database indexing. They're designed for security, not speed. SHA-256 takes ~30-100 CPU cycles per byte; database hash functions aim for <1 cycle per byte. The collision resistance of crypto hashes is irrelevant for indexing—we handle collisions explicitly.

Classic Hash Functions for Database Systems

Several hash function families have proven effective for database indexing. Understanding their characteristics helps in selecting the right function for specific workloads.

Division Method

The simplest approach: divide the key by the number of buckets and use the remainder.

h(k) = k mod N

For integer keys, this is extremely fast (single CPU instruction) but has limitations:

Highly sensitive to N. If N has small prime factors, keys divisible by those factors cluster.
Sequential keys produce sequential bucket indices (no mixing).
Works poorly when keys share common factors with N.

Mitigation: Choose N as a prime number not close to a power of 2. For example, if you need ~1000 buckets, use 1009 (prime) rather than 1024 (power of 2).

division_method.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* Division method hash - simplest form */
uint32_t hash_division(uint64_t key, uint32_t bucket_count) {
    return (uint32_t)(key % bucket_count);
}
 
/* Example usage with prime bucket count */
#define BUCKET_COUNT 1009  /* Prime near 1000 */
 
uint32_t bucket = hash_division(user_id, BUCKET_COUNT);
 
/* Problem demonstration: sequential keys */
for (int i = 0; i < 10; i++) {
    printf("key=%d -> bucket=%d\n", i, hash_division(i, BUCKET_COUNT));
}
/* Output: 0,1,2,3,4,5,6,7,8,9 - no mixing! */

Multiplication Method (Knuth's Method)

Address the weaknesses of division by introducing multiplication:

h(k) = ⌊N × (k × A mod 1)⌋

Where A is a carefully chosen constant. Knuth recommends A = (√5 - 1) / 2 ≈ 0.618033988... (the reciprocal of the golden ratio).

The multiplication method:

Provides better distribution than division for sequential keys
Is less sensitive to the choice of N
Works well when N is a power of 2 (enabling bit manipulation)

multiplication_method.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/* Knuth's multiplication method */
#include <stdint.h>
 
/* Golden ratio constant in fixed-point representation */
/* A = (sqrt(5) - 1) / 2 ≈ 0.618033988... */
/* As 32-bit fixed point: A * 2^32 = 2654435769 */
#define KNUTH_MULTIPLIER 2654435769UL
 
uint32_t hash_multiplication(uint64_t key, uint32_t log2_buckets) {
    /* Multiply key by golden ratio constant */
    uint32_t mixed = (uint32_t)(key * KNUTH_MULTIPLIER);
    
    /* Extract high-order bits for bucket index */
    /* log2_buckets = log2(N), assumes N is power of 2 */
    return mixed >> (32 - log2_buckets);
}
 
/* Example: 1024 buckets (2^10) */
#define LOG2_BUCKETS 10
uint32_t bucket = hash_multiplication(user_id, LOG2_BUCKETS);
 
/* Sequential keys now distribute better */
for (int i = 0; i < 10; i++) {
    printf("key=%d -> bucket=%d\n", i, 
           hash_multiplication(i, LOG2_BUCKETS));
}
/* Output: varied distribution, not sequential */

Polynomial Rolling Hash (for Strings)

For variable-length keys like strings, a common approach treats the string as a polynomial:

h(s) = (s[0] × p^(n-1) + s[1] × p^(n-2) + ... + s[n-1]) mod N

Where p is a prime base (commonly 31, 37, or 53 for alphanumeric strings) and s[i] are character codes.

Horner's method evaluates this efficiently in O(n) time:

h = 0
for each character c in string:
    h = (h × p + c) mod N

polynomial_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* Polynomial rolling hash for strings */
uint32_t hash_polynomial(const char *str, size_t len, uint32_t bucket_count) {
    const uint32_t BASE = 31;  /* Prime base for mixing */
    uint64_t hash = 0;
    
    for (size_t i = 0; i < len; i++) {
        hash = hash * BASE + (unsigned char)str[i];
        /* Keep intermediate results manageable */
        hash &= 0xFFFFFFFFFFFFFFFF;  /* Implicit for uint64_t */
    }
    
    return (uint32_t)(hash % bucket_count);
}
 
/* Improved version: use larger prime, mix at end */
uint32_t hash_polynomial_improved(const char *str, size_t len, 
                                   uint32_t bucket_count) {
    const uint64_t BASE = 31;
    const uint64_t MOD = 0x7FFFFFFFFFFFFFFF;  /* Large prime */
    
    uint64_t hash = 0;
    for (size_t i = 0; i < len; i++) {
        hash = ((hash * BASE) + str[i]) % MOD;
    }
    
    /* Apply secondary mixing before modulo by bucket count */
    hash ^= (hash >> 33);
    hash *= 0xff51afd7ed558ccdULL;
    hash ^= (hash >> 33);
    
    return (uint32_t)(hash % bucket_count);
}

String Hashing Pitfalls

String hash functions must handle: (1) Case sensitivity (should 'ABC' = 'abc'?), (2) Character encoding (UTF-8 vs. UTF-16), (3) Locale-specific comparisons, (4) Trailing spaces. These must match the equality semantics of the index. A hash that ignores case must correspond to case-insensitive lookups.

Modern High-Performance Hash Functions

Modern database systems employ sophisticated hash functions designed for both distribution quality and computational efficiency. These functions leverage CPU features like pipelining, SIMD instructions, and hardware multipliers.

MurmurHash3

Developed by Austin Appleby, MurmurHash3 is widely used in databases (including Apache Cassandra, Redis, and many others). Its key characteristics:

Excellent distribution quality (passes SMHasher test suite)
High throughput: ~3-4 GB/s on modern CPUs
Available in 32-bit, 64-bit, and 128-bit variants
Non-cryptographic (do not use for security)

murmurhash3.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* MurmurHash3 finalization mix (32-bit) */
static inline uint32_t fmix32(uint32_t h) {
    h ^= h >> 16;
    h *= 0x85ebca6b;
    h ^= h >> 13;
    h *= 0xc2b2ae35;
    h ^= h >> 16;
    return h;
}
 
/* MurmurHash3 for 32-bit output */
uint32_t murmurhash3_32(const void *key, size_t len, uint32_t seed) {
    const uint8_t *data = (const uint8_t*)key;
    const int nblocks = len / 4;
    
    uint32_t h1 = seed;
    const uint32_t c1 = 0xcc9e2d51;
    const uint32_t c2 = 0x1b873593;
    
    /* Body: process 4-byte blocks */
    const uint32_t *blocks = (const uint32_t*)(data);
    for (int i = 0; i < nblocks; i++) {
        uint32_t k1 = blocks[i];
        
        k1 *= c1;
        k1 = (k1 << 15) | (k1 >> 17);  /* ROTL32 */
        k1 *= c2;
        
        h1 ^= k1;
        h1 = (h1 << 13) | (h1 >> 19);
        h1 = h1 * 5 + 0xe6546b64;
    }
    
    /* Tail: process remaining bytes */
    const uint8_t *tail = data + nblocks * 4;
    uint32_t k1 = 0;
    switch (len & 3) {
        case 3: k1 ^= tail[2] << 16;  /* fall through */
        case 2: k1 ^= tail[1] << 8;   /* fall through */
        case 1: k1 ^= tail[0];
                k1 *= c1;
                k1 = (k1 << 15) | (k1 >> 17);
                k1 *= c2;
                h1 ^= k1;
    }
    
    /* Finalization */
    h1 ^= len;
    h1 = fmix32(h1);
    
    return h1;
}

xxHash

Developed by Yann Collet (creator of LZ4 compression), xxHash is one of the fastest non-cryptographic hash functions available:

Extremely fast: 10+ GB/s on modern CPUs (3-4× faster than MurmurHash)
Excellent distribution quality
32-bit (xxHash32), 64-bit (xxHash64), and 128-bit (XXH3) variants
Used in databases, file systems, and networking

xxHash achieves its speed through:

Explicit parallelism (processes multiple lanes simultaneously)
CPU-friendly access patterns
Minimized data dependencies for better pipelining

Hash Function Performance Comparison
Hash Function	Throughput (GB/s)	Quality Score	Use Case
Division (mod)	50+	Poor	Integer keys only, prime N required
Polynomial	0.5-2	Medium	Strings, compatible with range queries
FNV-1a	2-3	Good	Simple, good for short keys
MurmurHash3	3-4	Excellent	General purpose, widely adopted
xxHash64	10-15	Excellent	High-throughput systems
CityHash	5-10	Excellent	Google's hash, good for strings
SHA-256	0.3-0.5	Cryptographic	NEVER use for indexing

Choosing a Hash Function

For most database indexing scenarios, MurmurHash3 or xxHash64 are excellent choices. They offer excellent distribution, high speed, and are well-tested in production systems. Use xxHash for maximum throughput; use MurmurHash3 for broader ecosystem compatibility.

Hashing Composite Keys

Real-world database indexes often use composite keys—keys composed of multiple columns (e.g., (user_id, timestamp) or (category, product_id)). Hashing composite keys requires combining individual column hashes into a single bucket identifier.

Naive Approach: Concatenation

The simplest method concatenates column values and hashes the result:

composite_key = column1 || column2 || ... || columnN
bucket = hash(composite_key) mod N

This works but has drawbacks:

Requires memory allocation for concatenation
Must handle variable-length columns carefully
Separator ambiguity (is "ab" + "cd" different from "abc" + "d"?)

Hash Combination Techniques

Production systems typically hash each column independently and combine the results:

composite_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* Method 1: XOR combination (simple but weak) */
uint64_t hash_composite_xor(uint64_t h1, uint64_t h2) {
    return h1 ^ h2;
}
/* Problem: XOR is commutative, so (a,b) = (b,a) */
/* Problem: XOR of same values is 0 */
 
/* Method 2: Polynomial combination (better) */
uint64_t hash_composite_poly(uint64_t h1, uint64_t h2) {
    const uint64_t PRIME = 31;
    return h1 * PRIME + h2;
}
/* Preserves order: (a,b) ≠ (b,a) */
/* Still has collision patterns */
 
/* Method 3: Boost-style hash combine (widely used) */
uint64_t hash_combine(uint64_t seed, uint64_t value) {
    /* Magic constant derived from golden ratio */
    return seed ^ (value + 0x9e3779b97f4a7c15ULL 
                   + (seed << 6) + (seed >> 2));
}
 
/* Usage for composite key (user_id, timestamp) */
uint64_t hash_user_timestamp(uint64_t user_id, uint64_t timestamp,
                              uint32_t bucket_count) {
    uint64_t h = 0;
    h = hash_combine(h, murmurhash3_64(&user_id, sizeof(user_id), 0));
    h = hash_combine(h, murmurhash3_64(&timestamp, sizeof(timestamp), 0));
    return h % bucket_count;
}
 
/* Method 4: Murmur-style mixing (excellent distribution) */
uint64_t hash_composite_murmur(const void **columns, 
                                const size_t *lengths,
                                int num_columns,
                                uint32_t seed) {
    uint64_t h = seed;
    
    for (int i = 0; i < num_columns; i++) {
        uint64_t col_hash = murmurhash3_64(columns[i], lengths[i], h);
        /* Mix with rotation and multiplication */
        h = ((h << 5) | (h >> 59)) ^ col_hash;
        h *= 0xc4ceb9fe1a85ec53ULL;
    }
    
    /* Final mixing */
    h ^= h >> 33;
    h *= 0xff51afd7ed558ccdULL;
    h ^= h >> 33;
    
    return h;
}

NULL Handling in Composite Keys

When columns can be NULL, hash functions must handle this consistently:

NULL as special value: Hash NULL to a fixed constant (e.g., 0 or a magic number)
NULL propagation: If any column is NULL, the entire key hashes to a designated bucket
NULL exclusion: Rows with NULL in indexed columns are excluded from the hash index

Most SQL databases use option 1 or 3. The choice affects query planning—NULL = NULL evaluates to UNKNOWN in SQL, which has implications for hash join probes.

Order Matters for Composite Keys

When combining column hashes, the combination function should NOT be commutative. The keys (user_id=1, product_id=2) and (user_id=2, product_id=1) must hash to different buckets. XOR-only combination violates this. Always use order-preserving combinations like polynomial or Boost-style hash_combine.

Evaluating Hash Function Quality

How do you know if your hash function is performing well? Production database systems should monitor hash index quality and alert when distribution becomes pathological.

Chi-Square Test for Uniformity

The chi-square statistic measures deviation from expected uniform distribution:

χ² = Σ (observed_i - expected)² / expected

For a hash index with N buckets and n records, expected records per bucket = n/N.

Interpretation:

χ² ≈ N (degrees of freedom) → good distribution
χ² >> N → significant clustering (bad)
χ² << N → suspiciously uniform (check for errors)

hash_quality_test.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Chi-square test for hash distribution quality */
double compute_chi_square(const uint32_t *bucket_counts, 
                          uint32_t num_buckets, 
                          uint64_t total_records) {
    double expected = (double)total_records / num_buckets;
    double chi_square = 0.0;
    
    for (uint32_t i = 0; i < num_buckets; i++) {
        double diff = bucket_counts[i] - expected;
        chi_square += (diff * diff) / expected;
    }
    
    return chi_square;
}
 
/* Evaluate hash function quality */
void evaluate_hash_quality(const uint32_t *bucket_counts,
                            uint32_t num_buckets,
                            uint64_t total_records) {
    double chi_sq = compute_chi_square(bucket_counts, num_buckets, total_records);
    double expected_chi_sq = num_buckets - 1;  /* degrees of freedom */
    double ratio = chi_sq / expected_chi_sq;
    
    printf("Chi-square statistic: %.2f\n", chi_sq);
    printf("Expected (uniform): %.2f\n", expected_chi_sq);
    printf("Ratio: %.2f\n", ratio);
    
    if (ratio < 0.9) {
        printf("WARNING: Suspiciously uniform (check for bugs)\n");
    } else if (ratio > 1.5) {
        printf("WARNING: Significant non-uniformity detected\n");
    } else if (ratio > 2.0) {
        printf("CRITICAL: Severe clustering, consider different hash function\n");
    } else {
        printf("OK: Distribution within acceptable range\n");
    }
}

Practical Quality Metrics

Beyond chi-square, database administrators should monitor:

Hash Index Quality Metrics
Metric	Healthy Value	Warning Threshold	Action if Exceeded
Max bucket load / Average load	< 3	5	Check for key patterns; consider rehash
Empty bucket %	20-40% (at 70% fill)	60%	Index oversized; consider rebuild
Overflow chain %	< 5%	15%	Index undersized; consider rebuild
Avg overflow chain length	< 2	4	Severe clustering; change hash function
Longest overflow chain	< 10	20	Investigate specific key patterns

Testing with Production Data Patterns

Always test hash functions with REALISTIC data, not random inputs. Hash functions that perform well on random data may fail spectacularly on production patterns like sequential IDs, timestamps with second granularity, or URLs with common prefixes. Extract a sample of production keys and measure actual distribution.

Randomization and Seed Values

Many hash functions accept a seed parameter that modifies their output. Understanding how to use seeds effectively can improve distribution and provide defense against certain attack vectors.

Why Use Seeds?

Breaking systematic patterns: If a hash function clusters on certain key patterns, a different seed may break that clustering.
Universal hashing: Randomly selecting a hash function from a family provides probabilistic guarantees about worst-case performance.
Hash DoS mitigation: Adversaries who know your hash function can craft keys that all collide. Random seeds prevent this (more relevant for hash tables than database indexes, but still applicable).

Universal Hashing Theory

A hash function family H is universal if for any two distinct keys k₁ and k₂:

P[h(k₁) = h(k₂)] ≤ 1/N when h is chosen uniformly at random from H.

With universal hashing, the expected number of collisions for any key is bounded, regardless of the key distribution.

universal_hashing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Simple universal hashing family for integers */
/* h(k) = ((a * k + b) mod p) mod N */
/* where p is prime > key range, a ∈ [1, p-1], b ∈ [0, p-1] */
 
typedef struct {
    uint64_t a;          /* Multiplier, must be non-zero */
    uint64_t b;          /* Offset */
    uint64_t prime;      /* Large prime, > max key value */
    uint32_t num_buckets;
} UniversalHashParams;
 
/* Initialize with random parameters */
UniversalHashParams universal_hash_init(uint32_t num_buckets, uint64_t seed) {
    /* Use seed to generate random a and b */
    srand48(seed);
    
    UniversalHashParams params;
    params.prime = 2305843009213693951ULL;  /* Mersenne prime 2^61 - 1 */
    params.num_buckets = num_buckets;
    params.a = (lrand48() % (params.prime - 1)) + 1;  /* Non-zero */
    params.b = lrand48() % params.prime;
    
    return params;
}
 
/* Universal hash function */
uint32_t universal_hash(const UniversalHashParams *p, uint64_t key) {
    /* Note: For very large keys, use modular arithmetic carefully */
    __uint128_t temp = (__uint128_t)p->a * key + p->b;
    uint64_t h = temp % p->prime;
    return (uint32_t)(h % p->num_buckets);
}
 
/* Usage: different seeds create different hash functions */
UniversalHashParams h1 = universal_hash_init(1009, 12345);
UniversalHashParams h2 = universal_hash_init(1009, 67890);
/* h1 and h2 will distribute the same keys differently */

Seed Persistence

Once a seed is chosen for a hash index, it MUST be stored persistently in the index metadata. If the seed changes (even due to a software upgrade), all previously stored records become unfindable since their bucket locations would change.

Practical Seed Selection

For database hash indexes:

Generate seed at index creation: Use a cryptographically secure random number generator (e.g., /dev/urandom, CryptGenRandom).
Store seed in index metadata: Include it in the index header page, replicated in the system catalog.
Use seed consistently: All hash computations for that index must use the same seed.
Consider reproducibility: For testing and debugging, allow deterministic seed specification.

-- Hypothetical syntax for seed specification
CREATE HASH INDEX idx_users_email ON users(email) 
  WITH (hash_seed = 0xDEADBEEF, buckets = 10007);

Implementation Best Practices

Implementing hash functions for production database systems requires attention to correctness, performance, and platform compatibility.

Platform Considerations

Cross-Platform Implementation Issues

•Endianness — Hash functions that process multi-byte words must handle byte order. Either use byte-at-a-time processing (slow) or detect and swap bytes on big-endian systems.
•Alignment — Some CPUs require aligned memory access. Hash functions processing 4/8-byte words must handle unaligned input buffers safely.
•Integer overflow — C/C++ signed integer overflow is undefined behavior. Use unsigned types and explicit masking for predictable wrapping.
•Floating point — Never hash floating-point numbers directly. +0.0 and -0.0 are equal but have different representations. NaN ≠ NaN. Convert to canonical integer representation first.

platform_safe_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/* Safe 32-bit integer read, handles unaligned access and endianness */
static inline uint32_t read_u32(const uint8_t *p) {
#if defined(__GNUC__) && defined(__x86_64__)
    /* x86 allows unaligned access, use memcpy for strict aliasing */
    uint32_t v;
    memcpy(&v, p, sizeof(v));
    return v;  /* x86 is little-endian */
#else
    /* Portable: byte-at-a-time, little-endian */
    return (uint32_t)p[0] 
         | ((uint32_t)p[1] << 8)
         | ((uint32_t)p[2] << 16)
         | ((uint32_t)p[3] << 24);
#endif
}
 
/* Safe double hashing - normalize representation */
uint64_t hash_double(double value) {
    /* Handle special cases */
    if (isnan(value)) {
        return 0x7FF8000000000000ULL;  /* Canonical NaN */
    }
    if (value == 0.0) {
        return 0;  /* Both +0.0 and -0.0 hash the same */
    }
    
    /* Use bit representation for normal values */
    uint64_t bits;
    memcpy(&bits, &value, sizeof(bits));
    return murmurhash3_64(&bits, sizeof(bits), 0);
}

Performance Optimization Techniques

Hash Function Optimizations

•Process multi-byte words — Reading 4 or 8 bytes per iteration is faster than byte-by-byte. Use word-aligned loops with tail handling.
•Unroll loops — For fixed-size keys, completely unroll the hash loop. For variable-size, unroll by 4-8 iterations.
•Use SIMD when available — xxHash3 and some MurmurHash variants use SSE/AVX for parallel processing of multiple lanes.
•Minimize data dependencies — Arrange operations so CPU can pipeline. Avoid reading results immediately after computation.
•Precompute constants — Convert floating-point magic constants to integer equivalents at compile time.

Test on Target Hardware

Hash function performance varies significantly across CPU architectures. A function that's fast on Intel may be slower on ARM. Always benchmark on your production hardware. Cloud deployments should test on the specific instance types used.

Summary: Hash Function Design

We have explored the critical role of hash functions in database indexing—the algorithms that transform keys into bucket locations and determine the effectiveness of the entire hash index structure.

Key Takeaways

•Essential properties — Determinism, uniformity, efficiency, and avalanche effect define hash function quality for indexing.
•Classic algorithms — Division and multiplication methods work for simple cases but have known weaknesses with patterned data.
•Modern algorithms — MurmurHash3 and xxHash provide excellent distribution and speed; prefer these for production systems.
•Composite keys — Require careful combination of per-column hashes; use order-preserving combination functions like hash_combine.
•Quality monitoring — Chi-square tests and bucket load metrics should be used to detect hash function failures in production.
•Seeds and randomization — Random seeds improve distribution and provide security; must be persisted with index metadata.
•Platform safety — Handle endianness, alignment, and type representation carefully for portable, correct implementations.

What's Next:

With bucket architecture and hash functions established, we now face an inevitable reality: collisions. The next page examines overflow handling—what happens when multiple keys hash to the same bucket and the bucket fills beyond capacity.

Page Complete

You now understand the principles and practice of hash function design for database indexing. This knowledge enables you to select appropriate hash functions, evaluate their quality in production, and troubleshoot distribution problems. Next, we'll examine how to handle the collisions that even the best hash functions cannot prevent.

2 / 5

Loading learning content...

Database Management SystemsStatic Hashing

Static Hashing: Fundamentals and Implementation

LevelIntermediate

Duration90 mins

TopicStatic Hashing

2 / 5

Hash Function Design: Principles and Practice

The Engine of Hash Indexing

What You Will Learn

Essential Properties of Hash Functions

Determinism

The most fundamental requirement: given the same key and the same hash function configuration, the output must always be identical.

h(k) = h(k), always

Uniformity (Even Distribution)

The hash function should distribute keys as evenly as possible across all bucket indices. Formally, for a uniform random key k drawn from K:

P(h(k) = i) = 1/N for all buckets i ∈ {0, 1, ..., N-1}

Hash Function Properties for Database Indexing
Property	Definition	Consequence of Violation
Determinism	Same input always yields same output	Records become unfindable after storage
Uniformity	All outputs equally probable	Bucket hotspots cause overflow chains
Efficiency	Computation completes in O(key_length)	Hash computation becomes bottleneck
Sensitivity	Small input changes → large output changes	Similar keys cluster in same buckets
Completeness	Uses all input bits in computation	Unused bits create unnecessary collisions

Avalanche Effect

A desirable property for high-quality hash functions: changing a single bit in the input should change approximately half of the output bits. This ensures that:

Sequential keys (id=1, id=2, id=3) don't hash to sequential buckets
Similar strings ("user1", "user2") distribute across different buckets
Prefix-heavy data (URLs with same domain) spreads evenly

The avalanche effect quantifies how well the hash function "mixes" input bits. Poor mixing leads to clustering; good mixing leads to pseudo-random distribution.

Efficiency: The Speed Imperative

Unlike cryptographic hashing where security justifies computational expense, database hash functions must be fast. Hash computation occurs on every:

Index lookup (SELECT with equality predicate)
Index insertion (INSERT, UPDATE)
Index deletion (DELETE)
Hash join probe

For a table processing 100,000 queries per second, even a 1μs hash computation adds 100ms of aggregate CPU time per second—a significant overhead.

Cryptographic vs. Database Hash Functions

Classic Hash Functions for Database Systems

Several hash function families have proven effective for database indexing. Understanding their characteristics helps in selecting the right function for specific workloads.

Division Method

The simplest approach: divide the key by the number of buckets and use the remainder.

h(k) = k mod N

For integer keys, this is extremely fast (single CPU instruction) but has limitations:

Highly sensitive to N. If N has small prime factors, keys divisible by those factors cluster.
Sequential keys produce sequential bucket indices (no mixing).
Works poorly when keys share common factors with N.

Mitigation: Choose N as a prime number not close to a power of 2. For example, if you need ~1000 buckets, use 1009 (prime) rather than 1024 (power of 2).

division_method.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* Division method hash - simplest form */
uint32_t hash_division(uint64_t key, uint32_t bucket_count) {
    return (uint32_t)(key % bucket_count);
}
 
/* Example usage with prime bucket count */
#define BUCKET_COUNT 1009  /* Prime near 1000 */
 
uint32_t bucket = hash_division(user_id, BUCKET_COUNT);
 
/* Problem demonstration: sequential keys */
for (int i = 0; i < 10; i++) {
    printf("key=%d -> bucket=%d\n", i, hash_division(i, BUCKET_COUNT));
}
/* Output: 0,1,2,3,4,5,6,7,8,9 - no mixing! */

Multiplication Method (Knuth's Method)

Address the weaknesses of division by introducing multiplication:

h(k) = ⌊N × (k × A mod 1)⌋

Where A is a carefully chosen constant. Knuth recommends A = (√5 - 1) / 2 ≈ 0.618033988... (the reciprocal of the golden ratio).

The multiplication method:

Provides better distribution than division for sequential keys
Is less sensitive to the choice of N
Works well when N is a power of 2 (enabling bit manipulation)

multiplication_method.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/* Knuth's multiplication method */
#include <stdint.h>
 
/* Golden ratio constant in fixed-point representation */
/* A = (sqrt(5) - 1) / 2 ≈ 0.618033988... */
/* As 32-bit fixed point: A * 2^32 = 2654435769 */
#define KNUTH_MULTIPLIER 2654435769UL
 
uint32_t hash_multiplication(uint64_t key, uint32_t log2_buckets) {
    /* Multiply key by golden ratio constant */
    uint32_t mixed = (uint32_t)(key * KNUTH_MULTIPLIER);
    
    /* Extract high-order bits for bucket index */
    /* log2_buckets = log2(N), assumes N is power of 2 */
    return mixed >> (32 - log2_buckets);
}
 
/* Example: 1024 buckets (2^10) */
#define LOG2_BUCKETS 10
uint32_t bucket = hash_multiplication(user_id, LOG2_BUCKETS);
 
/* Sequential keys now distribute better */
for (int i = 0; i < 10; i++) {
    printf("key=%d -> bucket=%d\n", i, 
           hash_multiplication(i, LOG2_BUCKETS));
}
/* Output: varied distribution, not sequential */

Polynomial Rolling Hash (for Strings)

For variable-length keys like strings, a common approach treats the string as a polynomial:

h(s) = (s[0] × p^(n-1) + s[1] × p^(n-2) + ... + s[n-1]) mod N

Where p is a prime base (commonly 31, 37, or 53 for alphanumeric strings) and s[i] are character codes.

Horner's method evaluates this efficiently in O(n) time:

h = 0
for each character c in string:
    h = (h × p + c) mod N

polynomial_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* Polynomial rolling hash for strings */
uint32_t hash_polynomial(const char *str, size_t len, uint32_t bucket_count) {
    const uint32_t BASE = 31;  /* Prime base for mixing */
    uint64_t hash = 0;
    
    for (size_t i = 0; i < len; i++) {
        hash = hash * BASE + (unsigned char)str[i];
        /* Keep intermediate results manageable */
        hash &= 0xFFFFFFFFFFFFFFFF;  /* Implicit for uint64_t */
    }
    
    return (uint32_t)(hash % bucket_count);
}
 
/* Improved version: use larger prime, mix at end */
uint32_t hash_polynomial_improved(const char *str, size_t len, 
                                   uint32_t bucket_count) {
    const uint64_t BASE = 31;
    const uint64_t MOD = 0x7FFFFFFFFFFFFFFF;  /* Large prime */
    
    uint64_t hash = 0;
    for (size_t i = 0; i < len; i++) {
        hash = ((hash * BASE) + str[i]) % MOD;
    }
    
    /* Apply secondary mixing before modulo by bucket count */
    hash ^= (hash >> 33);
    hash *= 0xff51afd7ed558ccdULL;
    hash ^= (hash >> 33);
    
    return (uint32_t)(hash % bucket_count);
}

String Hashing Pitfalls

Modern High-Performance Hash Functions

MurmurHash3

Developed by Austin Appleby, MurmurHash3 is widely used in databases (including Apache Cassandra, Redis, and many others). Its key characteristics:

Excellent distribution quality (passes SMHasher test suite)
High throughput: ~3-4 GB/s on modern CPUs
Available in 32-bit, 64-bit, and 128-bit variants
Non-cryptographic (do not use for security)

murmurhash3.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* MurmurHash3 finalization mix (32-bit) */
static inline uint32_t fmix32(uint32_t h) {
    h ^= h >> 16;
    h *= 0x85ebca6b;
    h ^= h >> 13;
    h *= 0xc2b2ae35;
    h ^= h >> 16;
    return h;
}
 
/* MurmurHash3 for 32-bit output */
uint32_t murmurhash3_32(const void *key, size_t len, uint32_t seed) {
    const uint8_t *data = (const uint8_t*)key;
    const int nblocks = len / 4;
    
    uint32_t h1 = seed;
    const uint32_t c1 = 0xcc9e2d51;
    const uint32_t c2 = 0x1b873593;
    
    /* Body: process 4-byte blocks */
    const uint32_t *blocks = (const uint32_t*)(data);
    for (int i = 0; i < nblocks; i++) {
        uint32_t k1 = blocks[i];
        
        k1 *= c1;
        k1 = (k1 << 15) | (k1 >> 17);  /* ROTL32 */
        k1 *= c2;
        
        h1 ^= k1;
        h1 = (h1 << 13) | (h1 >> 19);
        h1 = h1 * 5 + 0xe6546b64;
    }
    
    /* Tail: process remaining bytes */
    const uint8_t *tail = data + nblocks * 4;
    uint32_t k1 = 0;
    switch (len & 3) {
        case 3: k1 ^= tail[2] << 16;  /* fall through */
        case 2: k1 ^= tail[1] << 8;   /* fall through */
        case 1: k1 ^= tail[0];
                k1 *= c1;
                k1 = (k1 << 15) | (k1 >> 17);
                k1 *= c2;
                h1 ^= k1;
    }
    
    /* Finalization */
    h1 ^= len;
    h1 = fmix32(h1);
    
    return h1;
}

xxHash

Developed by Yann Collet (creator of LZ4 compression), xxHash is one of the fastest non-cryptographic hash functions available:

Extremely fast: 10+ GB/s on modern CPUs (3-4× faster than MurmurHash)
Excellent distribution quality
32-bit (xxHash32), 64-bit (xxHash64), and 128-bit (XXH3) variants
Used in databases, file systems, and networking

xxHash achieves its speed through:

Explicit parallelism (processes multiple lanes simultaneously)
CPU-friendly access patterns
Minimized data dependencies for better pipelining

Hash Function Performance Comparison
Hash Function	Throughput (GB/s)	Quality Score	Use Case
Division (mod)	50+	Poor	Integer keys only, prime N required
Polynomial	0.5-2	Medium	Strings, compatible with range queries
FNV-1a	2-3	Good	Simple, good for short keys
MurmurHash3	3-4	Excellent	General purpose, widely adopted
xxHash64	10-15	Excellent	High-throughput systems
CityHash	5-10	Excellent	Google's hash, good for strings
SHA-256	0.3-0.5	Cryptographic	NEVER use for indexing

Choosing a Hash Function

Hashing Composite Keys

Naive Approach: Concatenation

The simplest method concatenates column values and hashes the result:

composite_key = column1 || column2 || ... || columnN
bucket = hash(composite_key) mod N

This works but has drawbacks:

Requires memory allocation for concatenation
Must handle variable-length columns carefully
Separator ambiguity (is "ab" + "cd" different from "abc" + "d"?)

Hash Combination Techniques

Production systems typically hash each column independently and combine the results:

composite_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* Method 1: XOR combination (simple but weak) */
uint64_t hash_composite_xor(uint64_t h1, uint64_t h2) {
    return h1 ^ h2;
}
/* Problem: XOR is commutative, so (a,b) = (b,a) */
/* Problem: XOR of same values is 0 */
 
/* Method 2: Polynomial combination (better) */
uint64_t hash_composite_poly(uint64_t h1, uint64_t h2) {
    const uint64_t PRIME = 31;
    return h1 * PRIME + h2;
}
/* Preserves order: (a,b) ≠ (b,a) */
/* Still has collision patterns */
 
/* Method 3: Boost-style hash combine (widely used) */
uint64_t hash_combine(uint64_t seed, uint64_t value) {
    /* Magic constant derived from golden ratio */
    return seed ^ (value + 0x9e3779b97f4a7c15ULL 
                   + (seed << 6) + (seed >> 2));
}
 
/* Usage for composite key (user_id, timestamp) */
uint64_t hash_user_timestamp(uint64_t user_id, uint64_t timestamp,
                              uint32_t bucket_count) {
    uint64_t h = 0;
    h = hash_combine(h, murmurhash3_64(&user_id, sizeof(user_id), 0));
    h = hash_combine(h, murmurhash3_64(&timestamp, sizeof(timestamp), 0));
    return h % bucket_count;
}
 
/* Method 4: Murmur-style mixing (excellent distribution) */
uint64_t hash_composite_murmur(const void **columns, 
                                const size_t *lengths,
                                int num_columns,
                                uint32_t seed) {
    uint64_t h = seed;
    
    for (int i = 0; i < num_columns; i++) {
        uint64_t col_hash = murmurhash3_64(columns[i], lengths[i], h);
        /* Mix with rotation and multiplication */
        h = ((h << 5) | (h >> 59)) ^ col_hash;
        h *= 0xc4ceb9fe1a85ec53ULL;
    }
    
    /* Final mixing */
    h ^= h >> 33;
    h *= 0xff51afd7ed558ccdULL;
    h ^= h >> 33;
    
    return h;
}

NULL Handling in Composite Keys

When columns can be NULL, hash functions must handle this consistently:

NULL as special value: Hash NULL to a fixed constant (e.g., 0 or a magic number)
NULL propagation: If any column is NULL, the entire key hashes to a designated bucket
NULL exclusion: Rows with NULL in indexed columns are excluded from the hash index

Most SQL databases use option 1 or 3. The choice affects query planning—NULL = NULL evaluates to UNKNOWN in SQL, which has implications for hash join probes.

Order Matters for Composite Keys

Evaluating Hash Function Quality

How do you know if your hash function is performing well? Production database systems should monitor hash index quality and alert when distribution becomes pathological.

Chi-Square Test for Uniformity

The chi-square statistic measures deviation from expected uniform distribution:

χ² = Σ (observed_i - expected)² / expected

For a hash index with N buckets and n records, expected records per bucket = n/N.

Interpretation:

χ² ≈ N (degrees of freedom) → good distribution
χ² >> N → significant clustering (bad)
χ² << N → suspiciously uniform (check for errors)

hash_quality_test.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Chi-square test for hash distribution quality */
double compute_chi_square(const uint32_t *bucket_counts, 
                          uint32_t num_buckets, 
                          uint64_t total_records) {
    double expected = (double)total_records / num_buckets;
    double chi_square = 0.0;
    
    for (uint32_t i = 0; i < num_buckets; i++) {
        double diff = bucket_counts[i] - expected;
        chi_square += (diff * diff) / expected;
    }
    
    return chi_square;
}
 
/* Evaluate hash function quality */
void evaluate_hash_quality(const uint32_t *bucket_counts,
                            uint32_t num_buckets,
                            uint64_t total_records) {
    double chi_sq = compute_chi_square(bucket_counts, num_buckets, total_records);
    double expected_chi_sq = num_buckets - 1;  /* degrees of freedom */
    double ratio = chi_sq / expected_chi_sq;
    
    printf("Chi-square statistic: %.2f\n", chi_sq);
    printf("Expected (uniform): %.2f\n", expected_chi_sq);
    printf("Ratio: %.2f\n", ratio);
    
    if (ratio < 0.9) {
        printf("WARNING: Suspiciously uniform (check for bugs)\n");
    } else if (ratio > 1.5) {
        printf("WARNING: Significant non-uniformity detected\n");
    } else if (ratio > 2.0) {
        printf("CRITICAL: Severe clustering, consider different hash function\n");
    } else {
        printf("OK: Distribution within acceptable range\n");
    }
}

Practical Quality Metrics

Beyond chi-square, database administrators should monitor:

Hash Index Quality Metrics
Metric	Healthy Value	Warning Threshold	Action if Exceeded
Max bucket load / Average load	< 3	5	Check for key patterns; consider rehash
Empty bucket %	20-40% (at 70% fill)	60%	Index oversized; consider rebuild
Overflow chain %	< 5%	15%	Index undersized; consider rebuild
Avg overflow chain length	< 2	4	Severe clustering; change hash function
Longest overflow chain	< 10	20	Investigate specific key patterns

Testing with Production Data Patterns

Randomization and Seed Values

Many hash functions accept a seed parameter that modifies their output. Understanding how to use seeds effectively can improve distribution and provide defense against certain attack vectors.

Why Use Seeds?

Breaking systematic patterns: If a hash function clusters on certain key patterns, a different seed may break that clustering.
Universal hashing: Randomly selecting a hash function from a family provides probabilistic guarantees about worst-case performance.
Hash DoS mitigation: Adversaries who know your hash function can craft keys that all collide. Random seeds prevent this (more relevant for hash tables than database indexes, but still applicable).

Universal Hashing Theory

A hash function family H is universal if for any two distinct keys k₁ and k₂:

P[h(k₁) = h(k₂)] ≤ 1/N when h is chosen uniformly at random from H.

With universal hashing, the expected number of collisions for any key is bounded, regardless of the key distribution.

universal_hashing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Simple universal hashing family for integers */
/* h(k) = ((a * k + b) mod p) mod N */
/* where p is prime > key range, a ∈ [1, p-1], b ∈ [0, p-1] */
 
typedef struct {
    uint64_t a;          /* Multiplier, must be non-zero */
    uint64_t b;          /* Offset */
    uint64_t prime;      /* Large prime, > max key value */
    uint32_t num_buckets;
} UniversalHashParams;
 
/* Initialize with random parameters */
UniversalHashParams universal_hash_init(uint32_t num_buckets, uint64_t seed) {
    /* Use seed to generate random a and b */
    srand48(seed);
    
    UniversalHashParams params;
    params.prime = 2305843009213693951ULL;  /* Mersenne prime 2^61 - 1 */
    params.num_buckets = num_buckets;
    params.a = (lrand48() % (params.prime - 1)) + 1;  /* Non-zero */
    params.b = lrand48() % params.prime;
    
    return params;
}
 
/* Universal hash function */
uint32_t universal_hash(const UniversalHashParams *p, uint64_t key) {
    /* Note: For very large keys, use modular arithmetic carefully */
    __uint128_t temp = (__uint128_t)p->a * key + p->b;
    uint64_t h = temp % p->prime;
    return (uint32_t)(h % p->num_buckets);
}
 
/* Usage: different seeds create different hash functions */
UniversalHashParams h1 = universal_hash_init(1009, 12345);
UniversalHashParams h2 = universal_hash_init(1009, 67890);
/* h1 and h2 will distribute the same keys differently */

Seed Persistence

Practical Seed Selection

For database hash indexes:

Generate seed at index creation: Use a cryptographically secure random number generator (e.g., /dev/urandom, CryptGenRandom).
Store seed in index metadata: Include it in the index header page, replicated in the system catalog.
Use seed consistently: All hash computations for that index must use the same seed.
Consider reproducibility: For testing and debugging, allow deterministic seed specification.

-- Hypothetical syntax for seed specification
CREATE HASH INDEX idx_users_email ON users(email) 
  WITH (hash_seed = 0xDEADBEEF, buckets = 10007);

Implementation Best Practices

Implementing hash functions for production database systems requires attention to correctness, performance, and platform compatibility.

Platform Considerations

Cross-Platform Implementation Issues

•Endianness — Hash functions that process multi-byte words must handle byte order. Either use byte-at-a-time processing (slow) or detect and swap bytes on big-endian systems.
•Alignment — Some CPUs require aligned memory access. Hash functions processing 4/8-byte words must handle unaligned input buffers safely.
•Integer overflow — C/C++ signed integer overflow is undefined behavior. Use unsigned types and explicit masking for predictable wrapping.
•Floating point — Never hash floating-point numbers directly. +0.0 and -0.0 are equal but have different representations. NaN ≠ NaN. Convert to canonical integer representation first.

platform_safe_hash.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/* Safe 32-bit integer read, handles unaligned access and endianness */
static inline uint32_t read_u32(const uint8_t *p) {
#if defined(__GNUC__) && defined(__x86_64__)
    /* x86 allows unaligned access, use memcpy for strict aliasing */
    uint32_t v;
    memcpy(&v, p, sizeof(v));
    return v;  /* x86 is little-endian */
#else
    /* Portable: byte-at-a-time, little-endian */
    return (uint32_t)p[0] 
         | ((uint32_t)p[1] << 8)
         | ((uint32_t)p[2] << 16)
         | ((uint32_t)p[3] << 24);
#endif
}
 
/* Safe double hashing - normalize representation */
uint64_t hash_double(double value) {
    /* Handle special cases */
    if (isnan(value)) {
        return 0x7FF8000000000000ULL;  /* Canonical NaN */
    }
    if (value == 0.0) {
        return 0;  /* Both +0.0 and -0.0 hash the same */
    }
    
    /* Use bit representation for normal values */
    uint64_t bits;
    memcpy(&bits, &value, sizeof(bits));
    return murmurhash3_64(&bits, sizeof(bits), 0);
}

Performance Optimization Techniques

Hash Function Optimizations

•Process multi-byte words — Reading 4 or 8 bytes per iteration is faster than byte-by-byte. Use word-aligned loops with tail handling.
•Unroll loops — For fixed-size keys, completely unroll the hash loop. For variable-size, unroll by 4-8 iterations.
•Use SIMD when available — xxHash3 and some MurmurHash variants use SSE/AVX for parallel processing of multiple lanes.
•Minimize data dependencies — Arrange operations so CPU can pipeline. Avoid reading results immediately after computation.
•Precompute constants — Convert floating-point magic constants to integer equivalents at compile time.

Test on Target Hardware

Summary: Hash Function Design

Key Takeaways

•Essential properties — Determinism, uniformity, efficiency, and avalanche effect define hash function quality for indexing.
•Classic algorithms — Division and multiplication methods work for simple cases but have known weaknesses with patterned data.
•Modern algorithms — MurmurHash3 and xxHash provide excellent distribution and speed; prefer these for production systems.
•Composite keys — Require careful combination of per-column hashes; use order-preserving combination functions like hash_combine.
•Quality monitoring — Chi-square tests and bucket load metrics should be used to detect hash function failures in production.
•Seeds and randomization — Random seeds improve distribution and provide security; must be persisted with index metadata.
•Platform safety — Handle endianness, alignment, and type representation carefully for portable, correct implementations.

What's Next:

Page Complete

2 / 5