Computer NetworksDigital Signatures

Digital Signatures

LevelIntermediate

Duration60 mins

TopicDigital Signatures

2 / 5

Hash Functions

The Foundation of Digital Signatures

Before a digital signature can be created, a remarkable transformation must occur. The message to be signed—whether a 100-character text or a 100-gigabyte file—must be reduced to a fixed-size 'fingerprint' that uniquely represents that message. This fingerprint must be irreversible (you cannot reconstruct the original from it), deterministic (the same input always produces the same output), and collision-resistant (it's practically impossible to find two different inputs that produce the same fingerprint).

This seemingly impossible task is performed by cryptographic hash functions—mathematical engines that are deceptively simple in concept yet extraordinarily powerful in their security properties. They form the invisible foundation upon which the entire edifice of digital signatures rests.

Understanding hash functions isn't merely academic. Weaknesses in hash algorithms have broken real-world security systems. The transition from MD5 to SHA-1 to SHA-256 represents an ongoing evolution driven by attacks and mathematical advances. As a security practitioner, understanding hash functions means understanding the bedrock upon which countless protocols depend.

What You Will Learn

By the end of this page, you will deeply understand what cryptographic hash functions are, their essential properties (preimage resistance, second preimage resistance, collision resistance), how they transform arbitrary data into fixed-size digests, the major hash algorithms (MD5, SHA family, SHA-3), and why hash function security is critical for digital signature integrity. You'll gain the knowledge to evaluate hash algorithm choices in security protocols.

What Is a Hash Function?

At its most fundamental, a hash function is a mathematical function that takes an input of arbitrary length and produces an output of fixed length. Think of it as a compression algorithm—but one designed for security rather than reversibility.

Formal Definition: A hash function H maps input strings of any length to output strings of fixed length n:

H: {0,1}* → {0,1}ⁿ

Where {0,1}* represents all possible bit strings of any length (including empty), and {0,1}ⁿ represents all bit strings of exactly n bits.

The Essence of Hashing: When you hash data, you're creating a digest—a compact, fixed-size representation of potentially massive input data. A SHA-256 hash of a novel, for instance, will be exactly 256 bits (32 bytes) regardless of whether the novel is 100 pages or 1,000 pages. That same 256-bit output format applies equally to a single character, an operating system image, or the entire contents of a database.

The Pigeonhole Principle: Here's an immediate mathematical reality: since hash outputs are fixed size, there are a finite number of possible outputs (2ⁿ for an n-bit hash). But the set of possible inputs is infinite. By the pigeonhole principle, collisions must exist—different inputs that produce identical outputs.

This isn't a flaw; it's mathematically inevitable. The security property isn't that collisions don't exist, but that finding them is computationally infeasible.

Hash Function Characteristics

•Fixed Output Size — Regardless of input length, the output is always the same size (e.g., 256 bits for SHA-256)
•Deterministic — The same input always produces exactly the same output, with no randomness
•Efficient — Computing the hash should be fast, even for large inputs
•One-Way — Given the output, it should be computationally infeasible to determine the input
•Avalanche Effect — Small changes in input cause drastic, unpredictable changes in output
•Collision Resistant — It should be computationally infeasible to find two inputs that produce the same output

Hash vs Encryption

Unlike encryption, hashing is intentionally irreversible. Encryption transforms data so it can later be recovered with a key. Hashing destroys information deliberately—you cannot 'unhash' a hash. This one-way property is essential for security applications like password storage, data integrity verification, and digital signatures.

The Three Security Properties

Cryptographic hash functions must satisfy three rigorous security properties. Understanding these properties—and the distinctions between them—is essential for evaluating hash function security and its implications for digital signatures.

Property 1: Preimage Resistance (One-Wayness)

Given a hash output h, it should be computationally infeasible to find any input m such that H(m) = h.

This is the fundamental one-way property. If you're given a hash value, you cannot reverse-engineer what was hashed. For an n-bit hash, the expected effort to find a preimage is 2ⁿ hash computations (brute force).

Why it matters for signatures: If an attacker could find preimages, they could create messages that produce any desired hash, enabling signature forgery.

Property 2: Second Preimage Resistance (Weak Collision Resistance)

Given an input m₁, it should be computationally infeasible to find a different input m₂ such that H(m₁) = H(m₂).

This is stronger than preimage resistance—you're given a specific message and must find a collision with it. For an n-bit hash, the expected effort is still 2ⁿ hash computations.

Why it matters for signatures: If an attacker obtains a legitimately signed document, they shouldn't be able to create a different document with the same hash (and thus the same valid signature).

Property 3: Collision Resistance (Strong Collision Resistance)

It should be computationally infeasible to find any two different inputs m₁ and m₂ such that H(m₁) = H(m₂).

Note the key difference from second preimage resistance: the attacker has complete freedom to choose both messages. This is significantly easier to attack due to the birthday paradox—the expected effort is only 2^(n/2) hash computations.

Comparison of Hash Security Properties
Property	Given	Find	Difficulty (n-bit hash)	Implication if Broken
Preimage Resistance	Hash h	Any message m where H(m) = h	2ⁿ operations	Can forge any signature
Second Preimage Resistance	Message m₁	Different m₂ where H(m₁) = H(m₂)	2ⁿ operations	Can forge existing signatures
Collision Resistance	Nothing (free choice)	Any m₁, m₂ where H(m₁) = H(m₂)	2^(n/2) operations	Can create fraudulent signed pairs

The Birthday Paradox

Collision resistance is fundamentally weaker than preimage resistance due to the birthday paradox. In a room with 23 people, there's a >50% chance two share a birthday—far fewer than the 183 you'd expect. Similarly, after about 2^(n/2) hash computations, there's a good chance of finding a collision. This is why 256-bit hashes (128-bit collision resistance) replaced 160-bit hashes (80-bit collision resistance) as attacks improved.

The Birthday Paradox and Its Implications

The birthday paradox is one of the most counterintuitive results in probability theory, yet it has profound implications for cryptographic hash function security. Understanding it deeply is essential for grasping why hash outputs must be sufficiently long.

The Classic Birthday Problem: How many people must be in a room before there's a 50% probability that at least two share a birthday? Intuitively, you might guess 183 (half of 365). The actual answer is just 23 people—remarkably fewer.

Why It's Not Intuitive: We're not asking whether someone shares your birthday (which would indeed require ~183 people for 50% probability). We're asking whether any pair shares a birthday. With 23 people, there are C(23,2) = 253 potential pairs, each with probability 1/365 of matching. This large number of pairs dramatically accelerates collision probability.

The Mathematical Foundation: For n possible values (like 365 days or 2²⁵⁶ hash outputs), the number of samples needed for a 50% collision probability is approximately:

k ≈ 1.17 × √n ≈ √n

For hash functions:

A 128-bit hash has 2¹²⁸ possible outputs → ~2⁶⁴ hashes for 50% collision probability
A 256-bit hash has 2²⁵⁶ possible outputs → ~2¹²⁸ hashes for 50% collision probability

Implications for Hash Function Design: This square-root relationship means collision resistance is fundamentally weaker than preimage resistance. If you need 128 bits of collision resistance (currently considered secure), you need a 256-bit hash output. A 128-bit hash provides only 64 bits of collision resistance—easily broken with modern computing power.

Birthday Attack Impact

•128-bit hash → 2⁶⁴ collision resistance (weak)
•160-bit hash → 2⁸⁰ collision resistance (marginal)
•MD5 broken with 2¹⁸ operations (practical)
•SHA-1 broken with 2⁶³ operations (demonstrated)
•Real attacks exist against older hashes

Modern Hash Strength

•256-bit hash → 2¹²⁸ collision resistance (strong)
•384-bit hash → 2¹⁹² collision resistance (very strong)
•512-bit hash → 2²⁵⁶ collision resistance (extreme)
•SHA-256 remains unbroken as of 2024
•SHA-3 provides alternative algorithm family

Security Margin

Cryptographers apply a security margin—choosing hash sizes larger than the absolute minimum. While 2¹²⁸ operations seems unfeasible today, increases in computing power, algorithmic improvements, and potential quantum attacks motivate using 256+ bit hashes. SHA-256 providing 128-bit collision resistance includes a generous margin over currently feasible attacks.

The Avalanche Effect

One of the most fascinating properties of cryptographic hash functions is the avalanche effect—the phenomenon where tiny changes in input create massive, unpredictable changes in output. This property is essential for security and is deliberately engineered into hash function designs.

The Strict Avalanche Criterion: Formally, a hash function satisfies the strict avalanche criterion if, when any single input bit is flipped, each output bit changes with approximately 50% probability. In practice, changing one bit of input should change roughly half of the output bits, and the pattern of changed bits should appear random.

Why Avalanche Matters:

Unpredictability: If small input changes caused small output changes, attackers could iteratively modify inputs to find collisions or preimages. The avalanche effect makes hash outputs appear random, thwarting such attacks.
No Structural Leakage: The output should reveal nothing about the input's structure. Two messages that are 99.99% identical should have completely independent-looking hash outputs.
Uniform Distribution: Outputs should be uniformly distributed across the hash space, regardless of input patterns. All hash values should be equally likely.

Demonstrating Avalanche: Consider hashing the strings 'Hello' and 'hello' with SHA-256. Despite differing by only one bit (the case of 'H'), the outputs are completely different:

SHA256('Hello') = 185f8db3271...
SHA256('hello') = 2cf24dba5fb...

Approximately 128 of the 256 output bits differ—exactly what we'd expect from random outputs.

Avalanche Effect Demonstration
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import hashlib
 
def count_differing_bits(hash1: bytes, hash2: bytes) -> int:
    """Count the number of bits that differ between two hashes."""
    diff = 0
    for b1, b2 in zip(hash1, hash2):
        xor = b1 ^ b2
        diff += bin(xor).count('1')
    return diff
 
# Original message and slightly modified version
msg1 = b"Hello, World!"
msg2 = b"Hello, World?"  # Changed '!' to '?'
 
hash1 = hashlib.sha256(msg1).digest()
hash2 = hashlib.sha256(msg2).digest()
 
differing_bits = count_differing_bits(hash1, hash2)
total_bits = len(hash1) * 8
 
print(f"Original: {hash1.hex()}")
print(f"Modified: {hash2.hex()}")
print(f"Bits differing: {differing_bits} out of {total_bits}")
print(f"Percentage changed: {100 * differing_bits / total_bits:.1f}%")
 
# Output typically shows ~50% of bits differ (around 128 bits for SHA-256)

Avalanche in Hash Internals

The avalanche effect doesn't happen by accident—it's deliberately engineered. Hash functions use operations like modular addition, bitwise XOR, bitwise rotation, and non-linear S-boxes that propagate differences aggressively. After several rounds of these operations, input differences have 'avalanched' throughout the entire state, producing the characteristic 50% bit-flip rate.

Major Hash Algorithms

The evolution of hash algorithms reflects the ongoing battle between cryptographers and attackers. Each generation of algorithms has addressed weaknesses discovered in predecessors while preparing for anticipated future threats.

MD5 (Message Digest 5) — 1991: Designed by Ron Rivest, MD5 produces a 128-bit hash. Once ubiquitous, it's now considered cryptographically broken:

Collision attacks demonstrated in 2004 (practical collisions found in minutes)
Real-world attacks include the Flame malware (2012), which used MD5 collisions to forge Microsoft certificates
Still occasionally seen for non-security checksums but must never be used for signatures

SHA-1 (Secure Hash Algorithm 1) — 1995: Designed by the NSA and published by NIST, SHA-1 produces a 160-bit hash. Now considered deprecated:

Theoretical attacks published in 2005
Practical collision demonstrated in 2017 by Google/CWI (SHAttered attack)
Phased out of TLS certificates; modern browsers reject SHA-1 signed certificates

SHA-2 Family — 2001: Also NSA-designed, this family includes SHA-224, SHA-256, SHA-384, and SHA-512. Currently considered secure:

SHA-256 produces 256-bit hashes; most widely deployed for signatures
No practical attacks; fully secure as of 2024
Used in Bitcoin, TLS, code signing, and countless security protocols

SHA-3 (Keccak) — 2015: Winner of NIST's hash function competition, SHA-3 uses a completely different internal design (sponge construction) from SHA-2:

Provides a fallback if SHA-2 is ever broken
Same output sizes as SHA-2 (224, 256, 384, 512 bits)
Not as widely deployed as SHA-2 but gaining adoption
Different internal structure means attacks on SHA-2 wouldn't apply to SHA-3

Hash Algorithm Comparison
Algorithm	Output Size	Security Status	Recommended Use
MD5	128 bits	❌ Broken (collisions found)	Never use for security; legacy checksums only
SHA-1	160 bits	⚠️ Deprecated (practical collisions)	Avoid; migrate existing systems
SHA-224	224 bits	✅ Secure (truncated SHA-256)	Uncommon; use SHA-256 instead
SHA-256	256 bits	✅ Secure (standard choice)	General-purpose security, signatures
SHA-384	384 bits	✅ Secure (truncated SHA-512)	Higher security requirements
SHA-512	512 bits	✅ Secure (maximum strength)	Highest security; quantum resistance margin
SHA3-256	256 bits	✅ Secure (alternative design)	Fallback if SHA-2 compromised
BLAKE2/BLAKE3	Variable	✅ Secure (modern, fast)	High-performance applications

Legacy System Risk

Many legacy systems still use MD5 or SHA-1 for signatures. The SHAttered attack demonstrated creating two different PDF documents with the same SHA-1 hash, enabling signature transfer attacks. If you encounter systems using deprecated hashes, prioritize migration—cryptographic attacks only get better over time, never worse.

How Hash Functions Work Internally

Understanding how hash functions operate internally illuminates their security properties and helps explain why certain attacks succeed or fail. Most hash functions use one of two general construction paradigms.

Merkle-Damgård Construction (MD5, SHA-1, SHA-2):

The classic approach, used by the MD and SHA families:

Message Padding: The input is padded to a multiple of the block size (typically 512 or 1024 bits), including the original message length
Initialization: An internal state (called the chaining value) is set to a fixed initialization vector (IV)
Compression Iteration: For each message block:
- The compression function takes the current state + message block
- It produces a new state
- This state becomes input for the next block
Finalization: The final state is the hash output

The compression function is the heart of security—it must mix inputs thoroughly to achieve avalanche and resist reversibility.

Sponge Construction (SHA-3/Keccak):

A newer paradigm with different security properties:

Absorption Phase: Message blocks are XORed into a portion of the internal state, then the entire state is permuted
Squeezing Phase: Output blocks are extracted from the state, with permutations between each extraction

The sponge construction naturally supports variable-length output and provides a clean theoretical security model.

Converting Mermaid diagram...

Length Extension Attacks

A subtle weakness in Merkle-Damgård construction: knowing H(m) allows computing H(m || padding || m') without knowing m. This 'length extension attack' affects naive authentication schemes. SHA-3's sponge construction and HMAC (Hash-based MAC) avoid this issue. Modern signature schemes are not vulnerable because they don't rely on hash-only authentication.

Hash Functions in Digital Signatures

Now we connect hash functions to their primary role in this module: enabling efficient and secure digital signatures. The partnership between hashing and signing is symbiotic—each solves problems the other cannot.

Why Sign Hashes, Not Messages?

Performance: Asymmetric operations are slow. RSA signature generation involves modular exponentiation with 2048+ bit numbers. Signing a hash (256 bits) rather than a document (potentially gigabytes) provides massive speedup:
- Hashing: ~GB/second on modern CPUs
- RSA signing: ~1000 operations/second
- Signing a 1GB file directly: impractical
- Signing a 32-byte hash of 1GB file: milliseconds
Fixed Input Size: Signature algorithms expect fixed-size inputs. RSA operates on integers modulo n; ECDSA on scalars in a finite field. Hashing provides the required fixed-size input regardless of document size.
Security Uniformity: By hashing first, the signing algorithm always receives uniformly distributed input. This prevents attacks that might exploit structure in the original message.

The Signature Process With Hashing:

Signature = Sign(PrivateKey, Hash(Document))

What this achieves:

The signature covers the entire document because any change alters the hash
Signature size is constant regardless of document size
Verification is fast: hash the document, verify the signature on the hash

Security Implications

•Hash weaknesses break signatures: If an attacker finds a collision, they can create two documents with the same hash. Getting one signed by a victim means the signature is valid for both—even the malicious one.
•Signature algorithm security alone is insufficient: Even a perfect signature algorithm is useless if built on a broken hash. This is why standards mandate specific hash algorithms alongside signature algorithms.
•Algorithm agility is crucial: Protocols should support hash algorithm migration. SHA-1 deprecation in TLS required infrastructure-wide updates.
•Hash output size matters: The signature covers the hash; if collisions are feasible, signatures can be forged. Minimum 256-bit hashes for modern security.

The MD5 Certificate Attack

In 2008, researchers demonstrated issuing fraudulent SSL certificates by exploiting MD5 collisions. They created a legitimate certificate request and a rogue CA certificate with the same MD5 hash. When the CA signed the legitimate request, the signature was also valid for the rogue certificate—enabling massive MITM attacks. This real-world attack prompted the industry to deprecate MD5 for signatures.

Choosing Hash Algorithms for Security

Selecting the appropriate hash algorithm for digital signatures requires balancing security strength, performance requirements, and compatibility constraints. Here's a decision framework:

Security Requirements:

Collision Resistance Level: For 128-bit security (currently standard), use 256-bit hashes (SHA-256, SHA3-256). For 192-bit or 256-bit security levels, use SHA-384 or SHA-512.
Algorithm Family Diversity: If depending on a single algorithm family is risky (e.g., all signatures use SHA-2), consider SHA-3 for some applications as a hedge against future SHA-2 vulnerabilities.
Post-Quantum Considerations: Quantum computers don't directly break hash functions, but Grover's algorithm halves the effective security. SHA-256's 128-bit collision resistance becomes ~85 bits against quantum attackers—still considered adequate, but SHA-384 or SHA3-384 provide margin.

Performance Considerations:

SHA-256 and SHA-512 perform well on 64-bit platforms (SHA-512 can be faster on 64-bit CPUs)
SHA3-256 is generally slower than SHA-256 but competitive on hardware with SHA-3 instructions
BLAKE2 and BLAKE3 offer better performance than SHA-2/SHA-3 with equivalent security

Compatibility:

SHA-256 has the widest support across libraries, hardware, and standards
Some legacy systems may only support SHA-1 (requiring careful risk assessment)
FIPS compliance may mandate specific algorithms (SHA-2 or SHA-3)

Hash Algorithm Selection Guide
Use Case	Recommended Algorithm	Rationale
General digital signatures	SHA-256	Standard choice; excellent security/performance balance
High-security documents	SHA-384 or SHA-512	Additional security margin; future-proofing
FIPS compliance required	SHA-256/384/512 or SHA3	FIPS 180-4 and 202 approved
Algorithm diversity needed	SHA3-256	Different design family hedges against SHA-2 attacks
Performance-critical signing	BLAKE2b-256	Faster than SHA-2; established security
Blockchain/cryptocurrency	SHA-256, Keccak-256	Industry standards; proven in adversarial environments
Code signing	SHA-256 minimum	Long-term validity requires strong algorithms

The Safe Default

When in doubt, use SHA-256. It's universally supported, well-analyzed, fast on modern hardware, and provides a comfortable security margin. Only deviate for specific requirements: performance (BLAKE3), higher security (SHA-384/512), or algorithm diversity (SHA-3).

Summary: Mastering Hash Functions

Hash functions are the unsung heroes of digital security—invisible yet indispensable. Our deep exploration has revealed their fundamental role in making digital signatures practical and secure. Let's consolidate the key insights:

Key Takeaways

•Hash functions compress arbitrary data to fixed-size digests — Enabling efficient signing of any-size documents with constant-size signatures.
•Three security properties are essential — Preimage resistance, second preimage resistance, and collision resistance each protect against different attacks.
•The birthday paradox halves effective security — A 256-bit hash provides only 128 bits of collision resistance; algorithm selection must account for this.
•The avalanche effect ensures unpredictability — Tiny input changes cause dramatic, random-looking output changes, thwarting structured attacks.
•MD5 and SHA-1 are broken for signatures — Practical collision attacks exist; never use these for new systems.
•SHA-256 is the current standard — Universally supported, well-analyzed, and providing adequate security margin.
•SHA-3 provides algorithm diversity — A completely different design offers a fallback if SHA-2 vulnerabilities emerge.
•Hash weaknesses directly break signatures — The security of a signature scheme is bounded by its hash function's collision resistance.

What's Next:

With hash functions fully understood, we're ready to examine the signing process itself. The next page explores how private keys and hash digests combine through mathematical operations to produce unforgeable signatures—covering RSA signatures, DSA, ECDSA, and EdDSA in detail.

Page Complete

You now possess deep understanding of cryptographic hash functions—their properties, algorithms, attacks, and critical role in digital signatures. This foundation prepares you to fully appreciate the signature generation process covered next.

2 / 5

Loading learning content...

Computer NetworksDigital Signatures

Digital Signatures

LevelIntermediate

Duration60 mins

TopicDigital Signatures

2 / 5

Hash Functions

The Foundation of Digital Signatures

What You Will Learn

What Is a Hash Function?

Formal Definition: A hash function H maps input strings of any length to output strings of fixed length n:

H: {0,1}* → {0,1}ⁿ

Where {0,1}* represents all possible bit strings of any length (including empty), and {0,1}ⁿ represents all bit strings of exactly n bits.

This isn't a flaw; it's mathematically inevitable. The security property isn't that collisions don't exist, but that finding them is computationally infeasible.

Hash Function Characteristics

•Fixed Output Size — Regardless of input length, the output is always the same size (e.g., 256 bits for SHA-256)
•Deterministic — The same input always produces exactly the same output, with no randomness
•Efficient — Computing the hash should be fast, even for large inputs
•One-Way — Given the output, it should be computationally infeasible to determine the input
•Avalanche Effect — Small changes in input cause drastic, unpredictable changes in output
•Collision Resistant — It should be computationally infeasible to find two inputs that produce the same output