07 — Hash Then Sign
The Principle
ML-DSA never signs a message directly. It always follows:
Message (any size)
|
v
Hash Function
|
v
Fixed-size digest (64 bytes)
|
v
Sign with private key
|
v
Signature
Why?
- Messages can be any size (megabytes, gigabytes)
- Signing algorithms work on fixed-size inputs
- Hashing provides domain separation (messages from different contexts can't collide)
The ML-DSA Hash Function: SHAKE-256
ML-DSA uses SHAKE-256 (a SHA-3 extendable-output function):
| Property | Value | Why |
|---|---|---|
| Algorithm | SHAKE-256 | NIST-standardised, quantum-resistant assumption |
| Output size | 64 bytes (512 bits) | Twice the security level (birthday bound protection) |
| Domain separation | tr (public key hash) + message | Same message with different keys produces different signatures |
The Message Representation
μ = H(tr || M)
Where:
- tr = H(ρ || t) — a 64-byte hash of the public key
- M — the message to sign
- H — SHAKE-256 with 512-bit output
- || — concatenation
Why include tr? Without it, the same message signed by two different signers would have related signatures. An attacker might exploit this relationship. Including tr ensures each signer's signatures are in a completely separate "domain."
Message Encoding for Different Applications
Pure ML-DSA
Sign the raw message bytes:
signature = ML-DSA.Sign(sk, message_bytes)
Pre-hashed ML-DSA (ExternalMu-ML-DSA)
If the message is already hashed (e.g., by a hardware security module):
# HSM computes:
mu = SHAKE-256(tr || message)
# ML-DSA signs the pre-hash:
signature = ExternalMu-ML-DSA.Sign(sk, mu)
This allows hardware offload — the HSM hashes large messages, then sends only the 64-byte digest to the signing module.
Context String (ctx)
ML-DSA supports an optional context string for application-specific domain separation:
signature = ML-DSA.Sign(sk, message, ctx="TLS-1.3-certificate")
The context is prepended to the message hash:
μ = H(tr || ctx || message)
Use cases:
- Separate signatures for different protocol versions
- Distinguish testnet vs. mainnet in blockchain
- Separate code signing from document signing within the same key
Why SHAKE-256 (Not SHA-256)?
| Feature | SHA-256 | SHAKE-256 |
|---|---|---|
| Output size | Fixed (256 bits) | Variable (any length) |
| Security level | 128-bit collision | 256-bit preimage |
| Quantum resistance | 64-bit (Grover) | 128-bit (Grover) |
| Standard | SHA-2 (2001) | SHA-3 (2015) |
| NIST PQC preference | Not recommended | Recommended |
NIST requires PQC algorithms to use SHA-3/SHAKE because:
- SHA-2 and RSA/ECDSA share similar algebraic structure (Merkle-Damgård)
- A breakthrough in one might affect the other (though no such attack is known)
- SHA-3 (Keccak) uses a completely different design (sponge construction)
Collision Resistance Requirements
For a signature scheme to be secure, the hash function must be:
| Property | Required strength | What it prevents | |
|---|---|---|---|
| Preimage resistance | 2256 | Attacker creates a message with a specific hash | |
| Second preimage | 2256 | Attacker modifies a signed message while keeping the same signature valid | |
| Collision resistance | 2^256 | Attacker finds two messages with the same hash (forges signatures) |
SHAKE-256 with 512-bit output provides all three at the 256-bit level.
Resources
- NIST FIPS 202: SHA-3 Standard (Permutation-Based Hash and Extendable-Output Functions)
- NIST FIPS 204, Section 3: Hash Functions
- RFC 9180: Hybrid Public Key Encryption (HPKE) — context string usage