07 — Hash Then Sign

The Principle

ML-DSA never signs a message directly. It always follows:

Message (any size)
      |
      v
   Hash Function
      |
      v
Fixed-size digest (64 bytes)
      |
      v
   Sign with private key
      |
      v
   Signature

Why?

Messages can be any size (megabytes, gigabytes)
Signing algorithms work on fixed-size inputs
Hashing provides domain separation (messages from different contexts can't collide)

The ML-DSA Hash Function: SHAKE-256

ML-DSA uses SHAKE-256 (a SHA-3 extendable-output function):

Property	Value	Why
Algorithm	SHAKE-256	NIST-standardised, quantum-resistant assumption
Output size	64 bytes (512 bits)	Twice the security level (birthday bound protection)
Domain separation	tr (public key hash) + message	Same message with different keys produces different signatures

The Message Representation

μ = H(tr || M)

Where:

tr = H(ρ || t) — a 64-byte hash of the public key
M — the message to sign
H — SHAKE-256 with 512-bit output
|| — concatenation

Why include tr? Without it, the same message signed by two different signers would have related signatures. An attacker might exploit this relationship. Including tr ensures each signer's signatures are in a completely separate "domain."

Message Encoding for Different Applications

Pure ML-DSA

Sign the raw message bytes:

signature = ML-DSA.Sign(sk, message_bytes)

Pre-hashed ML-DSA (ExternalMu-ML-DSA)

If the message is already hashed (e.g., by a hardware security module):

# HSM computes:
mu = SHAKE-256(tr || message)

# ML-DSA signs the pre-hash:
signature = ExternalMu-ML-DSA.Sign(sk, mu)

This allows hardware offload — the HSM hashes large messages, then sends only the 64-byte digest to the signing module.

Context String (ctx)

ML-DSA supports an optional context string for application-specific domain separation:

signature = ML-DSA.Sign(sk, message, ctx="TLS-1.3-certificate")

The context is prepended to the message hash:

μ = H(tr || ctx || message)

Use cases:

Separate signatures for different protocol versions
Distinguish testnet vs. mainnet in blockchain
Separate code signing from document signing within the same key

Why SHAKE-256 (Not SHA-256)?

Feature	SHA-256	SHAKE-256
Output size	Fixed (256 bits)	Variable (any length)
Security level	128-bit collision	256-bit preimage
Quantum resistance	64-bit (Grover)	128-bit (Grover)
Standard	SHA-2 (2001)	SHA-3 (2015)
NIST PQC preference	Not recommended	Recommended

NIST requires PQC algorithms to use SHA-3/SHAKE because:

SHA-2 and RSA/ECDSA share similar algebraic structure (Merkle-Damgård)
A breakthrough in one might affect the other (though no such attack is known)
SHA-3 (Keccak) uses a completely different design (sponge construction)

Collision Resistance Requirements

For a signature scheme to be secure, the hash function must be:

Property	Required strength	What it prevents
Preimage resistance	2^256	Attacker creates a message with a specific hash
Second preimage	2^256	Attacker modifies a signed message while keeping the same signature valid
Collision resistance	2^256	Attacker finds two messages with the same hash (forges signatures)

SHAKE-256 with 512-bit output provides all three at the 256-bit level.

Resources

NIST FIPS 202: SHA-3 Standard (Permutation-Based Hash and Extendable-Output Functions)
NIST FIPS 204, Section 3: Hash Functions
RFC 9180: Hybrid Public Key Encryption (HPKE) — context string usage