07 — Performance

Speed Benchmarks

All times measured on an Intel Core i7-1165G7 at 2.8 GHz (single thread).

Operation	ML-KEM-512	ML-KEM-768	ML-KEM-1024	RSA-2048	ECDH P-256
KeyGen	45 µs	78 µs	120 µs	5,200 µs	52 µs
Encapsulate	65 µs	95 µs	145 µs	150 µs*	98 µs
Decapsulate	72 µs	108 µs	168 µs	4,800 µs	52 µs
Total handshake	182 µs	281 µs	433 µs	10,150 µs	202 µs

*RSA "encapsulate" = encryption with public key

	ML-KEM-512	ML-KEM-768	ML-KEM-1024	RSA-2048	ECDH P-256
Public key	800 B	1,184 B	1,568 B	256 B	32 B
Ciphertext/Key share	768 B	1,088 B	1,568 B	256 B	32 B
Total handshake data	1,568 B	2,272 B	3,136 B	512 B	64 B

Protocol	Typical payload	Handshake overhead	ML-KEM-768 impact
TLS 1.3	~2–4 KB	+2.3 KB	Acceptable
HTTP/2	~500 B–20 KB	+2.3 KB	Negligible for most requests
DNS over TLS	~300 B	+2.3 KB	Significant; consider ML-KEM-512
IoT (MQTT)	~50 B	+2.3 KB	Large; consider ML-KEM-512 or FN-DSA for auth
VPN (WireGuard)	Variable	+2.3 KB per handshake	Acceptable; handshake is rare

Scenario	ECDH P-256	ML-KEM-768	ML-KEM-1024
Handshakes/second (single core)	~5,000	~3,500	~2,300
Handshakes/second (8 cores)	~40,000	~28,000	~18,000
Latency at 99th percentile	+0.2 ms	+0.3 ms	+0.5 ms

ML-KEM's polynomial operations (NTT, base conversion, sampling) vectorise well:

Mobile and embedded processors (ARM Cortex-A53 through Apple M-series):

Security-critical: ML-KEM must run in constant time to prevent timing attacks:

Operation	ML-KEM-512	ML-KEM-768	ML-KEM-1024
Stack (KeyGen)	~6 KB	~8 KB	~10 KB
Stack (Encaps)	~5 KB	~7 KB	~9 KB
Stack (Decaps)	~6 KB	~9 KB	~12 KB
Heap (none)	0	0	0

ML-KEM is stack-only — no dynamic allocation needed. Ideal for embedded systems and kernel crypto.