Underneath every content provenance system is the same primitive: a digital signature. Sign a piece of content with a private key, and anyone with the matching public key can confirm it has not changed and came from that signer. This guide covers how content signing works in practice, why Ed25519 is a strong default, and the two mistakes that quietly break signatures over structured records.
Why sign content at all
A signature gives you two guarantees at once: integrity (the content has not been altered since signing) and authenticity (it was signed by the holder of a specific key). That is exactly the foundation provenance needs. Note what it does not give you: a signature says nothing about whether the content is true or good — only that this signer vouched for these exact bytes.
Content hashing vs the signature
Two operations get conflated here. The first is a content-binding decision your application makes: rather than signing a large media blob, you compute a cryptographic hash of the content and place that hash inside the small, structured record you actually sign. The second is the signature algorithm itself. Pure Ed25519, per RFC 8032, takes the message as its input and hashes it internally with SHA-512 — you do not pre-hash for it; a distinct prehash variant, Ed25519ph, exists for the cases where you deliberately sign a digest. So “hash, then sign” is an application-protocol pattern for binding content into a compact signed record, not a property of Ed25519. Either way, change one byte of the signed bytes and verification fails — that is the tamper-evidence you are after.
Why Ed25519
Ed25519 is an EdDSA signature scheme over the twisted Edwards curve edwards25519 (birationally equivalent to Curve25519), and it has become a sensible default for new systems:
- Small and fast. 32-byte public keys and 64-byte signatures, with quick signing and verification.
- Deterministic. Signing does not depend on a fresh random nonce per message, which removes a whole class of catastrophic failures that have broken other schemes when the randomness was weak.
- Widely available.First-class support in libsodium, WebCrypto, Node’s crypto module, and most modern languages — so cross-platform verification is realistic.
The canonicalization trap
For media files you sign the bytes as-is. For structured records — JSON provenance objects, content envelopes — there is a trap: the same logical record can serialize to different byte strings (key order, whitespace, number formatting, Unicode forms). If the signer and the verifier serialize differently, verification fails on identical data. The fix is canonicalization: agree on one deterministic byte representation, and both sides reconstruct it the same way before hashing.
This is where two specific mistakes bite teams storing records in a database:
- Do not hash the database’s text rendering. Hashing something like a Postgres
jsonb::textoutput couples your signature to that engine’s serialization quirks. Load the record as a parsed object and reconstruct the canonical signing bytes yourself. - Do not normalize Unicode on the hash path. Applying NFC/NFD-style normalization changes bytes and breaks cross-implementation verification. Define the canonical form once and keep every code path on it.
A good rule: there should be exactly one function that builds the to-be-signed bytes, called by both signing and verification, in every language SDK. Reconstructing that logic inline in two places is how the two copies drift apart.
Verification: resolve the key, never trust the record
A signature is only as trustworthy as the public key you check it against. The cardinal rule: never read the verifying key from the record’s own self-claimed metadata. An attacker can put any public key in a field they control. Resolve the signer’s key from an independent source — a key directory keyed by a stable key identifier, or a key you pinned out of band. This is the same lesson that makes C2PArely on trust lists rather than the manifest’s self-description.
Key management and rotation
Real systems rotate keys. Reference each signature by a key identifier so that, when a key is retired, historical records still verify against the public key that was valid when they were signed. Keep private keys out of application logs and error messages entirely — provenance errors should carry opaque rule and key identifiers, never key material, seeds, or plaintext.
From a signature to a provenance record
A bare signature becomes useful provenance when it is wrapped in a structured record: what was signed, by which key, when, with what production context, and a link to the prior record for lineage. Deliver that record alongside the content — as data, not just embedded metadata that gets stripped in transit — and authenticity travels with the record through any channel you control. That is the model behind a headless CMS with a built-in provenance layer. See how Hessian implements it on the product overview.