checkrd

Identity & signing

How agents prove who they are, how telemetry is signed end-to-end, and how policy bundles are verified inside the WASM core.

Identity & signing

Every Checkrd agent has a cryptographic identity. That identity signs every telemetry batch, authenticates the agent to the control plane, and gates which policy bundles the agent will accept. This page explains the primitives, the trust boundaries, and what you need to manage operationally.

The design uses only IETF-standard primitives — Ed25519 (RFC 8032), HTTP Message Signatures (RFC 9421), Content-Digest (RFC 9530), and DSSE for policy bundles — so any conformant implementation can verify or interop with the SDK.

Agent identities

When the SDK initializes, it loads an Ed25519 keypair. The private key signs outbound telemetry; the public key is registered with the control plane on first use and looked up server-side to verify subsequent batches.

There are three ways to give an agent its identity:

SourceWhen to useStorage
Local file (default)Single-process agents, dev~/.checkrd/identity.key, mode 0600
from_bytes()Containers, deterministic identityMounted secret, env var (base64)
External signer (KMS)Production fleets, key rotation policiesKMS — key never leaves HSM

The local-file path is the path of least resistance. For production fleets, prefer an external signer: the SDK calls into your KMS or HSM for each signature and the private key never appears in process memory.

python
from checkrd import Checkrd, LocalIdentity

# Default: load from ~/.checkrd/identity.key, generate if absent
client = Checkrd(api_key="ck_live_...")

# Deterministic: identity bytes from a mounted secret
identity = LocalIdentity.from_bytes(open("/var/run/secrets/identity").read())
client = Checkrd(api_key="ck_live_...", identity=identity)

Once the engine is bound, the private key is zeroized from process memory — only WASM holds a copy. That bounds blast radius if the host process is later compromised.

How telemetry is signed

Every batch the SDK sends to the control plane carries three headers:

HeaderRFCPurpose
Signature-Input9421Which fields are signed and the signing parameters
Signature9421The Ed25519 signature itself
Content-Digest9530SHA-256 of the body, bound into the signature

The signature covers the HTTP method, target URI, the Content-Digest, the signing agent's ID, the algorithm (ed25519), and created / expires parameters. The expires window is 5 minutes; the control plane rejects anything older than the wall clock to bound replay.

The signing happens inside the WASM core via the sign_telemetry_batch FFI export. Wrappers do the I/O; the bytes that get signed are exactly the bytes that go on the wire.

TELEMETRY_SIGNATURE_MODE controls server enforcement:

  • off — server accepts unsigned batches
  • warn (default) — server accepts but logs unsigned batches
  • required — server rejects unsigned batches with HTTP 401

Anonymous identities (KMS without a registered pubkey) fall back to unsigned with a one-shot warning.

How policy bundles are verified

When the dashboard publishes a new policy, the control plane signs the bundle with the policy-signing key (held in AWS Secrets Manager as checkrd/prod/policy-signing-key) and pushes it to every connected agent over a long-lived SSE stream.

The agent doesn't trust the wire delivery. Every bundle goes through the WASM core's reload_policy_signed entry point, which verifies in-WASM against a trust list pinned in the SDK at compile time:

  • wrappers/python/src/checkrd/_trust.py
  • wrappers/javascript/src/_trust.ts

The trust list contains the public keys the SDK considers authoritative for policy bundles. A bundle signed by any other key is rejected and the previous policy stays in place. Pre-publish CI guards (policy trust-status for Python, verify-trust-roots.mjs for JavaScript) block tag-driven publishes if the embedded trust list is empty against api.checkrd.io.

The verification path:

  1. DSSE envelope parse — wraps the policy YAML in a typed, payload-bound envelope per the DSSE spec.
  2. Signature verification — Ed25519 against the trust list, in constant time.
  3. Monotonic version checklast_policy_version must strictly increase. Rollback to an older signed bundle is rejected even if the signature is valid.
  4. Freshness check — bundle's created timestamp must be within 24 hours by default.
  5. Cross-type binding — the DSSE payloadType field is bound into the signature, so a telemetry batch signature can't be replayed as a policy.

On any failure the previous policy stays installed and a structured warning fires (PolicySignatureError.code carries the stable reason label for dashboard grouping).

Key rotation

The policy-signing key rotates on an overlap-window pattern:

  1. Generate the new key.
  2. Add the new public key to the SDK trust lists.
  3. Cut a new SDK release that ships both old and new pubkeys.
  4. Wait for the rollout window (typically 30 days).
  5. Switch the control plane to sign with the new private key.
  6. After all agents have updated, drop the old pubkey in a follow-up release.

Until step 6 ships, both old and new keys are accepted, so a botched rotation never strands customers. The full operator runbook lives in KEY-CUSTODY.md (internal).

Agent identities (the per-agent keypairs) are independent and rotate on their own schedule — a new keypair is generated whenever the SDK runs without an existing key file, and registered with the control plane on first telemetry post.

What's verifiable independently

Both SDKs ship with reproducible test corpora so you can audit the crypto surface without trusting Checkrd's CI:

  • RFC 8032 §7.1 Ed25519 reference vectors, plus the full Project Wycheproof v1 Ed25519 set (150 vectors, 0 failures).
  • RFC 9421 §B.2.6 worked example, byte-for-byte.
  • DSSE spec compliance against the secure-systems-lab/dsse fixture set.
  • Mutation tests (cargo-mutants) on the verification primitives achieve 100% kill rate. Any silent weakening of a signature check fails the test suite.

The WASM core itself ships with an integrity SHA-256 checked at import time, and PEP 740 attestations / npm provenance bind the binary to the GitHub workflow that built it. See WASM-CORE.md in the SDK repo for the verification recipe.

See also