Building The Longhand Archive

How a solo archival system gets built in public

View My GitHub Profile

Superseded by ADR 0007 — Mutable projections and immutable evidence. Retained as historical record of the detailed normalisation policy and the metadata categories considered.


CSJ historical snapshot normalization policy

Last reviewed: 2026-04-20

This note defines a conservative policy for whether historical snapshots should ever be rewritten purely to add newer metadata fields.

Short answer:

Why this matters

The collector now has newer metadata concepts such as:

Older historical snapshots were created before some of these concepts existed. That naturally raises the question:

For a normal application database, the answer is often yes. For an archive, the answer should be much more conservative.

Core policy

Current job records and current asset manifests

Safe to rewrite for metadata normalization.

Reason:

Historical job snapshots (csj_history/)

Do not rewrite in place for metadata-only normalization by default.

Reason:

Historical asset snapshots (csj_asset_history/)

Same rule: do not rewrite in place by default.

Event logs (*.jsonl)

Never rewrite in place except for severe repair/recovery cases with explicit archival justification.

Default stance

Use this rule:

That gives you:

What to do instead of rewriting old snapshots

Prefer one of these approaches.

Option A — leave older snapshots as historically authentic

This is the default.

Interpretation rule:

This is often the cleanest archival answer.

Option B — add companion derived indexes/manifests

If you need uniform querying, create separate derived artifacts such as:

These should be clearly marked as:

This is usually the best option when you want both:

Option C — versioned migration outputs

If you ever truly need normalized historical snapshots, do not silently rewrite originals. Instead:

For example:

That preserves provenance.

When rewriting an old historical snapshot may be acceptable

Only consider in-place rewrite if all of the following are true:

  1. the change corrects a demonstrable storage corruption or broken serialization problem
  2. the original content cannot be meaningfully interpreted without repair
  3. the repair is explicitly documented
  4. the repair event itself is recorded as archive maintenance provenance
  5. the rewrite does not silently change the substantive historical meaning

Even then, prefer writing a repair note or companion record over mutating the original if possible.

Metadata categories and rewrite policy

Safe for current-layer backfill only

These belong to archive-management semantics, not historical source truth. They are fine to add to current records/manifests. They should not be silently injected into older immutable evidence by default.

Provenance fields requiring extra caution

These should only be added to older historical snapshots if you actually know them, not because they are useful now. If unknown, leave them absent or null in current derived overlays, not fabricated in immutable originals.

Never fabricate

Do not invent later values for older snapshots such as:

Policy 1

Keep the new --backfill-archive-metadata mode limited to:

Policy 2

Do not extend that mode to:

Policy 3

If analysts need uniform historical metadata, add a separate derived reporting/index layer rather than mutating immutable evidence.

Policy 4

If a future one-time migration of history is ever considered, require:

Maintainer rule of thumb

Ask this before rewriting an old historical snapshot:

“Am I repairing evidence, or am I making old evidence look more like current expectations?”

If the answer is the second one, do not rewrite it in place.

Current recommendation

Given the current CSJ state, the best policy is:

That preserves archive integrity while still letting the active system evolve.