How a solo archival system gets built in public
Superseded by ADR 0004 — Two-layer lifecycle model and ADR 0005 — Evidence-based lifecycle transitions. Retained as historical record of the live pressure-testing session and the empirical findings that led to those decisions.
Last reviewed: 2026-04-20
This note captures the follow-up from live pressure-testing of full-run behavior and current archive semantics. It is a maintainer note, not a historical handoff.
Three semantics issues showed up when comparing qualifying full-run manifests, current csj_jobs/ projections, and live CSJ behavior:
withdrawn_confirmed records kept broad status="active", so status-only downstream logic treated them as live.expired_sid_redirect was being treated as strong enough to promote a role to withdrawn_confirmed, even though live pressure-testing showed it was only medium-confidence evidence.Observed before the fix:
withdrawn_confirmed recordswithdrawn_confirmed records hot / mutableThe important architectural conclusion was:
status is not specific enough to distinguish true live roles from missing_unconfirmed / withdrawn_confirmedlifecycle_statusObserved before the fix:
csj_jobs/{reference}.json projection could still remain closed if a previously closed role reappeared in search and was skipped/deduped rather than refetchedThat means:
expired_sid_redirectLive pressure-testing showed:
is_probable_csj_homepage() is broad enough to match live detail pages too; the reference-text check is what prevents many false positivesConclusion:
expired_sid_redirect is useful evidence that the stored URL is no longer resolving to a vacancy pagehttp_404 or explicit “no longer available” copyChanged in csj/run.py:
collect_refresh_refs() now resolves lifecycle state and only refreshes records whose effective lifecycle state is activewithdrawn_confirmed and missing_unconfirmed records are no longer re-queued just because broad status still says activeChanged in csj/run.py:
refresh_existing_listing_state() now reactivates any non-active current projection when the reference is seen in search again, including previously closed recordsstatus as a changed field when broad status changed tooThis keeps the mutable current projection aligned with current search presence more reliably.
Changed in csj/tiering.py:
withdrawn_confirmed is now classified using the closed/withdrawn branch before the broad status="active" branch can short-circuit itThis prevents withdrawn records from remaining indefinitely hot / mutable purely because of the broad status field.
expired_sid_redirect no longer promotes to withdrawn_confirmedChanged in csj/lifecycle.py:
verify_missing_job_url() now keeps expired_sid_redirect as missing_unconfirmed (with low/medium confidence depending on repetition), not withdrawn_confirmedevaluate_repair_action() now only accepts strong confirmation (http_404, withdrawn_text, or future high-confidence evidence) for promotion to withdrawn_confirmedThe following regression expectations were locked in:
withdrawn_confirmed recordsexpired_sid_redirect remains missing_unconfirmed even after repeated missing runsexpired_sid_redirect evidence to withdrawn_confirmedThese are still true after the fix:
status = broad statelifecycle_status = specific stateThat can work, but maintainers must remember that status="active" does not necessarily mean “currently live and healthy” unless lifecycle_status is also active.
Run manifests remain stronger evidence than the mutable current projection for point-in-time questions.
expired_sid_redirect is still operationally useful and may justify wording like “likely disappeared” in monitoring/reporting, but it should not be treated as archival confirmation.Use this interpretation order when reasoning about a role:
lifecycle_status for specific lifecycle meaningstatus only as a broad bucket, not a source of fine-grained truthIf those layers disagree, trust the more specific / more evidential one, not the broad convenience field.