Building The Longhand Archive

How a solo archival system gets built in public

View My GitHub Profile

CSJ Collector Status Handoff

Last updated: 2026-04-19

Executive summary

The CSJ Collector is now in a good modular state and remains fully functional.

It has been reframed from “collector” to collector in architecture/documentation terms, while preserving compatibility-sensitive names/paths where needed.

The current working approach is:


What has been accomplished

1. Browser path removed

Browser-use/browser fallback support was removed intentionally. The collector is now native-only.

Result:

2. Core modular refactor completed

The old monolithic implementation has been split into focused modules.

Current code layout:

3. Cleanup pass completed

4. Terminology pass started and applied safely

Safe wording has been moved toward Collector in:

Compatibility-sensitive names were intentionally left unchanged, including:

5. Documentation/architecture notes added inside the skill

The following references now exist and should be maintained as we go:

6. Workspace plan/status doc updated

/root/.hermes/workspace/csj/CSJ-MODULAR-REFACTOR-PLAN.md has been updated so it reflects the current post-refactor state rather than the older future-state plan.


Current operational status

Tests

Most recent verified state:

CLI smoke

Verified:

Help output identifies the tool as:

Lifecycle repair / reactivation bug note

A real lifecycle bug was found and fixed after hardening work:

Historical duplicate repair-closure entries from earlier runs still exist in the archive, but the current-state reactivation risk in that path appears resolved.


Important compatibility decisions

These were made deliberately and should not be changed casually:

Keep stable for now

Safe to continue changing now

Defer until there is an explicit migration plan


These are the best next actions inside this skill.

Option A — keep documenting while continuing collector work

When changing the CSJ Collector, continue updating:

This is the default low-risk path.

Option B — add non-breaking metadata hints in summaries/reports

Potentially start emitting or documenting conceptual metadata such as:

Important: do this first in docs/summaries if needed, not as a breaking schema rewrite.

Option C — identify likely future shared archive-core concepts

A useful next design task would be to explicitly mark which current modules/functions are most likely to become shared later across The Longhand Archive.

Strong candidates include:

But this should stay conceptual until a second collector exists.

Option D — continue feature work on CSJ itself

If there is a collector improvement/bugfix/new archival behavior to implement, the architecture is now clean enough to do that work with less risk.


Working doctrine going forward

Use this principle for future work:

Build The Longhand Archive iteratively by evolving the CSJ Collector first. Generalize only from working, tested collector behavior. Keep planning and architecture inside this skill until a second collector justifies extraction.


If resuming later

Start by reading:

  1. this handoff note
  2. references/csj-collector-architecture.md
  3. references/csj-archive-envelope.md
  4. references/csj-field-mapping.md
  5. references/refresh-lifecycle-edge-cases.md

Then verify current operational health with:

pytest -q ~/.hermes/skills/research/civil-service-jobs-collector/tests
python3 ~/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py --help
python3 ~/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py --repair-lifecycle --dry-run

If working on naming/terminology again, remember: