Building The Longhand Archive

How a solo archival system gets built in public

View My GitHub Profile

CSJ Collector hardening — implementation plan

Status

This plan is now a reference/backlog document, not the primary active execution guide. The trimmed plan became the real execution guide, and its core hardening tranche has been completed.

Completed from this plan in substance

The following items were completed during the trimmed-plan execution pass:

Still open / deferred

These items remain optional backlog work rather than immediate requirements:

Environment corrections to this plan

For Hermes: Use subagent-driven-development skill to implement this plan task-by-task.

Goal: Add operational maturity to the CSJ Collector through observability, explicit run semantics, parser regression protection, typed runtime boundaries, and safer persistence.

Architecture: Keep the existing modular structure. Do not do another broad refactor. Add a thin observability/runtime layer around the current collector flow, then harden parser tests and persistence seams incrementally.

Tech Stack: Python, pytest, dataclasses, JSON/JSONL files, existing CSJ package modules.


Delivery strategy


Phase A — run identity and run summaries

Task 1: Add run artifact paths to config

Objective: Introduce stable filesystem locations for per-run artifacts.

Files:

Step 1: Write failing test

Add assertions that config/state bootstrap exposes and creates:

Example test shape:

from csj import config

def test_config_exposes_run_artifact_paths():
    assert config.RUNS_DIR.name == "csj_runs"
    assert config.RUN_EVENTS_FILE.name == "csj_run_events.jsonl"

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py -k run_artifact

Expected: FAIL — missing config constants.

Step 3: Implement minimal code

In csj/config.py, add:

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py -k run_artifact

Expected: PASS

Step 5: Commit

git add csj/config.py tests/test_state_and_paths.py
git commit -m "feat: add config paths for run artifacts"

Task 2: Ensure run directories are created during bootstrap

Objective: Make run artifact storage part of normal collector bootstrap.

Files:

Step 1: Write failing test

Add a test that patches paths to a temp dir and verifies ensure_dirs() creates:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py -k ensure_dirs

Expected: FAIL — run dir not created.

Step 3: Implement minimal code

Update ensure_dirs() in csj/state.py to create RUNS_DIR.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py -k ensure_dirs

Expected: PASS

Step 5: Commit

git add csj/state.py tests/test_state_and_paths.py
git commit -m "feat: create run artifact directories during bootstrap"

Task 3: Add a first-class run model

Objective: Create explicit data structures for run state and summaries.

Files:

Step 1: Write failing test

Add tests for:

Example:

from csj.run_model import RunSummary

def test_run_summary_to_dict_contains_required_fields():
    summary = RunSummary(
        run_id="2026-04-20T12-00-00Z_test",
        mode="native",
        started_at="2026-04-20T12:00:00",
        completed_at=None,
        status="success",
        counts={},
        timings={},
        warnings=[],
        errors=[],
        anomaly_level="none",
    )
    data = summary.to_dict()
    assert data["run_id"] == "2026-04-20T12-00-00Z_test"
    assert data["status"] == "success"

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_model.py

Expected: FAIL — module missing.

Step 3: Implement minimal code

Create csj/run_model.py with dataclasses:

Keep it minimal and serialization-friendly.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_model.py

Expected: PASS

Step 5: Commit

git add csj/run_model.py tests/test_run_model.py
git commit -m "feat: add run summary model"

Task 4: Generate a run_id at collector startup

Objective: Make every run uniquely identifiable.

Files:

Step 1: Write failing test

Add a test that exercises scrape() with monkeypatched subfunctions and asserts a generated run summary context includes run_id.

If direct testing is awkward, test a helper like:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k run_id

Expected: FAIL

Step 3: Implement minimal code

In csj/run.py:

Suggested format:

2026-04-20T09-15-32Z_a1f93b

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k run_id

Expected: PASS

Step 5: Commit

git add csj/run.py tests/test_run_module.py
git commit -m "feat: generate run id for collector runs"

Task 5: Write per-run summary files

Objective: Persist a durable summary per run while preserving csj_latest.json.

Files:

Step 1: Write failing test

Add test for write_run_summary() asserting:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k write_run_summary

Expected: FAIL — no run-specific file.

Step 3: Implement minimal code

Update write_run_summary() to:

Keep current summary schema intact; add new fields rather than replacing old ones.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k write_run_summary

Expected: PASS

Step 5: Commit

git add csj/run.py tests/test_run_module.py
git commit -m "feat: persist per-run summary artifacts"

Phase B — telemetry and explicit run status

Task 6: Add telemetry module for structured event emission

Objective: Introduce machine-readable operational events without removing human console output.

Files:

Step 1: Write failing test

Test that emit_event() writes one JSON line with fields:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_telemetry.py

Expected: FAIL — module missing.

Step 3: Implement minimal code

Create helper(s):

Keep it tiny.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_telemetry.py

Expected: PASS

Step 5: Commit

git add csj/telemetry.py tests/test_telemetry.py
git commit -m "feat: add structured telemetry event writer"

Task 7: Emit run_started and run_completed events

Objective: Make run lifecycle visible to machines and operators.

Files:

Step 1: Write failing test

Add a monkeypatched test that checks:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k telemetry

Expected: FAIL

Step 3: Implement minimal code

In scrape():

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k telemetry

Expected: PASS

Step 5: Commit

git add csj/run.py tests/test_run_module.py
git commit -m "feat: emit lifecycle telemetry for collector runs"

Task 8: Add phase timing collection

Objective: Track where run time goes.

Files:

Step 1: Write failing test

Add assertions that run summaries include timing keys, for example:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k timings

Expected: FAIL

Step 3: Implement minimal code

Use time.perf_counter() around:

Keep native internals optional for now; do top-level phase timing first.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k timings

Expected: PASS

Step 5: Commit

git add csj/run.py csj/native.py tests/test_run_module.py
git commit -m "feat: add phase timing to run summaries"

Task 9: Make run status explicit

Objective: Classify runs as success, degraded, or failed.

Files:

Step 1: Write failing test

Create tests for:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_status.py

Expected: FAIL

Step 3: Implement minimal code

Use current failure/anomaly data from summarize_fetch_results():

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_status.py

Expected: PASS

Step 5: Commit

git add csj/run_model.py csj/run.py tests/test_run_status.py
git commit -m "feat: classify collector runs by explicit status"

Phase C — parser drift protection

Task 10: Add fixture directory and first listing fixture

Objective: Establish fixture-backed parser testing.

Files:

Step 1: Write failing test

Test _extract_listings() against fixture and assert at least:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_native_parser_fixtures.py -k listings

Expected: FAIL — missing fixture/test/module path.

Step 3: Implement minimal code

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_native_parser_fixtures.py -k listings

Expected: PASS

Step 5: Commit

git add tests/fixtures/listings tests/test_native_parser_fixtures.py
git commit -m "test: add fixture-backed listing parser regression test"

Task 11: Add detail-page fixture tests

Objective: Protect detail parsing against HTML drift.

Files:

Step 1: Write failing test

Add fixture tests for:

If parsing logic is buried in fetch_detail(), test the current method with a stubbed session response.

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_native_parser_fixtures.py -k detail

Expected: FAIL

Step 3: Implement minimal code

Add fixtures and assertions for:

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_native_parser_fixtures.py -k detail

Expected: PASS

Step 5: Commit

git add tests/fixtures/details tests/test_native_parser_fixtures.py
git commit -m "test: add detail page parser fixture coverage"

Task 12: Add golden normalized-record tests

Objective: Catch silent output drift after parsing/normalization changes.

Files:

Step 1: Write failing test

Take representative parsed raw input and assert stable normalized output from:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_record_golden_outputs.py

Expected: FAIL

Step 3: Implement minimal code

Add one or two canonical expected JSON outputs covering:

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_record_golden_outputs.py

Expected: PASS

Step 5: Commit

git add tests/test_record_golden_outputs.py tests/fixtures/golden
git commit -m "test: add golden normalized record regression coverage"

Phase D — typed runtime boundaries

Task 13: Introduce a typed runtime container

Objective: Replace the large dict returned by build_run_context() with a typed object.

Files:

Step 1: Write failing test

Change/add test expectations so build_run_context() returns an object with attributes, e.g.:

Step 2: Run test to verify failure

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_cli_module.py /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py

Expected: FAIL

Step 3: Implement minimal code

Create dataclass CollectorRuntime with the current fields from build_run_context().

Update build_run_context() to return CollectorRuntime.

Update csj/run.py to use attributes instead of dict indexing in small, mechanical edits.

Step 4: Run test to verify pass

Run:

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_cli_module.py /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py

Expected: PASS

Step 5: Commit

git add csj/runtime.py csj/cli.py csj/run.py tests/test_cli_module.py tests/test_run_module.py
git commit -m "refactor: replace dict runtime context with typed runtime object"

Task 14: Add typed fetch result model

Objective: Reduce ad hoc dict returns in orchestration.

Files:

Step 1: Write failing test

Add tests expecting execute_fetches() to return an object with:

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k execute_fetches

Expected: FAIL

Step 3: Implement minimal code

Create FetchResult dataclass and update only the narrow return path from execute_fetches().

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k execute_fetches

Expected: PASS

Step 5: Commit

git add csj/run_model.py csj/run.py tests/test_run_module.py
git commit -m "refactor: type fetch results in run orchestration"

Phase E — failure taxonomy and degraded-path visibility

Task 15: Add failure taxonomy module

Objective: Replace vague failure strings with structured categories.

Files:

Step 1: Write failing test

Test a helper that builds structured failure records like:

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_failures.py

Expected: FAIL

Step 3: Implement minimal code

Create a tiny failure model and helper constructors/classifiers.

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_failures.py

Expected: PASS

Step 5: Commit

git add csj/failures.py tests/test_failures.py
git commit -m "feat: add structured failure taxonomy"

Task 16: Make attachment/transcript helper failures observable

Objective: Stop silently swallowing enrichment failures in fetch_one_job().

Files:

Step 1: Write failing test

Add tests where:

Assert:

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k asset_failure

Expected: FAIL

Step 3: Implement minimal code

Replace:

except Exception:
    pass

with:

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py -k asset_failure

Expected: PASS

Step 5: Commit

git add csj/run.py tests/test_run_module.py
git commit -m "feat: surface asset enrichment failures in run telemetry"

Task 17: Surface malformed JSON/state read problems

Objective: Make corrupted local data visible instead of silently ignored.

Files:

Step 1: Write failing test

Add tests for:

Assert:

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_recovery_paths.py

Expected: FAIL

Step 3: Implement minimal code

At each current swallow point:

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_recovery_paths.py

Expected: PASS

Step 5: Commit

git add csj/run.py csj/records.py csj/state.py tests/test_recovery_paths.py
git commit -m "feat: surface malformed local state and record failures"

Phase F — atomic writes and persistence hardening

Task 18: Add atomic JSON write helper

Objective: Protect critical files from partial writes.

Files:

Step 1: Write failing test

Test that helper writes JSON to a temp file then renames into place.

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_atomic_io.py

Expected: FAIL

Step 3: Implement minimal code

Create:

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_atomic_io.py

Expected: PASS

Step 5: Commit

git add csj/io_utils.py tests/test_atomic_io.py
git commit -m "feat: add atomic json write helper"

Task 19: Use atomic writes for state and run summaries

Objective: Harden the most important top-level artifacts first.

Files:

Step 1: Write failing test

Add tests asserting the helper is used indirectly by:

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py

Expected: FAIL

Step 3: Implement minimal code

Replace direct write_text() with atomic_write_json() for:

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_state_and_paths.py /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_module.py

Expected: PASS

Step 5: Commit

git add csj/state.py csj/run.py tests/test_state_and_paths.py tests/test_run_module.py
git commit -m "feat: use atomic writes for state and run summaries"

Task 20: Use atomic writes for per-job records

Objective: Protect core archival records from truncation/corruption.

Files:

Step 1: Write failing test

Add test that save_job_record() writes valid JSON through the atomic helper path.

Step 2: Run test to verify failure

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_records.py

Expected: FAIL

Step 3: Implement minimal code

Replace fpath.write_text(...) with atomic_write_json(...).

Step 4: Run test to verify pass

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_records.py

Expected: PASS

Step 5: Commit

git add csj/records.py tests/test_records.py
git commit -m "feat: use atomic writes for job records"

Phase G — docs and guardrails

Task 21: Document stable vs internal interfaces

Objective: Clarify what future changes can safely break.

Files:

Step 1: Draft section

Add headings:

Step 2: Add concrete statements

Stable:

Internal:

Step 3: Review No test needed; review for clarity.

Step 4: Commit

git add references/csj-collector-architecture.md
git commit -m "docs: define stable and internal collector interfaces"

Task 22: Document module invariants and run model

Objective: Prevent future boundary drift.

Files:

Step 1: Add module ownership table

For:

Step 2: Add run architecture section Document:

Step 3: Commit

git add references/csj-collector-architecture.md
git commit -m "docs: add module invariants and run model architecture"

Final verification stage

Task 23: Run targeted suite for new work

Objective: Verify the new hardening work before full-suite run.

Files: none

Step 1: Run targeted tests

pytest -q \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_model.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_telemetry.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_run_status.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_native_parser_fixtures.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_record_golden_outputs.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_recovery_paths.py \
  /root/.hermes/skills/research/civil-service-jobs-collector/tests/test_atomic_io.py

Expected: PASS

Step 2: Commit if needed

git add -A
git commit -m "test: verify hardening layers with targeted suite"

Task 24: Run full suite and CLI smoke checks

Objective: Confirm collector health after hardening.

Files: none

Step 1: Run full test suite

pytest -q /root/.hermes/skills/research/civil-service-jobs-collector/tests

Expected: PASS

Step 2: Run CLI smoke checks

python3 /root/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py --help
python3 /root/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py --repair-lifecycle --dry-run

Expected: both pass

Step 3: Commit

git add -A
git commit -m "chore: verify collector after observability and hardening work"

Recommended implementation order

If you want the smartest execution order, do these first:

  1. Task 1
  2. Task 2
  3. Task 3
  4. Task 4
  5. Task 5
  6. Task 6
  7. Task 7
  8. Task 8
  9. Task 9
  10. Task 10
  11. Task 11
  12. Task 12
  13. Task 13
  14. Task 16
  15. Task 17
  16. Task 18
  17. Task 19
  18. Task 20
  19. Task 21
  20. Task 22
  21. Task 23
  22. Task 24

That sequence gives you:


Minimum cut if you want to stop early

If you only do the highest-value subset, stop after:

That would still materially improve the collector.