Building The Longhand Archive

How a solo archival system gets built in public

View My GitHub Profile

CSJ v2.3 Phase 4 Checkpoint

Updated: 2026-04-13

Completed in this phase

Noise reduction refinement made during validation

Validation completed

  1. Synthetic validation:
    • confirmed extractor correctly identifies:
      • YouTube links
      • PDF candidate pack links
      • Vimeo iframe embeds
  2. Live SCS validation:
    • fetched first five SCS role detail pages directly via NativeCollector
    • observed collateral extraction on live senior roles
    • one sampled role (452781) included a pdf_candidate_pack attachment
    • no YouTube/video embeds were observed in the first ten SCS roles checked live
  3. Live write-path validation:
    • ran: python3 ~/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py -g SCS --details -w 1 -n 1 --full --force
    • verified saved job file includes populated asset fields

Observed live behavior / caveat

Current status after Phase 4 Done:

Still to do