CSJ v2.3 Phase 2 Checkpoint
Updated: 2026-04-13
Completed in this phase
- Added append-only event helper in
scripts/collector.py:
- Added history snapshot helper:
- Added save orchestration helper:
- Wired job-save flow to use
save_job_record() instead of direct write_text()
- Current behavior now supports:
first_seen history/event on first save of a reference
field_changed history/event on meaningful change
refreshed event on unchanged refresh
Important refinement made during validation
- Found a false-positive change source: CSJ detail URLs contain transient query/session state and were generating bogus history entries.
- Added
normalize_url_for_diff() and updated build_comparable_record() to compare CSJ URLs by stable path instead of volatile query string.
- Result: unchanged refreshes no longer create bogus historical versions just because the site changed the SID query blob.
Validation completed
- Live limited force run succeeded:
python3 ~/.hermes/skills/research/civil-service-jobs-collector/scripts/collector.py --details -w 1 -n 1 --full --force
- Verified unchanged refresh path now emits
refreshed event and does not create an extra history version.
- Verified meaningful field-change path by mutating a local record and re-running:
- created a new history snapshot
- emitted a
field_changed event
- correctly identified changed fields:
salary, salary_min, salary_max
- Cleaned up synthetic validation artifacts afterward so the real archive was not polluted.
Current status
- Phase 1 foundations: done
- Phase 2 history/event wiring: done at a basic level
- Not yet done:
- lifecycle-specific history/events for missing-before-expiry, withdrawn, reopened, closed
- supporting asset extraction/history
- fixture-based pytest coverage
Recommended next step
- Implement lifecycle classification fields/events for missing-before-expiry roles and direct URL verification for suspected withdrawals.