Fixity / integrity report

How Heratio verifies that stored files have not changed since ingest, and where to read the results.

What fixity is

Fixity is the assurance that a digital file is bit-for-bit identical to the file that was ingested. Heratio records a checksum (a short fingerprint) and the algorithm used (typically SHA-256) for each digital object when it is loaded. A fixity check re-computes the file's checksum on disk and compares it to that stored baseline. A match means the file is intact; a mismatch means the bytes have changed and the file needs attention. This is the actionable "Integrity" functional area of the NDSA Levels of Digital Preservation.

The report: /admin/fixity

The admin report at Admin → Fixity / integrity (/admin/fixity) is read-only. It shows:

  • Digital objects in scope (local files only; remote references are excluded).
  • With checksum baseline — how many objects carry a stored checksum and algorithm, so they can be verified, shown as a coverage bar.
  • Without baseline — objects that cannot be verified until a checksum is recorded.
  • Never verified — objects with a baseline that have not yet been checked.
  • Last sweep — when the most recent verification ran and what it found.
  • Latest result per object — a roll-up of match / mismatch / missing file / skipped (oversize) / error, as labelled CSS bars.
  • Recent checks — the most recent individual results.

The page never runs a verification itself, so it always loads quickly.

Running a sweep

Verification is done out-of-band by a bounded command:

php artisan ahg:fixity-sweep            # verify up to 100 objects
php artisan ahg:fixity-sweep --limit=500
php artisan ahg:fixity-sweep --json     # machine-readable summary

The sweep is deliberately bounded (default 100, hard-capped at 1000 per run) so it can never launch an unbounded hash of the whole collection, and resilient: a missing or unreadable file is recorded as missing_file / error rather than stopping the run, and files above a size cap are recorded as skipped_oversize so a single very large master cannot make a sweep run unreasonably long. It prefers objects that have never been verified, so repeated runs make steady forward progress across the collection.

A daily scheduled sweep runs automatically (off-peak, in the background), so the report stays current and integrity drift is caught early without manual action. Each result is written to an append-only log; nothing in the catalogue is ever changed.

Reading the results

Result Meaning
Match The file is intact — its checksum matches the baseline.
Mismatch The file has changed since ingest. Investigate (restore from a known-good copy).
Missing file No file was found on disk for the object's stored path.
No baseline The object has no stored checksum to verify against yet.
Skipped (oversize) The file is above the sweep size cap; verify it out-of-band.
Error The file was present but could not be read or hashed.

A non-zero count of Mismatch is the signal that warrants action.