Heratio Help Center article. Category: Federation.

Whole-collection CIDOC-CRM graph dump

Version: 1.0 Date: 2026-06-12 Author: The Archive and Heritage Group (Pty) Ltd

What it does

This is the dataset-level companion to the per-record CIDOC-CRM export. Where the per-record export emits one archival record, the graph dump streams the WHOLE published catalogue into ONE combined CIDOC-CRM (ISO 21127) Turtle document - a single connected graph in which every record, and optionally every producing actor and cited subject/place term, share one @prefix block and join through their #crm-object / actor / term fragment IRIs.

It reuses the same serializers as the per-record / actor / term downloads (CidocCrmSerializer, CidocCrmActorSerializer, CidocCrmTermSerializer), so the combined graph is byte-for-byte consistent with the single-entity documents.

Published records only: the same gate the rest of the platform uses (publication status published; the synthetic root record excluded). Nothing draft or private ever appears in the dump.

Generating the dump (operator / scheduled)

php artisan ahg:export-cidoc-graph
php artisan ahg:export-cidoc-graph --actors --terms
php artisan ahg:export-cidoc-graph --culture=af --batch=1000
php artisan ahg:export-cidoc-graph --limit=50        # smoke run
php artisan ahg:export-cidoc-graph --out=/path/to/file.ttl

Option	Default	Notes
`--out`	`{storage_path}/cidoc-graph/cidoc-crm.ttl`	Output path. Default lands under the configured Heratio storage path, never a hardcoded directory.
`--culture`	`en`	i18n culture for labels.
`--batch`	`500`	Id page size for the streaming keyset cursor.
`--limit`	`0` (no cap)	Cap the record count for smoke runs.
`--actors`	off	Also append every actor that produced a published record.
`--terms`	off	Also append every subject / place term cited by a published record.

The command streams: it walks published record ids in ascending id batches and renders one entity at a time straight to the file, so the whole catalogue is never held in memory. It is idempotent - each run overwrites the previous dump atomically (temp file + rename), and prints an accounted summary (records exported, records skipped, actor/term nodes appended, file size).

Run it on a schedule (for example nightly) so the public download always serves a current graph.

Public bulk download

GET /data/cidoc-crm.ttl

Unauthenticated open data, published records only, CORS-open (Access-Control-Allow-Origin: *), Content-Type: text/turtle.
If a scheduled dump exists, it is streamed straight off disk (no per-request database work, so a large catalogue costs nothing at request time). The response carries X-Open-Data-Source: prebuilt-dump.
If no dump is staged, a BOUNDED graph is generated on the fly, hard-capped at 2000 records, and streamed as it is produced. The response carries X-Open-Data-Source: on-the-fly and X-Open-Data-Cap; a Turtle comment tells the client to fetch the scheduled dump for the complete graph.
Optional ?culture= selects the label culture for the on-the-fly path.

The dump is also advertised as a dataset in the platform's capabilities document (/open-data/protocol) and the DCAT data catalogue (/data/catalog), so a generic data-portal harvester discovers it automatically.

Loading the graph

The output is valid Turtle. Load it into any CIDOC-CRM-aware store or tool - Apache Jena, ResearchSpace, an Erlangen-CRM importer, or a generic SPARQL endpoint:

riot --validate cidoc-crm.ttl

Notes

Read-only: the command and the endpoint only ever SELECT. The single write is the dump file under the configured storage path.
International by design: every URI is built from the configured base URL; no tenant- or jurisdiction-specific constant is baked in.