Heratio Help Center article. Category: Technical / Integration.
Bulk Open Data User Guide
Overview
Bulk Open Data lets anyone download the platform's published records as complete dataset dumps, without scraping pages one at a time. Two endpoints serve the data: /api/v1/dataset.csv returns the published records as CSV, and /api/v1/dataset.jsonld returns them as JSON-LD. The JSON-LD feed is paginated with an ?after cursor, so a client can walk through the whole dataset page by page until there is nothing left to fetch. Only published records are included. Start at /api/v1/dataset.csv or /api/v1/dataset.jsonld.
What it does
This feature publishes the open portion of the catalogue as reusable bulk data:
- It exposes /api/v1/dataset.csv, a CSV dump of the published records suitable for spreadsheets, data tools, and quick analysis.
- It exposes /api/v1/dataset.jsonld, the same published records as JSON-LD linked data for semantic and integration use.
- It paginates the JSON-LD feed using an ?after cursor, so large datasets can be retrieved in successive pages rather than in one oversized response.
- It includes only published records, so the dumps reflect what is already open to the public.
- It gives integrators, researchers, and aggregators a stable, machine-readable way to harvest the open dataset in full.
The aim is open, repeatable access to the published collection as data.
How to use it
- Download the CSV: fetch /api/v1/dataset.csv (for example
https://your-site.example/api/v1/dataset.csv) and open it in a spreadsheet or load it into a data tool. - Fetch the JSON-LD: request /api/v1/dataset.jsonld to receive the first page of published records as linked data.
- Page through with the cursor: take the ?after value supplied with a JSON-LD page and pass it on the next request (for example
https://your-site.example/api/v1/dataset.jsonld?after=<cursor>) to retrieve the following page. - Repeat the cursor step until a page returns no further records, at which point you have harvested the whole dataset.
- Re-run the harvest later to pick up newly published records.
Good to know
- The dumps contain published records only - anything embargoed, restricted, or in draft is excluded by design, so the open data stays open.
- Use the CSV for human-friendly, tabular work and the JSON-LD for linked-data, integration, and interoperability work; they cover the same published records in different shapes.
- Always follow the ?after cursor on the JSON-LD feed rather than guessing page numbers - the cursor is the reliable way to traverse the full set without gaps or duplicates.
- Because the feeds reflect what is currently published, repeating a harvest is the correct way to stay in sync as the collection grows.
- Treat these endpoints as a bulk-harvest surface; for targeted lookups of individual records, use the platform's record-level API instead.