Heratio Help Center article. Category: Technical / Integration.

Bulk Open Data User Guide

Overview

Bulk Open Data lets anyone download the platform's published records as complete dataset dumps, without scraping pages one at a time. Two endpoints serve the data: /api/v1/dataset.csv returns the published records as CSV, and /api/v1/dataset.jsonld returns them as JSON-LD. The JSON-LD feed is paginated with an ?after cursor, so a client can walk through the whole dataset page by page until there is nothing left to fetch. Only published records are included. Start at /api/v1/dataset.csv or /api/v1/dataset.jsonld.

What it does

This feature publishes the open portion of the catalogue as reusable bulk data:

It exposes /api/v1/dataset.csv, a CSV dump of the published records suitable for spreadsheets, data tools, and quick analysis.
It exposes /api/v1/dataset.jsonld, the same published records as JSON-LD linked data for semantic and integration use.
It paginates the JSON-LD feed using an ?after cursor, so large datasets can be retrieved in successive pages rather than in one oversized response.
It includes only published records, so the dumps reflect what is already open to the public.
It gives integrators, researchers, and aggregators a stable, machine-readable way to harvest the open dataset in full.

The aim is open, repeatable access to the published collection as data.

How to use it

Download the CSV: fetch /api/v1/dataset.csv (for example https://your-site.example/api/v1/dataset.csv) and open it in a spreadsheet or load it into a data tool.
Fetch the JSON-LD: request /api/v1/dataset.jsonld to receive the first page of published records as linked data.
Page through with the cursor: take the ?after value supplied with a JSON-LD page and pass it on the next request (for example https://your-site.example/api/v1/dataset.jsonld?after=<cursor>) to retrieve the following page.
Repeat the cursor step until a page returns no further records, at which point you have harvested the whole dataset.
Re-run the harvest later to pick up newly published records.

Good to know

The dumps contain published records only - anything embargoed, restricted, or in draft is excluded by design, so the open data stays open.
Use the CSV for human-friendly, tabular work and the JSON-LD for linked-data, integration, and interoperability work; they cover the same published records in different shapes.
Always follow the ?after cursor on the JSON-LD feed rather than guessing page numbers - the cursor is the reliable way to traverse the full set without gaps or duplicates.
Because the feeds reflect what is currently published, repeating a harvest is the correct way to stay in sync as the collection grows.
Treat these endpoints as a bulk-harvest surface; for targeted lookups of individual records, use the platform's record-level API instead.

Contents

Bulk Open Data User Guide

Overview

What it does

How to use it

Good to know