title: REST API - Embedded EXIF / IPTC / XMP slug: api-embedded-metadata category: Integration
REST API - Embedded EXIF / IPTC / XMP
Heratio extracts the EXIF, IPTC, and XMP metadata that photographers and scanners embed in every image / PDF / video file. This guide explains how to read that data over the REST API.
The short version
Add ?include=embedded_metadata to either:
GET /api/v1/digital-object/{id}(anonymous, rate-limited)GET /api/v2/descriptions/{slug}(bearer token required)
or call the standalone endpoint:
GET /api/v2/digital-object/{id}/embedded-metadata
The response gains an embedded_metadata block with three sub-keys: exif, iptc, and xmp.
What you get back
{
"embedded_metadata": {
"exif": {
"Make": "Canon",
"Model": "EOS R5",
"DateTimeOriginal": "2024-01-15 14:32:18",
"GPSLatitude": "-25.7479",
"GPSLongitude": "28.2293"
},
"iptc": {
"By-line": "Aslam Levy",
"Headline": "Voortrekker Monument east elevation",
"CopyrightNotice": "(c) 2024 Plain Sailing Information Systems",
"Keywords": "monument, heritage, granite"
},
"xmp": {
"dc:creator": "Aslam Levy",
"dc:rights": "All rights reserved"
}
}
}
Only fields that actually have a value appear - empty fields are dropped, not nulled. Tag names follow the ExifTool / Adobe XMP camel-case naming so you can map them straight into a third-party EXIF / IPTC library.
Default-off opt-in
The block is not part of the default response shape. Existing clients that hit /api/v1/digital-object/{id} or /api/v2/descriptions/{slug} see the same JSON they always have. You only get embedded_metadata when you explicitly opt in with ?include=embedded_metadata (or the hyphenated alias ?include=embedded-metadata).
This keeps catalogue browse traffic lean - the embedded block is multi-kilobyte for image-heavy descriptions.
The standalone endpoint at /api/v2/digital-object/{id}/embedded-metadata is always-on - the block IS the response there.
Worked examples
Inline include over v2 (recommended for description detail views):
curl -H "Authorization: Bearer ahg_live_XXX" \
"https://heratio.example.org/api/v2/descriptions/voortrekker-monument?include=embedded_metadata"
Standalone endpoint (lazy-load from a description page that already loaded the rest):
curl -H "Authorization: Bearer ahg_live_XXX" \
"https://heratio.example.org/api/v2/digital-object/12345/embedded-metadata"
v1 anonymous (the hyphenated alias works on every path):
curl "https://heratio.example.org/api/v1/digital-object/12345?include=embedded_metadata"
Rights protection (ODRL)
When the parent archival description has an active ODRL odrl:use prohibition for the calling user / API key, the block is suppressed:
- Inline include (
/api/v1/digital-object,/api/v2/descriptions): theembedded_metadatakey is silently dropped from the response. You still get the base description because that level of access is separately governed. - Standalone endpoint: returns
403 Forbiddenwith an explicit error message - because the block IS the resource, silent suppression would be misleading.
If your client receives the description but no embedded_metadata key after you set the include flag, this is the most likely cause. Check the description's policies in Admin > Research > Rights Policies.
Privacy protection (PII)
Embedded EXIF GPS coordinates are personal information under GDPR, POPIA, CCPA, and CDPA - the photograph's capture location often identifies a sitter, archaeological site, or witness location. Heratio's privacy scanner (Embedded PII findings) flags every digital object whose sidecar tables hold GPS coordinates.
When a digital object has a pending or escalated GPS finding, every GPS field in the API response is replaced with null and a sibling _pii_redacted: true flag is added:
{
"embedded_metadata": {
"exif": {
"Make": "Apple",
"GPSLatitude": null,
"GPSLongitude": null,
"_pii_redacted": true
}
}
}
Non-GPS fields (creator, headline, camera model) pass through untouched. Once a privacy officer marks the finding as cleared (not PII in this context) or redacted (purged from the source), the gate clears and GPS values are returned again. Re-scanning the file via the Embedded PII Backfill job creates a fresh finding if new GPS coordinates appear.
Auth and rate limiting
| Endpoint | Auth | Rate limit |
|---|---|---|
/api/v1/digital-object/{id} |
None (anonymous) | 60 req / min / IP |
/api/v2/descriptions/{slug} |
Bearer token OR session | Per-key from API key profile |
/api/v2/digital-object/{id}/embedded-metadata |
Bearer token OR session | Per-key from API key profile |
The bearer token must have the read scope. Browse-quality API keys created in Admin > API > API Keys ship with read by default.
Common questions
Q. Why is embedded_metadata returned as {} even though my image has EXIF?
The sidecar tables (digital_object_metadata, dam_iptc_metadata, media_metadata) are populated by the metadata-extraction pipeline. Newly uploaded files take a few minutes for the queue worker to extract. If embedded_metadata stays {} after that, check Admin > Settings > Extraction Pipeline for the job log.
Q. Why are GPS fields null even though the image has them?
Either the PII gate is active (look for _pii_redacted: true in the sub-block) or the extraction didn't capture them. The PII gate clears as soon as the finding moves out of pending / escalated state.
Q. Can I edit metadata via this API?
Not yet - this endpoint is read-only. Write support is tracked under a separate issue.
Q. Where do I see the underlying SQL columns each tag maps to?
See the internal reference at docs/reference/api-embedded-metadata.md.