Heratio Help Center article. Category: Import/Export.
Heratio AHG Framework - Data Migration Tool
User Guide
Plugin Version: 1.4.0 Last Updated: 2026-02-03 Plugin: ahgDataMigrationPlugin
Table of Contents
- Overview
- Accessing the Tool
- Supported Source Systems
- Web Interface Workflow
- Field Mapping
- Data Validation
- Batch Export
- Sector-Specific Import
- Sample CSV Files
- Preservica Import/Export
- Background Jobs
- CLI Commands
- Gearman Setup
- Troubleshooting
1. Overview
The Data Migration Tool enables importing records from various archival and collection management systems into Heratio. It supports:
- CSV and Excel files from multiple source systems
- XML formats (Preservica OPEX, EAD)
- Preservica packages (PAX/XIP with digital objects)
- Sector-specific mappings (Archives, Museum, Library, Gallery, DAM)
- Background processing for large datasets via Gearman
- Field transformation and validation
- Rights import (PREMIS, SecurityDescriptor, dc:rights, MODS, EAD)
- Provenance/history import from OPEX
2. Accessing the Tool
Web Interface
Navigate to: https://[your-domain]/dataMigration
Or: Admin → Import/Export → Data Migration
Required Permissions
- Administrator access
- Or
importpermission in user group
3. Supported Source Systems
Collection Management Systems
| System | Formats | Target Sector |
|---|---|---|
| ArchivesSpace | CSV/JSON | Archives, Accessions, Agents, Repositories |
| Vernon CMS | CSV/Excel | Museum |
| PastPerfect | CSV | Museum |
| CollectiveAccess | CSV | Multi-sector |
| Filemaker Pro | CSV | Any |
| WDB | CSV | Archives |
| PSIS | Excel (83 fields) | Library |
Preservation Systems
| System | Formats | Features |
|---|---|---|
| Preservica | OPEX XML | Metadata, rights, provenance import/export |
| Preservica | PAX/XIP (ZIP) | Metadata + digital objects |
Standard Formats
| Format | Use Case |
|---|---|
| CSV | Universal import |
| Excel (.xlsx, .xls) | Spreadsheet data |
| XML | EAD, Dublin Core |
4. Web Interface Workflow
Step 1: Upload File
- Go to
/dataMigration - Click "Choose File" or drag-and-drop
- Supported:
.csv,.xlsx,.xls,.xml,.opex,.pax,.zip - Click "Upload"
The system auto-detects:
- File format (CSV, Excel, XML)
- Source system (based on headers/structure)
- Sector type (Archives, Museum, Library, etc.)
Step 2: Select or Create Mapping
Use Existing Mapping:
- Select from dropdown (e.g., "Vernon CMS (Museum)")
- Click "Load Mapping"
Create New Mapping:
- Click "New Mapping"
- Enter name (e.g., "My Museum Import")
- Select target sector
- Click "Create"
Step 3: Map Fields
The mapping interface shows:
- Left column: Your source fields (from uploaded file)
- Right column: Heratio target fields
For each source field:
- Click the dropdown
- Select matching Heratio field
- Optionally set transformation rules
Field Transformations:
trim- Remove whitespaceuppercase/lowercase- Case conversiondate:Y-m-d- Date formattingprepend:/uploads/- Add prefix to pathssplit:|- Split multi-value fields
Step 4: Preview
- Click "Preview"
- Review first 10-20 records
- Check field mappings are correct
- Verify hierarchy (parent-child relationships)
Step 5: Import
Option A: Export to Heratio CSV
- Click "Export Heratio CSV"
- Download the transformed CSV
- Use Heratio's built-in CSV Import (Admin → Import → CSV)
Option B: Direct Import (Large Files)
- Click "Background Job"
- Job queued to Gearman workers
- Monitor progress at
/dataMigration/jobs
5. Field Mapping
Core Heratio Fields
| Heratio Field | Description | Required |
|---|---|---|
legacyId |
Unique ID from source system | Yes |
parentId |
Parent's legacyId for hierarchy | No |
title |
Record title | Yes |
identifier |
Reference code | No |
scopeAndContent |
Description/scope | No |
levelOfDescription |
Fonds/Series/File/Item | Yes |
repository |
Repository name or ID | No |
culture |
Language code (en, af, etc.) | No |
Digital Object Fields
| Field | Description |
|---|---|
digitalObjectPath |
Path to file (relative or absolute) |
digitalObjectURI |
External URL |
digitalObjectChecksum |
MD5/SHA256 for verification |
Multi-Value Fields
Use pipe | separator for multiple values:
subjectAccessPoints: History|World War II|Military
placeAccessPoints: South Africa|Johannesburg
nameAccessPoints: Jan Smuts|Louis Botha
Hierarchy Example
legacyId,parentId,title,levelOfDescription
F001,,Municipal Archives,Fonds
S001,F001,Council Minutes,Series
F001-001,S001,Minutes 1950-1960,File
F001-001-001,F001-001,Meeting 1950-01-15,Item
6. Data Validation
The validation framework helps you identify and fix data quality issues before importing.
Validation Types
| Validation | Description |
|---|---|
| Schema | Required fields, data types, patterns, max lengths |
| Referential | Parent-child relationships, circular reference detection |
| Duplicates | Duplicate detection in file and against existing database records |
| Sector-Specific | Standards-based rules for each GLAM sector |
Validation-Only Mode
Test your data without importing:
Web Interface:
- Upload your file
- Map fields as usual
- Click "Validate Only" instead of Import
- Review errors/warnings by row and column
CLI:
php symfony sector:archives-csv-import /path/to/file.csv --validate-only
php symfony sector:museum-csv-import /path/to/file.csv --validate-only --mapping=10
Understanding Validation Results
Results show errors by row and column:
| Severity | Icon | Action Required |
|---|---|---|
| Error | Red | Must fix before import |
| Warning | Yellow | Review recommended |
| Info | Blue | Informational only |
Example output:
Row 3, Column 'identifier': Required field is empty
Row 5, Column 'levelOfDescription': Invalid value 'folder' - must be one of: fonds, series, file, item
Row 8, Column 'parentId': Parent record '999' not found in file or database
Sector-Specific Validation Rules
Each sector has specialized validation:
Archives (ISAD-G):
- Level of description must be valid (fonds, series, file, item, etc.)
- Parent hierarchy must follow ISAD-G rules
- Required: identifier, title, levelOfDescription
Museum (Spectrum):
- Object number format validation
- Acquisition date must be valid date
- Required: objectNumber, objectName
Library (MARC/RDA):
- ISBN-10 and ISBN-13 checksum validation
- ISSN validation
- Required: identifier, title
Gallery (CCO):
- Work type must be from controlled vocabulary
- Creator format validation (Name; Role)
- Required: objectNumber, title
DAM (Dublin Core/IPTC):
- DC type must be valid (Image, Audio, Video, Document, etc.)
- MIME type format validation
- GPS coordinate range validation
- Required: identifier, title
Live Validation Preview
While mapping fields, click "Preview Validation" to see validation results for the first 20 rows without running a full import.
Duplicate Detection Strategies
Configure how duplicates are detected:
| Strategy | Matches On |
|---|---|
| Identifier | identifier field |
| Legacy ID | legacyId field |
| Title + Date | Combination of title and date |
| Composite | Multiple configurable fields |
7. Batch Export
Export existing Heratio records to sector-specific CSV formats for backup, reporting, or migration to other systems.
Accessing Batch Export
Navigate to: https://[your-domain]/dataMigration/batchExport
Or from the Data Migration page, click the "Batch Export" button in the header.
Export Formats
| Format | Standard | Best For |
|---|---|---|
| Archives | ISAD(G) | Archival fonds, series, files, items |
| Museum | Spectrum 5.1 | Museum objects with acquisition, location data |
| Library | MARC/RDA | Bibliographic records with ISBN, call numbers |
| Gallery | CCO/VRA | Artworks and visual resources |
| Digital Assets | Dublin Core/IPTC | Digital files with technical metadata |
Filter Options
You can narrow down which records to export:
| Filter | Description |
|---|---|
| Repository | Export only records from a specific repository |
| Level of Description | Filter by fonds, series, file, item, etc. (multi-select) |
| Parent Slug | Export children of a specific record |
| Include Descendants | Include all levels below the parent (not just direct children) |
Export Workflow
- Select the Sector Format for your CSV columns
- Optionally set filters to narrow the export
- Click "Export CSV"
For small exports (<500 records):
- CSV downloads immediately
For large exports (>500 records):
- Export is queued as a background job
- Check progress at
/dataMigration/jobs - Download file when complete
Example Use Cases
Backup a collection:
- Select "Archives (ISAD-G)" format
- Enter the fonds slug in "Parent Record Slug"
- Check "Include all descendants"
- Export
Export museum objects for reporting:
- Select "Museum (Spectrum 5.1)" format
- Select your repository
- Select "Item" level of description
- Export
Migrate records to another system:
- Choose the format closest to your target system
- Export all records or filter by repository
- Use the CSV in your target system's import tool
8. Sector-Specific Import
Import directly using sector-specific CLI commands with validation built-in.
Archives Import (ISAD-G)
php symfony sector:archives-csv-import /path/to/archives.csv \
--repository=my-archive \
--update=identifier \
--validate-only # Remove to perform actual import
Museum Import (Spectrum)
php symfony sector:museum-csv-import /path/to/museum.csv \
--repository=my-museum \
--update=objectNumber
Library Import (MARC/RDA)
php symfony sector:library-csv-import /path/to/library.csv \
--repository=my-library \
--update=identifier
Gallery Import (CCO)
php symfony sector:gallery-csv-import /path/to/gallery.csv \
--repository=my-gallery \
--update=objectNumber
DAM Import (Dublin Core/IPTC)
php symfony sector:dam-csv-import /path/to/dam.csv \
--repository=my-repository \
--update=identifier
Common Options
| Option | Description |
|---|---|
--validate-only |
Validate without importing |
--mapping=ID |
Use specific mapping profile |
--repository=SLUG |
Target repository slug |
--update=FIELD |
Match field for updates (skip if exists) |
9. Sample CSV Files
The plugin includes sample CSV files demonstrating correct format for each sector.
Available Samples
Located in: atom-ahg-plugins/ahgDataMigrationPlugin/data/samples/
| File | Sector | Records | Description |
|---|---|---|---|
archives_sample.csv |
Archives | 5 | Hierarchical ISAD-G records with parent-child relationships |
museum_sample.csv |
Museum | 5 | Spectrum objects with materials, techniques, locations |
library_sample.csv |
Library | 5 | MARC/RDA records with ISBN, call numbers, subjects |
gallery_sample.csv |
Gallery | 5 | CCO artworks with provenance, credit lines |
dam_sample.csv |
DAM | 5 | Dublin Core assets with technical metadata |
Archives Sample Structure
legacyId,parentId,identifier,title,levelOfDescription,repository,...
1,,F001,Smith Family Papers,Fonds,Main Archive,...
2,1,F001/S1,Correspondence,Series,Main Archive,...
3,2,F001/S1/F1,Personal Letters,File,Main Archive,...
Key features:
legacyIdandparentIdestablish hierarchy- Parent records must appear before children
- Level of description follows ISAD-G
Museum Sample Structure
objectNumber,objectName,title,materials,techniques,dimensions,productionDate,...
OBJ-001,Painting,Landscape with River,Canvas|Oil paint,Oil painting,60 x 80 cm,1935,...
Key features:
- Multi-value fields use pipe
|separator - Spectrum-compliant field names
- Acquisition and location tracking
Using Samples for Testing
-
Copy sample to test location:
cp data/samples/museum_sample.csv /tmp/test-import.csv -
Test validation:
php symfony sector:museum-csv-import /tmp/test-import.csv --validate-only -
Import if validation passes:
php symfony sector:museum-csv-import /tmp/test-import.csv --repository=test
10. Preservica Import/Export
OPEX Import
OPEX (Open Preservation Exchange) is Preservica's XML metadata format.
Web Interface:
- Upload
.opexor.xmlfile - Select "Preservica OPEX" mapping
- Map fields or use defaults
- Preview and import
CLI:
php symfony preservica:import /path/to/file.opex
php symfony preservica:import /path/to/file.opex --repository=5
php symfony preservica:import /path/to/file.opex --dry-run
OPEX Rights Extraction: The importer automatically extracts rights from:
SecurityDescriptorelementsdc:rightsDublin Coredcterms:license- MODS
<accessCondition> - EAD
<userestrict>and<accessrestrict>
Provenance Import:
OPEX <opex:History> elements are imported to provenance_event table.
PAX/XIP Import
PAX packages contain metadata (XIP XML) plus content files.
Web Interface:
- Upload
.paxor.zipfile - Select "Preservica PAX/XIP" mapping
- Digital objects extracted automatically
- Preview and import
CLI:
php symfony preservica:import /path/to/package.pax --format=xip
php symfony preservica:import /path/to/directory --batch
Preservica Export
Export Heratio records to Preservica format:
CLI:
# Export single record
php symfony preservica:export 123
# Export with hierarchy
php symfony preservica:export 123 --hierarchy
# Export to XIP/PAX format
php symfony preservica:export 123 --format=xip
# Export entire repository
php symfony preservica:export --repository=5
Output Location: /uploads/exports/preservica/
11. Background Jobs
For large imports (1000+ records), use background processing:
Starting a Background Job
- Complete field mapping
- Click "Background Job" instead of direct import
- Job queued to Gearman workers
Monitoring Jobs
Navigate to: /dataMigration/jobs
| Status | Description |
|---|---|
queued |
Waiting for worker |
running |
Currently processing |
completed |
Finished successfully |
failed |
Error occurred |
Job Details
Click any job to see:
- Records processed / total
- Errors encountered
- Processing time
- Download results
12. CLI Commands
List Available Mappings
php symfony migration:import --list-mappings
Output:
ARCHIVES:
[2] ArchivesSpace Resources
[11] Preservica OPEX
[12] Preservica PAX/XIP
MUSEUM:
[10] Vernon CMS (Museum)
LIBRARY:
[8] PSIS Full Import (83 fields)
Import with Mapping
# By mapping ID
php symfony migration:import /path/to/file.csv --mapping=10
# By mapping name
php symfony migration:import /path/to/file.csv --mapping="Vernon CMS"
# With options
php symfony migration:import /path/to/file.csv --mapping=10 \
--repository=5 \
--culture=en \
--update
Dry Run (Preview Only)
php symfony migration:import /path/to/file.csv --mapping=10 --dry-run
Sector Import Commands
# Archives (ISAD-G)
php symfony sector:archives-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Museum (Spectrum)
php symfony sector:museum-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Library (MARC/RDA)
php symfony sector:library-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Gallery (CCO)
php symfony sector:gallery-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# DAM (Dublin Core)
php symfony sector:dam-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
Preservica Commands
# Show Preservica info
php symfony preservica:info
# Import OPEX
php symfony preservica:import /path/to/file.opex
# Import PAX/XIP
php symfony preservica:import /path/to/package.pax --format=xip
# Export to OPEX
php symfony preservica:export 123 --format=opex
# Export to PAX
php symfony preservica:export 123 --format=xip --hierarchy
13. Gearman Setup
Gearman is required for background job processing (large imports/exports).
Quick Install (Ubuntu)
# Run the automated setup script
cd /usr/share/nginx/archive/atom-ahg-plugins/ahgDataMigrationPlugin
sudo ./bin/setup-gearman.sh
Manual Install
# Install packages
sudo apt-get install -y gearman-job-server php8.3-gearman
# Enable and start
sudo systemctl enable gearman-job-server
sudo systemctl start gearman-job-server
# Restart PHP-FPM
sudo systemctl restart php8.3-fpm
Running the Worker
Development (manual):
cd /usr/share/nginx/archive
php symfony jobs:worker
Production (systemd service):
sudo systemctl enable atom-worker
sudo systemctl start atom-worker
Verify Installation
# Check Gearman status
gearadmin --status
# Check worker service
sudo systemctl status atom-worker
# View worker logs
sudo journalctl -u atom-worker -f
For detailed Gearman configuration and troubleshooting, see:
atom-ahg-plugins/ahgDataMigrationPlugin/docs/GEARMAN.md
14. Troubleshooting
File Upload Fails
Problem: File too large
Solution: Increase PHP limits in /etc/php/8.3/fpm/php.ini:
upload_max_filesize = 100M
post_max_size = 100M
max_execution_time = 300
Mapping Not Found
Problem: Source columns not detected
Solution: Ensure CSV has headers in first row, UTF-8 encoding
Hierarchy Not Working
Problem: Parent-child relationships broken
Solution:
- Ensure
legacyIdis unique - Ensure
parentIdmatches a validlegacyId - Parents must appear before children in file
Digital Objects Not Importing
Problem: Files not attaching to records
Solution:
- Check
digitalObjectPathis correct - Verify files exist at specified path
- Use absolute paths or paths relative to Heratio root
Background Job Stuck
Problem: Job shows "running" but no progress
Solution:
# Check Gearman workers
ps aux | grep jobs:worker
# Restart workers
sudo systemctl restart atom-worker
OPEX Rights Not Importing
Problem: Rights not appearing on records
Solution:
- Verify OPEX contains
<SecurityDescriptor>or<dc:rights> - Check
ahg_rights_statementtable for imported rights - Ensure ahgRightsPlugin is enabled
Quick Reference
| Task | Web UI | CLI |
|---|---|---|
| Import CSV | /dataMigration → Upload |
php symfony migration:import file.csv --mapping=X |
| Validate Only | /dataMigration/validate |
php symfony sector:*-csv-import file.csv --validate-only |
| Import Archives | /dataMigration → Upload |
php symfony sector:archives-csv-import file.csv |
| Import Museum | /dataMigration → Upload |
php symfony sector:museum-csv-import file.csv |
| Import Library | /dataMigration → Upload |
php symfony sector:library-csv-import file.csv |
| Import Gallery | /dataMigration → Upload |
php symfony sector:gallery-csv-import file.csv |
| Import DAM | /dataMigration → Upload |
php symfony sector:dam-csv-import file.csv |
| Import OPEX | /dataMigration → Upload |
php symfony preservica:import file.opex |
| Import PAX | /dataMigration → Upload |
php symfony preservica:import file.pax --format=xip |
| Batch Export | /dataMigration/batchExport |
php symfony sector:export --sector=X |
| Export Mapping | Map page → Export | N/A |
| Import Mapping | Map page → Import | N/A |
| Export OPEX | N/A | php symfony preservica:export 123 |
| Export PAX | N/A | php symfony preservica:export 123 --format=xip |
| View Jobs | /dataMigration/jobs |
N/A |
| List Mappings | Dropdown | php symfony migration:import --list-mappings |
Version History
| Version | Changes |
|---|---|
| 1.4.0 | Universal validation framework, sector-specific validators (ISAD-G, Spectrum, MARC/RDA, CCO, Dublin Core), sector CLI import tasks, validation-only mode, sample CSV files, mapping profile export/import |
| 1.3.0 | Added Batch Export UI, Library/Gallery/DAM default mappings, Gearman setup script |
| 1.2.0 | Added Preservica OPEX/PAX support, rights import, provenance import, Gearman jobs |
| 1.1.0 | Added sector-specific CSV exporters |
| 1.0.0 | Initial release with field mapping UI |
Need Help?
- Check
/dataMigration/jobsfor import status - Review error logs:
/log/qubit.log - Contact: support@theahg.co.za