Heratio Help Center article. Category: Technical.
Heratio AHG Data Migration Tool
Technical Documentation
Version: 1.0.0
Last Updated: January 2026
Author: The Archive and Heritage Group
Table of Contents
- Architecture Overview
- Database Schema
- Component Details
- API Reference
- File Parsers
- Gearman Job System
- Security Considerations
- Extension Points
- Deployment
- Testing
1. Architecture Overview
1.1 System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Browser │ │ CLI │ │ API │ │ Cron │ │
│ │ (Web UI) │ │ (Symfony) │ │ (REST) │ │ (Jobs) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ ahgDataMigrationPlugin │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Actions │ │ Parsers │ │ Services │ │ Tasks │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ - upload │ │ - CSV │ │ - Mapping │ │ - import │ │ │
│ │ │ - map │ │ - Excel │ │ - Transform│ │ - export │ │ │
│ │ │ - preview │ │ - XML │ │ - Import │ │ - info │ │ │
│ │ │ - import │ │ - JSON │ │ - Export │ │ │ │ │
│ │ │ - queueJob │ │ - OPEX │ │ │ │ │ │ │
│ │ │ - jobStatus│ │ - PAX │ │ │ │ │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ atom-framework (Laravel) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Query │ │ Repositories│ │ Services │ │ │
│ │ │ Builder │ │ │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Heratio Core (Symfony 1.x) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Propel ORM │ │ Gearman │ │ Elasticsearch│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ MySQL │ │ Elasticsearch│ │ File System │ │ Gearman │ │
│ │ │ │ │ │ (uploads) │ │ Queue │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
1.2 Directory Structure
/usr/share/nginx/archive/
├── atom-ahg-plugins/
│ └── ahgDataMigrationPlugin/
│ ├── config/
│ │ ├── ahgDataMigrationPluginConfiguration.class.php
│ │ └── routing.yml
│ ├── lib/
│ │ ├── Parsers/
│ │ │ ├── OpexParser.php
│ │ │ └── PaxParser.php
│ │ └── task/
│ │ ├── migrationImportTask.class.php
│ │ ├── preservicaImportTask.class.php
│ │ ├── preservicaExportTask.class.php
│ │ └── preservicaInfoTask.class.php
│ ├── modules/
│ │ └── dataMigration/
│ │ ├── actions/
│ │ │ ├── indexAction.class.php
│ │ │ ├── uploadAction.class.php
│ │ │ ├── mapAction.class.php
│ │ │ ├── previewAction.class.php
│ │ │ ├── importAction.class.php
│ │ │ ├── queueJobAction.class.php
│ │ │ ├── jobStatusAction.class.php
│ │ │ ├── jobProgressAction.class.php
│ │ │ ├── jobsAction.class.php
│ │ │ └── cancelJobAction.class.php
│ │ └── templates/
│ │ ├── indexSuccess.php
│ │ ├── mapSuccess.php
│ │ ├── previewSuccess.php
│ │ ├── jobStatusSuccess.php
│ │ └── jobsSuccess.php
│ └── data/
│ └── fixtures/
├── lib/
│ └── job/
│ └── arMigrationImportJob.class.php
└── uploads/
└── migration/
└── test_data/
1.3 Request Flow
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Upload │────▶│ Parse │────▶│ Map │────▶│ Transform│
│ Action │ │ File │ │ Fields │ │ Data │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
┌───────────────────────────────────────────────────┘
│
▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Preview │────▶│ Import │────▶│ Index │
│ Data │ │ Records │ │ Search │
└──────────┘ └──────────┘ └──────────┘
2. Database Schema
2.1 Entity Relationship Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA MIGRATION SCHEMA │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┐ ┌─────────────────────────┐
│ atom_data_mapping │ │ atom_migration_job │
├─────────────────────────┤ ├─────────────────────────┤
│ PK id │ │ PK id │
│ name │◄──────│ FK mapping_id │
│ description │ │ FK job_id ──────────────┼──┐
│ target_type │ │ name │ │
│ field_mappings (JSON)│ │ target_type │ │
│ created_at │ │ source_file │ │
│ updated_at │ │ source_format │ │
└─────────────────────────┘ │ mapping_snapshot │ │
│ import_options (JSON)│ │
│ status │ │
┌─────────────────────────┐ │ total_records │ │
│ job │ │ processed_records │ │
├─────────────────────────┤ │ imported_records │ │
│ PK id │◄──────│ updated_records │ │
│ name │ │ skipped_records │ │
│ download_path │ │ error_count │ │
│ completed_at │ │ error_log (JSON) │ │
│ FK user_id │ │ progress_message │ │
│ FK object_id │ │ started_at │ │
│ FK status_id │ │ completed_at │ │
│ output │ │ FK created_by │ │
└─────────────────────────┘ │ created_at │ │
▲ └─────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────┘
┌─────────────────────────┐ ┌─────────────────────────┐
│ information_object │ │ keymap │
├─────────────────────────┤ ├─────────────────────────┤
│ PK id │◄──────│ target_id │
│ identifier │ │ target_name │
│ level_of_desc_id │ │ source_id │
│ FK parent_id │ │ source_name │
│ FK repository_id │ └─────────────────────────┘
│ source_culture │
│ lft, rgt │ Used for tracking imported
└─────────────────────────┘ records and updates
│
▼
┌─────────────────────────┐
│ information_object_i18n │
├─────────────────────────┤
│ PK id │
│ PK culture │
│ title │
│ scope_and_content │
│ extent_and_medium │
│ ... (other i18n) │
└─────────────────────────┘
2.2 Table Definitions
atom_data_mapping
Stores saved field mapping configurations.
CREATE TABLE atom_data_mapping (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
target_type VARCHAR(100) NOT NULL,
field_mappings JSON NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY idx_name (name),
INDEX idx_target_type (target_type)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
field_mappings JSON Structure
{
"name": "Vernon CMS (Museum)",
"target_type": "museum",
"fields": [
{
"source_field": "sys_id",
"atom_field": "legacyId",
"constant_value": "",
"concatenate": false,
"include": true,
"concat_constant": false,
"concat_symbol": "|",
"transform": null
},
{
"source_field": "object_name",
"atom_field": "title",
"constant_value": "",
"concatenate": false,
"include": true,
"concat_constant": false,
"concat_symbol": "|",
"transform": null
}
]
}
atom_migration_job
Tracks background import jobs.
CREATE TABLE atom_migration_job (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
job_id INT NULL COMMENT 'Links to Heratio job table',
name VARCHAR(255) NULL,
target_type VARCHAR(100) NOT NULL,
source_file VARCHAR(500) NULL,
source_format VARCHAR(50) NULL,
mapping_id BIGINT UNSIGNED NULL,
mapping_snapshot JSON NULL,
import_options JSON NULL,
status ENUM('pending','running','completed','failed','cancelled') DEFAULT 'pending',
total_records INT DEFAULT 0,
processed_records INT DEFAULT 0,
imported_records INT DEFAULT 0,
updated_records INT DEFAULT 0,
skipped_records INT DEFAULT 0,
error_count INT DEFAULT 0,
error_log JSON NULL,
progress_message VARCHAR(255) NULL,
output_file VARCHAR(500) NULL,
started_at TIMESTAMP NULL,
completed_at TIMESTAMP NULL,
created_by INT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_status (status),
INDEX idx_job_id (job_id),
INDEX idx_created_by (created_by),
INDEX idx_created_at (created_at),
FOREIGN KEY (mapping_id) REFERENCES atom_data_mapping(id) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
import_options JSON Structure
{
"repository_id": 5,
"parent_id": 12345,
"culture": "en",
"update_existing": true,
"match_field": "legacyId",
"image_path": "/path/to/images/",
"first_row_header": 1,
"sheet_index": 0,
"delimiter": "auto"
}
3. Component Details
3.1 Actions
uploadAction.class.php
Handles file upload and initial parsing.
class dataMigrationUploadAction extends sfAction
{
public function execute($request)
{
// 1. Validate administrator access
// 2. Handle file upload
// 3. Detect file format
// 4. Parse headers and sample rows
// 5. Store in session
// 6. Redirect to mapping
}
protected function parseFile($filepath, $ext, $options)
{
// Dispatch to appropriate parser based on extension
// Returns: ['headers' => [], 'rows' => [], 'row_count' => int]
}
}
mapAction.class.php
Displays field mapping interface.
class dataMigrationMapAction extends sfAction
{
public function execute($request)
{
// 1. Load session data
// 2. Get Heratio field definitions for target type
// 3. Load saved mapping if specified
// 4. Auto-match fields
// 5. Pass to template
}
protected function getAtomFields($targetType)
{
// Returns array of available Heratio fields for target type
}
protected function autoMatchFields($sourceFields, $atomFields)
{
// Attempts to match source fields to Heratio fields
}
}
queueJobAction.class.php
Creates background job and queues to Gearman.
class dataMigrationQueueJobAction extends sfAction
{
public function execute($request)
{
// 1. Collect mapping from form
// 2. Create atom_migration_job record
// 3. Create Heratio job record
// 4. Queue Gearman job
// 5. Redirect to status page
}
}
3.2 Gearman Job
arMigrationImportJob.class.php
Background worker for processing imports.
class arMigrationImportJob extends arBaseJob
{
protected $extraRequiredParameters = ['migrationJobId'];
public function runJob($parameters)
{
// 1. Load migration job record
// 2. Parse source file
// 3. Load field mappings
// 4. Process each row:
// a. Transform data
// b. Check for existing record
// c. Create or update
// d. Import digital object if specified
// e. Update progress
// 5. Mark job complete
}
protected function updateMigrationJob($DB, $jobId, $data)
{
// Update progress in database
}
protected function transformRow($row, $headers, $fields)
{
// Apply field mappings to transform row data
}
protected function createRecord($DB, $record, $culture, $repositoryId, $parentId)
{
// Create new information_object and related records
}
}
3.3 File Parsers
OpexParser.php
namespace ahgDataMigrationPlugin\Parsers;
class OpexParser
{
public function parse($filepath)
{
// 1. Load XML
// 2. Register namespaces
// 3. Extract Transfer info
// 4. Extract Properties
// 5. Extract Dublin Core elements
// 6. Return array of records
}
protected function extractDublinCore($xml)
{
// Extract dc:* and dcterms:* elements
}
}
PaxParser.php
namespace ahgDataMigrationPlugin\Parsers;
class PaxParser
{
public function parse($filepath)
{
// 1. Open ZIP archive
// 2. Find metadata.xml or *.xip
// 3. Parse XIP content
// 4. Extract StructuralObjects
// 5. Return array of records
}
protected function parseXipContent($content)
{
// Parse Preservica XIP format
}
}
4. API Reference
4.1 Web Routes
| Route | Method | Action | Description |
|---|---|---|---|
/dataMigration |
GET | index | Upload page |
/dataMigration/upload |
POST | upload | Handle file upload |
/dataMigration/map |
GET/POST | map | Field mapping page |
/dataMigration/preview |
POST | preview | Preview/import data |
/dataMigration/queue |
POST | queueJob | Queue background job |
/dataMigration/jobs |
GET | jobs | List all jobs |
/dataMigration/job/:id |
GET | jobStatus | Job status page |
/dataMigration/job/progress |
GET | jobProgress | AJAX progress endpoint |
/dataMigration/job/cancel |
POST | cancelJob | Cancel running job |
/dataMigration/mapping/save |
POST | saveMapping | Save field mapping |
/dataMigration/mapping/load/:id |
GET | loadMapping | Load saved mapping |
/dataMigration/mapping/delete/:id |
POST | deleteMapping | Delete mapping |
4.2 AJAX Endpoints
GET /dataMigration/job/progress?id={jobId}
Returns JSON with job progress:
{
"id": 42,
"status": "running",
"total_records": 150,
"processed_records": 75,
"imported_records": 72,
"updated_records": 3,
"skipped_records": 0,
"error_count": 0,
"progress_message": "Processing: 75/150 (50%)",
"percent": 50,
"started_at": "2026-01-16 10:30:05",
"completed_at": null
}
POST /dataMigration/job/cancel?id={jobId}
Returns JSON:
{
"success": true
}
Or on error:
{
"success": false,
"error": "Job already completed"
}
4.3 CLI Commands
migration:import
php symfony migration:import [source] [options]
Arguments:
source Path to import file
Options:
--mapping=ID|NAME Mapping ID or name (required)
--list-mappings List available mappings
--repository=ID Repository ID
--parent=ID Parent information object ID
--culture=CODE Culture code (default: en)
--update Update existing records
--match-field=FIELD Match field (legacyId|identifier)
--output=MODE Output mode (import|csv|preview)
--output-file=PATH Output file for CSV export
--dry-run Simulate without changes
--limit=N Limit rows to import
--sheet=N Excel sheet index (0-based)
--skip-header First row is not header
--delimiter=CHAR CSV delimiter (auto|,|;|\t||)
5. File Parsers
5.1 Parser Interface
All parsers implement a common interface:
interface ParserInterface
{
/**
* Parse file and return array of records.
*
* @param string $filepath Path to file
* @return array ['headers' => [], 'rows' => [], 'row_count' => int]
*/
public function parse($filepath);
}
5.2 Supported Formats
| Format | Extension | Parser | Library |
|---|---|---|---|
| CSV | .csv, .txt | Built-in | PHP fgetcsv |
| Excel | .xls, .xlsx | PhpSpreadsheet | PhpOffice |
| XML | .xml | Built-in | SimpleXML |
| JSON | .json | Built-in | json_decode |
| OPEX | .opex, .xml | OpexParser | SimpleXML |
| PAX | .pax, .zip | PaxParser | ZipArchive |
5.3 Adding New Parsers
To add support for a new format:
- Create parser class in
lib/Parsers/:
namespace ahgDataMigrationPlugin\Parsers;
class MyFormatParser
{
public function parse($filepath)
{
$headers = [];
$rows = [];
// Parse file...
return [
'headers' => $headers,
'rows' => $rows,
'row_count' => count($rows),
'format' => 'myformat'
];
}
}
- Register in
uploadAction.class.php:
if ($ext === 'myformat') {
return $this->parseMyFormat($filepath);
}
- Add file extension to upload form accept list.
6. Gearman Job System
6.1 Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Web Request │────▶│ Gearman Server │────▶│ Job Worker │
│ (queueJob) │ │ (gearmand) │ │ (jobs:worker) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ atom_migration │ │ Job Queue │ │ Process Import │
│ _job record │ │ (in memory) │ │ Update Progress │
└─────────────────┘ └─────────────────┘ └─────────────────┘
6.2 Job Lifecycle
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ PENDING │────▶│ QUEUED │────▶│ RUNNING │────▶│COMPLETED │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │
│ ▼ │
│ ┌──────────┐ │
└──────────────────────────▶│ FAILED │◀──────────┘
└──────────┘
│
▼
┌──────────┐
│CANCELLED │
└──────────┘
6.3 Progress Updates
The worker updates progress every 50 records:
if ($processed % 50 === 0 || $processed === $totalRows) {
$percent = round(($processed / $totalRows) * 100);
$this->updateMigrationJob($DB, $jobId, [
'processed_records' => $processed,
'imported_records' => $stats['imported'],
'progress_message' => "Processing: {$processed}/{$totalRows} ({$percent}%)"
]);
}
6.4 Cancellation
Jobs check for cancellation every 100 records:
if ($i % 100 === 0) {
$jobStatus = $DB::table('atom_migration_job')
->where('id', $jobId)
->value('status');
if ($jobStatus === 'cancelled') {
break;
}
}
7. Security Considerations
7.1 Authentication
All actions require administrator privileges:
if (!$this->context->user->isAdministrator()) {
$this->forward('admin', 'secure');
}
7.2 File Upload Security
- Files are saved with random names to prevent path traversal
- File extensions are validated
- Upload directory is outside web root
- Maximum file size enforced by PHP configuration
7.3 SQL Injection Prevention
All database queries use Laravel Query Builder with parameterized queries:
$DB::table('atom_migration_job')
->where('id', $jobId) // Parameterized
->update($data);
7.4 XSS Prevention
All output is escaped in templates:
<?php echo htmlspecialchars($job->name) ?>
8. Extension Points
8.1 Adding New Target Types
- Add to target type list in
indexSuccess.php - Add field definitions in
mapAction.class.php - Add record creation logic in job worker
8.2 Adding New Field Transformations
- Add transform option in
mapSuccess.php - Implement transform logic in
transformRow():
protected function applyTransform($value, $transform, $options)
{
switch ($transform) {
case 'my_transform':
return myTransformFunction($value, $options);
// ...
}
}
8.3 Custom Import Logic
Override the job worker for custom import behavior:
class MyCustomImportJob extends arMigrationImportJob
{
protected function createRecord($DB, $record, $culture, $repositoryId, $parentId)
{
// Custom record creation logic
}
}
9. Deployment
9.1 Requirements
- PHP 8.1+
- MySQL 8.0+
- Gearman Job Server
- PhpSpreadsheet (for Excel support)
- Heratio 2.8+
- atom-framework (Laravel Query Builder)
9.2 Installation
# 1. Plugin is part of atom-ahg-plugins
cd /usr/share/nginx/archive
git clone https://github.com/ArchiveHeritageGroup/atom-ahg-plugins.git
# 2. Create symlink
ln -sf /usr/share/nginx/archive/atom-ahg-plugins/ahgDataMigrationPlugin \
/usr/share/nginx/archive/plugins/ahgDataMigrationPlugin
# 3. Enable plugin
php symfony extension:enable ahgDataMigrationPlugin
# 4. Run database migrations (if any)
mysql -u root archive < atom-ahg-plugins/ahgDataMigrationPlugin/data/install.sql
# 5. Clear cache
php symfony cc
# 6. Ensure Gearman workers are running
systemctl status gearman-job-server
ps aux | grep jobs:worker
9.3 Configuration
No additional configuration required. Plugin uses Heratio's existing:
- Database connection
- File upload settings
- Gearman configuration
9.4 Upgrading
cd /usr/share/nginx/archive/atom-ahg-plugins
git pull origin main
php symfony cc
10. Testing
10.1 Test Data
Test data files are provided in:
/usr/share/nginx/archive/uploads/migration/test_data/
| File | Records | Format | Description |
|---|---|---|---|
| archivesspace_resources.csv | 20 | CSV | Hierarchical archives |
| archivesspace_accessions.csv | 20 | CSV | Accession records |
| archivesspace_agents.csv | 20 | CSV | Authority records |
| vernon_museum.csv | 20 | CSV | Museum objects |
| psis_library.csv | 20 | CSV | Library items |
| preservica_opex.opex | 20 | OPEX | Government gazettes |
| wdb_archives.csv | 20 | CSV | Archives with hierarchy |
10.2 Manual Testing
# Test CLI import
php symfony migration:import --list-mappings
# Dry run
php symfony migration:import /path/test.csv --mapping=10 --dry-run
# Full import
php symfony migration:import /path/test.csv --mapping=10 --limit=5
10.3 Automated Testing
# Run plugin tests (if implemented)
php symfony test:unit ahgDataMigrationPlugin
Appendix A: Field Mapping Reference
A.1 Heratio Information Object Fields
| Field Name | Database Column | Type | Required |
|---|---|---|---|
| identifier | identifier | string | No |
| title | title (i18n) | string | Yes |
| levelOfDescription | level_of_description_id | term | No |
| extentAndMedium | extent_and_medium (i18n) | text | No |
| repository | repository_id | FK | No |
| archivalHistory | archival_history (i18n) | text | No |
| scopeAndContent | scope_and_content (i18n) | text | No |
| arrangement | arrangement (i18n) | text | No |
| accessConditions | access_conditions (i18n) | text | No |
| reproductionConditions | reproduction_conditions (i18n) | text | No |
| language | language (i18n) | string | No |
| findingAids | finding_aids (i18n) | text | No |
| locationOfOriginals | location_of_originals (i18n) | text | No |
| locationOfCopies | location_of_copies (i18n) | text | No |
| generalNote | general_note (i18n) | text | No |
| legacyId | keymap.source_id | string | No |
| parentId | parent_id | FK | No |
| creators | Relation | Actor | No |
| eventDates | Event | Date | No |
| subjectAccessPoints | Relation | Term | No |
| placeAccessPoints | Relation | Term | No |
| nameAccessPoints | Relation | Term | No |
| digitalObjectPath | Digital Object | File | No |
© 2026 The Archive and Heritage Group. All rights reserved.