Combining large TIFF sets into a PDF/A (Heratio)
Why
Combining many scanned TIFF pages into one PDF used to load every page into a
single ImageMagick process, which ran out of memory on large documents (e.g.
hundreds of 40 MB+ scans). The merge is now memory-safe: each page is
converted to its own single-page PDF (bounded limits + JPEG compression),
concatenated with qpdf in batches, then a single Ghostscript pass produces the
PDF/A (the archival target — always applied). Peak memory is about one page's
worth regardless of page count.
The web tool
PDF Tools → Merge (/pdf-tools/merge) merges uploaded files and streams back a
PDF/A. Note the browser-upload size cap (about 100 MB per file): for very large
documents that arrive by FTP, use the command line below instead.
Large volumes from the command line
When a folder of page TIFFs is already on the server (e.g. delivered by FTP), combine it without any upload limit:
cd /usr/share/nginx/heratio
# produce a standalone PDF/A
sudo -u www-data php artisan ahg:pdf-combine /path/to/folder --out=/path/to/output.pdf --dpi=200
# or combine AND attach to a record as a master digital object
sudo -u www-data php artisan ahg:pdf-combine /path/to/folder --id=12345 --dpi=200
Pages are ordered by filename (natural sort). Options: --out, --dpi (default
200), --quality (default 85), --id (attach to an information object),
--no-web (skip the web derivative). Run as www-data so any attached file
lands with the right ownership.
When attached (--id), the combined PDF/A is a master digital object and the
command immediately creates its fast-loading web derivative (via
ahg:optimize-pdfs), so the big document opens page-1-fast in the viewer without
waiting for the daily optimisation pass. Pass --no-web to skip that step.
Requirements
Ghostscript, qpdf and ImageMagick on the host:
sudo apt-get install -y ghostscript qpdf imagemagick
If Ghostscript is missing the merge still produces a normal merged PDF (no PDF/A conversion) and logs a warning.