Identification of PDF preservation risks with VeraPDF and JHOVE
The PDF format has a number of features that don’t sit well with the aims of long-term preservation and accessibility. This includes encryption and password protection, external dependencies (e.g. fonts that are not embedded in a document), and reliance on external software. In this post I’ll review to what extent such features can be detected using VeraPDF and JHOVE. It further builds on earlier work I did on this subject between 2012 and 2017.
-
JHOVE
- Escape from the phantom of the PDF
- Multi-image TIFFs, subfiles and image file directories
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- PDF processing and analysis with open-source tools
- Breaking WAVEs (and some FLACs too)
- Why can't we have digital preservation tools that just work?
- A simple JP2 file structure checker
-
PDF
- Escape from the phantom of the PDF
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- On The Significant Properties of Spreadsheets
- PDF processing and analysis with open-source tools
- Policy-based assessment with VeraPDF - a first impression
- PDF/A as a preferred, sustainable format for spreadsheets?
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A
- When (not) to migrate a PDF to PDF/A
- Identification of PDF preservation risks: analysis of Govdocs selected corpus
- Identification of PDF preservation risks with Apache Preflight: the sequel
- What do we mean by "embedded" files in PDF?
- Identification of PDF preservation risks with Apache Preflight: a first impression
- PDF – Inventory of long-term preservation risks
-
preservation-risks
- Escape from the phantom of the PDF
- Multi-image TIFFs, subfiles and image file directories
- Identification of PDF preservation risks with VeraPDF and JHOVE
- On The Significant Properties of Spreadsheets
- PDF processing and analysis with open-source tools
- ISO/IEC TS 22424 standard on EPUB3 preservation
- Does Microsoft OneDrive export large ZIP files that are corrupt?
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A
- Measuring Bigfoot
- Assessing file format risks: searching for Bigfoot?
- PDF – Inventory of long-term preservation risks
- EPUB for archival preservation
-
VeraPDF
- Escape from the phantom of the PDF
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- PDF processing and analysis with open-source tools
- Policy-based assessment with VeraPDF - a first impression
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A