VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
Last month I wrote this post, which addresses the use of JHOVE and VeraPDF for identifying preservation risks in PDF files. In the concluding section I suggested that VeraPDF’s parse status might be used as a rough “validity proxy” to identify malformed PDFs. But does VeraPDF’s parse status actually have any predictive value for rendering? And how does this compare to what JHOVE tells us? This post is a first attempt at answering these questions, using data from the Synthetic PDF Testset for File Format Validation by Lindlar, Tunnat and Wilson.
-
JHOVE
- Escape from the phantom of the PDF
- Multi-image TIFFs, subfiles and image file directories
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- PDF processing and analysis with open-source tools
- Breaking WAVEs (and some FLACs too)
- Why can't we have digital preservation tools that just work?
- A simple JP2 file structure checker
-
PDF
- PDF Quality assessment for digitisation batches with Python, PyMuPDF and Pillow
- Escape from the phantom of the PDF
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- On The Significant Properties of Spreadsheets
- PDF processing and analysis with open-source tools
- Policy-based assessment with VeraPDF - a first impression
- PDF/A as a preferred, sustainable format for spreadsheets?
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A
- When (not) to migrate a PDF to PDF/A
- Identification of PDF preservation risks: analysis of Govdocs selected corpus
- Identification of PDF preservation risks with Apache Preflight: the sequel
- What do we mean by "embedded" files in PDF?
- Identification of PDF preservation risks with Apache Preflight: a first impression
- PDF – Inventory of long-term preservation risks
-
VeraPDF
- Escape from the phantom of the PDF
- VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset
- Identification of PDF preservation risks with VeraPDF and JHOVE
- PDF processing and analysis with open-source tools
- Policy-based assessment with VeraPDF - a first impression
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A