EPUB for archival preservation
Over the last few years, the EPUB format has gained widespread popularity in the consumer market. The KB has been approached by a number of publishers that wish to use EPUB for delivering some of their electronic publications. Surprisingly little information is available on the format’s suitability for archival preservation, apart from Library of Congress’ Sustainability of Digital Formats web pages, which contain entries on EPUB 2 and EPUB 3.
So, the KB’s Departments of Collection and Collection Care requested a more detailed investigation of EPUB’s preservation credentials. More specifically, answers were needed to the following questions:
-
What are the main characteristics of EPUB?
-
What functionality does EPUB provide, and is this sufficient for representing e.g. content with sophisticated layout and typography requirements?
-
How well is the EPUB supported by software tools that are used in (pre-)ingest workflows?
-
How suitable is EPUB for archival preservation? What are the main risks?
-
EPUB
- Extracting text from EPUB files in Python
- ISO/IEC TS 22424 standard on EPUB3 preservation
- Valid, but not accessible: crazy fixed EPUB layouts
- The future of EPUB? A first look at the EPUB 3.1 Editor’s draft
- Policy-based assessment of EPUB with Epubcheck
- EPUB for archival preservation: an update
- EPUB for archival preservation
-
preservation-risks
- Escape from the phantom of the PDF
- Multi-image TIFFs, subfiles and image file directories
- Identification of PDF preservation risks with VeraPDF and JHOVE
- On The Significant Properties of Spreadsheets
- PDF processing and analysis with open-source tools
- ISO/IEC TS 22424 standard on EPUB3 preservation
- Does Microsoft OneDrive export large ZIP files that are corrupt?
- Why PDF/A validation matters, even if you don't have PDF/A - Part 2
- Why PDF/A validation matters, even if you don't have PDF/A
- Measuring Bigfoot
- Assessing file format risks: searching for Bigfoot?
- PDF – Inventory of long-term preservation risks
- EPUB for archival preservation