Identification of PDF preservation risks with Apache Preflight: a first impression

19 December 2012

The PDF format contains various features that may make it difficult to access content that is stored in this format in the long term. Examples include (but are not limited to):

  • Encryption features, which may either restrict some functionality (copying, printing) or make files inaccessible altogether.
  • Multimedia features (embedded multimedia objects may be subject to format obsolescence)
  • Reliance on external features (e.g. non-embedded fonts, or references to external documents)

Automated assessment of JP2 against a technical profile

04 September 2012

I’ve already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints. The process that compares the technical characteristics of a file against a format profile is sometimes called Policy Driven Validation. This corresponds to what JHOVE2 refers to as Assessment (which I think is a better description).

This blog post describes a simple method for doing a rule-based assessment of JP2 images. It uses Schematron, which is a rule-based validation language, to ‘validate’ the output of jpylyzer against a profile. Before getting into any technical details, let’s first have a look at an example of a format profile.


Magic editing and creation: a primer

09 August 2012

The purpose of this post is to give a brief introduction to creating, editing and submitting format signatures (or ‘magic’ entries) for the well-known File tool. The occasion for this was some work I did last week on improving File’s identification of the JPEG 2000 formats. I had some difficulty finding any easy-to-follow documentation that describes how to do this. The information is all out there, but it’s pretty fragmented. So, I wrote this brief tutorial, which is intended as an accessible introduction to magic editing. It only covers the very basics, but hopefully this is enough to overcome some initial stumbling blocks.


PDF – Inventory of long-term preservation risks

26 July 2012

In this blog post I’ll be dusting off some old stuff for a change. The occasion for this is the following question, posted by Paul Wheatley on the Libraries and Information Science Stack Exchange website a few days ago:

What preservation risks are associated with the PDF file format?


EPUB for archival preservation

18 June 2012

Over the last few years, the EPUB format has gained widespread popularity in the consumer market. The KB has been approached by a number of publishers that wish to use EPUB for delivering some of their electronic publications. Surprisingly little information is available on the format’s suitability for archival preservation, apart from Library of Congress’ Sustainability of Digital Formats web pages, which contain entries on EPUB 2 and EPUB 3.

So, the KB’s Departments of Collection and Collection Care requested a more detailed investigation of EPUB’s preservation credentials. More specifically, answers were needed to the following questions:

  • What are the main characteristics of EPUB?

  • What functionality does EPUB provide, and is this sufficient for representing e.g. content with sophisticated layout and typography requirements?

  • How well is the EPUB supported by software tools that are used in (pre-)ingest workflows?

  • How suitable is EPUB for archival preservation? What are the main risks?



Search

Tags

Archive

2024

November

October

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM