21 September 2011
As I already briefly mentioned in a previous blog
post,
one of the objectives of the
SCAPE project is to develop an
architecture that will enable large scale characterisation of digital
file objects. As a first step, we are evaluating existing
characterisation tools. The overall aim of this work is twofold. First,
we want to establish which tools are suitable candidates for inclusion
in the SCAPE architecture. As the enhancement of existing tools is
another goal of SCAPE, the evaluation is also aimed at getting a better
idea of the specific strengths and weaknesses of each individual tool.
The outcome of this will be helpful for deciding what modifications and
improvements are needed. Also, many of these tools are widely used
outside of the SCAPE project, which means that the results will most
likely be relevant to a wider audience (including the original tool
developers).
-
DROID
-
Fido
-
FITS
-
format-identification
-
JHOVE2
-
unix-file
01 September 2011
Over the last few weeks I’ve been working on the design of a workflow
that the KB is planning to use for the migration of a collection of
(mostly old) TIFF images to JP2. One major risk of such a migration is
that hardware failures during the migration process may result in
corrupted images. For instance, one could imagine a brief network or
power interruption that occurs while an image is being written to disk.
In that case data may be missing from the written file. Ideally we would
be able to detect such errors using format validation tools such as
JHOVE. Some time ago Paul Wheatley
reported that the BL at some point were dealing with corrupted,
incomplete JP2 files that were nevertheless deemed “well-formed and
valid” by JHOVE. So I started doing some experiments in which I
deliberately butchered up some images, and subsequently checked to what
extent existing tools would detect this.
-
JHOVE
-
JP2
-
jpeg-2000
-
jpylyzer
11 July 2011
As a part of the SCAPE project, I’m
currently heavily involved in the evaluation of various file format
identification tools. The overall aim of this work is to determine which
tools are suitable candidates for inclusion in the SCAPE architecture.
In addition, we’re also trying to get a better idea of each tool’s
specific strengths and weaknesses, which will hopefully serve as useful
input to the developers community. We’re actually planning to publish
the first results of this work on the OPF blog some time soon, so you
may want to keep your eyes peeled for that.
-
DROID
-
Fido
-
format-identification
-
unix-file
06 June 2011
The JPEG 2000 compression standard is steadily becoming more and more
popular in the archival community. Several large (national) libraries
are now using the JP2
format (which
corresponds to Part 1 of the standard) as the master format in mass
digitisation projects. However, some aspects of the JP2 file format are
defined in ways that are open to multiple interpretations. This applies
to the embedding of ICC
profiles (which
are used to define colour space information), and the definition of grid
resolution. This situation has lead to a number of interoperability
issues that are potential risks for long-term preservation.
02 December 2010
In my presentation
during the Wellcome Trust’s JPEG 2000 seminar I discussed the suitability of JPEG 2000
(and more specifically its JP2 format) for long-term preservation. I
highlighted the erroneous restriction in the JP2 (and JPX) format
specification that only allows ICC profiles of the ‘input’ class to be
used. This effectively prohibits the use of all working colour spaces
such as Adobe RGB, which are defined using ‘display device’ profiles. I
also showed how different software vendors interpret the format
specification in subtly different ways, and how such issues can create
problems in the long term, such as the loss of colour space and
resolution information after some future migration.