Optimising archival JP2s for the derivation of access copies

19 August 2013

Like many other organisations that are using JPEG 2000, the KB produces two representations of most of its digitised content (newspapers, books, periodicals):

  • a high-quality, losslessly compressed JP2 that is the archival master;
  • a lesser-quality, lossily compressed JP2 that is used as an access image (this is used for e.g. our newspapers website).

The majority of our digitisation work is contracted out to external suppliers, and both master and access images are typically derived from from a parent (TIFF) image, which is converted to JP2 using the settings for master and access images, respectively. This means that we’re not currently using the archival masters for producing derived images. However, there may be a need for this at some point in the future. For instance, we may need higher quality access images, or access images that give better performance in our access environment. Because of this, I was asked to take a further look into ways to derive access JP2s directly from our archival masters.

In this blog post I’ll be sharing some preliminary findings of this work, which may be of interest to other JPEG 2000 practitioners as well. All images and test results that I’ll be showing along the way are available from this Github repository, so you can have a go at these data yourself, if you’re so inclined.


Identification of PDF preservation risks with Apache Preflight: the sequel

25 July 2013

Last winter I started a first attempt at identifying preservation risks in PDF files using the Apache Preflight PDF/A validator. This work was later followed up by others in two SPRUCE hackathons in Leeds (see this blog post by Peter Cliff) and London (described here). Much of this later work tacitly assumes that Apache Preflight is able to successfully identify features in PDF that are a potential risk for long-term access. This Wiki page on uses and abuses of Preflight (created as part of the final SPRUCE hackathon) even goes as far as stating that “Preflight is thorough and unforgiving (as it should be)”. But what evidence do we have to support such claims? The only evidence that I’m aware of, are the results obtained from a small test corpus of custom-created PDFs. Each PDF in this corpus was created in such a way that it includes only one specific feature that is a potential preservation risk (e.g. encryption, non-embedded fonts, and so on). However, PDFs that exist ‘in the wild’ are usually more complex. Also, the PDF specification often allows you to implement similar features in subtly different ways. For these reasons, it is essential to obtain additional evidence of Preflight’s ability to detect ‘risky’ features before relying on this tool in any operational setting.


ICC profiles and resolution in JP2: update on 2011 D-Lib paper

01 July 2013

It’s been more than two years now since I wrote my D-Lib paper JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format. From time to time people ask me about the status of the issues that are mentioned in that paper, so here’s a long overdue update.


EPUB for archival preservation: an update

23 May 2013

Introduction

Last year (2012) the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report’s findings and conclusions have become outdated. This applies in particular to the observations on EPUB 3, and the support of EPUB by characterisation tools. This blog post provides an update to those findings. It addresses the following topics in particular:

  • Use of EPUB in scholarly publishing
  • Adoption and use of EPUB 3
  • EPUB 3 reader support
  • Support of EPUB by characterisation tools

In the following sections I will briefly summarise the main developments in each of these areas, after which I will wrap up things in a concluding section.


Adventures in Debian packaging

23 April 2013

About a year ago, work started on packaging SCAPE tools. Jpylyzer was the first SCAPE tool that was turned into a Debian package. Some time later, the OPF set up a couple of machine images at Amazon Web Services, which can be used to create packages repeatedly using a virtual machine. Even though I’ve used the Amazon service a couple of times myself, I really know next to nothing about Debian packages, and it’s safe to say that the underlying build process has been more or less a complete mystery to me.

To get a better understanding of the process for building Debian packages, I had a try at packaging jpylyzer on my local machine (which runs on Linux Mint 14). Some time ago Dave Tarrant and Rui Castro wrote a nice step-by-step guide on building Debian packages on the OPF Wiki, so I tried to follow the instructions there. While working on this, I made some notes, mainly to remind myself of what I was doing. Then I realised that some of this might be useful to others as well, so I decided to turn it into a blog post.



Search

Tags

Archive

2024

December

November

October

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM