Generating lossy access JP2s from lossless preservation masters

30 March 2022

Plumbers Tool Box — Intensive Breeding by Jean Marc Cote, Public domain, via Wikimedia Commons.

At the KB we’ve been using JP2 (JPEG 2000 Part 1) as our primary image format for digitised newspapers, books and periodicals since 2007. The digitisation work is contracted out to external vendors, who supply the digitised pages as losslessly compressed preservation masters, as well as lossily compressed access images that are used within the Delpher platform.

Right now the KB is in the process of migrating its digital collections to a new preservation system. This prompted the question whether it would be feasible to generate access JP2s from the preservation masters in-house at some point in the future, using software that runs inside the preservation system¹. As a first step towards answering that question, I created some simple proof of concept workflows, using three different JPEG 2000 codecs. I then tested these workflows with preservation master images from our collection. The main objective of this work was to find a workflow that both meets our current digitisation requirements, and is also sufficiently performant.

On The Significant Properties of Spreadsheets

24 September 2021

Clippy saying It looks like you're migrating a spreadsheet to ... TIFF?!

Earlier this month saw the publication of The Significant Properties of Spreadsheets. This is the final report of a six-year research effort by the Open Preservation Foundation’s Archives Interest Group (AIG), which is composed of participants from the National Archives of the Netherlands (NANETH), the National Archives of Estonia (NAE), the Danish National Archives (DNA), and Preservica. The report caught my attention for two reasons. First, there’s the subject matter of spreadsheets, on which I’ve written a few posts in the past¹. Second, it marks a surprising (at least to me!) return of “significant properties”, a concept that was omnipresent in the digital preservation world between, roughly, 2005 and 2010, but which has largely fallen into disuse since then. In this post I’m sharing some of my thoughts on the report.

PDF processing and analysis with open-source tools

06 September 2021

Photo of assortment of old plumbing tools. — Plumbers Tool Box by pszz on Flickr. Used under CC BY-NC-SA 2.0.

Over the years, I’ve been using a variety of open-source software tools for solving all sorts of issues with PDF documents. This post is an attempt to (finally) bring together my go-to PDF analysis and processing tools and commands for a variety of common tasks in one single place. It is largely based on a multitude of scattered lists, cheat-sheets and working notes that I made earlier. Starting with a brief overview of some general-purpose PDF toolkits, I then move on to a discussion of the following specific tasks:

Validation and integrity testing
PDF/A and PDF/UA compliance testing
Document information and metadata extraction
Policy/profile compliance testing
Text extraction
Link extraction
Image extraction
Conversion to other (graphics) formats
Inspection of embedded image information
Conversion of multiple images to PDF
Cross-comparison of two PDFs
Corrupted PDF repair
File size reduction of PDF with hi-res graphics
Inspection of low-level PDF structure
View, search and extract low-level PDF objects
Incremental updates and document versions: get information about the number of incremental updates, and restore previous versions

Towards a preservation workflow for mobile apps

24 February 2021

Satellite image of Wadden Sea — Production photo from "2001: A Space Odyssey". ©Stanley Kubrick Archives/TASCHEN.

My previous post addressed the emulation of mobile Android apps. In this follow-up, I’ll explore some other aspects of mobile app preservation, with a focus on acquisition and ingest processes. The 2019 iPres paper on the Acquisition and Preservation of Mobile eBook Apps by Maureen Pennock, Peter May and Michael Day again was the departure point. In its concluding section, they recommend:

In terms of target formats for acquisition, we reach the undeniable conclusion that acquisition of the app in its packaged form (either an IPA file or an APK file) is optimal for ensuring organisations at least acquire a complete published object for preservation.

And:

[T]his form should at least also include sufficient metadata about inherent technical dependencies to understand what is needed to meet them.

In practical terms, this means that the workflows that are used for acquisition and (pre-)ingest must include components that are able to deal with the following aspects:

Acquisition of the app packages (either by direct deposit from the publisher, or using the app store).
Identification of the package format (APK for Android, IPA for iOS).
Identification of metadata about the app’s technical dependencies.

The main objective of this post is to get an idea of what would be needed to implement these components. Is it possible to do all of this with existing tools? If not so, what are the gaps? The underlying assumption here is an emulation-based preservation strategy¹.

Four Android emulators, two apps

09 February 2021

Header image — "Android Robot" by Google Inc., used under CC BY 3.0, via Wikimedia Commons.

So far the KB hasn’t actively pursued the preservation of mobile apps. However, born-digital publications in app-only form have become increasingly common, as well as “hybrid” publications, with apps that are supplemental to traditional (paper) books. At the request of our Digital Preservation department, I’ve started some exploratory investigations into how to preserve mobile apps in the near future. The 2019 iPres paper on the Acquisition and Preservation of Mobile eBook Apps by the British Library’s Maureen Pennock, Peter May and Michael Day provides an excellent starting point on the subject, and it highlights many of the challenges involved.

Before we can start archiving mobile apps ourselves, some additional aspects need to be addressed in more detail. One of these is the question of how to ensure long-term access. Emulation is the obvious strategy here, but I couldn’t find much information on the emulation of mobile platforms within a digital preservation context. In this blog post I present the results of some simple experiments, where I tried to emulate two selected apps. The main objective here was to explore the current state of emulation of mobile devices, and to get an initial impression of the suitability of some existing emulation solutions for long-term access.

For practical reasons I’ve limited myself to the Android platform¹. Attentive readers may recall I briefly touched on this subject back in 2014. As much of the information in that blog post has now become outdated, this new post presents a more up-to date investigation. I should probably mention here that I don’t own or use any Android device, or any other kind of smartphone or tablet for that matter². This probably makes me the worst possible person to evaluate Android emulation, but who’s going to stop me trying anyway? No one, that’s who!

« Newer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Older »

Search

Tags

Archive

February

June

May

April

December

November

October

March

June

May

March

February

January

November

June

April

March

September

February

September

June

April

March

February

September

April

March

January

July

April

July

June

April

January

December

April

March

December

November

October

July

April

March

January

December

November

October

September

August

January

October

September

August

July

May

April

January

December

September

August

July

June

April

January

December

September

July

June

December

Issues

Social

Feeds