PDF – Inventory of long-term preservation risks

26 July 2012

In this blog post I’ll be dusting off some old stuff for a change. The occasion for this is the following question, posted by Paul Wheatley on the Libraries and Information Science Stack Exchange website a few days ago:

What preservation risks are associated with the PDF file format?

Report

This reminded me of a report I wrote on this very subject back in 2009. (Incidentally this was my very first foray into the wacky world of digital preservation, but that’s another story.) Originally this document was intended for internal use at the KB, but looking at it again, I think it may be of interest to a wider audience. It also aligns quite nicely with the upcoming work on a knowledge base of file-format related risks that will be done as part of the SCAPE project. The main idea here is to take a file format, identify its main (preservation-related) risks, and describe how “risky” features can be detected by existing (characterisation) tools. In fact I was envisaging something along these lines when I wrote PDF report in 2009, but other things got in the way, and I never got round to the final step. The SCAPE work should finally make this happen.

Although the work on the knowledge base is still in its early stages, some very first results can be found here. The initial focus will be on JPEG 2000 (JP2/JPX) and PDF.

As for the report, I should add that some of it is a little rough around the edges, and you may note some gaps and not-quite-finished bits. This is also why we never released this first time around. Also, one aspect that is not well covered is PDF’s potential for transmitting viruses and other malware. Nevertheless, as a general introduction to the format and an overview of its main risks I think it’s not too shabby, but I’ll let you be the judge of that! As always, feel free to use the comment fields for you feedback and suggestions.

Link to report

Adobe Portable Document Format - Inventory of long-term preservation risks, KB/ National Library of the Netherlands

Originally published at the Open Preservation Foundation blog

Comments

Post a comment by replying to this post using your ActivityPub (e.g. Mastodon) account.

PDF – Inventory of long-term preservation risks

Report

Link to report

Comments

About

Search

Tags

Archive

February

June

May

April

December

November

October

March

June

May

March

February

January

November

June

April

March

September

February

September

June

April

March

February

September

April

March

January

July

April

July

June

April

January

December

April

March

December

November

October

July

April

March

January

December

November

October

September

August

January

October

September

August

July

May

April

January

December

September

August

July

June

April

January

December

September

July

June

December

Issues

Report a problem with this site

Hackers Hall of Fame