Evaluation of identification tools: first results from SCAPE

21 September 2011

As I already briefly mentioned in a previous blog post, one of the objectives of the SCAPE project is to develop an architecture that will enable large scale characterisation of digital file objects. As a first step, we are evaluating existing characterisation tools. The overall aim of this work is twofold. First, we want to establish which tools are suitable candidates for inclusion in the SCAPE architecture. As the enhancement of existing tools is another goal of SCAPE, the evaluation is also aimed at getting a better idea of the specific strengths and weaknesses of each individual tool. The outcome of this will be helpful for deciding what modifications and improvements are needed. Also, many of these tools are widely used outside of the SCAPE project, which means that the results will most likely be relevant to a wider audience (including the original tool developers).

Evaluation of identification tools

Over the last months, work on this has focused on format identification tools. This has resulted in a report which is attached with this blog post. We have evaluated the following tools:

All tools were evaluated against a set of 22 criteria. Extensive testing using real data has been a key part of the work. One area which, I think, we haven’t been able to tackle sufficiently so far is the accuracy of the tools. This is problematic, since it would require a test corpus where the format of each file object is known a priori. In most large data sets this information will be derived from the very same tools that we are trying to test, so we need to see if we can say anything meaningful about this in a follow-up.

Involvement of tool developers

Over the previous months we’ve been sending out earlier drafts of this document to the developers of DROID, FIDO, FITS and JHOVE2, and we have received a lot of feedback to this. In the case of FIDO, a new version is underway, and this should correct most (if not all) of the problems that are mentioned in the report. For the other tools we have also received confirmation that some of the found issues will be fixed in upcoming releases.

Status of the report and future work

The attached report should be seen as a living document. There will probably be one or more updates at some later point, and we may decide to include more tests using additional data. Meanwhile, as always, we appreciate any of your feedback on this!

Evaluation of characterisation tools – Part 1: Identification


Originally published at the Open Preservation Foundation blog




Search

Tags

Archive

2024

November

October

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM