Evaluation of identification tools: first results from SCAPE

21 September 2011

As I already briefly mentioned in a previous blog post, one of the objectives of the SCAPE project is to develop an architecture that will enable large scale characterisation of digital file objects. As a first step, we are evaluating existing characterisation tools. The overall aim of this work is twofold. First, we want to establish which tools are suitable candidates for inclusion in the SCAPE architecture. As the enhancement of existing tools is another goal of SCAPE, the evaluation is also aimed at getting a better idea of the specific strengths and weaknesses of each individual tool. The outcome of this will be helpful for deciding what modifications and improvements are needed. Also, many of these tools are widely used outside of the SCAPE project, which means that the results will most likely be relevant to a wider audience (including the original tool developers).

Evaluation of identification tools

Over the last months, work on this has focused on format identification tools. This has resulted in a report which is attached with this blog post. We have evaluated the following tools:

All tools were evaluated against a set of 22 criteria. Extensive testing using real data has been a key part of the work. One area which, I think, we haven’t been able to tackle sufficiently so far is the accuracy of the tools. This is problematic, since it would require a test corpus where the format of each file object is known a priori. In most large data sets this information will be derived from the very same tools that we are trying to test, so we need to see if we can say anything meaningful about this in a follow-up.

Involvement of tool developers

Over the previous months we’ve been sending out earlier drafts of this document to the developers of DROID, FIDO, FITS and JHOVE2, and we have received a lot of feedback to this. In the case of FIDO, a new version is underway, and this should correct most (if not all) of the problems that are mentioned in the report. For the other tools we have also received confirmation that some of the found issues will be fixed in upcoming releases.

Status of the report and future work

The attached report should be seen as a living document. There will probably be one or more updates at some later point, and we may decide to include more tests using additional data. Meanwhile, as always, we appreciate any of your feedback on this!

Evaluation of characterisation tools – Part 1: Identification


Originally published at the Open Preservation Foundation blog



Comments

Post a comment by replying to this post using your ActivityPub (e.g. Mastodon) account.

    Search

    Tags

    Archive

    2025

    April

    2024

    December

    November

    October

    March

    2023

    June

    May

    March

    February

    January

    2022

    November

    June

    April

    March

    2021

    September

    February

    2020

    September

    June

    April

    March

    February

    2019

    September

    April

    March

    January

    2018

    July

    April

    2017

    July

    June

    April

    January

    2016

    December

    April

    March

    2015

    December

    November

    October

    July

    April

    March

    January

    2014

    December

    November

    October

    September

    August

    January

    2013

    October

    September

    August

    July

    May

    April

    January

    2012

    December

    September

    August

    July

    June

    April

    January

    2011

    December

    September

    July

    June

    2010

    December

    Feeds

    RSS

    ATOM