Jpylyzer 2015 round-up

08 December 2015

Yesterday (7 December) we released version 1.16.0 of the jpylyzer tool, which is this year’s third release of the software (excluding bugfix releases). This blog post gives a brief overview of the main jpylyzer improvements that have been implemented over this year.

Changes in XML output

The 1.14 release introduced two output improvements. Most importantly, an XML Schema Definition (XSD) was created. The schema formally defines the output format, and it also makes it possible to validate output files. In addition, a namespace declaration was added. These changes make the post-processing of jpylyzer’s output more straightforward.

The 1.16 release added the statusInfo element, which tells you whether the validation completed without any internal errors. It contains the following sub-elements:

  • success: a Boolean flag that indicates whether the validation attempt completed normally (“True”) or not (“False”). A value of “False” indicates an internal error that prevented jpylyzer from validating the file.

  • failureMessage: if the validation attempt failed (value of success equals “False”), this field gives further details about the reason of the failure.

This means that the general structure of the output now looks like this:

Jplylyzer output structure

Recursive traversal of directory trees

Another feature that was introduced with the 1.14 release is the --recurse option. This allows one to recursively traverse a directory tree. The code for this feature was created by Adam Retter, Jaishree Davey and Laura Damian of The National Archives (UK).

Memory mapping

The 1.15 release introduced the use of memory mapping for reading input images. This results in better performance when processing (very) large files. Images that would cause a memory error in previous versions are now handled without any problem. Also, the processing of very large files can be significantly faster than in earlier releases, and is less prone to freezing other processes that are simultaneously running on the machine. This improvement was suggested by Stefan Weil of Mannheim University Library, and the changes are based on a patch he submitted.

Two examples illustrate the benefits of this change:

  • This 2 GB image resulted in a memory error with jpylyzer 1.14.2 on a Windows machine with 4 GB RAM. The latest versions process the file without problems.

  • On a Linux Mint machine with 8 GB RAM, this 6.7 GB image also resulted in a memory error. Again, the current version handles the file without any problem.

This doesn’t mean that memory errors are now a thing of the past entirely; they may still occur under some circumstances. For instance, a test with the 6.7 GB image failed on a Linux Mint machine with 4 GB RAM. So it seems prudent to make sure that the amount of available RAM always exceeds the maximum image size by a fairly wide safety margin. Also, chip architecture and operating system may put further constraints on the amount of memory than can be mapped at a time.

Improved exception handling

Prior to release 1.16.0, an exception during the processing of an image could cause jpylyzer to crash. For example, an extremely large image can result in an internal memory error, and this would grind jpylyzer to a halt. This is particularly problematic when using the new --recurse option: in this case a single jpylyzer invocation may involve the processing of thousands of images at a time. One single (e.g. extremely large) image could then result in unusable output; moreover, it would be difficult to identify which image caused the crash in the first place! Release 1.16.0 introduces improved exception handling that allows jpylyzer to handle such situations more gracefully.

Robustness

The combined effect of the exception handling, memory mapping and status output should make jpylyzer releases from 1.16.0 onwards significantly more robust than previous versions. As an example, here’s some (simplified) output for a 6.5 GB JP2 that caused a memory error:

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer>
    <toolInfo>
        <toolName>jpylyzer.py</toolName>
        <toolVersion>1.16.0</toolVersion>
    </toolInfo>
    <fileInfo>
        <fileName>AS16-P-4102.jp2</fileName>
        <filePath>/home/johan/testJpylyzer/AS16-P-4102.jp2</filePath>
        <fileSizeInBytes>6745365021</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 20:05:29 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>False</success>
        <failureMessage>memory error (file size too large)</failureMessage>
    </statusInfo>
    <isValidJP2>False</isValidJP2>
    <tests/>
    <properties/>
</jpylyzer>

Previous versions would simply crash in this situation. Now, automated workflows can simply check for the value of the success field to verify the status of the validation. More importantly, if the jpylyzer invocation involved multiple input files (e.g. through the --recurse option), errors like these will not stop the processing of the remaining files.

64-bit Windows binaries

Finally, from version 1.15.1 onwards we are now providing 64 bit Windows binaries of jpylyzer (previously only 32-bit binaries were available).

Jpylyzer website


Originally published at the KB Research blog



Comments

Post a comment by replying to this post using your ActivityPub (e.g. Mastodon) account.

    Search

    Tags

    Archive

    2025

    April

    2024

    December

    November

    October

    March

    2023

    June

    May

    March

    February

    January

    2022

    November

    June

    April

    March

    2021

    September

    February

    2020

    September

    June

    April

    March

    February

    2019

    September

    April

    March

    January

    2018

    July

    April

    2017

    July

    June

    April

    January

    2016

    December

    April

    March

    2015

    December

    November

    October

    July

    April

    March

    January

    2014

    December

    November

    October

    September

    August

    January

    2013

    October

    September

    August

    July

    May

    April

    January

    2012

    December

    September

    August

    July

    June

    April

    January

    2011

    December

    September

    July

    June

    2010

    December

    Feeds

    RSS

    ATOM