Ever since its start in late 2013, this blog has been hosted on Github Pages, using the Jekyll static site generator. On a technical level this always worked flawlessly, but in the current geopolitical climate I no longer want my site being hosted at a US-based tech giant. After reviewing some options, I decided to migrate the site to Codeberg Pages, which is operated by a non-profit organization that is based in Germany. I also implemented a new comments system that is based on ActivityPub. This allows readers to post comments with a Fediverse (e.g. Mastodon) account.
One of the most elusive items in the Digital Dark Age Crew back catalogue is “Y2K”, which deals with the Year 2000 problem. Originally planned as a December 1999 release, the track was never finished due to a succession of technical problems. Some early demos of “Y2K” have surfaced as bootlegs, and many fans of the group rate these amongst the most sought-after Digital Dark Age Crew tracks.
This post introduces Pdfquad, a software tool that for automated quality assessment for large digitisation batches. The software was developed specifically for the Digital Library for Dutch Literature (DBNL), but it might be adaptable to other users and organisations as well.
In a recent blog post, colleagues at the National Digital Preservation Services in Finland addressed an issue with PDF files that contain strings with octal escape sequences. These are not parsed correctly by JHOVE, and the resulting parse errors ultimately lead to (seemingly unrelated) validation errors. The authors argue that octal escape sequences present a preservation risk, as they may confuse other software besides JHOVE. Since this claim is not backed up by any evidence, here I put this to the test using 8 different PDF processing tools and libraries.
In my previous post I addressed several problems I ran into when I tried to estimate the “last saved” quality level of JPEG images. It described some experiments based on ImageMagick’s quality heuristic, which led to a Python implementation of a modified version of the heuristic that improves the behaviour for images with a quality of 50% or less.
I still wasn’t entirely happy with this solution. This was partially because ImageMagick’s heuristic uses aggregated coefficients of the image’s quantization tables, which makes it potentially vulnerable to collisions. Another concern was, that the reasoning behind certain details of ImageMagick’s heuristic seems rather opaque (at least to me!).
In this post I explore a different approach to JPEG quality estimation, which is based on a straightforward comparison with “standard” JPEG quantization tables using least squares matching. I also propose a measure that characterizes how similar an image’s quantization tables are to its closest “standard” tables. This could be useful as a measure of confidence in the quality estimate. I present some tests where I compare the results of the least squares matching method with those of the ImageMagick heuristics. I also discuss the results of a simple sensitivity analysis.