Dutch newspaper wipes out articles citing fabricated sources - Internet Archive to the rescue!
Shortly before Christmas, Dutch daily newspaper Trouw removed 126 articles from its website. These articles were all authored by Perdiep Ramesar, a former journalist of the newspaper. Ramesar had been fired by Trouw in November, after it turned out that many of the sources that are cited in his articles were fabricated. The most notorious example was a series of pieces about the so-called “Sharia Triangle”, a neighbourhood in the city of The Hague, which Ramesar claimed was being ruled by Sharia law. As it turned out, this story was largely based on fabricated sources. Nevertheless, it was taken at face value by most major Dutch news outlets at the time, and even prompted a parliamentary debate.
Trouw’s decision to remove the 126 articles overnight was met with considerable criticism. For example, historian Jan Dirk Snel noted that the removal of these articles makes it impossible to check what was wrong with them in the first place. Various other critics accused Trouw of trying to rewrite history.
Internet Archive to the rescue?
A quick check on a handful of Ramesar’s articles revealed that quite a few were still accessible from the Internet Archive’s Wayback Machine. This got me curious how many out of the 126 deleted articles would still be available there. Answering this question isn’t completely straightforward, because the Wayback Machine isn’t easily searchable. In order to locate any of the deleted articles, one first needs to know its original URL (i.e. the one at Trouw’s website). A list of all deleted articles does exist, but this only provides each article’s title, without listing the full URL.
However, by entering each title into a search engine (I used a combination of Google and DuckDuckGo1), I was able to recover the original URL of every article in the list. In many cases the URLs were still present in the cache of the search engine. In other cases URLs could be recovered from linking pages on the Trouw website. I then wrote a simple script to check the availability of each URL in Internet Archive’s Wayback Machine. The script is just a wrapper around Wayback’s Availability JSON API, which is insanely handy (and really easy to use as well!). This yielded a list with -for each article- its status in Wayback (i.e. has it been archived), and, if so, the URL to the most recent capture.
Result
The results of the above exercise are summarised in this table. As it turned out, 53 out of the 126 deleted articles are still accessible from the Internet Archive. These are mostly pieces that were written from 2010 onward, and include the notorious “Sharia Triangle” ones. From the time period 2007-2009 very few articles could be found.
Possibly more?
It may be possible that more removed articles are hidden in the Internet Archive. This is because of the way the Trouw website handles news items. If I understand things correctly, articles in Trouw are often first published under a news URL; subsequently it is moved to the archive section of the website, where it is published under a different URL. By way of illustration, a DuckDuckGo search of the article Ik kan mezelf niet veranderen in een witte man yielded the following URL:
This archive link (recognisable from the word archief in the URL) cannot be found anywhere in Internet Archive. By chance I encountered a different link to the same article on the website of historian Jan Dirk Snel:
This is the news link under which the article was first published, and a snapshot of it exists in the Internet Archive:
Likewise, I expect that some articles may have slipped through the net in a similar way. Nevertheless, I think the above results are pretty good as they are!
Links
- Data as comma-separated text file (UTF-8)
- Github repo with scripts and raw data files
- Github repo (as single ZIP file)
-
In order to get unscrambled links from Google, I used the following FireFox add-on: https://palant.de/2011/11/28/google-yandex-search-link-fix ↩
-
web-archiving
- How to preserve your personal Twitter archive
- Mapping the Dutch web domain
- Restoring Liesbet's Virtual Home, a digital treasure from the early Dutch web
- Web domain geolocation and spatial analysis with QGIS
- Crawling offline web content: the NL-menu case
- Resurrecting the first Dutch web index: NL-menu revisited
- Dutch newspaper wipes out articles citing fabricated sources - Internet Archive to the rescue!
- Perdiep Ramesar in het Internet Archive
- Demise of the Dutch Blogosphere
- How to save a web page to the Internet Archive
Comments