bitsgalore.org

compare

compare -quiet -metric AE source.tif fromjp2.tif null:

exiftool -X source.tif > source.xml

 <IFD1:SubfileType>Reduced-resolution image</IFD1:SubfileType>
 <IFD1:ImageWidth>160</IFD1:ImageWidth>
 <IFD1:ImageHeight>126</IFD1:ImageHeight>
 <IFD1:BitsPerSample>8 8 8</IFD1:BitsPerSample>
 <IFD1:Compression>Uncompressed</IFD1:Compression>
 <IFD1:PhotometricInterpretation>RGB</IFD1:PhotometricInterpretation>
 <IFD1:StripOffsets>6510</IFD1:StripOffsets>
 <IFD1:SamplesPerPixel>3</IFD1:SamplesPerPixel>
 <IFD1:RowsPerStrip>126</IFD1:RowsPerStrip>
 <IFD1:StripByteCounts>60480</IFD1:StripByteCounts>
 <IFD1:PlanarConfiguration>Chunky</IFD1:PlanarConfiguration>
 <IFD1:ThumbnailTIFF>(Binary data 60696 bytes, use -b option to extract)</IFD1:ThumbnailTIFF>

 <IFD0:ImageWidth>9458</IFD0:ImageWidth>
 <IFD0:ImageHeight>7429</IFD0:ImageHeight>
 <IFD0:BitsPerSample>8 8 8</IFD0:BitsPerSample>
 <IFD0:Compression>Uncompressed</IFD0:Compression>
 <IFD0:PhotometricInterpretation>RGB</IFD0:PhotometricInterpretation>
 <IFD0:Make>Leaf</IFD0:Make>
 <IFD0:Model>Leaf Aptus-II 12R(LI201033   )/Other</IFD0:Model>
 <IFD0:StripOffsets>(Binary data 72 bytes, use -b option to extract)</IFD0:StripOffsets>
 <IFD0:Orientation>Horizontal (normal)</IFD0:Orientation>
 <IFD0:SamplesPerPixel>3</IFD0:SamplesPerPixel>
 <IFD0:RowsPerStrip>1024</IFD0:RowsPerStrip>
 <IFD0:StripByteCounts>(Binary data 70 bytes, use -b option to extract)</IFD0:StripByteCounts>
 <IFD0:XResolution>300</IFD0:XResolution>
 <IFD0:YResolution>300</IFD0:YResolution>
 <IFD0:PlanarConfiguration>Chunky</IFD0:PlanarConfiguration>
 <IFD0:ResolutionUnit>inches</IFD0:ResolutionUnit>
 <IFD0:Software>Capture One 8 Macintosh</IFD0:Software>

exiftool -ifd1:all= -m source.tif

compare -quiet -metric AE source.tif[0] fromjp2.tif[0] null:

compare -quiet -metric AE source.tif[1] fromjp2.tif[0] null:

exiftool -X source.tif > source.xml

 <IFD0:SubfileType>Full-resolution image</IFD0:SubfileType>
 <IFD0:ImageWidth>363</IFD0:ImageWidth>
 <IFD0:ImageHeight>7429</IFD0:ImageHeight>
  ::
 <IFD1:SubfileType>Full-resolution image</IFD1:SubfileType>
 <IFD1:ImageWidth>363</IFD1:ImageWidth>
 <IFD1:ImageHeight>382</IFD1:ImageHeight>
  ::
  ::
 <IFD5:SubfileType>Full-resolution image</IFD5:SubfileType>
 <IFD5:ImageWidth>363</IFD5:ImageWidth>
 <IFD5:ImageHeight>382</IFD5:ImageHeight>
  ::

 <IFD0:SubfileType>Single page of multi-page image</IFD0:SubfileType>
    ::
 <IFD0:Compression>Uncompressed</IFD0:Compression>
    ::
 <IFD1:SubfileType>Single page of multi-page image</IFD1:SubfileType>
    ::
 <IFD2:SubfileType>Single page of multi-page image</IFD2:SubfileType>
    ::
 <IFD2:Compression>JPEG 2000</IFD2:Compression>
    ::
 <IFD3:SubfileType>Single page of multi-page image</IFD3:SubfileType>
    ::
 <IFD3:Compression>JPEG</IFD3:Compression>
    ::
    ::

identify -quiet source.tif

source.tif[0] TIFF 9458x7429 9458x7429+0+0 8-bit sRGB 201.089MiB 0.000u 0:00.010
source.tif[1] TIFF 160x126 160x126+0+0 8-bit sRGB 0.000u 0:00.010

identify-im6.q16: compression not supported `MultipleFormats.tif' @ error/tiff.c/ReadTIFFImage/1433.

jhove -m TIFF-hul -i source.tif -h xml -o source-jhove.xml

<property>
<name>NewSubfileType</name>
<values arity="Scalar" type="Long">
  <value>0</value>
</values>
</property>

<property>
<name>NewSubfileType</name>
<values arity="List" type="String">
  <value>reduced-resolution image of another image in this file</value>
</values>
</property>

<property>
<name>NewSubfileType</name>
<values arity="Scalar" type="Long">
  <value>0</value>
</values>
</property>

<property>
<name>NewSubfileType</name>
<values arity="List" type="String">
  <value>single page of multi-page image</value>
</values>
</property>

<mix:Compression>
 <mix:compressionScheme>Unknown</mix:compressionScheme>
</mix:Compression>

jhove -m PDF-hul -h XML -i whatever.pdf whatever-jhove.xml

verapdf --off --addlogs --extract whatever.pdf > whatever-vera.xml

<taskResult type="PARSE" isExecuted="true" isSuccess="false">
    <duration start="1683818755295" finish="1683818755343">00:00:00.048</duration>
    <exceptionMessage>Exception: The PDF stream appears to be encrypted. caused by exception: Reader::init(...)encrypted pdf is not supported</exceptionMessage>
</taskResult>

<property>
 <name>Encryption</name>
 <values arity="List" type="Property">
 <property>
  <name>SecurityHandler</name>
  <values arity="Scalar" type="String">
   <value>Standard</value>
  </values>
 </property>
 <property>
  <name>EFF</name>
  <values arity="Scalar" type="String">
   <value>Standard</value>
  </values>
 </property>
 <property>
  <name>Algorithm</name>
  <values arity="Scalar" type="String">
   <value>Document-defined</value>
  </values>
 </property>
 <property>
  <name>KeyLength</name>
  <values arity="Scalar" type="Integer">
   <value>128</value>
  </values>
 </property>
 <property>
  <name>StandardSecurityHandler</name>
  <values arity="List" type="Property">
  <property>
   <name>UserAccess</name>
   <values arity="List" type="String">
    <value>Print</value>
    <value>Modify</value>
    <value>Extract</value>
    <value>Add/modify annotations/forms</value>
    <value>Fill interactive form fields</value>
    <value>Extract for accessibility</value>
    <value>Print high quality</value>
   </values>
  </property>
  <property>
   <name>Revision</name>
   <values arity="Scalar" type="Integer">
    <value>4</value>
   </values>
  </property>
  <property>
   <name>OwnerString</name>
   <values arity="Scalar" type="String">
    <value>0x03a59d10aae3b50f1a30c34fbb1be09ababfc1fb19f0d3491ee84c1752671918</value>
   </values>
  </property>
  <property>
   <name>UserString</name>
   <values arity="Scalar" type="String">
    <value>0xf240da93aa83bb9f336cf249812c4db700000000000000000000000000000000</value>
   </values>
  </property>
  </values>
 </property>
 </values>
</property>

<documentSecurity>
    <filter>Standard</filter>
    <version>4</version>
    <length>128</length>
    <ownerKey>5F7FA03AD5A40C66...</ownerKey>
    <userKey>A572ACDFF8B3DC74...</userKey>
    <encryptMetadata>true</encryptMetadata>
    <printAllowed>true</printAllowed>
    <printDegradedAllowed>true</printDegradedAllowed>
    <changesAllowed>true</changesAllowed>
    <modifyAnnotationsAllowed>true</modifyAnnotationsAllowed>
    <fillingSigningAllowed>true</fillingSigningAllowed>
    <documentAssemblyAllowed>false</documentAssemblyAllowed>
    <extractContentAllowed>false</extractContentAllowed>
    <extractAccessibilityAllowed>true</extractAccessibilityAllowed>
</documentSecurity>

<property>
    <name>UserAccess</name>
    <values arity="List" type="String">
        <value>Print</value>
        <value>Modify</value>
        <value>Add/modify annotations/forms</value>
        <value>Fill interactive form fields</value>
        <value>Extract for accessibility</value>
        <value>Print high quality</value>
    </values>
</property>

<property>
    <name>UserAccess</name>
    <values arity="List" type="String">
        <value>Modify</value>
        <value>Extract</value>
        <value>Add/modify annotations/forms</value>
        <value>Fill interactive form fields</value>
        <value>Extract for accessibility</value>
    </values>
</property>

<action type="Rendition">
    <location>Annotation</location>
</action>

<annotation id="annotIndir35">
    <subType>Screen</subType>
    <rectangle lly="360.647" llx="73.180" urx="393.180" ury="600.647"></rectangle>
    <width>320.000</width>
    <height>240.000</height>
    <resources>
        <xobject id="xobjIndir39"></xobject>
    </resources>
    <invisible>false</invisible>
    <hidden>false</hidden>
    <print>true</print>
    <noZoom>false</noZoom>
    <noRotate>false</noRotate>
    <noView>false</noView>
    <readOnly>false</readOnly>
    <locked>false</locked>
    <toggleNoView>false</toggleNoView>
    <lockedContents>false</lockedContents>
</annotation>

<property>
    <name>Annotation</name>
    <values arity="List" type="Property">
    <property>
        <name>Subtype</name>
        <values arity="Scalar" type="String">
        <value>Screen</value>
        </values>
    </property>
        ::
</property>

<action type="JavaScript">
<location>Document</location>
</action>

<font id="fntIndir30">
<type>TrueType</type>
<baseFont>TimesNewRomanPSMT</baseFont>
<firstChar>32</firstChar>
<lastChar>255</lastChar>
<encoding>WinAnsiEncoding</encoding>
<fontDescriptor>
     <subset>false</subset>
     <fontName>TimesNewRomanPSMT</fontName>
     <fontFamily>Times New Roman</fontFamily>
     <fontStretch>Normal</fontStretch>
     <fontWeight>400.000</fontWeight>
     <fixedPitch>false</fixedPitch>
     <serif>true</serif>
     <symbolic>false</symbolic>
     <script>false</script>
     <nonsymbolic>true</nonsymbolic>
     <italic>false</italic>
     <allCap>false</allCap>
     <smallCap>false</smallCap>
     <forceBold>false</forceBold>
     <fontBBox lly="-307.000" llx="-568.000" urx="2000.000" ury="1007.000"></fontBBox>
     <italicAngle>0.000</italicAngle>
     <ascent>891.000</ascent>
     <descent>-216.000</descent>
     <leading>0.000</leading>
     <capHeight>1000.000</capHeight>
     <xHeight>1000.000</xHeight>
     <stemV>82.000</stemV>
     <stemH>0.000</stemH>
     <averageWidth>0.000</averageWidth>
     <maxWidth>0.000</maxWidth>
     <missingWidth>0.000</missingWidth>
     <embedded>false</embedded>
</fontDescriptor>
</font>

<property>
    <name>FontFile2</name>
    <values arity="Scalar" type="Boolean">
    <value>true</value>
    </values>
</property>

<embeddedFiles>
    <embeddedFile id="file1">
    <fileName>KSBASE.WQ2</fileName>
    <description></description>
    <filter>FlateDecode</filter>
    <creationDate>2012-11-23T15:40:38.000+01:00</creationDate>
    <modDate>2012-11-19T12:35:10.000Z</modDate>
    <checkSum>)#x000002Aﬁ˛½�ﬂô#x000004‘SèŠ-¡</checkSum>
    <size>20668</size>
    </embeddedFile>
</embeddedFiles>

<annotation id="annotIndir26">
    <subType>FileAttachment</subType>
    <rectangle lly="562.840" llx="185.281" urx="199.281" ury="582.840"></rectangle>
    <width>14.000</width>
    <height>20.000</height>
    <contents>PF.WK1</contents>
    <annotationName>9a716f25-da75-4309-b918-483cfa9f6473</annotationName>
    <modifiedDate>D:20131024143706+02'00'</modifiedDate>
    <resources>
        <xobject id="xobjIndir29"></xobject>
    </resources>
    <color red="0.250000" green="0.333328" blue="1.000000"></color>
    <invisible>false</invisible>
    <hidden>false</hidden>
    <print>true</print>
    <noZoom>true</noZoom>
    <noRotate>true</noRotate>
    <noView>false</noView>
    <readOnly>false</readOnly>
    <locked>false</locked>
    <toggleNoView>false</toggleNoView>
    <lockedContents>false</lockedContents>
</annotation>

<property>
    <name>PageMode</name>
    <values arity="Scalar" type="String">
        <value>UseAttachments</value>
    </values>
</property>

<property>
<name>Annotation</name>
<values arity="List" type="Property">
<property>
     <name>Subtype</name>
     <values arity="Scalar" type="String">
     <value>FileAttachment</value>
     </values>
</property>
<property>

<action type="Launch">
     <location>Annotation</location>
</action>

<annotation id="annotIndir35">
     <subType>Link</subType>
     <rectangle lly="678.816" llx="70.860" urx="200.820" ury="694.584"></rectangle>
     <width>129.960</width>
     <height>15.768</height>
     <invisible>false</invisible>
     <hidden>false</hidden>
     <print>true</print>
     <noZoom>false</noZoom>
     <noRotate>false</noRotate>
     <noView>false</noView>
     <readOnly>false</readOnly>
     <locked>false</locked>
     <toggleNoView>false</toggleNoView>
     <lockedContents>false</lockedContents>
</annotation>

<property>
    <name>Annotation</name>
    <values arity="List" type="Property">
    <property>
        <name>Subtype</name>
        <values arity="Scalar" type="String">
            <value>Link</value>
        </values>
    </property>
        ::
<property>

 <action type="GoTo">
<location>Annotation</location>
</action>
<action type="URI">
<location>Annotation</location>
</action>
<action type="SubmitForm">
<location>Annotation</location>
</action>

<annotation id="annotIndir163">
<subType>Link</subType>
<rectangle lly="720.000" llx="11.000" urx="88.000" ury="760.000"></rectangle>
<width>77.000</width>
<height>40.000</height>
<invisible>false</invisible>
<hidden>false</hidden>
<print>false</print>
<noZoom>false</noZoom>
<noRotate>false</noRotate>
<noView>false</noView>
<readOnly>false</readOnly>
<locked>false</locked>
<toggleNoView>false</toggleNoView>
<lockedContents>false</lockedContents>
</annotation>

<taskResult type="PARSE" isExecuted="true" isSuccess="false">
    <duration start="1683818745078" finish="1683818745121">00:00:00.043</duration>
    <exceptionMessage>Exception: Couldn't parse stream caused by exception: Pages not found</exceptionMessage>
</taskResult>

<message offset="0" severity="error" id="PDF-HUL-140">Document catalog dictionary object number and trailer root ref number are inconsistent.</message>

<message offset="17342" severity="error" id="PDF-HUL-120">Annotation dictionary missing required type (S) entry</message>

<message offset="0" severity="error" id="PDF-HUL-144">Pages dictionary has no Type key or it has a null value.</message>

<taskResult type="PARSE" isExecuted="true" isSuccess="false">

<message offset="0" severity="error" id="PDF-HUL-137">No PDF header</message>

<message offset="0" severity="error" id="PDF-HUL-137">No PDF header</message>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<featuresConfig>
    <enabledFeatures>
        <feature>ACTION</feature>
        <feature>ANNOTATION</feature>
        <feature>DOCUMENT_SECURITY</feature>
        <feature>EMBEDDED_FILE</feature>
        <feature>FONT</feature>
    </enabledFeatures>
</featuresConfig>

#! /usr/bin/env python3

import tika
from tika import parser

fileIn = "berk011veel01_01.epub"
fileOut = "berk011veel01_01.txt"

parsed = parser.from_file(fileIn)
content = parsed["content"]

with open(fileOut, 'w', encoding='utf-8') as fout:
    fout.write(content)

Charis SIL Bold Italic

::
::

Charis SIL Small Caps

#! /usr/bin/env python3

import tika
from tika import parser

fileIn = "berk011veel01_01.epub"
fileOut = "berk011veel01_01.txt"

parsed = parser.from_file(fileIn, service='text')
content = parsed["content"]

with open(fileOut, 'w', encoding='utf-8') as fout:
    fout.write(content)

[image: cover]

Aster Berkhof

Veel geluk, professor!

[image: DBNL]

java -jar ~/tika/tika-app-2.6.0.jar -t berk011veel01_01.epub  > berk011veel01_01-app.txt

curl -T berk011veel01_01.epub  http://localhost:9998/tika  --header "Accept: text/plain" > berk011veel01_01-server.txt

#! /usr/bin/env python3

import textract

fileIn = "berk011veel01_01.epub"
fileOut = "berk011veel01_01.txt"

content = textract.process(fileIn, encoding='utf-8').decode()

with open(fileOut, 'w', encoding='utf-8') as fout:
        fout.write(content)

#! /usr/bin/env python3

from html.parser import HTMLParser
import ebooklib
from ebooklib import epub

fileIn = "berk011veel01_01.epub"
fileOut = "berk011veel01_01.txt"

book = epub.read_epub(fileIn)
content = ""

for item in book.get_items():
    if item.get_type() == ebooklib.ITEM_DOCUMENT:
        bodyContent = item.get_body_content().decode()
        f = HTMLFilter()
        f.feed(bodyContent)
        content += f.text

with open(fileOut, 'w', encoding='utf-8') as fout:
        fout.write(content)

Press Ctrl-C to interrupt
     ipos:   720896 B
     opos:   720896 B
non-tried:        0 B
  rescued:   737280 B
pct rescued:  100.00%, read errors:        0

pct rescued:   48.88%, read errors:     7372

 "urls" : [
          {
            "url" : "https://t.co/y2gpEVvjAd",
            "expanded_url" : "https://youtu.be/C47ZCosJPAw",
            "display_url" : "youtu.be/C47ZCosJPAw",
            "indices" : [
              "246",
              "269"
            ]
          }
        ]

  {
    "following" : {
      "accountId" : "216697909",
      "userLink" : "https://twitter.com/intent/user?user_id=216697909"
    }
  }

 python parser.py

 python3 parser.py

pip install pywin32

import sys
import struct
import argparse
import win32api
import win32file
import winioctlcon

drive="A"
# Low-level device name of device assigned to logical drive
driveDevice =  "\\\\.\\" + drive + ":"

handle = win32file.CreateFile(driveDevice,
                              0,
                              win32file.FILE_SHARE_READ,
                              None,
                              win32file.OPEN_EXISTING,
                              0,
                              None)

diskGeometry = win32file.DeviceIoControl(handle,
                            winioctlcon.IOCTL_DISK_GET_DRIVE_GEOMETRY,
                            None,
                            24)

offset = 8
mediaTypeCode = struct.unpack("<I", diskGeometry[offset:offset + 4])[0]

getMediaTypes = win32file.DeviceIoControl(handle,
                            winioctlcon.IOCTL_STORAGE_GET_MEDIA_TYPES_EX,
                            None,
                            2048)

deviceCode = struct.unpack("<I", getMediaTypes[0:4])[0]
mediaInfoCount = struct.unpack("<I", getMediaTypes[4:8])[0]

# Start position in GET_MEDIA_TYPES structure
# (remember we already read two 4-byte integers from it, 
# hence the 8 byte start offset!)
offset = 8

# Loop over DEVICE_MEDIA_INFO structures
for _ in range(mediaInfoCount):
    if deviceCode in [31, 32]:
        # Tape device, mediaTypeCode is first item
        mediaTypeCode = struct.unpack("<I",
                          getMediaTypes[offset:offset + 4])[0]
        offset +=8
    else:
        # Not a tape device, so skip 8 byte cylinders value
        offset += 8
        mediaTypeCode = struct.unpack("<I",
                          getMediaTypes[offset:offset + 4])[0]

    # Skip to position of next DEVICE_MEDIA_INFO structure
    offset += 24

python detectStorageMediaType.py A D E

Drive:                   A
Media type:              F3_1Pt44_512

Drive:                   D
Media type:              RemovableMedia
Device type:             FILE_DEVICE_CD_ROM
Supported media types:
                         CD_ROM
                         RemovableMedia

Drive:                   E
Media type:              RemovableMedia
Device type:             FILE_DEVICE_DISK
Supported media types:
                         RemovableMedia

pip install isolyzer --user

pip install isolyzer --upgrade --user

sudo sysctl vm.drop_caches=3

(time ~/kb/jp2totiff/mastertoaccess-grok.sh ./master-1 ./access-1-grok) 2> time-grok.txt

pdfcpu validate whatever.pdf

pdfcpu validate -m strict whatever.pdf

jhove -m PDF-hul -i whatever.pdf

qpdf --check --verbose whatever.pdf

gs -dNOPAUSE -dBATCH -sDEVICE=nullpage whatever.pdf

   **** Error:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** However, the output may be incorrect.
   **** Warning:  There are objects with matching object and generation
   **** numbers.  The output may be incorrect.
   **** Error:  Trailer dictionary not found.
                Output may be incorrect.
   No pages will be processed (FirstPage > LastPage).

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

mutool info whatever.pdf 

exiftool corrupted.pdf

ExifTool Version Number         : 11.88
File Name                       : corrupted.pdf
Directory                       : .
File Size                       : 87 kB
File Modification Date/Time     : 2022:02:07 14:36:47+01:00
File Access Date/Time           : 2022:02:07 14:37:11+01:00
File Inode Change Date/Time     : 2022:02:07 14:36:59+01:00
File Permissions                : rw-rw-r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
Warning                         : Invalid xref table

verapdf -l

  1a - PDF/A-1A validation profile
  1b - PDF/A-1B validation profile
  2a - PDF/A-2A validation profile
  2b - PDF/A-2B validation profile
  2u - PDF/A-2U validation profile
  3a - PDF/A-3A validation profile
  3b - PDF/A-3B validation profile
  3u - PDF/A-3U validation profile
  ua1 - PDF/UA-1 validation profile

verapdf -f 1a whatever.pdf > whatever-1a.xml

verapdf -f ua1 whatever.pdf > whatever-ua.xml

pdfinfo whatever.pdf

Creator:        PdfCompressor 3.1.32
Producer:       CVISION Technologies
CreationDate:   Thu Sep  2 07:52:56 2021 CEST
ModDate:        Thu Sep  2 07:53:20 2021 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      439.2 x 637.92 pts
Page rot:       0
File size:      24728 bytes
Optimized:      yes
PDF version:    1.6

java -jar ~/tika/tika-app-2.1.0.jar -m whatever.pdf > whatever.txt

Content-Length: 24728
Content-Type: application/pdf
X-TIKA:Parsed-By: org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By: org.apache.tika.parser.pdf.PDFParser
access_permission:assemble_document: true
access_permission:can_modify: true
access_permission:can_print: true
access_permission:can_print_degraded: true
access_permission:extract_content: true
access_permission:extract_for_accessibility: true
access_permission:fill_in_form: true
access_permission:modify_annotations: true
dc:format: application/pdf; version=1.6
dcterms:created: 2021-09-02T05:52:56Z
dcterms:modified: 2021-09-02T05:53:20Z
pdf:PDFVersion: 1.6
pdf:charsPerPage: 0
pdf:docinfo:created: 2021-09-02T05:52:56Z
pdf:docinfo:creator_tool: PdfCompressor 3.1.32
pdf:docinfo:modified: 2021-09-02T05:53:20Z
pdf:docinfo:producer: CVISION Technologies
pdf:encrypted: false
pdf:hasMarkedContent: false
pdf:hasXFA: false
pdf:hasXMP: true
pdf:producer: CVISION Technologies
pdf:unmappedUnicodeCharsPerPage: 0
resourceName: whatever.pdf
xmp:CreateDate: 2021-09-02T07:52:56Z
xmp:CreatorTool: PdfCompressor 3.1.32
xmp:MetadataDate: 2021-09-02T07:53:20Z
xmp:ModifyDate: 2021-09-02T07:53:20Z
xmpMM:DocumentID: uuid:2ec84d65-f99d-49fe-9aac-bd0c1fff5e66
xmpTPg:NPages: 1

exiftool whatever.pdf

ExifTool Version Number         : 11.88
File Name                       : whatever.pdf
Directory                       : .
File Size                       : 24 kB
File Modification Date/Time     : 2021:09:02 12:23:32+02:00
File Access Date/Time           : 2022:02:07 15:04:11+01:00
File Inode Change Date/Time     : 2021:09:02 15:27:38+02:00
File Permissions                : rw-rw-r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : Yes
Create Date                     : 2021:09:02 07:52:56+02:00
Creator                         : PdfCompressor 3.1.32
Modify Date                     : 2021:09:02 07:53:20+02:00
XMP Toolkit                     : Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05
Metadata Date                   : 2021:09:02 07:53:20+02:00
Creator Tool                    : PdfCompressor 3.1.32
Format                          : application/pdf
Document ID                     : uuid:2ec84d65-f99d-49fe-9aac-bd0c1fff5e66
Instance ID                     : uuid:28d0af59-9373-4358-88f2-c8c4db3915ed
Producer                        : CVISION Technologies
Page Count                      : 1

java -jar ~/tika/tika-app-2.1.0.jar -J digitally_signed_3D_Portfolio.pdf > whatever.json

verapdf --off --extract whatever.pdf > whatever.xml

verapdf --recurse --off --extract myDir > whatever.xml

<?xml version="1.0"?>

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt">
    <sch:pattern name="Disallow encrypt in trailer dictionary">
        <sch:rule context="/report/jobs/job/featuresReport/documentSecurity">
            <sch:assert test="not(encryptMetadata = 'true')">Encrypt in trailer dictionary</sch:assert>
        </sch:rule>
    </sch:pattern>    

    <sch:pattern name="Disallow other forms of encryption (e.g. open password)">
        <sch:rule context="/report/jobs/job/taskResult/exceptionMessage">
            <sch:assert test="not(contains(.,'encrypted'))">Encrypted document</sch:assert>
        </sch:rule>
    </sch:pattern>

</sch:schema>

verapdf --extract --policyfile policy.sch whatever.pdf > whatever.xml

<policyReport passedChecks="0" failedChecks="1" xmlns:vera="http://www.verapdf.org/MachineReadableReport">
    <passedChecks/>
    <failedChecks>
        <check status="failed" test="not(encryptMetadata = 'true')" location="/report/jobs/job/featuresReport/documentSecurity">
            <message>Encrypt in trailer dictionary</message>
        </check>
    </failedChecks>
</policyReport>

pdftotext whatever.pdf whatever.txt

java -jar ~/pdfbox/pdfbox-app-2.0.24.jar ExtractText whatever.pdf whatever.txt

java -jar ~/tika/tika-app-2.1.0.jar --text whatever.pdf > whatever.txt

java -jar ~/tika/tika-app-2.1.0.jar --text -i ./myPDFs/ -o ./tika-out/

java -jar ~/tika/tika-server-standard-2.1.0.jar

curl -T whatever.pdf http://localhost:9998/tika --header "Accept: text/plain" > whatever.txt

pdfx -v whatever.pdf > whatever.txt

pdfimages -png whatever.pdf whatever

pdftocairo -png whatever.pdf

pdfimages -list whatever.pdf

page   num  type   width height color comp bpc  enc interp
-----------------------------------------------------------
   1     0 image    1830  2658  gray    1   1  jbig2  no
   1     1 image     600   773  gray    1   8  jpx    no

page   object ID x-ppi y-ppi size ratio
----------------------------------------
   1       16  0   301   301   99B 0.0%
   1       17  0   300   300 17.9K 4.0%

img2pdf image1.jp2 image2.jp2 image3.jp2 -o whatever.pdf

comparepdf -ct -v=2 whatever.pdf wherever.pdf

comparepdf -ca -v=2 whatever.pdf wherever.pdf

gs -o whatever_repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress whatever_corrupted.pdf

pdftocairo -pdf whatever_corrupted.pdf whatever_repaired.pdf

pdftk whatever_corrupted.pdf output whatever_repaired.pdf

gs -sDEVICE=pdfwrite \
   -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=whatever_small.pdf whatever_large.pdf

convert -density 300 \
        -compress jpeg \
        -quality 70 \
         whatever_large.pdf whatever_small.pdf

java -jar ~/pdfbox/pdfbox-app-2.0.24.jar PDFDebugger whatever.pdf

java -jar ~/itext-rups/itext-rups-7.1.16.jar

mutool show whatever.pdf trailer

trailer
<<
/DecodeParms <<
  /Columns 3
  /Predictor 12
>>
/Filter /FlateDecode
/ID [ <500AB94E8F45C149808B2EEE98528B78> <431017E495216040A953126BB73D0CD4> ]
/Index [ 11 10 ]
/Info 10 0 R
/Length 47
/Prev 24426
/Root 12 0 R
/Size 21
/Type /XRef
/W [ 1 2 0 ]
>>

mutool show whatever.pdf xref

xref
0 21
00000: 0000000000 00000 f 
00001: 0000019994 00000 n 
00002: 0000020399 00000 n 
00003: 0000020534 00000 n 
::
etc

mutool show whatever.pdf 12

12 0 obj
<<
/Metadata 4 0 R
/Pages 9 0 R
/Type /Catalog
>>
endobj

mutool show -b whatever.pdf 151 > whatever.dat

mutool show whatever.pdf grep | grep '/Filter/DCTDecode'

1 0 obj <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 516/Length 76403/Subtype/Image/Type/XObject/Width 1226>> stream
18 0 obj <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 676/Length 149186/Subtype/Image/Type/XObject/Width 1014>> stream
19 0 obj <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 676/Length 142232/Subtype/Image/Type/XObject/Width 1014>> stream
24 0 obj <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 676/Length 192073/Subtype/Image/Type/XObject/Width 1014>> stream
25 0 obj <</BitsPerComponent 8/ColorSpace/DeviceRGB/Filter/DCTDecode/Height 676/Length 141081/Subtype/Image/Type/XObject/Width 1014>> stream

adb shell pm list packages > packages.txt

adb shell pm path com.Triplee.TripleeSocial

package:/data/app/com.Triplee.TripleeSocial-r8iVFUp1MOSAc6LmHA1MDQ==/base.apk

adb pull /data/app/com.Triplee.TripleeSocial-r8iVFUp1MOSAc6LmHA1MDQ==/base.apk

gplaycli -d com.Triplee.TripleeSocial

<mime-type type="application/vnd.android.package-archive">
  <sub-class-of type="application/java-archive"/>
  <glob pattern="*.apk"/>
</mime-type>

androguard axml com.Triplee.TripleeSocial.apk -o arize-android.xml

<uses-sdk android:minSdkVersion="24" android:targetSdkVersion="29"/>

<uses-feature android:name="android.hardware.camera" android:required="true"/>

<mime-type type="application/x-itunes-ipa">
  <sub-class-of type="application/zip"/>
  <_comment>Apple iOS IPA AppStore file</_comment>
  <glob pattern="*.ipa"/>
</mime-type>

<key>MinimumOSVersion</key>
<string>7.0</string>
<key>UIDeviceFamily</key>
<array>
  <integer>1</integer>
  <integer>2</integer>
</array>

adb connect 127.0.0.1

connected to 127.0.0.1:5555

adb install com.Triplee.TripleeSocial.apk

qemu-img create -f qcow2 androidx86_9_hda.img 8G

qemu-system-x86_64 \
-enable-kvm \
-m 2048 \
-smp 2 \
-cpu host \
-device ES1370 -device virtio-mouse-pci -device virtio-keyboard-pci \
-serial mon:stdio \
-boot menu=on \
-net nic \
-net user,hostfwd=tcp::4444-:5555 \
-hda androidx86_9_hda.img \
-cdrom android-x86_64-9.0-r2.iso

qemu-system-x86_64 \
-enable-kvm \
-m 2048 \
-smp 2 \
-cpu host \
-device ES1370 -device virtio-mouse-pci -device virtio-keyboard-pci \
-serial mon:stdio \
-boot menu=on \
-net nic \
-net user,hostfwd=tcp::4444-:5555 \
-hda androidx86_9_hda.img

adb connect 127.0.0.1:4444

nmcli con add type bridge ifname anbox0 -- connection.id \
anbox-net ipv4.method shared ipv4.addresses 192.168.250.1/24

adb: failed to install com.Triplee.TripleeSocial.apk:
Failure [INSTALL_FAILED_NO_MATCHING_ABIS: Failed to extract native libraries, res=-113]

<A HREF="javascript:openit('tvplus.mov')">

https://ziklies.home.xs4all.nl/
https://ziklies.home.xs4all.nl/toilet.html
https://ziklies.home.xs4all.nl/e-toilet.html
https://ziklies.home.xs4all.nl/atelier/
https://ziklies.home.xs4all.nl/bad/
https://ziklies.home.xs4all.nl/cas/
https://ziklies.home.xs4all.nl/gambia/
https://ziklies.home.xs4all.nl/keuken/
https://ziklies.home.xs4all.nl/slaapk/
https://ziklies.home.xs4all.nl/slaapk/gspot/
https://ziklies.home.xs4all.nl/toilet/
https://ziklies.home.xs4all.nl/woonk/
https://ziklies.home.xs4all.nl/woonk/agenda/
https://ziklies.home.xs4all.nl/zolder/

scrape-seeds.sh seed-urls.txt

diff -r ./wget-original/ziklies.home.xs4all.nl/ ./wget-improved/ziklies.home.xs4all.nl/ > diff-site-toilet.txt

<B><A HREF="http://imagine.xs4all.nl/ziklies/"> Go to the toilet</a></B>

<B><A HREF="e-toilet.html"> Go to the toilet</a></B>

<A HREF="/cgi-bin/imagemap/~ziklies/deurtje1.map"><img src="deurtje1.gif" Border=0 ISMAP></A>

default http://www.xs4all.nl/~ziklies/start.html
poly http://www.xs4all.nl/~ziklies/start.html 0,56 76,40 91,43 89,67 76,62 6,77 0,60 1,55
poly http://www.xs4all.nl/~ziklies/e-start.html 53,72 80,76 81,81 91,85 89,107 76,106 69,137 42,130 51,77

<img src="deurtje1.gif" usemap="#deurtje1Map" alt="deurtje 1" border="0">
<map name="deurtje1Map">
    <area shape="poly" coords="0,56 76,40 91,43 89,67 76,62 6,77 0,60 1,55" href="http://www.xs4all.nl/~ziklies/start.html">
    <area shape="poly" coords="53,72 80,76 81,81 91,85 89,107 76,106 69,137 42,130 51,77" href="http://www.xs4all.nl/~ziklies/e-start.html">
    <area shape="default" href="http://www.xs4all.nl/~ziklies/start.html">
</map>

<FORM METHOD="POST"
ACTION="/cgi-bin/barbie1.cgi">

<DL>
<DD><b> First: wich pants or skirt? </b><br>
<INPUT TYPE="radio" NAME="onder" VALUE="1a" > A1
<INPUT TYPE="radio" NAME="onder" VALUE="2a" > A2
 ::
 ::
<INPUT TYPE="radio" NAME="onder" VALUE="7a" > A7<br>
</DL>
<br>
<DL>
<DD><b>Second: wich top, sweater or blouse fits? </b><br>
<INPUT TYPE="radio" NAME="midden" VALUE="1b" > B1
<INPUT TYPE="radio" NAME="midden" VALUE="2b" > B2
 ::
 ::
<INPUT TYPE="radio" NAME="midden" VALUE="7b" > B7<br>
</DL>
<br>
<DL>
<DD><b>Last: wich earrings and what to do with her hair? </b><br>
<INPUT TYPE="radio" NAME="top" VALUE="1c" > C1
<INPUT TYPE="radio" NAME="top" VALUE="2c" > C2
 ::
 ::
<INPUT TYPE="radio" NAME="top" VALUE="7c" > C7<br>
</DL><br>
<center>
<INPUT TYPE="submit" VALUE="have a look into the mirror">
<br></center>
</FORM>

from warcio.capture_http import capture_http
import requests

url = 'http://ziklies.home.xs4all.nl'
warcFile = 'ziklies.home.xs4all.nl.gz'

with capture_http(warcFile):
    requests.post(url, data={'onder': '1a', 'midden': '1b', 'top': '1c'})

unzip kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip

Archive:  kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip
warning [kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip]:  1859568605 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

7z x kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (506E3),ASM,AES-NI)

Scanning the drive for archives:
1 file, 6154566547 bytes (5870 MiB)

Extracting archive: kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip

ERRORS:
Headers Error
Unconfirmed start of archive

WARNINGS:
There are data after the end of archive

--
Path = kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip
Type = zip
ERRORS:
Headers Error
Unconfirmed start of archive
WARNINGS:
There are data after the end of archive
Physical Size = 4330182775
Tail Size = 1824383772

ERROR: CRC Failed : kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f/afd3f61a-5e0e-11ea-ab97-40b0341fbf5f/08.wav

Sub items Errors: 1

Archives with Errors: 1

Warnings: 1

Open Errors: 1

Sub items Errors: 1

zip -T kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip

Could not find:
  kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.z01

Hit c      (change path to where this split file is)
    q      (abort archive - quit)
 or ENTER  (try reading this split again): 

import zipfile

# Open ZIP file
myZip = zipfile.ZipFile("kb-4d8a2f9a-5e0b-11ea-9376-40b0341fbf5f.zip",
                        mode='r')

# Read all files in archive and check their CRCs and file headers.
myZip.testzip()

# Close the ZIP file
myZip.close()

zipfile.BadZipFile: zipfiles that span multiple disks are not supported

if diskno != 0 or disks > 1:
  raise BadZipFile("zipfiles that span multiple disks are not supported")

zip64 end of central dir locator 
      signature                       4 bytes  (0x07064b50)
      number of the disk with the
      start of the zip64 end of 
      central directory               4 bytes
      relative offset of the zip64
      end of central directory record 8 bytes
      total number of disks           4 bytes

if diskno != 0 or disks != 1:
  raise BadZipFile("zipfiles that span multiple disks are not supported")

fix-onedrive-zip onedrive-zip-test-zeros.zip

kruisbandkliniek.nl
anima-communicatie.nl
nieuwebrabander.nl
isisschuurman.nl
wamail.nl
kopenzonderklussen.nl
muziekles-laren.nl

Errors in file /home/johan/test-geolocatDomains/out_NL_geo_Johan.csv
594 records discarded due to missing geometry definitions

dd if=/dev/nst0 of=file0001.dd bs=16384

dd if=/dev/nst0 of=/dev/null bs=512 count=1

mt -f /dev/nst0 bsr 1

mt -f /dev/nst0 fsr 1

tar -xvf /path/to/file0001.dd > /dev/null

{
    "acquisitionEnd": "2019-04-09T13:10:40.503984+02:00",
    "acquisitionStart": "2019-04-09T13:09:59.835833+02:00",
    "autoRetry": false,
    "blockDevice": "/dev/sdc",
    "checksumType": "SHA-512",
    "checksums": {
        "ks.img": "79a17d3fa536b8fa750257b01d05124dadb888f1171e9ca5cc3398a2c16de81b1687b52c70135b966409a723ef5f3960536a6e994847c5ebe7d5eaffefa62dc7"
    },
    "description": "KS metingen origineel",
    "diskimgrVersion": "0.1.0b3",
    "extension": "img",
    "identifier": "cc630cda-5ab7-11e9-bc82-dc4a3e5f53bf",
    "interruptedFlag": false,
    "maxRetries": "4",
    "notes": "",
    "prefix": "ks",
    "readCommandLine": "dd if=/dev/sdc of=/home/johan/test/6/ks.img bs=512 conv=notrunc",
    "readMethod": "dd",
    "readMethodVersion": "dd (coreutils) 8.25",
    "rescueDirectDiscMode": false,
    "successFlag": true
}

umount /dev/sr0

readom retries=4 dev=/dev/sr0 f=disc.iso

ddrescue -b 2048 -r4 -v /dev/sr0 disc.iso disc.map

ddrescue -d -b 2048 -r4 -v /dev/sr0 disc.iso disc.map

ddrescue -b 2048 -r4 -v /dev/sr1 disc.iso disc.map

isolyzer disc.iso

{
    "acquisitionEnd": "2019-03-22T13:38:51.969934+01:00",
    "acquisitionStart": "2019-03-22T13:37:43.060185+01:00",
    "autoRetry": false,
    "checksumType": "SHA-512",
    "checksums": {
        "backupjrc.iso": "714426e7f965e4f6b33571ae4d60d945928dbee8c06f74225a138eaaa4ea4b2b7442620227e94920a0bc7ac17a6c7096fb310746cfff2c04b5c3e778ae8998ce"
    },
    "description": "Backup JRC 31-03-2000",
    "extension": "iso",
    "identifier": "e23f9158-4c9e-11e9-bbfc-dc4a3e413173",
    "imageTruncated": false,
    "interruptedFlag": false,
    "isolyzerSuccess": true,
    "maxRetries": "4",
    "notes": "Outer edge of CD shows signs of corrosion",
    "omDevice": "/dev/sr1",
    "omimgrVersion": "0.1.0",
    "prefix": "backupjrc",
    "readCommandLine": "readom retries=4 dev=/dev/sr1 f=/home/johan/test/backupjrc.iso",
    "readMethod": "readom",
    "readMethodVersion": "readom 1.1.11 (Linux)",
    "rescueDirectDiscMode": false,
    "successFlag": true
}

sudo date --set="2004-01-23 21:03:09.000"

wget --mirror \
    --page-requisites \
    --adjust-extension \
    --warc-file="nl-menu" \
    --warc-cdx \
    --output-file="nl-menu.log" \
    http://www.nl-menu.nl/

wb-manager init my-web-archive

wb-manager add my-web-archive ~/NL-menu/warc-wget/nl-menu.warc.gz

wayback

find /var/www/www.nl-menu.nl -type f \
    | sed -e 's/\/var\/www\//http:\/\//g' > urls.txt

wget --page-requisites \
    --warc-file="nl-menu" \
    --warc-cdx \
    --output-file="nl-menu.log" \
    --input-file=urls.txt

/var/www/www.nl-menu.nl/
├── nlmenu.en
│   ├── index.html
│   ├── ...
│   ├── ...
│   └── ...
└── nlmenu.nl
    ├── index.html
    ├── ...
    ├── ...
    └── ...

echo "http://www.nl-menu.nl/" > urls.txt

echo "http://www.nl-menu.nl/nlmenu.nl/" >> urls.txt
echo "http://www.nl-menu.nl/nlmenu.en/" >> urls.txt

find /var/www/www.nl-menu.nl -type f \
    | sed -e 's/\/var\/www\//http:\/\//g' > urls.txt

wget --page-requisites \
    --warc-file="nl-menu" \
    --warc-cdx \
    --output-file="nl-menu.log" \
    --input-file=urls.txt

warcdump nl-menu.warc.gz > nl-menu-dump.txt

archive record at nl-menu.warc.gz:75708014
Headers:
    WARC-Type:request
    WARC-Target-URI:<http://www.nl-menu.nl/nlmenu.en/index.html>
    Content-Type:application/http;msgtype=request
    WARC-Date:2004-01-23T20:04:54Z
    WARC-Record-ID:<urn:uuid:4b05df6b-d408-4f2f-8efb-bbd38098cbdb>
    WARC-IP-Address:127.0.0.1
    WARC-Warcinfo-ID:<urn:uuid:7dacf508-ab81-4244-ae9c-ce04a1e18123>
    WARC-Block-Digest:sha1:MJEIKQUIEOMARJPUEYXZFT4TTTUKX2IE
    Content-Length:159
Content Headers:
    Content-Type : application/http;msgtype=request
    Content-Length : 159
Content:
    GET /nlmenu\x2Een/index\x2Ehtml HTTP/1\x2E1\xD\xAUser\x2DAgent\x3A Wget/1\x2E19 \x28linux\x2Dgnu\x29\xD\xAAccept\x3A \x2A/\x2A\xD\xAAccept\x2DEncoding\x3A identity\xD\xAHost\x3A www\x2Enl\x2Dmenu\x2Enl\xD\xAConnection\x3A Keep\x2DAlive\xD\xA\xD\xA
    ...

WARC-IP-Address:127.0.0.1

ddrescue -d -b 2048 -r4 -v /dev/sr0 NL-menu-ddrescue.iso NL-menu-ddrescue.log

pip install isolyzer --user

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<featuresConfig>
    <enabledFeatures>
        <feature>ANNOTATION</feature>
        <feature>DOCUMENT_SECURITY</feature>
        <feature>EMBEDDED_FILE</feature>
        <feature>FONT</feature>
        <feature>INFORMATION_DICTIONARY</feature>
    </enabledFeatures>
</featuresConfig>

verapdf -x --policyfile demo-policy.sch ~/myPdfs/* > myPdfsOut.xml

~/pdfPolicyVeraPDF/policyValidate.sh /home/johan/pdfAcrobatEngineering/fonts /home/johan/pdfPolicyVeraPDF/schemas/demo-policy.sch fonts

<tests>
    <containsISO9660Signature>True</containsISO9660Signature>
    <containsApplePartitionMap>True</containsApplePartitionMap>
    <containsAppleHFSHeader>False</containsAppleHFSHeader>
    <containsAppleMasterDirectoryBlock>False</containsAppleMasterDirectoryBlock>
    <parsedAppleZeroBlock>True</parsedAppleZeroBlock>
    <parsedPrimaryVolumeDescriptor>True</parsedPrimaryVolumeDescriptor>
    <sizeExpected>609912832</sizeExpected>
    <sizeActual>554373120</sizeActual>
    <sizeDifference>-55539712</sizeDifference>
    <sizeAsExpected>False</sizeAsExpected>
    <smallerThanExpected>True</smallerThanExpected>
</tests>

cd-info /dev/sr0 > leesleeuw_cdinfo.txt

CD-ROM Track List (1 - 3)
    #: MSF       LSN    Type   Green? Copy? Channels Premphasis?
    1: 00:02:00  000000 audio  false  no    2        no
    2: 01:25:24  006249 audio  false  no    2        no
    3: 06:05:46  027271 data   false  no   
170: 66:14:61  297961 leadout (668 MB raw, 668 MB formatted)
Media Catalog Number (MCN): 0000000000000
TRACK  1 ISRC: 000000000000
TRACK  2 ISRC: 000000000000
TRACK  3 ISRC: 000000000000
Last CD Session LSN: 27271

session #2 starts at track  3, LSN: 27271, ISO 9660 blocks: 297809

isolyzer leesleeuw.iso --offset 27271

<tests>
    <containsISO9660Signature>True</containsISO9660Signature>
    <containsApplePartitionMap>True</containsApplePartitionMap>
    <containsAppleHFSHeader>False</containsAppleHFSHeader>
    <containsAppleMasterDirectoryBlock>False</containsAppleMasterDirectoryBlock>
    <parsedAppleZeroBlock>True</parsedAppleZeroBlock>
    <parsedPrimaryVolumeDescriptor>True</parsedPrimaryVolumeDescriptor>
    <sizeExpected>554061824</sizeExpected>
    <sizeActual>554373120</sizeActual>
    <sizeDifference>311296</sizeDifference>
    <sizeAsExpected>False</sizeAsExpected>
    <smallerThanExpected>False</smallerThanExpected>
</tests>

dd if=/dev/zero bs=2K count=1 seek=27270 of=leesleeuw_tmp.iso

cat leesleeuw.iso >>leesleeuw_tmp.iso

sudo mount -t iso9660 -o loop,sbsector=27270 leesleeuw_tmp.iso /media/johan

xorrisofs -r -J -o leesleeuw_new.iso /media/johan

<tests>
    <containsISO9660Signature>True</containsISO9660Signature>
    <containsApplePartitionMap>False</containsApplePartitionMap>
    <containsAppleHFSHeader>False</containsAppleHFSHeader>
    <containsAppleMasterDirectoryBlock>False</containsAppleMasterDirectoryBlock>
    <parsedPrimaryVolumeDescriptor>True</parsedPrimaryVolumeDescriptor>
    <sizeExpected>508913664</sizeExpected>
    <sizeActual>508913664</sizeActual>
    <sizeDifference>0</sizeDifference>
    <sizeAsExpected>True</sizeAsExpected>
    <smallerThanExpected>False</smallerThanExpected>
</tests>

md5sum myimage.iso
md5sum /dev/sr0

Root at extent 17, 2048 bytes
[0,0]
No errors found

<tests>
    <containsISO9660Signature>True</containsISO9660Signature>
    <containsApplePartitionMap>False</containsApplePartitionMap>
    <containsAppleHFSHeader>False</containsAppleHFSHeader>
    <containsAppleMasterDirectoryBlock>False</containsAppleMasterDirectoryBlock>
    <parsedPrimaryVolumeDescriptor>True</parsedPrimaryVolumeDescriptor>
    <sizeExpected>358400</sizeExpected>
    <sizeActual>358400</sizeActual>
    <sizeDifference>0</sizeDifference>
    <sizeAsExpected>True</sizeAsExpected>
    <smallerThanExpected>False</smallerThanExpected>
</tests>

<tests>
    <containsISO9660Signature>True</containsISO9660Signature>
    <containsApplePartitionMap>False</containsApplePartitionMap>
    <containsAppleHFSHeader>False</containsAppleHFSHeader>
    <containsAppleMasterDirectoryBlock>False</containsAppleMasterDirectoryBlock>
    <parsedPrimaryVolumeDescriptor>True</parsedPrimaryVolumeDescriptor>
    <sizeExpected>358400</sizeExpected>
    <sizeActual>49157</sizeActual>
    <sizeDifference>-309243</sizeDifference>
    <sizeAsExpected>False</sizeAsExpected>
    <smallerThanExpected>True</smallerThanExpected>
</tests>

jhove -m WAVE-hul foo.wav

shntool info foo.wav

ffmpeg -v error -i foo.wav -f null -

mediainfo foo.wav

Possible problems:
    File contains ID3v2 tag:    no
    Data chunk block-aligned:   yes
    Inconsistent header:        no
    File probably truncated:    no
    Junk appended to file:      no
    Odd data size has pad byte: n/a

flac -t foo.flac

=A2 + SQRT(A2)

52.06077146

52.0607714623856

52.06077146238560

52.0607714623856

CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (6-2)
CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (24-1)
CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (43-1)
::

<p id="p01" class="para">This is an <em>EPUB</em> 2 file that uses a fixed layout.</p>
<p id="p02" class="para">This is achieved by placing each line inside a</p>
<p id="p03" class="para"><em>paragraph</em> element. Each <em>paragraph</em> element</p>
<p id="p04" class="para">is placed at a fixed position on the page. Even</p>
<p id="p05" class="para">though this file is valid <em>EPUB</em>, this is a pretty</p>
<p id="p06" class="para"> terrible idea, because in most readers the text</p>
<p id="p07" class="para">will not reflow after resizing the viewer window.</p>

#p01
{
position:absolute;
left:40px;
top:80px;
letter-spacing:0.42px;
word-spacing:0.1em;
}
#p02
{
position:absolute;
left:40px;
top:120px;
letter-spacing:0.42px;
word-spacing:0.1em;

CSS-017, WARN, [CSS selector specifies absolute position.], OEBPS/Styles/styles.css (13-1)

<div class="pos" style="left: 40px; top: 100px;">This is an <em>EPUB</em> file<div>
<div class="pos" style="left: 260px; top: 100px;">page. Even though this</div>
<div class="pos" style="left: 40px; top: 140px;">that uses a two-column</div>
<div class="pos" style="left: 260px; top: 140px;">file is valid <em>EPUB</em>, there's</div>

.pos {position:absolute;
}

<link rel="record"
href="meta/9780000000001.xml" 
media-type="application/marc"/>

<epub:switch id="cmlSwitch">

   <epub:case required-namespace="http://www.xml-cml.org/schema">
      <cml xmlns="http://www.xml-cml.org/schema">
         <molecule id="sulfuric-acid">
            <formula id="f1" concise="H 2 S 1 O 4"/>
         </molecule>
      </cml>
   </epub:case>

   <epub:default>
      <p>H<sub>2</sub>SO<sub>4</sub></p>
   </epub:default>

</epub:switch>

<?xml version='1.0' encoding='UTF-8'?>
<jpylyzer>
    <toolInfo>
        <toolName>jpylyzer.py</toolName>
        <toolVersion>1.16.0</toolVersion>
    </toolInfo>
    <fileInfo>
        <fileName>AS16-P-4102.jp2</fileName>
        <filePath>/home/johan/testJpylyzer/AS16-P-4102.jp2</filePath>
        <fileSizeInBytes>6745365021</fileSizeInBytes>
        <fileLastModified>Wed Dec  2 20:05:29 2015</fileLastModified>
    </fileInfo>
    <statusInfo>
        <success>False</success>
        <failureMessage>memory error (file size too large)</failureMessage>
    </statusInfo>
    <isValidJP2>False</isValidJP2>
    <tests/>
    <properties/>
</jpylyzer>

mount|grep ^'/dev'

/dev/sda1 on / type ext4 (rw,errors=remount-ro)
/dev/sr0 on /media/johan/REBELS_0 type iso9660
(ro,nosuid,nodev,uid=1000,gid=1000,iocharset=utf8,mode=0400,dmode=0500,uhelper=udisks2)

ls /dev/

clipboard  dsp   mqueue  random  sda2  sdc1    stdin   ttyS2
conin      fd    null    scd0    sdb   shm     stdout  urandom
conout     full  ptmx    sda     sdb1  sr0     tty     windows
console    kmsg  pty0    sda1    sdc   stderr  ttyS0   zero

umount /dev/sr0

sudo readom retries=4 dev=/dev/sr0 f=mydisk.iso

Read  speed:  4234 kB/s (CD  24x, DVD  3x).
Write speed:     0 kB/s (CD   0x, DVD  0x).
Capacity: 309104 Blocks = 618208 kBytes = 603 MBytes = 633 prMB
Sectorsize: 2048 Bytes
Copy from SCSI (10,0,0) disk to file 'mydisk.iso'
end:    309104
addr:   309104 cnt: 44
Time total: 259.287sec
Read 618208.00 kB at 2384.3 kB/sec.

ddrescue -b 2048 -r4 -v /dev/sr0 mydisk.iso mydisk.log

GNU ddrescue 1.17
About to copy 624918 kBytes from /dev/sr0 to mydisk.iso
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size:  32 sectors       Initial skip size: 32 sectors
Sector size: 2048 Bytes

Press Ctrl-C to interrupt
rescued:   624871 kB,  errsize:   47104 B,  current rate:        0 B/s
    ipos:   508162 kB,   errors:       3,    average rate:     592 kB/s
    opos:   508162 kB,    time since last successful read:    12.3 m
Finished

ddrescue -d -b 2048 -r1 -v /dev/sr0 mydisk.iso mydisk.log

GNU ddrescue 1.17
About to copy 624918 kBytes from /dev/sr0 to mydisk.iso
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size:  32 sectors       Initial skip size: 32 sectors
Sector size: 2048 Bytes

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   624871 kB,  errsize:   47104 B,  errors:       3
Current status
rescued:   624912 kB,  errsize:    6144 B,  current rate:        0 B/s
    ipos:   508162 kB,   errors:       3,    average rate:     1706 B/s
    opos:   508162 kB,    time since last successful read:       7 s
Finished

ddrescue -d -b 2048 -r4 -v /dev/sr2 mydisk.iso mydisk.log

GNU ddrescue 1.17
About to copy 624918 kBytes from /dev/sr2 to mydisk.iso
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size:  32 sectors       Initial skip size: 32 sectors
Sector size: 2048 Bytes

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:   624916 kB,  errsize:    2048 B,  errors:       1
Current status
rescued:   624918 kB,  errsize:       0 B,  current rate:      682 B/s
    ipos:   106450 kB,   errors:       0,    average rate:      682 B/s
    opos:   106450 kB,    time since last successful read:       0 s
Finished

md5sum mydisk.iso
md5sum /dev/sr0

isovfy mydisk.iso

Root at extent 13, 2048 bytes
[0 0]
No errors found

isoinfo -d -i mydisk.iso

CD-ROM is in ISO 9660 format
System id: 
Volume id: REBELS_0
Volume set id: 
Publisher id: 
Data preparer id: 
Application id: NERO - BURNING ROM
Copyright File id: 
Abstract File id: 
Bibliographic File id: 
Volume set size is: 1
Volume set sequence number is: 1
Logical block size is: 2048
Volume size is: 333151
Joliet with UCS level 3 found
NO Rock Ridge present

isoinfo -d -i /dev/sr0

isoinfo -f -i mydisk.iso

/AUTORUN.EXE;1
/AUTORUN.INF;1
/DISK0
/LICENSE2.TXT;1
/LICENSEF.TXT;1
/LICENSEU.TXT;1
/SETUP.EXE;1
/DISK0/CONTROLS.CFG;1
/DISK0/DISK0;1
::
::
etc

disktype bewaarmachine.iso

--- bewaarmachine.iso
Regular file, size 342.1 MiB (358727680 bytes)
Apple partition map, 2 entries
Partition 1: 1 KiB (1024 bytes, 2 sectors from 1)
    Type "Apple_partition_map"
Partition 2: 172.4 MiB (180773376 bytes, 353073 sectors from 346957)
    Type "Apple_HFS"
    HFS file system
    Volume name "de bewaarmachine"
    Volume size 172.4 MiB (180764672 bytes, 44132 blocks of 4 KiB)
ISO9660 file system
    Volume name "BEWAARMACHINE_PC"
    Application "TOAST ISO 9660 BUILDER COPYRIGHT (C) 1993-1996 MILES SOFTWARE GMBH - HAVE A NICE DAY"
    Data size 169.4 MiB (177641472 bytes, 86739 blocks of 2 KiB)

disktype /dev/sr0

isolyzer bewaarmachine.iso > bewaarmachine.xml

cdparanoia -B -L

cdparanoia -B -l

track01.cdda.wav
track02.cdda.wav
track03.cdda.wav

cd-info /dev/sr0

CD Analysis Report

CD-TEXT for Disc:
CD-TEXT for Track  1:
CD-TEXT for Track  2:
mixed mode CD   
CD-ROM with ISO 9660 filesystem
ISO 9660: 235494 blocks, label `TNT_ROM                         '
Application: TOAST ISO 9660 BUILDER COPYRIGHT (C) 1993 MILES SOFTWARE ENGINEERING - HAVE A NICE DAY
Preparer   : 
Publisher  : 
System     : APPLE COMPUTER, INC., TYPE: 0002
Volume     : TNT_ROM
Volume Set : 
mixed mode CD   XA sectors   
session #2 starts at track  2, LSN: 235719, ISO 9660 blocks: 235494
ISO 9660: 235494 blocks, label `TNT_ROM    

umount /dev/sr0

cdrdao read-cd --read-raw --datafile toolstales.bin --device /dev/sr0 --driver generic-mmc-raw toolstales.toc

CD_ROM_XA

CATALOG "0000000000000"

// Track 1
TRACK MODE2_RAW
NO COPY
DATAFILE "toolstales.bin" 52:20:69 // length in bytes: 554058288

// Track 2
TRACK AUDIO
NO COPY
NO PRE_EMPHASIS
TWO_CHANNEL_AUDIO
SILENCE 00:02:00
FILE "toolstales.bin" #554058288 0 14:58:22
START 00:02:00

toc2cue toolstales.toc toolstales.cue

bchunk -s -w toolstales.bin toolstales.cue toolstales

cd-info /dev/sr0

CD Analysis Report

CD-TEXT for Disc:
CD-TEXT for Track  1:
::  ::
CD-TEXT for Track 18:
CD-Plus/Extra   
session #2 starts at track 18, LSN: 163570, ISO 9660 blocks: 170006
ISO 9660: 170006 blocks, label `NO   

umount /dev/sr0

cdrdao read-cd --read-raw --session 1 --datafile no1.bin --device /dev/sr0 --driver generic-mmc-raw no1.toc

cdrdao read-cd --read-raw --session 2 --datafile no2.bin --device /dev/sr0 --driver generic-mmc-raw no2.toc

toc2cue no1.toc no1.cue
toc2cue no2.toc no2.cue

bchunk -s -w no1.bin no1.cue no1
bchunk -s -w no2.bin no2.cue no2

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<preflight name="Jpeg_linked.pdf">
  <executionTimeMS>9792</executionTimeMS>
  <isValid type="PDF/A1-b">false</isValid>
  <errors count="96">
    <error count="1">
      <code>3.1.3</code>
      <details>Invalid Font definition, CourierNewPSMT: FontFile entry is missing from FontDescriptor</details>
    </error>
    <error count="1">
      <code>7.11</code>
      <details>Error on MetaData, PDF/A identification schema http://www.aiim.org/pdfa/ns/id/ is missing</details>
    </error>
    <error count="3">
      <code>6.2.1</code>
      <details>Action is forbidden, GoToPage isn't authorized as named action</details>
      <page>0</page>
    </error>

    ::
    ::

    <error count="1">
      <code>1.4.2</code>
      <details>Trailer Syntax error, The trailer dictionary contains Encrypt</details>
    </error>
  </errors>
</preflight>

<?xml version="1.0"?>
<!--
Schematron rules for policy-based  validation of PDF, based on output of Apache Preflight.
-->
<s:schema xmlns:s="http://purl.oclc.org/dsdl/schematron">

  <s:pattern name="Checks for encryption">        
    <s:rule context="/preflight/errors/error">
      <s:assert test="not(code = '1.0' and contains(details,'password'))">Open password</s:assert>
      <s:assert test="not(code = '1.4.2')">Encryption</s:assert>
    </s:rule>
  </s:pattern>

</s:schema>

<?xml version="1.0" encoding="UTF-8"?>
<profile xmlns="http://www.verapdf.org/ValidationProfile" model="org.verapdf.model.PDFA1a">
    <name>ISO 19005-1:2005 - 6.2.4 Images - Alternates</name>
    <description></description>
    <creator>veraPDF Consortium</creator>
    <created>2015-06-16T22:22:45Z</created>
    <hash>sha-1 hash code</hash>
    <rules>
        <rule id="6-2-4-t01" object="PDXImage">
            <description>An Image dictionary shall not contain the Alternates key</description>
            <test>Alternates_size == 0</test>
            <error>
                <message>Alternates key is present in the Image dictionary(</message>
            </error>
            <reference>
                <specification>ISO19005-1</specification>
                <clause>6.2.4</clause>
            </reference>
        </rule>
    </rules>
</profile>

ERROR: : OPS/XHTML file OEBPS/Text/pdfMigration.html cannot be decrypted

ERROR: : $fileType file $fileName cannot be decrypted

RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.],epub20_minimal_encryption.epub

ERROR: : I/O error reading OEBPS/hauy-2005-1.xml: Stream closed

ERROR: /OEBPS/Text/pdfMigration.html(20): non-standard image resource 'OEBPS/Images/pdfVenn.jp2' of type 'image/jp2'

MED-003, ERROR, [Non-standard image resource of type image/jp2 found.], OEBPS/Text/pdfMigration.html (20-63) 

<s:pattern name="wellFormed">
    <s:rule context="/jh:jhove/jh:repInfo">
    <s:assert test="(jh:status = 'Well-formed')">Not well-formed epub</s:assert>
    </s:rule>
</s:pattern>

<!-- This rule rules out encrypted content, but permits font obfuscation-->  
<s:pattern name="encryptedResources">
    <s:rule context="/jh:jhove/jh:repInfo/jh:messages">
    <s:assert test="count(jh:message[contains(.,'cannot be decrypted')]) = 0">Contains encrypted resources</s:assert>
    </s:rule>
</s:pattern>

<manifest>
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
    <item href="Images/pdfVenn.jp2" id="pdfVennJP2" media-type="image/jp2" fallback="pdfVennPNG" />
    <item href="Images/pdfVenn.png" id="pdfVennPNG" media-type="image/png" />
    <item href="hauy-2005-1.xml" id="opf3" media-type="application/x-dtbook+xml" />
</manifest>

opj_compress -i krant.tif -o krant_oj_4.jp2 -r 4 -I -p RPCL -n 7 -c [256,256],[256,256],[256,256],[256,256],[256,256],[256,256],[256,256] -b 64,64

f:\myData>c:\fits\fits

The system cannot find the path specified.
Error: Could not find or load main class edu.harvard.hul.ois.fits.Fits

f:\myData>c:\droid\droid

java -jar c:\droid\droid-command-line-6.1.3.jar

No command line options specified

f:\myData>c:\jhove2\jhove2

16:51:02,702 [main] WARN  TypeConverterDelegate : PropertyEditor [com.sun.beans.editors.EnumEditor]
found through deprecated global PropertyEditorManager fallback - consider using a more isolated
form of registration, e.g. on the BeanWrapper/BeanFactory!

f:\myData>c:\fido\fido.py

f:\myData>c:\jhove\jhove 

Exception in thread "main" java.lang.NoClassDefFoundError: edu/harvard/hul/ois/j
hove/viewer/ConfigWindow
        at edu.harvard.hul.ois.jhove.DefaultConfigurationBuilder.writeDefaultCon
figFile(Unknown Source)
        at edu.harvard.hul.ois.jhove.JhoveBase.init(Unknown Source)
        at Jhove.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException: edu.harvard.hul.ois.jhove.viewer.Co
nfigWindow
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 3 more

kdu_transcode -i balloon_master.jp2 \
    -o balloon_access_kdu.jpf \
    -jpx_layers sRGB,0,1,2 \
    Sprofile=PROFILE2 -rate 1.2

j2kdriver -i balloon_master.jp2 \
    -R 20 -w \
    I97 \
    -t JP2 \
    -o balloon_access_aw.jp2

kdu_transcode -i balloon_master_layers_precincts.jp2 \
    -o balloon_access_precincts_kdu.jpf \
    -jpx_layers sRGB,0,1,2 \
    Sprofile=PROFILE2 \
    -rate 1.2

j2kdriver -i balloon_master_layers_precincts.jp2 \
    -ql 4 \
    -t JP2 \
    -o balloon_access_layers_precincts_aw.jp2

 Syntax error, Expected pattern 'obj but missed at character 'o'

32 0 obj<</Kids[33 0 R]>>
endobj

32 0 obj
<</Kids[33 0 R]>>
endobj

sudo apt-get install build-essential dh-make devscripts debhelper lintian

mkdir debtest_1.0.0

cd debtest_1.0.0

dh_make

mv debtest_1.0.0 debtest-1.0.0

dh_make --createorig

git clone https://github.com/openplanets/jpylyzer.git

cd jpylyzer

dpkg-buildpackage -tc

pymakespec --onefile jpylyzer.py
make[1]: pymakespec: Command not found
make[1]: *** [build] Error 127
make[1]: Leaving directory `/home/johan/debtest/jpylyzer'
make: *** [build] Error 2
dpkg-buildpackage: error: debian/rules build gave error exit status

build:
    pymakespec --onefile jpylyzer.py
    pyinstaller jpylyzer.spec
    @echo "Built in dist/jpylyzer"

python /home/johan/pyinstall1.5/Makespec.py --onefile jpylyzer.py
python home/johan/pyinstall1.5/pyinstaller.py jpylyzer.spec

python /home/johan/pyinstall/pyinstaller.py --onefile jpylyzer.py	

dpkg-buildpackage -tc

lintian jpylyzer_1.9.0_amd64.deb

E: jpylyzer: unstripped-binary-or-object usr/bin/jpylyzer
W: jpylyzer: hardening-no-fortify-functions usr/bin/jpylyzer
W: jpylyzer: wrong-bug-number-in-closes l3:#nnnn
E: jpylyzer: debian-changelog-file-contains-invalid-email-address johan@unknown
E: jpylyzer: helper-templates-in-copyright

W: jpylyzer: hardening-no-relro usr/bin/jpylyzer

31 0 obj   
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>    
endobj

32 0 obj   
<</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>>   
stream  
…SVG Data…  
endstream  
endobj

10 0 obj            % Image XObject
<< 
    /Type /XObject
    /Subtype /Image
    /Width 100
    /Height 200
    /ColorSpace /DeviceGray
    /BitsPerComponent 8
    /Length 2167
    /Filter /DCTDecode
>>
stream
…Image data…
endstream  
endobj

37 0 obj    
<<
    /Desc()
    /EF<</F 38 0 R>>
    /F(KSBASE.WQ2)
    /Type/Filespec/UF(KSBASE.WQ2)>>
endobj

33 0 obj   
<<
    /EmbeddedFiles 34 0 R
    /JavaScript 35 0 R
>>   
endobj

41 0 obj
<<
    /CT(video/quicktime)
    /D 42 0 R
    /N(Media clip from animation.mov)
    /P<</TF(TEMPACCESS)>>
    /S/MCD
>>
endobj

42 0 obj
<<
    /EF<</F 43 0 R>>
    /F(<embedded file>)
    /Type/Filespec
    /UF(<embedded file>)
>>
endobj

/jpylyzer/properties/contiguousCodestreamBox/cod/transformation

/jpylyzer/properties/contiguousCodestreamBox/cod/order

/jpylyzer/properties/jp2HeaderBox/colourSpecificationBox/meth

/jpylyzer/properties/jp2HeaderBox/resolutionBox

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test="transformation = '9-7 irreversible'">wrong transformation</s:assert>
</s:rule>

<s:rule context="/jpylyzer/properties">
<s:assert test="compressionRatio &lt; 35">Too much compression</s:assert>
</s:rule>

<s:rule context="/jpylyzer/properties/jp2HeaderBox/resolutionBox">
<s:assert test="captureResolutionBox">no capture resolution box</s:assert>
</s:rule>

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test=
    "(multipleComponentTransformation = 'yes') and 
        (../../jp2HeaderBox/imageHeaderBox/nC = '3') 
    or (multipleComponentTransformation = 'no') and 
        (../../jp2HeaderBox/imageHeaderBox/nC = '1')">
    no colour transformation</s:assert>
</s:rule>

<s:rule context="/jpylyzer/properties/contiguousCodestreamBox/cod">
<s:assert test="precinctSizeY[1] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[2] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[3] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[4] = '128'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[5] = '256'">precinctSizeY doesn't match profile</s:assert>
<s:assert test="precinctSizeY[6] = '256'">precinctSizeY doesn't match profile</s:assert>
</s:rule>

jpylyzer balloon.jp2 > balloon_jp2.xml

java -jar probatron.jar balloon\_jp2.xml profile.sch > balloon_jp2_assessment.xml

<svrl:failed-assert test="layers = '8'" location="/jpylyzer[1]/properties[1]/contiguousCodestreamBox[1]/cod[1]" line="45" col="550">
<svrl:text>wrong number of layers</svrl:text>  
</svrl:failed-assert>

/usr/share/misc

/usr/share/file

/cygwin/usr/share/misc/

file -C -m myMagic

file -i -m myMagic *

file -i -m myMagic.mgc *

0	string	\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A	JPEG 2000

>20	string	\x6a\x70\x32\x20	Part 1 (JP2)

!:mime	image/jp2

0	string	\x00\x00\x00\x0C\x6A\x50\x20\x20\x0D\x0A\x87\x0A	JPEG 2000
>20	string	\x6a\x70\x32\x20	Part 1 (JP2)
!:mime	image/jp2
>20	string	\x6a\x70\x78\x20	Part 2 (JPX)
!:mime	image/jpx
>20	string	\x6a\x70\x6d\x20	Part 6 (JPM)
!:mime	image/jpm
>20	string	\x6d\x6a\x70\x32	Part 3 (MJ2)
!:mime video/mj2

dos2unix jpeg2000Magic

file -C -m jpeg2000Magic

file -i -m jpeg2000Magic *

balloon.jp2:  image/jp2; charset=binary
balloon.jpf:  image/jpx; charset=binary
balloon.jpm:  image/jpm; charset=binary
Speedway.mj2: video/mj2; charset=binary

<?xml version="1.0"?>

Metrics	Cramér’s V	p	df	N
JHOVE status vs VeraPDF parse errors	0.56	<.001	2	88
JHOVE status vs VeraPDF warnings	0.51	<.001	2	88

Tool metric	Cramér’s V	p	df	N
JHOVE status	0.23	.011	4	88
VeraPDF parse errors	0.46	<.001	2	88
VeraPDF warnings	0.41	<.001	2	88

File	Description
encryption_nocopy.pdf	Copying document contents requires password.
encryption_noprinting.pdf	Printing requires password.
encryption_notextaccess.pdf	Text access (e.g. by a screen reader) requires password.

File	Description
embedded_video_quicktime.pdf	Contains embedded Quicktime movie.
embedded_video_avi.pdf	Contains embedded AVI movie.

File	Description
text_only_fontsNotEmbedded.pdf	Only uses fonts that are not embedded.
text_only_fontsEmbeddedAll.pdf	Only uses fonts that are embedded.

JHOVE status	VeraPDF parse errors	No VeraPDF parse errors	All
Not well-formed	62	7	69
Well-Formed, but not valid	2	4	6
Well-Formed and valid	4	9	13
All	68	20	88

Rendering \ JHOVE status	Not well-formed	Well-Formed, but not valid	Well-Formed and valid	All
Does not render	31	1	0	32
Renders with issues	27	2	9	38
Renders normally	11	3	4	18
All	69	6	13	88

File	Description
fileAttachment.pdf	Contains file attachment that uses EmbeddedFiles entry.
fileAttachment_fileAttachmentAnnotation.pdf	Contains file attachment that uses File Attachment Annotation.

File	Actions (VeraPDF)	Annotations (VeraPDF)	Annotations (JHOVE)
AdobeChassisDemo-commented_Review.pdf	JavaScript	3D Ink Text Popup Polygon	Popup Ink Text 3D Polygon
movie.pdf		Movie
MusicalScore.pdf	GoTo JavaScript URI Rendition	Screen Widget Link	Link Widget Screen
LabelExample.pdf	GoTo3DView JavaScript	3D Widget	3D Widget
MultiMedia_Acro6.pdf	Rendition	Screen	Screen
AVI+Transitions Demo.pdf
20020402_CALOS.pdf	Movie Hide Named GoTo	Link Movie Widget
Disney-Flash.pdf	URI Rendition SubmitForm	Widget Link Screen	Widget Link Screen
gXsummer2004-stream.pdf
SVG-AnnotAnim.pdf		SVG
Service Form_media.pdf	SubmitForm Named ResetForm JavaScript Rendition	Widget Link Screen	Screen Widget Link
Binder_6-3DPages.pdf	GoTo	3D	3D
us_population.pdf	GoTo	SVG	SVG
SVG.pdf	JavaScript URI	Widget	Widget
phlmapbeta7.pdf	Rendition	Screen	Screen
ScriptEvents.pdf	JavaScript GoTo Rendition	Screen Widget	Widget Screen
Trophy.pdf	JavaScript URI GoTo Rendition	Screen Widget Link	Widget Link Screen
VolvoS40V50-Full.pdf	Named JavaScript GoTo URI Rendition	Widget Link Screen	Screen Widget Link
Jpeg_linked.pdf	Named GoTo Rendition	Link Screen	Link Screen
AdobeChassisDemo-commented.pdf		3D Popup Polygon Text	Popup Polygon 3D Text
drape_raster_contour_sample.pdf	URI	Link 3D	3D Link
remotemovieurl.pdf		Movie FreeText
3-D_PDF.pdf		3D	3D
movie_down1.pdf		Movie

Element (relative to job element)	Significance
taskResult	Parse status, rough validity proxy, may be used to identify malformed files and files that require open password
featuresReport/documentSecurity	Child elements indicate security-related restrictions (e.g. copy, printing and text access passwords)
featuresReport/actions/action*	Some actions imply a preservation risk
featuresReport/annotations/annotation*	Some annotations imply a preservation risk
featuresReport/documentResources/fonts/font/fontDescriptor/embedded*	Value indicates whether font is embedded or not
featuresReport/embeddedFiles/embeddedFile*	Indicates presence of file attachment

File name	Words (Tika)	Words (Textract)	Words (EbookLib)
eern001lief01_01.epub	25450	1	25446
spro002mure01_01.epub	50553	0	50549
berk011veel01_01.epub	67978	0	67974
sche034drie01_01.epub	203853	3	203352
jous010supe01_01.epub	202495	0	202491
dele035wegv01_01.epub	76536	0	76530
verv017eerl01_01.epub	33844	0	33840
dhae007euro01_01.epub	394455	2	394400
gomm002uurw01_01.epub	43754	0	43731
gang009lalb01_01.epub	28453	4	28381
geel005bloe01_01.epub	76316	0	76312
hart008droo02_01.epub	77283	0	77279
eede003vand04_01.epub	120481	6	120310
meij031tuss02_01.epub	145678	4	145665
maas013blau01_01.epub	55099	0	55093

File name	Words (Tika)	Words (Textract)	Words (EbookLib)
william-shakespeare_king-lear.epub	28442	18621	28430
david-garnett_lady-into-fox.epub	25240	25223	25228
joseph-conrad_heart-of-darkness.epub	38717	38698	38705
anthony-trollope_the-dukes-children.epub	223014	222995	223002
agatha-christie_the-mysterious-affair-at-styles.epub	57401	57229	57271
edgar-allan-poe_the-narrative-of-arthur-gordon-pym-of-nantucket.epub	71931	71837	71863
p-g-wodehouse_short-fiction.epub	212224	212182	212212
robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde.epub	26370	26345	26358
h-g-wells_the-time-machine.epub	33044	33024	33032
thorstein-veblen_the-theory-of-the-leisure-class.epub	106537	106515	106525

Name	Type	TTL	Value
`www`	`A`	`86400`	`185.199.111.153`
`www`	`A`	`86400`	`185.199.110.153`
`www`	`A`	`86400`	`185.199.108.153`
`www`	`A`	`86400`	`185.199.109.153`
`www`	`AAAA`	`86400`	`2606:50c0:8001::153`
`www`	`AAAA`	`86400`	`2606:50c0:8000::153`
`www`	`AAAA`	`86400`	`2606:50c0:8003::153`
`www`	`AAAA`	`86400`	`2606:50c0:8002::153`

Parameter	Value (master)	Value (access)
File format	JP2 (JPEG 2000 Part 1)	JP2 (JPEG 2000 Part 1)
Compression type	Reversible 5-3 wavelet filter	Irreversible 7-9 wavelet filter
Colour transform	Yes (only for colour images)	Yes (only for colour images)
Number of decomposition levels	5	5
Progression order	RPCL	RPCL
Tile size	1024 x 1024	1024 x 1024
Code block size	64 x 64 (2⁶ x 2⁶)	64 x 64 (2⁶ x 2⁶)
Precinct size	256 x 256 (2⁸) for 2 highest resolution levels; 128 x 128 (2⁷) for remaining resolution levels	256 x 256 (2⁸) for 2 highest resolution levels; 128 x 128 (2⁷) for remaining resolution levels
Number of quality layers	11	8
Target compression ratio layers	2560:1 [1] ; 1280:1 [2] ; 640:1 [3] ; 320:1 [4] ; 160:1 [5] ; 80:1 [6] ; 40:1 [7] ; 20:1 [8] ; 10:1 [9] ; 5:1 [10] ; - [11]	2560:1 [1] ; 1280:1 [2] ; 640:1 [3] ; 320:1 [4] ; 160:1 [5] ; 80:1 [6] ; 40:1 [7] ; 20:1 [8]
Error resilience	Start-of-packet headers; end-of-packet headers; segmentation symbols	Start-of-packet headers; end-of-packet headers; segmentation symbols
Sampling rate	Stored in “Capture Resolution” fields	Stored in “Capture Resolution” fields
Capture metadata	Embedded as XMP metadata in XML box	Embedded as XMP metadata in XML box

Codec	Link to script
OpenJPEG	https://github.com/KBNLresearch/jp2totiff/blob/master/mastertoaccess-opj.sh
Grok	https://github.com/KBNLresearch/jp2totiff/blob/master/mastertoaccess-grok.sh
Kakadu	https://github.com/KBNLresearch/jp2totiff/blob/master/mastertoaccess-kdu.sh

Software	Version
VeraPDF	1.22.3
JHOVE	1.28.0

Software	Version
Tika-python	2.6.0
Textract	1.6.5
EbookLib	0.18

Codec	time (real)	time (user)
OpenJPEG	0m50.715s	1m20.497s
Grok	0m22.143s	1m1.308s
Kakadu	0m25.507s	0m48.990s

Tool	Version	ID
Siegfried	1.9.1; DROID Signature File V97⁵	x-fmt/412 (Java Archive Format) x-fmt/263 (ZIP Format)
Unix File	5.32	application/zip
Apache Tika	1.23	application/vnd.android.package-archive

Codec	Deviations from KB access requirements
OpenJPEG	XML box missing, resolution box missing, ICC profile missing
Grok	XML box missing
Kakadu	-

Tool	Version	ID
Siegfried	1.9.1; DROID Signature File V97⁵	x-fmt/263 (ZIP Format)
Unix File	5.32	application/zip
Apache Tika	1.23	application/x-itunes-ipa

	Android-x86, VirtualBox	Android-x86, QEMU	Anbox	Android Studio
Emulator version	6.0.24, r139119	5.2	4-56c25f1	30.3.5.0
Android version	Android-x86 9.0-R2 (64-bit)	Android-x86 9.0-R2 (64-bit)	Customized system image based on v. 7.1.1 of Android Open Source Project	Android 9.0 x86 system image (Google), API level 28
Emulation approach	Virtualization	Virtualization	Compatibility layer	Virtualization (full ARM emulation optional, depending on system image)
ARize app installs	Yes	Yes	No	Yes
ARize app works	No	No	-	Partially (camera device not recognised; emulator unresponsive after using camera)
Immer app installs	Yes	Yes	Yes	Yes
Immer app works	Yes	Yes	Partially (rendering and navigation issues)	Yes

Country	Count	% of all active domains
NL	3.45617e+06	75.03
DE	419623	9.11
US	235962	5.12
IE	180379	3.92
FR	78023	1.69
DK	68584	1.49
BE	42333	0.92
GB	26748	0.58
LU	15569	0.34
Other	56489	1.17

Province	Count	% of all NL-hosted domains
Noord-Holland	2.60368e+06	75.34
Zuid-Holland	270357	7.82
Gelderland	221120	6.4
Noord-Brabant	133056	3.85
Utrecht	62215	1.8
Flevoland	53595	1.55
Overijssel	33781	0.98
Groningen	28392	0.82
Fryslân	20184	0.58
Drenthe	14971	0.43
Limburg	11154	0.32
Zeeland	3325	0.1

Carrier type	Category	Query	Matching records
CD-ROM	Optical	extent any “cdrom* cd-rom*”	8109
DVD	Optical	extent any “dvd*”	711
Blu-Ray	Optical	extent any “bluray* blu-ray*”	4
Audio CD	Optical	type any “geluidsdrager” and extent any “cd* compact”	4605
CD-i	Optical	extent any “cdi* cd-i*”	44
Optical carrier, unspecified	Optical	extent any “optisch* schijf”	23
Tape, unspecified	Magnetic	extent any “tape*”	2
Compact cassette, data (datassette)	Magnetic	extent any “cassetteband*”	6
Floppy disk, 8”	Magnetic	-	0
Floppy disk, 5.25”	Magnetic	extent any “flop* diskette” and extent any “5.25 5,25*”	184
Floppy disk, 3.5”	Magnetic	extent any “flop* diskette” and extent any “3.5 3,5*”	1194
Floppy disk, unspecified	Magnetic	extent any “flop* diskette” not extent any “5.25 3.5*”	822
Zip Disk	Magnetic	extent any “zipdis* zip-dis*”	0
Hard disk	Magnetic	extent any “hard” and extent any “schijf schijv disk*”	2
Compact Flash card	Electronic	-	0
Sony Memory stick	Electronic	-	0
SD card	Electronic	-	0
USB thumb drive	Electronic	extent any “usb*”	44
Solid-State drive	Electronic	-	0

Format (file(1))	Number of files
new-fs dump file (big endian)	28
new-fs dump file (little endian)	8
tar archive	4
POSIX tar archive	2
POSIX tar archive (GNU)	5

Test file	Failed assert(s)
EmbeddedCmap.pdf	Font is not embedded
embedded_fonts.pdf	Font is not embedded
embedded_pm65.pdf
notembedded_pm65.pdf	Font is not embedded
printtestfont_nonopt.pdf
printtestfont_opt.pdf
substitution_fonts.pdf	Font is not embedded
text_images_pdf1.2.pdf	Font is not embedded
TEXT.pdf	Font is not embedded
Type3_WWW-HTML.PDF	Font is not embedded

Test file	Failed assert(s)
20020402_CALOS.pdf	Font is not embedded;Movie annotation
3-D_PDF.pdf	3D annotation
AdobeChassisDemo-commented.pdf	3D annotation
AdobeChassisDemo-commented_Review.pdf	3D annotation
AVI+Transitions Demo.pdf	Document not parsable
Binder_6-3DPages.pdf	3D annotation
Disney-Flash.pdf	Font is not embedded;Screen annotation
drape_raster_contour_sample.pdf	Font is not embedded;3D annotation
gXsummer2004-stream.pdf	Document not parsable
Jpeg_linked.pdf	Encrypted document;Document not parsable
LabelExample.pdf	Encrypted document;Document not parsable
movie_down1.pdf	Movie annotation
movie.pdf	Movie annotation
MultiMedia_Acro6.pdf	Encrypted document;Document not parsable
MusicalScore.pdf	Font is not embedded;Screen annotation
phlmapbeta7.pdf	Font is not embedded;Screen annotation
remotemovieurl.pdf	Font is not embedded;Movie annotation
ScriptEvents.pdf	Font is not embedded;Screen annotation
Service Form_media.pdf	Font is not embedded;Screen annotation
SVG-AnnotAnim.pdf	Font is not embedded
SVG.pdf	Font is not embedded
Trophy.pdf	Font is not embedded;Screen annotation
us_population.pdf

File	Result
frogs-01.wav	Status: Well-Formed and valid
frogs-01-last-byte-missing.wav	Status: Well-Formed and valid
frogs-01-last-2032-bytes-missing.wav	Status: Well-Formed and valid
frogs-01-byte-missing-at-offset-811537.wav	Status: Well-Formed and valid

bitsgalore.org

Multi-image TIFFs, subfiles and image file directories

TIFF to JP2 workflow

ImageMagick reports changed pixel values

Subfiles and image file directories

SubIFDs

Remove thumbnail with ExifTool

Pixel compare with unmodified source TIFFs

ImageMagick reads last subfile by default

Are multi-image TIFFs a preservation risk?

Detection of multi-image TIFFs

ExifTool

ImageMagick identify

JHOVE

Conclusion

Further resources

VeraPDF parse status as a proxy for PDF rendering: experiments with the Synthetic PDF Testset

Goal and objectives of this post

Synthetic PDF Testset for File Format Validation

Running JHOVE and VeraPDF

JHOVE validation status vs VeraPDF parse errors and warnings

Simplifying the ground truth

JHOVE and VeraPDF results vs rendering behaviour

Interpretation and conclusions

VeraPDF vs JHOVE

Implications for digital preservation workflows

Limitations

JHOVE, VeraPDF and the future of PDF validation

Feedback welcome

Further resources

Identification of PDF preservation risks with VeraPDF and JHOVE

Context of this work

PDF preservation risks

PDF features and risks

VeraPDF vs JHOVE

Methods and test data

Advance warning

Analysis of Horror Corpus

Open password

Test files

VeraPDF

JHOVE

Copy, printing and text access passwords

Test files

VeraPDF

JHOVE

Multimedia

Test files

VeraPDF

JHOVE

JavaScript

Test files

VeraPDF

JHOVE

Font embedding

Test files

VeraPDF

JHOVE

File attachments

Test files

VeraPDF

JHOVE

Link to external file

Test files

VeraPDF

JHOVE

Web Capture content

Test files

VeraPDF

JHOVE

Byte corruption

Test files

VeraPDF

JHOVE

Analysis Adobe Acrobat Engineering dataset

Results

Behaviour with corrupted files

Support for PDF 2.0

Discussion

JHOVE

File	Result
frogs-01.wav	File probably truncated: no
frogs-01-last-byte-missing.wav	File probably truncated: yes (missing 1 byte)
frogs-01-last-2032-bytes-missing.wav	File probably truncated: yes (missing 2032 bytes
frogs-01-byte-missing-at-offset-811537.wav	File probably truncated: yes (missing 1 byte)

File	result
frogs-01.wav	-
frogs-01-last-byte-missing.wav	[pcm_s16le @ 0x3545380] Invalid PCM packet, data has size 3 but at least a size of 4 was expected Error while decoding stream #0:0: Invalid data found when processing input
frogs-01-last-2032-bytes-missing.wav	-
frogs-01-byte-missing-at-offset-811537.wav	[pcm_s16le @ 0x2768380] Invalid PCM packet, data has size 3 but at least a size of 4 was expected Error while decoding stream #0:0: Invalid data found when processing input

File	Result
frogs-01.flac	File probably truncated: no
frogs-01-last-byte-missing.flac	File probably truncated: no
frogs-01-last-1000-bytes-missing.flac	File probably truncated: no
frogs-01-byte-missing-at-offset-651202.flac	File probably truncated: no

File	Result
frogs-01.flac	-
frogs-01-last-byte-missing.flac	[flac @ 0x294b860] overread: 1 Error while decoding stream #0:0: Invalid data found when processing input
frogs-01-last-1000-bytes-missing.flac	[flac @ 0x3c5d860] overread: 1 Error while decoding stream #0:0: Invalid data found when processing input
frogs-01-byte-missing-at-offset-651202.flac	[flac @ 0x279faa0] overread: 1 Error while decoding stream #0:0: Invalid data found when processing input

File	Result
frogs-01.flac	-
frogs-01-last-byte-missing.flac	ERROR while decoding data state = FLAC__STREAM_DECODER_END_OF_STREAM
frogs-01-last-1000-bytes-missing.flac	ERROR while decoding data state = FLAC__STREAM_DECODER_END_OF_STREAM
frogs-01-byte-missing-at-offset-651202.flac	ERROR while decoding data state = FLAC__STREAM_DECODER_READ_FRAME

Extension	ID Tika	Remarks
cif	text/plain	Crystallographic Information File
csv	text/csv	Comma-separated values
mol	text/plain	MDL Molfile
tdb	text/plain	Thermo-Calc Database Format
r	text/x-rsrc	R source code
m	text/x-objcsrc	Objective-C source code

Category	Rendering software in reading rooms	Formats accessible in reading rooms?
Image formats	MS Paint, Windows Photoviewer	Yes
PDF	Adobe Acrobat	Yes
Web formats	Internet Explorer, Google Chrome	Yes
Office formats	Microsoft Office	Yes (support for old Office formats presently not clear)
Audio	Windows Media Player, VLC Media Player	No (hardware in reading rooms doesn’t support audio)
Video	Windows Media Player, VLC Media Player	Partially (hardware in reading rooms doesn’t support audio)
Metadata	Internet Explorer, Notepad, Wordpad	Yes
Executables, installers, system files	Not applicable	No
Containers	Windows Explorer, 7-Zip	Yes
Source code / scripts	Notepad, Wordpad	Partially: available software doesn’t support syntax highlighting
(Scientific) text-based data formats	Notepad, Wordpad	Partially: available software doesn’t support syntax highlighting; CSV files are not imported correctly by MS Excel
(Scientific) binary data formats	-	No

Test	Epub version	Description
Minimal	2	Basic file with one text resource and one image
Encryption	2	Fake encrypted file that includes encryption.xml resource in `META-INF`, indicating that main text resource is encrypted²
Font obfuscation	3	Includes fonts that are obfuscated (which results in hasEncryption in epubcheck). Taken from EPUB 3 Sample Documents (wasteland with OTF fonts, obfuscated).
Foreign resource without fallback	2	Includes JP2 image, which is a format that is not on the list of Core Media Types
Foreign resource with fallback 1	2	Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, identifier in content document
Foreign resource with fallback 2	2	Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, no identifier in content document
DTBook	2	Includes Digital Talking Book content. Adapted from threepress, published under BSD 3 license.

Symbol	Posting frequency
	High
	Occasional gaps
	Gaps of one or more days
	Lots of large gaps
	Almost dead
	Casualty of war

Codec	Command line
opj20	`opj_decompress -i krant_oj_4.jp2 -o krant_oj_4_oj.tif`
kakadu	`kdu_expand -i krant_oj_4.jp2 -o krant_oj_4_kdu.tif`
kakadu-precise	`kdu_expand -i krant_oj_4.jp2 -o krant_oj_4_kdu_precise.tif -precise`
irfan	Used GUI
im	`convert krant_oj_4.jp2 krant_oj_4_im.tif`
gm	`gm convert krant_oj_4.jp2 krant_oj_4_gm.tif`