At the heart of digital preservation is the original digital object, the “thing” we want to preserve. Organisations that will preserve digital material, will have an ingest procedure where they receive the digital objects and will do various quality checks, like whether the received object is what they expected to receive, be it a born-digital or a digitized object. But this quality control is only possible to a certain level, mainly related to technical aspects, like file formats, structure or size. The quality control of the intellectual content of the object is often done by the creator of the object. Digitized material can be compared with the analogue original and deviations can be identified. This process being finished, someone gives the green light that the digital object is “correct” . But what if one is not aware of errors that can occur and are difficult to notice? My colleague Johan van der Knijff pointed me to this interesting article.
David Kriesel describes here his recent experience with the Xerox WorkCentre machines. While scanning an image and comparing it later with the original, he discovered that some figures on the image were changed. “66” became “86” in several cases. This is a deviation that is not easy to detect when scanning many pages with figures, apart from the fact that no one expects the need of checking this! The error had nothing to do with the OCR process as the OCR functionality was not active in this task. The current assumption is that is has to do with the use of JBIG2 for compression. Xerox has confirmed this error and will create a patch. However, this error might have been present in Xerox WorkCentres and other copiers for years – we never will know how many documents are “damaged” by this error. Metadata in the original digital object about the scanning environment that was used and the date of creation might be helpful to retrieve possible faulty documents and support future digital detectives.