Corrupted PDF

The issue

In his blog Restricted Data. The Nuclear Secrecy Blog  Alex Wellerstein describes a case of corrupted pdf-files.

Example

Comparing the corrupted pdf  file (figure 1) with the original source as mentioned in the blog post (figure 2) shows blank pages where text should have been.


corrupted PDF

 

comparison-2

Description

Cited from the above mentioned blog post

Calculating the efficiency of the bomb as a function of how well you can hold it together is apparently the essence of the still mostly-classified Bethe-Feynman formula. It is described qualitatively in Samuel Glasstone, “Weapons Activities of Los Alamos Scientific Laboratory, Part I,” LA-1632 (January 1954), 34-37. My copy of this report comes from the NNSA’s FOIA Reading Room. I downloaded the file in 2009, and sometime since then all of their PDFs have gotten corrupted somehow, and so many of the pages of the PDFs now available on their site are unreadable. For those who are curious, at a technical level, the corruption involved a systematic stripping out of the carriage return (0D) ASCII characters from the PDFs — there are none in any of the files, and there should be several thousand of them. Here is a screenshot from a hex editor showing the corrupted file (on left) versus the uncorrupted one (on the right). There seems to be no easy fix for this problem. I have tried to contact the NNSA about this but have gotten no response. It is one of many troubling incidents revealing, in my view, the very low priority that public release of information, and poor understanding of public-facing information technology, with regards to the present nuclear agencies.

 

Can we avoid this?

This case seems to have the same cause as described in Byte corruption in PDF