There is a raising awareness amongst scientists that their data sets will need attention if they will be able to use them in the future and some of them learnt this the hard way from past experience. An interesting article by T. Vines at all in Current Biology, Volume 24, Issue 1, 94-97, 19 December 2013 describes a study into the availability of research data years after the article was published . 516 Publicly available articles, published between 1991 and 2011 were used to find the related data sets, via authors email addresses, either from the article or by searching the web. Vines and his colleagues received 101 data sets, and another 20 datasets were reported to be still in use. Especially for older papers the related datasets were not readily available any more. The original authors were asked for the data and they gave a variety of reasons why they could not.
Responses included authors being sure that the data were lost (e.g., on a stolen computer) or thinking that they might be stored in some distant location (e.g., their parent’s attic) to authors having some degree of certainty that the data are on a Zip or floppy disk in their possession but no longer having the appropriate hardware to access it. In the latter two cases, the authors would have to devote hours or days to retrieving the data.
The article was discussed in Nature and two other cases of lost data sets were mentioned, which will be cited here, as they are too small to put in the Stories part of the Atlas.
Showing that “benign neglect”, after all, often seems to be not the way to preserve digital information.
Agricultural researcher Melvin McCarty, for instance, spent 15 years between 1958 and 1973 recording the life cycles of plants and grasses near Lincoln, Nebraska. Forty years later, ecologist Lizzie Wolkovich went searching for McCarty’s data as part of an effort to tie together experiments exploring how rising temperatures affect plant life cycles. But McCarty had died, and his raw data could not be found. “There is nothing we can replicate now. The loss of the long-term data set is very sad,” says Wolkovich, who works at the University of British Columbia in Vancouver.
A similar fate befell the raw data collected in the 1980s by Otto Solbrig, a biologist at Harvard University in Cambridge, Massachusetts, on species of violets in New England. Plant biologist Sydne Record at Michigan State University in East Lansing wrote to him in 2009 asking for the original data, to test out a mathematical analysis of population viability that she was developing — but Solbrig didn’t have them. “We had at least 20 big folders with those data, but nobody was interested in them so we threw them away,” he says.