The digital Dark Ages

Those Mac disks in your basement--the ones with your kids' baby pictures. Can you still open them?

Veronique Greenwood ’08 has written for the New York Times, the Atlantic, and National Geographic.

Richard Borge

Richard Borge

View full image

The disks had been sitting in an attic for quite some time. Big black squares, like flexible plastic PopTarts, they were a type of data storage that flourished briefly in the late 1970s and early ’80s: the 5 ¼-inch floppy disk. Many similar objects languish in garages and closets around the world, some of them bearing long-forgotten term papers, others old letters or painstakingly assembled genealogies. This particular cache of disks was suspected by the Yale linguistics department to contain something priceless: a dictionary for an endangered Native American language, compiled in the early 1980s.

The only problem was that the computer system capable of reading the disks’ proprietary format had been out of production for decades. It had been cutting-edge in 1982, when it was a competitor to IBM. But by the mid-1990s the company that made it had largely faded from public view.

The linguists brought the problem to Euan Cochrane, digital preservation manager at the Yale University Library. He was able to track down a person who used to rebuild and sell those particular computers. Three months later, the man wrote that he’d managed to arrange for one to be donated to the cause. Cochrane paid the shipping cost, waited hopefully, and at last unwrapped the ponderous, beige-plastic-encased monitor with its separate keyboard and processor unit. (There was no mouse. The system was too old.) Then the moment of truth: he slotted in a disk—and found that several keys on the keyboard, the only means of opening the disk drive, had broken some time before.

Without many people noticing, over a relatively short period of time it has become almost impossible to access information in many formerly common formats. And it isn’t just quaint old floppy disks or once-flashy Zip drives; many laptops don’t even have CD drives anymore. Data saved on the Web are not necessarily safer. Millions of sites on GeoCities, a hosting site popular in the early days of the Web, were deleted in October of 2009, with people scrambling to collect relics before they were gone. Often there isn’t any warning that the ephemeral nature of the World Wide Web—which was invented as an information-sharing service, not a museum—is going to assert itself. MySpace, the social networking site that saw the birth of many music groups, revealed last year that it had lost more than a decade’s worth of files, possibly as many as 53 million songs, some of which may have existed nowhere else.

History’s record has always been partial. As any historian will tell you, we have little idea of the internal life of a ninth-century peasant, nor has the precise reservoir of knowledge necessary to make Damascus steel survived. Empires fall, libraries burn, and to understand the past we make do with what remains. 
But historians of periods before this last century at least have the benefit of records on stone, on vellum, on photographic paper. The default used to be that if a record was created, some physical, legible artifact remained. We are in the peculiar situation as a civilization that most of our artifacts are not now created in a form that can be read with the naked eye; they require hardware, software, and know-how that, as the story of the dictionary shows, is increasingly fleeting. 

As knowledge of our predicament spreads, data professionals, tech leaders, and journalists have raised the question of whether the late twentieth and early twenty-first centuries will become a digital dark age, a time from which few records will survive for our successors in the centuries and millennia to come. If we do nothing, it seems likely that much of what we write, think, and create will pass out of human knowledge. But a slowly growing team of data devotees, hardware nerds, librarians, and scrappy volunteers is hoping to take the first few steps to keep our era from oblivion.  

In data preservation circles, one commonly cited object lesson of the danger of not thinking ahead—really, of the danger of there not being someone assigned to pay attention—is the 1986 Domesday Project. The project was a commemoration of the 900th anniversary of the Domesday Book, a massive, parchment-paged record made in 1086 at the behest of William the Conqueror. The king commissioned a detailed survey of all landholdings in his kingdom, and the resulting document, bound in five books now held in the National Archives of the United Kingdom, is still perfectly legible.

The 1986 project, conceived by the BBC, was a vast undertaking. Over a million contributors, including children in about 9,000 schools across the UK, were asked to document their hometowns, and their crowdsourced information was bundled together with professional photographs, maps, video tours of historical locations, and the entire 1981 census, and loaded onto a pair of laser discs. Having images, video, and words stored on the same device had never been done before, and with a computer system specially designed and loaded with software to interpret it, the data could be seen by anyone who could afford the (fairly pricey) system.

Years passed. Few people noticed, but by the early 2000s, almost all the Domesday computer systems had been broken or lost. The data—all those snapshots and time capsules of what people thought and knew for a brief moment in Britain, and which had cost 2.5 million pounds to assemble—were inaccessible. The BBC and an academic coalition retrieved some of the information; the BBC also temporarily displayed some data from the discs online. But those projects eventually ended. Their sites are now defunct.

Then, a computer hobbyist in Stockholm named Simon Inns grew interested in the project. Most people think of information as existing in the virtual world of computers, but the truth is that it exists physically, too. It is written on the reflective surface of a laser disc or the magnetic substrate of a floppy disk. Inns realized that if he could find a way to take an accurate enough copy of the information, he could extract the analog data. He figured out how to capture information from a laser disc so that the message the laser sent was fed directly into a computer.

And yet, data is not the entirety of any digital experience: just imagine having each individual frame of a movie and no way to know how to string them together or how fast to move through them. In the case of the Domesday laser discs, the creators had invented a navigation system that was truly ahead of its time. It allowed users to search in a way that was a forerunner to modern search engines like Google (which includes some of the same algorithms). “It had never been done,” Inns says, “before Domesday.”

So Inns built a device called an emulator, a box whose software allows it to impersonate the Domesday computer system. On Inns’s system, the 1986 Domesday discs live again, with their records of schoolchildren’s slang in the British Midlands and their brilliant, early search protocol developed before CD-ROMs, let alone the internet, existed. Inns and his collaborator Chad Page and their fellow enthusiasts have extensively documented their process, and it may seem like the Domesday project has been rescued, at least for now. But Inns warns that this is just a drop in the bucket of what is necessary to keep today’s information from slipping into darkness. At the moment, there is almost nobody in the world explicitly paying attention to what happens to our data. One of the few groups that do, however, is that age-old guild of keen-eyed, long-term planners: librarians.                 

The Yale di Bonaventura Family Digital Archaeology and Preservation Lab is nestled in a red brick building on Winchester Street. For anyone who remembers the early 2000s and before, the hefty, elaborately corded relics that line its shelves and tables are the most unlikely madeleines. Their graphite-colored screens and chunky keyboards recall hours spent playing long-gone games, scrolling through chat rooms, exploring the zany world of the early Web. When the classic smiling computer icon suddenly pops up on the loading screen of the lab’s original Mac, it’s like seeing a childhood friend you haven’t thought of in years. “That’s what I call the digital patina,” says Euan Cochrane, one of the expert caretakers of this room—it’s all the little features, from Clippy the Microsoft office assistant to the ecstatic sigh of a Windows XP machine booting up, that make up our experience of computers. Cochrane and his colleagues are part of the library’s preservation and conservation services department, and they use these devices to transfer material in antique formats into more legible forms. 

In the case of the Native American dictionary and the broken keyboard, Cochrane did eventually get a lucky break. “I Googled the model number,” he says, “and the only hit I found was a guy that collects keyboards in Canada.” That collector graciously agreed to a swap, sending his functioning version to Yale, and with that, Cochrane was able to finally open the disks. The dictionary was there. But now he and his colleagues face another set of problems, the most pressing of which is that there is no way to get the information off the ticking time-bomb of a computer they are running it on. The machine could break at any moment, and there are no cables that can connect it to another device. 

Still, as important as these rescue missions are, they’re only a small part of Cochrane’s job. Much bigger things are going on at the Yale Library to ward off a digital dark age. With start-up grants from the Sloan and the Mellon Foundations, Cochrane and his team are doing something a bit like what Simon Inns did, but on an immense scale. The Yale group is using emulation to enable interaction with data created in thousands of different software programs, on computers no one’s seen in decades, using the original software accessed via a simple web page interface.

Emulation, or mimicking an old system so as to play host to otherwise inaccessible software, has long been an attractive option for data preservation. Many emulation pioneers were lovers of early video games who refused to let them slip into unplayability. But when a would-be designer is emulating software or hardware, they must take care to record everything they do. As part of the Library of Congress’s Preserving Virtual Worlds project, Kari Kraus, a professor of English and information studies at the University of Maryland, and her colleagues gathered and tested emulators made by fan communities (along with eight early video games, including 1980’s Mystery House), for the Library’s collection) But in addition to the software, PDF versions of the ancient manuals had to be entered as well. Emulators can go defunct when the modern computer system they run on becomes obsolete, so they must be maintained over time. And emulation can be made more difficult or even illegal by copyright restrictions on the software, intended to protect the rights of creators who’ve long since walked away.

Still, emulation has emerged as one of the best tools we have. It’s mostly been small-scale so far, but the Yale team aims to change that. “Everyone’s always said it’s too complicated, too expensive, and hard to scale,” Cochrane says. “We disagree.” In 2017, the team received two million-dollar grants to help scale up their emulation projects. The goal is to have more than 3,000 pieces of software—from old word-processing software like WordStar to early versions of Sibelius, a music composition tool—able to run within a web browser.

With the library’s catalog open, Cochrane clicks on a file. When it opens, it’s like a hole to 1995 has appeared. Music plays, eerie and MIDI-esque. You are in a dark chamber, its walls covered with scribbles. Hidden in each object you click there is a snippet of audio or text or video, and you can move from room to room, in this game-cum-interactive artwork made by artist Laurie Anderson early in the CD-ROM age. The game, called Puppet Motel, is an artistic relic of an earlier time, and like so much else, it had fallen into that hole of “too hard to access.” Now anyone who can get on the Yale system can play Puppet Motel as easily as Candy Crush, and be reassured that what they are seeing is a faithful copy of what its first users would have seen nearly 25 years ago.

“There’s a whole lot of work that goes into that, thousands of pieces of software that we need to preconfigure and document,” Cochrane says of the project. “But that’s the general idea.” Christine McCarthy, the director of Preservation and Conservation Services, notes that it was only in 2018 that the team’s work began in earnest. “We’ve made a tremendous amount of progress in a very short amount of time,” she says.

As the technological work gathers steam, a nascent national network of institutions is using what the Yale team has built in order to maintain their own access to digital data. When authors leave their papers to a university today, the bequest may include their entire email inbox, audio and text files in outmoded formats, and any number of other digital objects. With a large-scale emulation system, it should be possible to deal with these strings of data simply: they will be accessible to researchers without the need to seek out a special digital preservation lab, or to hope that the latest version of the software is still compatible. 

Of course, emulation doesn’t save the data in and of itself. Hard drives consistently fail, McCarthy and Cochrane point out—though a lot of information in preservation contexts is kept on magnetic tape, which has a much longer life as long as the devices for reading it are maintained. The Yale Library is starting to grapple with the fact that in the future, a great deal of its collection will need to be moved from one storage medium to another on a regular basis. The library will have to consider the continuing costs of replacing hard drives, maintaining tape readers, and supplying electricity, McCarthy explains. It’s a shift, she says, from capital expenditure to operating expenditures, but a worthwhile one.

“It’s a big change for the library,” she says. But with funding, “it can be done. We can do it.”

In an age when so much information is produced every day, it’s easy to live in a kind of unrealized optimism that maybe everything will still be there when we need it—or to swim in the fear that nothing will survive. Neither of those, it turns out, is true. It’s within our grasp as a society to hang on to our records; we’re not fundamentally different from any other era in human history. It just takes a different process and a little more thought.

And thought is what is needed at the highest level. Richard Whitt, a former managing director at Google, wrote in a 2017 paper that—while libraries and other groups that selectively save data should be funded and supported—if society means to save on a grand scale, what’s required is a systematic method for protecting digital information. There should be some way, built into the bones of the Web and of software, for data to be collected and kept. “Unless you can deal with it at the source,” he told me, currently “there’s no way to catch it on the other end.”  One such option would be an open standard for data preservation, a public explanation of what designers and programmers can build into their work that will allow what’s made with it to be saved for the long term.

Openness—in software terms, that means publicly available and nonproprietary—is a big deal in data preservation, because quite a lot of information we might like to save is in formats that no one is allowed to break into and mimic. Data on Instagram and Facebook, for instance, would be very hard for emulation experts to save. Furthermore, when data is retrieved, if it was produced under a legal agreement that restricts its movement, that can effectively doom it over the long term. You still can’t see the data from the 1986 Domesday discs anywhere online. For legal reasons—even after having gone to all the work to get the data back—Inns can’t give them to anybody else, unless they themselves happen to own the original laser discs.

Another way of dealing with the problem proactively might be to think of data management as a new kind of fiduciary responsibility, Whitt suggests. An investment adviser or an accountant has an almost sacred duty to act for the benefit of their client, a duty that overrides their own interests and that makes them trustworthy in a way that transcends the average business relationship. What if, Whitt asks, there were entities tasked by their clients to take on a similar duty concerning data? Not the way Dropbox or Time Machine store data today, but something much more secure and ultimately lasting. In the same way that librarians save books, manuscripts, and information for future generations, these fiduciary entities would save our data for us. Part of the mission of the nonprofit Whitt created after he left Google, called GLIA Foundation, is to consider how such a system might be developed. These are grand, difficult visions, Whitt is quick to point out. But nothing ventured, nothing gained.

While change on the level of society may be needed to save data comprehensively, you can take steps to be your own digital archivist (see sidebar). And you can submit Web pages, books, audio, and video to the Internet Archive—a San Francisco nonprofit founded in 1996 that aims to preserve fleeting parts of the Web—simply by dragging and dropping files on their site. The Internet Archive has a glorious mix of antique credit-card sites, cantankerous restaurant owner’s declarations, and the front pages of venerable Pulitzer-winning newspapers now fallen before the ravages of economics: the trivial and the profound collected in a capacious attic trunk. “Preservation is not a binary,” says Kari Kraus, the University of Maryland professor who helped save early video games. “It’s always somewhere on a spectrum.” Save a little, and you’ll be doing more than what might have been done otherwise.

Kraus points out, furthermore, that the fondness we feel for our old data and games and long-forgotten software or hardware is a real driver of their rescue. “You can see how nostalgia and the affective role of one’s claim to the past serve as important almost-guarantors of preservation,” she reflects. “That psychological component is going to play a critical role.”

Indeed, in Yale's Digital Archaeology and Preservation Lab, your own sense of nostalgia can surprise you. Examine an early PC’s keyboard button inscribed with a triangle, and you can’t remember what it did—but the places and the times when striking that key was something you did every day float in memory. The past has a kind of claim on you, whether it’s written down on paper or vellum or saved in the bits of a magnetic disk.

Technology and what we make with it don’t stand in isolation from us, a faceless void of numbers and code; it’s an intimate part of who we are and how our lives unfurl. And that may be its saving grace. 

Wikimedia Commons.

Wikimedia Commons.

View full image


  • Maurice M Margulies
    Maurice M Margulies, 3:22pm May 07 2020 | Ico flag Flag as inappropriate

    I have always been concerned about availability of digital data as programs, storage media and devices change. I have floppies that go back to 1986 that I cannot read because I do not remember the program used and lack of appropriate device. I should've saved my Commodore 64, as well as my 2000 iMac. Analog is much more available.I have family photographic prints that go back to the 1880;s and are still in rather good condition. Also, I have shellac recordings from before WW I which are usable with my 78 rpm, 45 rpm and 331/2 rpm turntable.

  • Jean-Pierre de Villers
    Jean-Pierre de Villers, 12:26pm May 19 2020 | Ico flag Flag as inappropriate

    Disks and Floppy disks are difficult to play on some of our machines today. What about video tapes made in the 60s like Sony's I/2 inch tapes. Little marvels were recorded on them. Nothing to do. they sleep in my basement. I was also very surprised to read in the New York Times of October 25, 2019 that the United States' nuclear arsenal will no longer rely on computer systems that use eight-inch floppy disks. The system called Strategic Automated Command and Control System, or SACCS, still in use today, has decided not to use floppy disks anymore. I juts threw mines to the waste basket!!

The comment period has expired.