We have a pathetically tiny corpus of texts that predate the Roman collapse, and fewer still from before the Bronze Age Collapse. The ones we do know generally survived because they were recopied more often than the rest. Most of the works we have from classical antiquity derive from copies made in Charlemagne’s era, and countless more are referenced that have never been found. Aristotle’s Poetics are a good example — the volume on tragedy survived by blind luck via an Arabic translation, and the volume on comedy is lost forever (ecce Eco). Confucius’ Analects only survived the Qin dynasty because someone hid a copy behind a wall, most of his contemporaries’ work having been burned and buried along with said contemporaries their own selves.
Human history is rife with examples of literary canons largely destroyed by the simple attrition of civlizations rising and falling in their usual messy ways. The things that survive various Yugas Kali are obsessively copied and recopied, like the Masoretic Text. Modern technology means many, many orders of magnitude more copies of modern data, which one might think bodes well for their survival. The new problem is encoding.
Linear A documents are indecipherable, since the Myceneans just wrote their own archaic Greek in the old alphabet and the subsequent Greek dark ages forgot writing altogether. So when you pick up a Linear A tablet, you lack the techniques required to read it, even though the medium has survived for millenia. Minoan decoder rings are not common these days.
Modern information storage does not survive for millenia, as Brewster Kahle would affirm. But say it did. If you find a thousand-year-old hard drive in the 31st century (unwittingly used as a brick in a 24th-century temple, say), how do you decipher it? Assuming you have reinvented the scanning electron microscope, I mean. Once you’ve destroyed a few hundred figuring out just how we stored data back at the dawn of the American Empire, all you have is a sequence of bits. You might, if you’re clever, figure out that ASCII corresponds to the Latin alphabet. You’d be much less likely to figure out, say, Unicode from first principles.
Now how many old DVDs would it take you to derive the DVD video format standard? Won’t be a problem for long; they’ll only last a couple hundred years in perfect storage, and you’d have to keep 200-year-old machinery in working order to recopy them… I’m sure you could build a DVD player from the specs, but they’re probably stored in whatever CAD format was in vogue at the time.
Lots of contemporary data, lots of the accumulating history of our time, is stored in ways that require special programs to decipher — proprietary file formats, faddish databases… Urbit, heaven forfend. These programs are stored on the same essentially ephemeral media as the data itself. Losing a text is not a matter of forgetting an alphabet over a thousand years, it’s a matter of forgetting an obsolete program over a decade. Ever tried to read a WriteNow file from 1987 you stored on floppy? Even better, the programs you need might run on architectures that haven’t existed for a long time; can you read TERNAC assembly?
Perhaps you can find an intact binary for that CAD program to read the specs to build a DVD player on its… original DVD installation media! Good luck finding the source code, that was a trade secret and there weren’t many copies. And then you need a computer that will emulate a computer old enough to emulate a computer old enough to run it, of course.
Continual recopying takes effort and energy. Even if there is no collapse — and I challenge anyone to find a Holocene millenium in which there was nothing that deserves the name collapse — much falls by the wayside. Most early silent films are already lost forever. Without Alan Lomax most early American folk music would be lost, and without the Internet Archive much of the early web… Empirically, most information anywhere gets discarded.
(A book that builds its own translator is called a genome.)