About DATA and Literary Informatics
DATA is an acronym for 'digitally assisted text analysis'. The operative word here is 'assisted'. There is no claim that doing things digitally is a finer or cooler thing than plain old reading. There is merely a claim that in some situations the digital manipulation of texts comes in handy and lets you do things that otherwise would take much longer or might be impracticable altogether. Whenever it does, texts for a while become 'data' — a word that grates on the humanist's ear, even though it is a perfectly good and simple Latin word for the 'given'. I have been told that quite recently a French medievalist said to his PhD student that "L'ordinateur est un instrument de déshumanisation de la recherche et de la désincarnation du vivant." But if we speak of the 'disincarnation of the living', reading and writing are much greater sins than digital textuality — a fact known not only to Plato in the Phaedrus but to Shakespeare's peasant rebel Jack Cade in his indictment of Lord Say:
Thou has most traitorously corrupted the youth of the realm in erecting a grammar school ... It will be proved to thy face that thou hast men about thee that usually talk of a noun and a verb and such abominable words as no Christian ear can endure to hear. (2 Henry VI 4.7.30-37)
Whatever doubts one may harbour over the digital manipulation of texts, it will not do to think of it as a technologizing of something that should not be technologized: the written word has always already been technologized, and the distance between the written and the spoken is much more consequential than the distance between the printed and the digital word. It is more a matter of a familiar vs. an unfamiliar technology, with the attendant calculus —tacit or explicit — of what is quite literally 'worthwhile'.
To think of text as DATA in terms of this acronym is to move into the realm of Literary Informatics. This not yet a common term of art. A Google search retrieves just nineteen hits, and half a dozen of them refer to my use of it. I did not however, coin the term. More common is the term Cultural Informatics (5,900 hits), but this pales before Bioinformatics and its variant spellings, which add up to some 13 million hits. Note that this paragraph about Literary Informatics is itself an instance of it, although a very simple-minded one.
If you think of Literary Informatics as an intriguing topic it may be more helpful to explore its relations with Bioinformatics than to think of it as a subset of Cultural Informatics. The reason is that much of Bioinformatics is a very peculiar form of text analysis, where the Book of Life is imagined as a very large text written in an alphabet of the four letters A, G, C, T, which stand for the building blocks of DNA. A human genome is a text of ~six billion such letters or 'base pairs' as the biologists call it.
Can one usefully think of a 'cultural genome'? Some years ago, Peter Robinson published an article in Nature (Aug. 27, 1998) that describes the use of phylogenetic software to trace the relationships of the 58 different manuscripts of The Wife of Bath's Prologue. The approach is rooted in the family tree as a fundamental model of thinking in the biological and philological realm. This particular example comes from the highly specialized and technical subdiscipline of textual criticism, but there may be broader ways in which literary scholars in their different ways with texts and genres can learn from the biologist's ways with genomes and species.
Martin Mueller
Professor of English and Classics
