Martin Mueller's blog
Verbs by prose and verse
Martin Mueller — Wed, 10/21/2009 - 14:24
Verbs are harder to talk about than nouns or adjectives. Especially with the most common verbs, their uses range so widely that differences in distribution are not easily accounted for. You have to dig deeper and look at the nouns and prepositions that give them their distinctive meanings. But that is a task for a later phase.
Adjectives in verse and prose
Martin Mueller — Wed, 10/21/2009 - 11:03
The story of adjectives adds some nuance to the story of nouns in Early Modern drama. If you have an interest in stories told by numbers the first thing to note is that there are more verse markers than prose markers among the adjectives that discriminate sharply between verse and prose. The verse markers belong to a world of human aspiration: 'noble', 'fair', 'high', 'proud', 'gentle', 'strong', 'happy', 'bold', 'just', 'rich'.
Nouns in Early Modern drama by verse and prose
Martin Mueller — Wed, 10/21/2009 - 09:20
A cursory look at the distribution of nouns across verse and prose in the EMD corpus shows that the prosodic differences point to differences in genre and social register. This is hardly surprising, but it is instructive to observe how closely high-level generalizations about genre are mirrored in the low-level fabric of language. Verse with its lead words of 'king', 'blood', 'death' takes us into the world of tragedy.
Conjunctions, prepositions, and wh-words
Martin Mueller — Tue, 10/20/2009 - 21:22
There are about fifty words that occur at least 1,000 times and can be used as conjunctions, prepositions, adverbs or relative pronouns. Very roughly a quarter differ quite sharply in their use. Another quarter differ noticeably. Half do not differ very much. In the table below the words are classified as prose markers, verse markers or neither and grouped by four log value ranges that follow noticeable breaks between the log values. The figures next to each word mark rounded frequencies per 10K words for prose and verse.
Pronouns in EMD prose and verse
Martin Mueller — Tue, 10/20/2009 - 20:53
With some interesting exceptions, personal pronouns are more common in prose, while possessive pronouns are more common in verse. 'Thou' is a strong verse marker, while 'you' is the strongest prose marker among pronouns. 'We' is also more common in verse, but in descending order or discriminatory force, 'they', 'she', 'he', 'it', and 'I' are prose marker.
How word distributions differ in prose and verse: the example of 'a' and 'from' in the EMD corpus
Martin Mueller — Tue, 10/20/2009 - 08:36
The following is a five-finger exercise about the distribution in verse and prose of the indeterminate article 'a' and the preposition 'from'. On the basis of the G-test the former ranks first among words that are more common in prose, the latter second among words that are more common in verse. I deliberately skipped 'thy', which ranks first among words more common in verse precisely because one might have an immediate explanation that even in Early Modern drama 'thy' is already more 'poetical'.
Advice to the TCP: Be more like Project Gutenberg and practice corpus linguistics
Martin Mueller — Sun, 10/18/2009 - 11:35
Yesterday I wrote two long blog entries about the Text Creation Partnership, which very few people will read. This morning I have been pruning my arbor vitae hedge, and suddenly I found a way of putting the gist of those long entries into short advice. The various collections of the Text Creation Partnership will pass into the public domain some time after 2015, when they will be the largest full-text archive of Early Modern English in the world. In the meantime, the texts are freely available to members of the subscriber institutions.
Are the Text Creation Partnership texts sufficiently interoperable?
Martin Mueller — Sat, 10/17/2009 - 13:51
Are the TCP texts sufficiently interoperable to serve the research purposes of its scholarly users? This addresses the second criterion in my definition of retro-digitized texts good enough for scholarly purposes. The answer is 'no'. On the other hand, it would not be very difficult to move them into an interoperable format.
Are the Text Creation Partnership texts good enough for research purposes?
Martin Mueller — Sat, 10/17/2009 - 11:25
Are the Text Creation Partnership texts good enough for research purposes? By 'good enough' I mean two things:
"Fluent in Marlowe": Emily's and Sasha's successful adventures in data curation
Martin Mueller — Sat, 10/17/2009 - 10:23
The following is an excerpt from a report by Emily Anderson and Sasha Puchalla on a course assignment to to check and correct the TCP EEBO transcription of Marlowe's Tamburlaine. They worked from a spreadsheet with a 'verticalized' representation of the text in which every word was a data row containing the spelling, the lemma , the part-of-speech tag, and five words of context preceding and following. This output was generated by Phil Burns' MorphAdorner program. The students were asked to check these data against the EEBO digital page image of the source text.
