To continue the theme of corpus linguistics and digital humanities, the VARIENG Research Unit is hosting
A guest lecture by Professor William A. Kretzschmar Jr. (University of Georgia, University of Glasgow, University of Oulu) on “Complex Systems for Corpus and Historical Linguistics”
When: Friday 28 February 2014, at 2 p.m.
Where: Metsätalo lecture room 14 (Unioninkatu 40B, 3rd floor).
Everybody is warmly welcome to attend.
As shown in The Linguistics of Speech (2009), the basic elements of speech (i.e., language in use, what people actually say and write to and for each other) correspond to what has been called a “complex system” in sciences ranging from physics to ecology to economics. After a non-technical introduction to the principles of complexity science, this talk will apply properties of complexity to corpus linguistics and historical linguistics.
Complex systems are made up of massive numbers of components interacting with one another, and this results in self-organization and emergent order. For speech, the order that emerges is simply the fact that our use of words and other linguistic features is significantly clustered in the spatial and social and textual groups in which we actually communicate. In both texts and regional/social groups, the frequency distribution of features occurs as the same pattern: an asymptotic hyperbolic curve (or “A-curve”).
These properties are easily observed from corpora, and should guide analyses we make from corpora. In corpus and historical linguistics, first, the scaling property of complex systems tells us that there are no representative speakers, and so our observation of any small group of speakers is unlikely to represent any group at a larger scale—and limited evidence is the necessary condition of many of our historical studies. The fact that underlying complex distributions follow the 80/20 rule, i.e. 80% of the word tokens in a data set will be instances of only 20% of the word types, gives us an effective tool for estimating the status of historical states of the language.
Besides issues of sampling, the frequency-based approach also affects how we can think about change. The A-curve immediately translates to the S-curve now used to describe linguistic change, and explains that “change” cannot reasonably be considered to be a qualitative shift. The Great Vowel Shift, for example, is a useful generalization, but complex systems explains why we should not expect it ever to be “complete” or to appear in the same form in different places. Finally, complexity science helps us to see and understand how English continues to “emerge” around us in the ongoing complex system of our speech, so that any process of “standardization” does not just lead inevitably to Modern English, but must be understood as a limited and highly specialized part of the history of English.
Kretzschmar, William A., Jr. 2009. The Linguistics of Speech. Cambridge: Cambridge University Press.