William A. Kretzschmar Jr.: “Complex Systems for Corpus and Historical Linguistics”

To continue the theme of corpus linguistics and digital humanities, the VARIENG Research Unit is hosting

A guest lecture by Professor William A. Kretzschmar Jr. (University of Georgia, University of Glasgow, University of Oulu) on “Complex Systems for Corpus and Historical Linguistics”

When: Friday 28 February 2014, at 2 p.m.
Where: Metsätalo lecture room 14 (Unioninkatu 40B, 3rd floor).

Everybody is warmly welcome to attend.


As shown in The Linguistics of Speech (2009), the basic elements of speech (i.e., language in use, what people actually say and write to and for each other) correspond to what has been called a “complex system” in sciences ranging from physics to ecology to economics. After a non-technical introduction to the principles of complexity science, this talk will apply properties of complexity to corpus linguistics and historical linguistics.

Complex systems are made up of massive numbers of components interacting with one another, and this results in self-organization and emergent order. For speech, the order that emerges is simply the fact that our use of words and other linguistic features is significantly clustered in the spatial and social and textual groups in which we actually communicate. In both texts and regional/social groups, the frequency distribution of features occurs as the same pattern: an asymptotic hyperbolic curve (or “A-curve”).

These properties are easily observed from corpora, and should guide analyses we make from corpora. In corpus and historical linguistics, first, the scaling property of complex systems tells us that there are no representative speakers, and so our observation of any small group of speakers is unlikely to represent any group at a larger scale—and limited evidence is the necessary condition of many of our historical studies. The fact that underlying complex distributions follow the 80/20 rule, i.e. 80% of the word tokens in a data set will be instances of only 20% of the word types, gives us an effective tool for estimating the status of historical states of the language.

Besides issues of sampling, the frequency-based approach also affects how we can think about change. The A-curve immediately translates to the S-curve now used to describe linguistic change, and explains that “change” cannot reasonably be considered to be a qualitative shift. The Great Vowel Shift, for example, is a useful generalization, but complex systems explains why we should not expect it ever to be “complete” or to appear in the same form in different places. Finally, complexity science helps us to see and understand how English continues to “emerge” around us in the ongoing complex system of our speech, so that any process of “standardization” does not just lead inevitably to Modern English, but must be understood as a limited and highly specialized part of the history of English.


Kretzschmar, William A., Jr. 2009. The Linguistics of Speech. Cambridge: Cambridge University Press.

Narratives of exile – Prof. Galin Tihanovin vierailuluento 26.2.2014 klo 18

Professor Galin Tihanov (University of London, Queen Mary) will visit the Department of Modern Languages at the University of Helsinki to give a lecture


Time: Wednesday, February 26th, 2014 at 6–8 p.m.
Venue: sali 6, Metsätalo (Unioninkatu 40B, 3rd floor)

About the lecturer: Galin Tihanov holds the George Steiner Chair of Comparative Literature at Queen Mary, University of London. He was previously Professor of Comparative Literature and Intellectual History and founding co-director of the Research Institute for Cosmopolitan Cultures at the University of Manchester. His most recent research has been on exile, cosmopolitanism, and transnationalism. He is the author of The Master and the Slave: Lukacs, Bakhtin and the Ideas of their Time (2000) and the co-author of A Companion to the Works of Robert Musil (2010) and Critical Theory in Russia and the West (2011).

Digital Humanities, Shakespeare, and text visualisation: Events in February

Dear all,

In February, Heather Froehlich from the University of Strathclyde, Glasgow will visit the English subject and the Varieng Research Unit at the Department of Modern Languages. Heather studies representations of gender in Early Modern London plays as part of the Visualizing English Print 1470–1800 project, and she works with DocuScope, a text analysis software which provides interactive visualisation tools for corpus-based rhetorical analysis. In her research she applies corpus linguistic methods to Early Modern printed texts and early modern drama, particularly with regard to gender.

On 17–18 February, Heather will give two lectures and a workshop on Digital Humanities, corpus linguistics, and text analysis software – you are warmly welcome to attend! The events will be of interest to linguists, literary scholars, and anyone interested in Digital Humanities, students and researchers alike.

The Twitter hashtag for the events is #daftpunkDH.

See below for abstracts and details. Welcome!

Anni Sairio
Tanja Säily
Terttu Nevalainen


Monday, 17 February, 2–4 pm, Metsätalo (Unioninkatu 40), lecture room 4

”Drag and Drop it, Zip-Unzip it, View It, Code It: What are Digital Humanities?”

’Digital Humanities’ has been gaining much traction as a buzzword in higher education. Are the digital humanities a method, a theory, or a social movement? Or is it a shifting mode of humanistic inquiry wherein traditional humanities work is supplemented by computer-assisted queries? In a roundtable format, we will discuss the role of the digital in, and alongside, more traditional humanities work.

Kirschenbaum’s essay ’What are Digital Humanities and what is it doing in English departments’ (2012) addresses the varying ways that English departments are the ideal space for digitally-inflected humanistic inquiry – but are English departments the only place in which this can thrive? Are English departments the best place for this kind of inquiry? What can the digital humanities learn from their contemporaries in computer science, history, languages? What can these groups learn from digital humanities?

Recommended reading:
Kirschenbaum, Matthew (2012). ’What are digital humanities and what’s it doing in English departments?’. Debates in the Digital Humanities, ed. M.K. Gold. http://dhdebates.gc.cuny.edu/debates/text/38
Underwood, Ted (2011). ’Why we don’t actually want to be the next big thing in literary studies’. The Stone and the Shell. http://goo.gl/T0eCvJ
Wilkens, Matthew (2012) ’Canons, Close Reading, and the Evolution of Method’. Debates in the Digital Humanities, ed. M.K. Gold. http://dhdebates.gc.cuny.edu/debates/text/17


Tuesday, 18 February, 10–12 am, Main Building, auditorium XII

”What Do You Do with Millions of Words?”

With the rise of digitized collections of historical documents, we suddenly have a lot more information than we know how to address as linear readers. This is a practical, hands-on approach to doing digital humanities research: What can a computer tell us that a human reader can’t, and how does the human inform and guide the computer-aided analysis? Building on principles of corpus linguistics, I will introduce DocuScope, a rhetorical analysis software, and address ways of approaching very large corpora with computer-aided analysis. In this talk we will discuss other possibilities for DocuScope in use, such as custom dictionary building, visualization, ways other tools can supplement DocuScope’s output, and data management. Special attention will be given to issues surrounding scaling up from Shakespeare to a corpus of Early Modern drama to all of EEBO-TCP and developing subcorpora, all with regards to the difficulties and benefits of using progressively larger datasets.

Recommended Reading:
Ishizaki, Suguru and David Kaufer (2011). ’Computer-aided Rhetorical Analysis’. Applied Natural Language Processing and content analysis: Identification, Investigation, and Resolution, ed. Philip McCarthy and Chutima Boonthum. Idea Group Inc (IGI), 275–291.
Hope, Jonathan and Michael Witmore (2010). ’The hundredth psalm to the tune of ’Green Sleeves’: Digital Approaches to the Language of Genre’, Shakespeare Quarterly, vol. 61, no. 3 pp. 357–90.
Witmore, Michael. (2010). ’Text, The massively addressable object’. Wine Dark Sea. http://winedarksea.org/?p=926


DocuScope workshop on Tuesday, 18 February, 1–4 pm, Metsätalo (Unioninkatu 40), room A114

In this workshop we will explore DocuScope and its functions by trying it out on the Corpus of Early English Correspondence (CEEC), a letter corpus developed in the Varieng Research Unit at the University of Helsinki: we will find out how the data in these letters can be visualised and what a rhetorical analysis tool can add to standard corpus-linguistic methods.

To learn more about DocuScope, see http://www.cmu.edu/hss/english/research/docuscope.html
To learn more about Heather, see http://hfroehlich.wordpress.com/ and @heatherfro on Twitter.