The Criminalized Poor and the Rise of Mega-Corpora

The poor have been degraded and criminalized in texts of all genres throughout the ages through denigrating adjectives and descriptives. Professor Tony McEnery from Lancaster University provides us with a unique and chilling view on how linguistic changes portray the changing treatment of the poor during the 17th century.

Tony McEnery. Photo: Tanja Säily.

Tony McEnery. Photo: Tanja Säily.

Criminalizing the Poor in the 17th Century

In October 19-22, the Research Unit for Variation, Contacts and Change in English (VARIENG) organized the D2E – From data to evidence conference, gathering scholars from the fields of English language studies to discuss how big data, rich data, and uncharted data can affect, enhance, or hinder linguistic research.

The first plenary of the conference was given by Professor Tony McEnery, speaking on the use of corpora in socioeconomic studies of the treatment of the poor through a linguistic lens. Professor McEnery’s speech provides a fascinated view on how large corpora can be used to provide a sociolinguistic approach both on the use of derogative terminology and the changes that happen decade by decade.

McEnery’s team explored over a billion words of writing from the 17th century through the Early English Books Online (EEBO) corpus, which includes nearly every piece of literature printed in the UK, Ireland and British North America from the 15th to the 18th century. The team identified the most common words used to identify the poor, examining their use in the texts to uncover patterns of meaning denoting the linguistic socio-economical treatment of the poor during the 17th century.

McEnery provides an enticing case by examining the evolution of terms such as rogue, beggar, vagrant, and vagabond in the 17th century, as well as the language associated with the words. The study paints an interesting image of how literary and religious texts treated the terms and which modifiers were used with them.

Beggar, for example, was typically modified by adjectives denoting the understanding of their poverty, such as poor, needy and miserable. There are few negative denotations in the beginning of the 17th century – until this changed sharply during the second decade, when the word sturdy became to be commonly attached to beggar to portray them as able-bodied people who choose to not work but beg. Likewise, drunkenness starts to be attached to them in the 1620s. Similarly, vagabonds were associated with negative modifiers or close-proximity words such as vile, loose, and whore, and rogues with close association to cheating, lying, and villain.

McEnery’s results demonstrate the prejudices felt against the poor in our societies across the ages, providing us with an unsettlingly clear map on the frequency, dispersion, and connectivity of negative vocabulary used to create the negative semiotics of poverty, mapping the changes decade by decade through the 17th century.

Mega-corpora methodologies

In recent years, the evolution of corpora have provided scholars with unprecedented access to texts in vaster amounts of text masses than ever before. The rise of mega-corpora has been both a curse and a blessing – their vast sizes have meant the ability to routinely utilize texts on scales never seen before, but their inclusiveness has brought additional challenges with contextualization. When the boundaries of a corpus are not clearly mapped, or when a mega-corpus spans across the boundaries of various well contextualized corpora, extra effort is needed in maintaining the representability of the data.

According to McEnery, the methods of the study deserve even more scrutiny than the case results themselves: with his unique perspective in the field, McEnery presents singularly convincing and well-thought insights on working with mega-corpora. Even though the results would have been worth many more plenaries, the essential core philosophy of McEnery’s methodology is both sobering and enthralling. “A corpus is more than a load of text. It needs linguistic tools to strip the essential parts of data.” Linguistic context is everything.

McEnery emphasizes that mega-corpora provide unique opportunities for synergy between the study of history and linguistics. While historians can help linguists by pointing out cultural contexts and frames, linguists can provide them with much needed linguistic context: whether a concept or change in semantics is relevant, and what it could mean within the linguistic frame.

While the vast sizes of modern corpora may sway some researchers to rely on statistics and correlations, McEnery disagreed heavily with such methods. The most important factors in language are change and context, as he emphasizes: “Dynamism is the key…Close reading is the key. No mathematics will tell you what is happening there.” Meaning is never stable, and there is only so much word frequency and connectivity will tell you without a deep reading of the selected text segments.

Methods and results go hand in hand

McEnery’s main warning to academics is in failed contextualization: not necessarily contextualizing the research texts and results, but in contextualizing the research texts with the 20th century methodology used to study them: as our – and our societies’ – conceptual maps change, we must closely examine our own conceptual frames when doing research, as our concepts of meaning may be fundamentally wrong in reading historical texts.

McEnery manages a rare feat: touching both methodological and socio-linguistic issues and bringing up important aspects in both. How we can and should use mega-corpora – and topically, how the ways the poor were criminalized through linguistic means in the 17th century echoes chillingly in our own time.

Text: Mika Loponen

Read more:

Digitaalisten ihmistieteiden tutkimukselle rahoitusta

Kolme humanistisen tiedekunnan tutkijaa on saanut rahoituksen Suomen Akatemian Digitaaliset ihmistieteet 2015 -haussa.

Terttu Nevalainen ja Taru Nordlund saivat rahoituksen hankkeille konsortioon Tekstin ja rakenteisen tiedon yhdistäminen kielenmuutoksen sosiolingvistisessä tutkimuksessa. Mikko Tolonen on mukana konsortiossa Digitaalinen historiantutkimus ja julkisuuden muutos Suomessa 1640–1910.

Miten kirjeistä välittyy sosiaalisia merkityksiä?

Yksityiset tekstit ovat ainutkertainen ikkuna menneiden aikojen arkiseen kielenkäyttöön. Terttu Nevalainen ja Taru Nordlund tarkastelevat suomen- ja englanninkielisen kirjeenvaihdon pohjalta miten oikeinkirjoitus ja muoto-opilliset valinnat sekä uusi sanasto välittävät sosiaalisia merkityksiä kuten vaikutusvaltaa tai epämuodollisuutta ja miten nämä merkitykset muuttuvat.

Koska nykyiset työkalut eivät helposti mahdollista tekstin yhdistämistä kielen ulkoiseen tietoon, rakennetaan sarja interaktiivisia työkaluja, joiden avulla tutkijat voivat tarkastella kielenkäytön sosiaalista luonnetta yhdistämällä toisiinsa tekstin, metatiedon ja visualisoinnin. Työssä ovat mukana Poika Isokoski Tampereen yliopistosta ja Eetu Mäkelä Aalto-yliopistosta.

Hankkeissa tutkitaan myös lähdetekstien luotettavuutta. Nykyiset kirje-editiot, jotka ovat käytännöllisin lähde mm. historiantutkimuksessa, usein modernisoivat oikeinkirjoitusta. Alkuperäinen muoto on mahdollista tarkistaa käsikirjoituksista; samalla voidaan kartoittaa, mitä piirteitä tyypillisesti muokataan, ja näin tehdä editiot luotettavammiksi kielitieteelliseen tutkimukseen.

Julkisen keskustelun analyysia tekstilouhinnan keinoin

Konsortio Digitaalinen historiantutkimus ja julkisuuden muutos Suomessa 1640–1910 perustuu neljän partnerin, Helsingin yliopiston humanistisen tiedekunnan, Turun yliopiston kulttuurihistorian ja informaatioteknologian ja Kansalliskirjaston Digitointikeskuksen yhteistyöhön.

Hanke tutkii ja arvioi uudelleen suomalaisen julkisen keskustelun laajuutta, luonnetta ja ylirajaisia kytkentöjä vuosina 1640–1910. Hanke yhdistää kaksi toisiaan täydentävää lähestymistapaa, ja pohjautuu toisaalta kirjastojen metadatan, toisaalta digitoitujen suomalaisten sanoma- ja aikakauslehtien tekstinlouhintaan.

Konsortio analysoi, miten kielirajat, eliittikulttuuri ja populaari keskustelu, tekstien uudelleen käyttö ja julkaisujen kanavat olivat vuorovaikutuksessa keskenään. Uutena merkittävänä metodologisena innovaationa ihmistieteissä esitellään avointen, tutkimusongelman erityispiirteet huomioivien tieteellisten laskentakirjastojen käsite ja toimivuus digitaalisen historian keskeisenä tutkimusmenetelmänä.

Suomen Akatemian rahoituspäätökset

Don’t save English, save the dying languages

England may have lost a mediocre cricket player in Peter Trudgill, but the world of linguistics gained a living legend.

Students of the Language Change Database Project course interviewed the noted sociolinguist and his wife, Jean Hannah, over coffee a day before his guest lecture at the Metsätalo building in Helsinki on March 18th. Their questions revolved around Trudgill’s experiences as a student at Cambridge and Edinburgh, his career choices, and the future of linguistics and academia.

In his youth Trudgill did indeed dream of playing cricket for England but ultimately his career in academia progressed quite organically. In hindsight, he said he could not “imagine anything better” and has enjoyed his time at several universities, including the University of Lausanne, Switzerland (home to “the best coffee served at a university”) and the University of Agder, Norway, where he is currently tenured.

Peter Trudgill and his wife Jean Hannah

Peter Trudgill and his wife Jean Hannah

The past is now

Unfortunately, a long career like Trudgill’s might be harder to achieve today than in the 1960s. In his view, one of the biggest challenges facing the modern academic world is zealous business thinking, which hurts the fields that are considered “unprofitable”, such as the Humanities.

Putting business first might prove shortsighted, as resources are needed now if the world wants to document languages that are under the threat of extinction. Studying those languages would shed light on the ways prehistoric languages have developed throughout human history.

Trudgill giving lecture

Trudgill giving lecture (on the left), Professor Terttu Nevalainen (on the right)

The origins and evolution of human languages still remain obscured by lack of data.

As linguists we can only try to hypothesise how prehistoric languages may have functioned and the prevalent way to do that is through the Uniformitarian hypothesis, meaning that by observing languages in the present, we can understand what human languages must have been like in the past.

However, in his lecture titled “The Uniformitarian Hypothesis and Prehistoric Linguistics” Trudgill urged researchers to exercise caution when using the hypothesis to make generalisations. He stressed that linguists need to be mindful of chronological and geographical bias and not build models based solely on languages spoken in modern societies, which are highly atypical in the broader history of humankind. Living in “societies of strangers” where the vast majority of the people do not know each other is a very recent development in human history.

In the absence of time machines

Trudgill does not mean to say that observing modern languages is fruitless or that we should abandon the Uniformitarian hypothesis.

While we and our prehistoric ancestors share largely the same physiology and language faculties, we have no way of gathering any hard data on the prehistoric languages themselves (short of inventing a time machine). Thus we must hypothesise based on the workings of modern languages.

Effectively, Trudgill argues that more attention should be paid to the small, endangered and “remote” languages that are still spoken in a context that strongly resembles what prehistoric societies were like: small and tightly-knit groups of people who all know each other, or “societies of intimates”.

Indeed, research on such languages has yielded surprising nuggets of insight into what our prehistoric ancestors’ languages might have been like. Small communities give rise to language features that seem atypical and exotic if looked at from an Indo-European context, yet such features will most likely have been significantly more commonplace in prehistoric times.

Examples of such features abound in small languages.

One example given by Trudgill is Onya Darat, a language spoken on the island of Borneo, whose system of personal pronouns shows generational affiliation. In other words, their personal pronouns signify whether the addressed person belongs to the same or younger or older generation as the speaker.

Such a feature can only appear in a society where people know each other and are aware of everyone’s ancestry. Thus we can surmise that such complex structures linked to non-anonymity may have existed in prehistoric languages, even if they are exceedingly rare in modern ones.

Audience of Trudgill's lecture on March 18th

Trudgill’s lecture on March 18th

Trudgill likewise asserted during the coffee meetup that more linguists should be engaged in documenting small languages around the world, as they are verging on extinction in this era of globalisation and interconnectedness. Now is our last chance to record many of them for posterity.

Lamenting that he himself had not done more fieldwork in his career, Trudgill half-jokingly encouraged younger linguists to “forget about English” in favour of focusing on conserving these endangered languages. Hearing a Professor of English make such claims might sound strange, but it only illustrates the urgency of such conservationist efforts.

Contributors:

Text: Sofia Bergman and Toni Matikainen

Pictures: Saana Kallioinen and Ina Liukkonen

Interview questions, comments and proofreading: Sanna van Erk-Koivisto, Ida Mauko, Antti Siitonen and Ari Slioor

18 March Peter Trudgill: Sociolinguistic Typology and the Uniformitarian Hypothesis

Professor Peter Trudgill will visit the Department of Modern Languages and give a guest lecture entitled “Sociolinguistic Typology and the Uniformitarian Hypothesis”.

Date: Wednesday, 18th March, 14-16
Venue: Metsätalo auditorium 1 (ground floor)

Prof. Trudgill’s visit is hosted by the Academy project Reassessing Language Change at VARIENG.

Everyone is welcome to attend.

Abstract

One of the fundamental bases of modern historical linguistics is the uniformitarian principle. This principle states that knowledge of processes that operated in the past can be inferred by observing ongoing processes in the present. In this paper I present a sociolinguistic-typological perspective on this issue, where by “sociolinguistic typology” I mean a form of linguistic typology which is sociolinguistically informed and which investigates the extent to which it is possible to produce sociolinguistic explanations for why a particular language variety is like it is.

This work is based on the assumption that there is a possibility that certain aspects of social structure may be capable of having an influence on certain aspects of language structure. I argue that, insofar as the characteristics of individual human languages are due to the nature of the human language faculty, there cannot be any questioning of the uniformitarian principle. We have to assume that the nature of the human language faculty is the same the world over, and that it has been like that ever since humans became fully human. But what about if some of the characteristics of individual human languages are due to social factors?

Finnish-Russian cooperation on lexical typology

Last week, Russian Language and Literature hosted a group of five professors and fifteen students from the National Research University – Higher School of Economics (HSE), Moscow. The Faculty of Philology in HSE is one of the leading centers of theoretical and applied linguistics in Russia, comprising, among other things, three research laboratories.

The purpose of the visit was a three-day seminar on lexical typology. Special attention was given to methodological issues such as using corpora and questionnaires in data collection and developing semantic frames and maps for the analysis and interpretation of cross-linguistic observations. Researchers in HSE have created inventories of specific frames for a number of semantic fields – from temperature and pain to various types of motion – and applied the methodology for a wide range of languages.

Professor Rakhilina

Professor Rakhilina

According to professor Ekaterina Rakhilina, lexical typology can also be studied from the perspective of non-native speakers. For example, when learning a second language, your first language may affect the word choices in the second language. A similar effect is also observed with heritage language speakers, that is, people who live in an environment where their parents’ language is not the dominant language (e.g. Chinese speakers in the United States).

From personal contacts to formal agreement

In the recent past, there has been a lot of cooperation between individual researchers from the University of Helsinki and HSE such as visits, meetings in conferences, and jointly prepared publications.

“The diversity of common research interests allows us to cooperate in many different fields of research, from Russian grammar to sociolinguistics,” says professor Rakhilina.

Now there is also a new agreement between the institutions about student exchange. “I was very impressed by the high quality of the posters and presentations given by the students from HSE,” says professor Ahti Nikunlassi. “In the spring term, two of our doctoral students will participate in the exchange programme, and we are looking forward to meet the first exchange students from HSE at our department in January.”