HY:n kielitieteet maailman huipulle QS:n vertailussa

Helsingin yliopiston filosofian ja kielitieteen oppiaineet kipusivat tänä vuonna yhä korkeammalle QS:n kansainvälisessä alakohtaisessa vertailussa. Molemmat ovat maailman 50 parhaan joukossa.

Helsingin yliopiston filosofian oppiaine sijoittui tänä vuonna sijalle 30 (viime vuonna 46) ja kielitieteet sijalle 43 (viime vuonna 51).

Nykykielet, Englannin kieli ja kirjallisuus ja historia säilyttivät asemansa 100 parhaan joukossa. Uutena vertailussa oli tänä vuonna mukana arkeologia. Myös se sijoittuu välille 51–100.

Lue lisää

Puhetta työstetään paloina

Miten ymmärrämme jatkuvaa puhevirtaa, vaikka työmuistimme kapasiteetti on pieni? Tätä selvitetään nyt lingvistien ja neurotutkijoiden yhteishankkeessa. Vastaus saattaa löytyä kielellisestä palastelusta.

Anna Mauranen toivoo, että tutkimushankkeen tulokset tuovat apua kielellisten häiriöiden diagnosointiin ja kielten oppimiseen. Kuva: Mika Federley

Anna Mauranen toivoo, että tutkimushankkeen tulokset tuovat apua kielellisten häiriöiden diagnosointiin ja kielten oppimiseen. Kuva: Mika Federley

Lue lisää hankkeesta humanistisen tiedekunnan verkkosivuilla

The Criminalized Poor and the Rise of Mega-Corpora

The poor have been degraded and criminalized in texts of all genres throughout the ages through denigrating adjectives and descriptives. Professor Tony McEnery from Lancaster University provides us with a unique and chilling view on how linguistic changes portray the changing treatment of the poor during the 17th century.

Tony McEnery. Photo: Tanja Säily.

Tony McEnery. Photo: Tanja Säily.

Criminalizing the Poor in the 17th Century

In October 19-22, the Research Unit for Variation, Contacts and Change in English (VARIENG) organized the D2E – From data to evidence conference, gathering scholars from the fields of English language studies to discuss how big data, rich data, and uncharted data can affect, enhance, or hinder linguistic research.

The first plenary of the conference was given by Professor Tony McEnery, speaking on the use of corpora in socioeconomic studies of the treatment of the poor through a linguistic lens. Professor McEnery’s speech provides a fascinated view on how large corpora can be used to provide a sociolinguistic approach both on the use of derogative terminology and the changes that happen decade by decade.

McEnery’s team explored over a billion words of writing from the 17th century through the Early English Books Online (EEBO) corpus, which includes nearly every piece of literature printed in the UK, Ireland and British North America from the 15th to the 18th century. The team identified the most common words used to identify the poor, examining their use in the texts to uncover patterns of meaning denoting the linguistic socio-economical treatment of the poor during the 17th century.

McEnery provides an enticing case by examining the evolution of terms such as rogue, beggar, vagrant, and vagabond in the 17th century, as well as the language associated with the words. The study paints an interesting image of how literary and religious texts treated the terms and which modifiers were used with them.

Beggar, for example, was typically modified by adjectives denoting the understanding of their poverty, such as poor, needy and miserable. There are few negative denotations in the beginning of the 17th century – until this changed sharply during the second decade, when the word sturdy became to be commonly attached to beggar to portray them as able-bodied people who choose to not work but beg. Likewise, drunkenness starts to be attached to them in the 1620s. Similarly, vagabonds were associated with negative modifiers or close-proximity words such as vile, loose, and whore, and rogues with close association to cheating, lying, and villain.

McEnery’s results demonstrate the prejudices felt against the poor in our societies across the ages, providing us with an unsettlingly clear map on the frequency, dispersion, and connectivity of negative vocabulary used to create the negative semiotics of poverty, mapping the changes decade by decade through the 17th century.

Mega-corpora methodologies

In recent years, the evolution of corpora have provided scholars with unprecedented access to texts in vaster amounts of text masses than ever before. The rise of mega-corpora has been both a curse and a blessing – their vast sizes have meant the ability to routinely utilize texts on scales never seen before, but their inclusiveness has brought additional challenges with contextualization. When the boundaries of a corpus are not clearly mapped, or when a mega-corpus spans across the boundaries of various well contextualized corpora, extra effort is needed in maintaining the representability of the data.

According to McEnery, the methods of the study deserve even more scrutiny than the case results themselves: with his unique perspective in the field, McEnery presents singularly convincing and well-thought insights on working with mega-corpora. Even though the results would have been worth many more plenaries, the essential core philosophy of McEnery’s methodology is both sobering and enthralling. “A corpus is more than a load of text. It needs linguistic tools to strip the essential parts of data.” Linguistic context is everything.

McEnery emphasizes that mega-corpora provide unique opportunities for synergy between the study of history and linguistics. While historians can help linguists by pointing out cultural contexts and frames, linguists can provide them with much needed linguistic context: whether a concept or change in semantics is relevant, and what it could mean within the linguistic frame.

While the vast sizes of modern corpora may sway some researchers to rely on statistics and correlations, McEnery disagreed heavily with such methods. The most important factors in language are change and context, as he emphasizes: “Dynamism is the key…Close reading is the key. No mathematics will tell you what is happening there.” Meaning is never stable, and there is only so much word frequency and connectivity will tell you without a deep reading of the selected text segments.

Methods and results go hand in hand

McEnery’s main warning to academics is in failed contextualization: not necessarily contextualizing the research texts and results, but in contextualizing the research texts with the 20th century methodology used to study them: as our – and our societies’ – conceptual maps change, we must closely examine our own conceptual frames when doing research, as our concepts of meaning may be fundamentally wrong in reading historical texts.

McEnery manages a rare feat: touching both methodological and socio-linguistic issues and bringing up important aspects in both. How we can and should use mega-corpora – and topically, how the ways the poor were criminalized through linguistic means in the 17th century echoes chillingly in our own time.

Text: Mika Loponen

Read more:

Big Data is a great new medium, but it is no silver bullet

For four exciting days in October the d2e: From Data to Evidence conference took place in the University Main Building. As conference assistants, we were able to take part in the adventure and absorb a multitude of information on the trends of current linguistic research.

Photo by Tanja Säily

Photo by Jukka Suomela.

The conference themes were Big Data, Rich Data, Uncharted Data. Their aim was to draw more discussion to the benefits and challenges of past, current and future linguistic research done in the field of corpus linguistics. The conference was hosted by the University of Helsinki and the Research Unit for Variation, Contacts and Change in English (VARIENG).

More is better – also when it comes to data?

In her opening words the director of VARIENG, Terttu Nevalainen, quoted the now almost clichéd truism: “there’s no data like more data” while simultaneously questioning the traditional meaning of “more data”.

In corpus linguistics, researchers seem to be continuously striving for ever vaster datasets and larger corpora. However, it is a worthy reminder that, in addition to big data and its huge volumes, rich data and uncharted data also meet the definition of “more data” in every sense of the expression.

Exploring big data, rich data and uncharted data

The numerous benefits of using big data and how it allows for research into areas and phenomena that were previously inaccessible were at the heart of many sessions. All of the plenary speakers (Mark Davies, Tony McEnery, Päivi Pahta and Jane Winters) as well as many others, such as Antoinette Renouf and Jack Grieve, touched on the topic. They also draw attention to the shortcomings of big data and the importance of close reading – even while working with large datasets.

An innovative example of utilizing rich data was Marie-Louise Brunner, Stefan Diemer and Selina Schmidt’s corpus of Skype conversations, where they had enriched their corpus of informal, academic dialogue by including the video component, orthographic transcription and pragmatic annotations.  This kind of study and corpus can be very useful in studying for example communication in English as Foreign Language and English as Lingua Franca as well as diverse types of multimodal research.

Previously uncharted data, which has not yet been systematically mapped, was also used in inventive ways in several research projects. For example, no one has done such detailed a study as Lucia Siebers on African American letters from the 18th and 19th century. Susanna Mäkinen on the other hand examined how slaves were characterized in the advertising sections of the newspapers in the same time period. She presented her findings on how the terminology and characterization varied in Massachusetts, New York and South Carolina newspapers.

Close reading takes time, money and effort, but is indispensable

In many ways the conference was finished off in similar sentiments as it was opened.

In his demonstration of the process of compiling and editing The Historical Thesaurus of English Marc Alexander touched upon all the three themes of the d2e conference. He reiterated the strong claim heard in many of the sessions and plenaries that big data does not excuse from close reading. His team led by example by having used a bottom-up approach without a predetermined theory that might warp the results while working on the colossal thesaurus, despite it being a huge undertaking.

Alexander reminded the audience that it is our job as experts to convince financiers that spending time getting fully acquainted with your data is a worthwhile use of research hours. Simultaneously researchers themselves must possess the energy and integrity necessitated by such labour-intensive a task.

As Tony McEnery articulated, a corpus is not just a lot of data: the context, the meaning and the structure of the data are of crucial importance. However interesting, voluminous and rich the data, it remains meaningless unless it is contextualised and translated into evidence.

Uncharted data waiting to be discovered

The variety of research topics presented at the conference was remarkable, as was the prevalence and enthusiasm for multidisciplinary collaborations, especially with historians and digital humanities developers.

As seen in this conference, today’s linguists can study a plethora of different topics, ranging from the use of the word “perhaps” between 1500-1850 to Zimbabwe’s current language situation, from Middle English alchemist texts to gender differences in Twitter English in Finland or the use of “pliis” in Finnish discourse.

Due to the relative novelty of the field, there is a whole uncharted world to explore and anyone interested in corpus linguistics will find as many opportunities for research as one could wish for.

Text: Johanna Hirvensalo & Sofia Bergman

More about the d2e conference: