375 humanistia: Timo Honkela

Professori Timo Honkelaa on luonnehdittu renessanssi-ihmiseksi. Hän on ihmisläheisen tietojenkäsittelyn osaaja, joka haluaa selvittää kielen, mielen ja yhteiskunnan peruskysymyksiä. Filosofinen pohdiskelija, joka haluaa tekoälyn humanismin palvelukseen. Reflektoiva tutkija, jonka sydän sykkii taiteelle.

Tutustu Timo Honkelaan 375 humanistia -verkkosivuilla

Timo Honkela pitämässä On Knowledge -aiheista esitystä videolle Leppävaaran kirjastossa Espoossa. Kuva: Nelli Honkela

Timo Honkela pitämässä On Knowledge -aiheista esitystä videolle Leppävaaran kirjastossa Espoossa. Kuva: Nelli Honkela

Helsingin yliopiston 375-vuotisjuhlavuoden kunniaksi humanistinen tiedekunta nostaa esiin 375 humanistia. Sivustolla esiteltävät henkilöt avaavat näkymää humanistisen alan yhteiskunnalliseen ja kulttuuriseen merkitykseen sekä tarjoavat esimerkkejä humanistien laaja-alaisesta osaamisesta.

Inauguration of new professors: Jörg Tiedemann

Before their formal appointment, all professors at the University of Helsinki give a public inaugural lecture. The lecture provides an overview of the major issues of the discipline and is aimed at both the academic community and the general public.

Jörg Tiedemann, the new Professor of Language Technology, will give his inaugural lecture on Wednesday 2 December 2015.

His lecture is a part of the Inaugural Ceremony in the Great Hall starting at 4.00 p.m. The title of the lecture is Recycling Data – The Amazing Utility of Human Translations in Language Technology.

Event programme (pdf)

Bio

Jörg TiedemannJörg Tiedemann was born on January 16, 1972 in the small town of Ilsenburg in the heart of Germany. He studied computer science at the Otto-von-Guericke University in Magdeburg from which he received a ”Diplom für Informatik” in 1997. During his undergraduate studies, he spent one year at the University of Southern Colorado in Pueblo, Colorado.

His master’s thesis on the automatic extraction of lexical knowledge from bilingual corpora, which was written during a one-year-visit at Uppsala University in Sweden, paved his way into computational linguistics. He returned to Uppsala after finishing his degree in Magdeburg and started to work as a research assistant at the Department of Linguistics. In 2000, he started his doctoral research at the same department under the supervision of Prof. Anna Sågvall Hein. During that time, he stayed at the University of Edinburgh for one academic year in 2001/2002 before receiving his PhD in Computational Linguistics in 2003 from Uppsala University.

After a short period as university lecturer and program coordinator of the BA programme for language technology at Uppsala University, Tiedemann joined the University of Groningen in The Netherlands in 2004 where he stayed for 5 years as post-doctoral research fellow working on question answering and information retrieval. In 2009, he returned to Uppsala for an appointment as visiting professor in computational linguistics at the Department of Linguistics and Philology, which he held until 2014. In 2015, he worked as a senior research fellow at the same department before moving to Helsinki for the current appointment.

Tiedemann has worked on a number of national and international projects on machine translation, bitext alignment, question answering and information extraction. Since 2004, Tiedemann maintains the World’s largest collection of freely available parallel corpora, OPUS, which is a widely used resource for machine translation, multilingual lexicography and terminology, and translation studies. His recent research interests include discourse-oriented statistical machine translation and the projection of linguistic information across languages. He has published over 100 refereed scientific articles including a text book on bitext alignment and released various well-known open source tools and data resources.

Doctor of Philosophy Jörg Tiedemann was appointed Professor of Language Technology from 1 August 2015.

375 humanistia: Anni Sinnemäki

Apulaiskaupunginjohtaja Anni Sinnemäki haaveilee Helsingin kasvusta ja kaupunkibulevardeista. Tätä ennen Sinnemäki ehti toimia eduskunnassa lähes 16 vuotta. Venäjän kirjallisuuden ja filosofian opinnot antoivat kriittisyyttä, analyysikykyä ja taitoja perehtyä kokonaisuuksiin. Opiskeluaikoinaan Sinnemäki ryhtyi kirjoittamaan sanoituksia Ultra Bra -yhtyeelle. Hän runoilee yhä.

Tutustu Anni Sinnemäkeen 375 humanistia -verkkosivuilla

Kuva: Pertti Nisonen / Helsingin kaupunki

Kuva: Pertti Nisonen / Helsingin kaupunki

Helsingin yliopiston 375-vuotisjuhlavuoden kunniaksi humanistinen tiedekunta nostaa esiin 375 humanistia. Sivustolla esiteltävät henkilöt avaavat näkymää humanistisen alan yhteiskunnalliseen ja kulttuuriseen merkitykseen sekä tarjoavat esimerkkejä humanistien laaja-alaisesta osaamisesta.

375 humanistia: Fred Karlsson

Kuva: Evy Nickström, Hbl.

Kuva: Evy Nickström, Hbl.

Emeritusprofessori Fred Karlsson on mieleltään kielitieteen sekatyömies. Opinalana yleinen kielitiede on universaalisen laaja. Eri kielten professuureja Helsingin yliopistossa on muutamia kymmeniä. Yleinen kielitiede on näiden lisäksi olemassa kaikkia muita 6 900 kieltä sekä kielitieteen teoriaa ja metodologiaa varten.

Tutustu Fred Karlssoniin 375 humanistia -verkkosivuilla

Helsingin yliopiston 375-vuotisjuhlavuoden kunniaksi humanistinen tiedekunta nostaa esiin 375 humanistia. Sivustolla esiteltävät henkilöt avaavat näkymää humanistisen alan yhteiskunnalliseen ja kulttuuriseen merkitykseen sekä tarjoavat esimerkkejä humanistien laaja-alaisesta osaamisesta.

The Criminalized Poor and the Rise of Mega-Corpora

The poor have been degraded and criminalized in texts of all genres throughout the ages through denigrating adjectives and descriptives. Professor Tony McEnery from Lancaster University provides us with a unique and chilling view on how linguistic changes portray the changing treatment of the poor during the 17th century.

Tony McEnery. Photo: Tanja Säily.

Tony McEnery. Photo: Tanja Säily.

Criminalizing the Poor in the 17th Century

In October 19-22, the Research Unit for Variation, Contacts and Change in English (VARIENG) organized the D2E – From data to evidence conference, gathering scholars from the fields of English language studies to discuss how big data, rich data, and uncharted data can affect, enhance, or hinder linguistic research.

The first plenary of the conference was given by Professor Tony McEnery, speaking on the use of corpora in socioeconomic studies of the treatment of the poor through a linguistic lens. Professor McEnery’s speech provides a fascinated view on how large corpora can be used to provide a sociolinguistic approach both on the use of derogative terminology and the changes that happen decade by decade.

McEnery’s team explored over a billion words of writing from the 17th century through the Early English Books Online (EEBO) corpus, which includes nearly every piece of literature printed in the UK, Ireland and British North America from the 15th to the 18th century. The team identified the most common words used to identify the poor, examining their use in the texts to uncover patterns of meaning denoting the linguistic socio-economical treatment of the poor during the 17th century.

McEnery provides an enticing case by examining the evolution of terms such as rogue, beggar, vagrant, and vagabond in the 17th century, as well as the language associated with the words. The study paints an interesting image of how literary and religious texts treated the terms and which modifiers were used with them.

Beggar, for example, was typically modified by adjectives denoting the understanding of their poverty, such as poor, needy and miserable. There are few negative denotations in the beginning of the 17th century – until this changed sharply during the second decade, when the word sturdy became to be commonly attached to beggar to portray them as able-bodied people who choose to not work but beg. Likewise, drunkenness starts to be attached to them in the 1620s. Similarly, vagabonds were associated with negative modifiers or close-proximity words such as vile, loose, and whore, and rogues with close association to cheating, lying, and villain.

McEnery’s results demonstrate the prejudices felt against the poor in our societies across the ages, providing us with an unsettlingly clear map on the frequency, dispersion, and connectivity of negative vocabulary used to create the negative semiotics of poverty, mapping the changes decade by decade through the 17th century.

Mega-corpora methodologies

In recent years, the evolution of corpora have provided scholars with unprecedented access to texts in vaster amounts of text masses than ever before. The rise of mega-corpora has been both a curse and a blessing – their vast sizes have meant the ability to routinely utilize texts on scales never seen before, but their inclusiveness has brought additional challenges with contextualization. When the boundaries of a corpus are not clearly mapped, or when a mega-corpus spans across the boundaries of various well contextualized corpora, extra effort is needed in maintaining the representability of the data.

According to McEnery, the methods of the study deserve even more scrutiny than the case results themselves: with his unique perspective in the field, McEnery presents singularly convincing and well-thought insights on working with mega-corpora. Even though the results would have been worth many more plenaries, the essential core philosophy of McEnery’s methodology is both sobering and enthralling. “A corpus is more than a load of text. It needs linguistic tools to strip the essential parts of data.” Linguistic context is everything.

McEnery emphasizes that mega-corpora provide unique opportunities for synergy between the study of history and linguistics. While historians can help linguists by pointing out cultural contexts and frames, linguists can provide them with much needed linguistic context: whether a concept or change in semantics is relevant, and what it could mean within the linguistic frame.

While the vast sizes of modern corpora may sway some researchers to rely on statistics and correlations, McEnery disagreed heavily with such methods. The most important factors in language are change and context, as he emphasizes: “Dynamism is the key…Close reading is the key. No mathematics will tell you what is happening there.” Meaning is never stable, and there is only so much word frequency and connectivity will tell you without a deep reading of the selected text segments.

Methods and results go hand in hand

McEnery’s main warning to academics is in failed contextualization: not necessarily contextualizing the research texts and results, but in contextualizing the research texts with the 20th century methodology used to study them: as our – and our societies’ – conceptual maps change, we must closely examine our own conceptual frames when doing research, as our concepts of meaning may be fundamentally wrong in reading historical texts.

McEnery manages a rare feat: touching both methodological and socio-linguistic issues and bringing up important aspects in both. How we can and should use mega-corpora – and topically, how the ways the poor were criminalized through linguistic means in the 17th century echoes chillingly in our own time.

Text: Mika Loponen

Read more: