Russian sauna is in fact kitchen and other findings during #DHH18

On 23 May – 1 June 2018, at the UH was organized Helsinki Digital Humanities Hackathon #DHH for the fourth time. By bringing together students and researchers of computer science, humanities and social sciences, the aim is to co-operate and conduct multidisciplinary research. #DHH is a unique chance to invent new research methods and implement them. In addition, students learn to formulate research questions, not to mention the relevance of this experience for working life. The hackathon schedule covered discussing research interests, writing scripts, waiting for server to process them, drinking coffee and tea, analysing the results, preparing for presentations and poster session.

This year, there were altogether five groups participating in the hackathon with more than 50 persons involved, among them guest students from ITMO University (St. Petersburg) sharing their knowledge. One of groups was leaded by Dr. Daria Gritsenko and Andrey Indukaev from DRS research group. Gritsenko’s and Indukaev’s team analysed Russia ⇔ Finland. The idea was to study the image of Finns, Finland and Finnish issues in Russian media and vice versa. In the team were engaged participants with different backgrounds: Computational Science, Cultural History, Data Science, Instrumentation Technologies, Russian Language and Literature, Translation Studies.

The data used for the hackathon research included two corpora. The Russian corpus was based on Integrum with both regional (e.g. Delovoy Peterburg) and federal newspapers (e.g. Kommersant). The Finnish corpus was provided by Yle. Both corpora were filtered by words describing Finnish and Russian affairs. Making more than 120.000 articles altogether, it would have been impossible to scrutinize corpora in a week’s time by using only traditional methodological approaches. Hence, several methods of digital humanities were introduced including processes like data cleanup, lemmatization for both languages, topic modelling, defining locations per topic, creating yearly word clouds, distributed representations.

Work in process at #DHH18

As result, the team discovered that the leading topics in Russian media talking about Finland were sports, culture and economy. In Finnish media sports, politics and economy were agenda during the timeline. The number of cities mentioned has grown and geography has widened in Russian regional newspapers and Yle articles by time. For Russian federal newspapers, Finland remains represented only by the biggest cities.

The team was also interested in defining the neighbourhood by searching words neighbour in Russian and Finnish media and their distributed representations (Word2Vec). In Russian media, contexts linked to neighbourhood remained mostly with positive associations such as ‘ally’. Contexts in Finnish articles were less neutral and positive, words such  ‘tension’, ‘threat’ appearing in texts.

While exploring other concepts in their distributional representations, word ‘sauna’ was also checked in corpora. It showed that kitchen has the identical meaning for Russians as sauna has for Finns. The same open relaxed atmosphere.

#DHH18 poster: Rus­sia ⇔ Fin­land


Leave a Reply

Your email address will not be published. Required fields are marked *