Plagiarism detection algorithms, magazines in digital era and approaches to contexts

On 13 June, we had two speakers from UH, Dr. Mikhail Kopotev and Dr. Saara Ratilainen. Besides, a special guest from St. Petersburg State University, PhD Maria Khokhlova, gave her talk.

Dr. Mikhail Kopotev, a researcher of Russian language, presented Dissernet, a network community of volunteer experts who work against plagiarism in research and educational field with focus on economics, pedagogies, law and history. The community goals at revealing on the one hand politicians, university rectors in dishonesty. On the other hand, academic journals are also their target. According to statistics, dissertations have nowadays less elements of plagiarism than some years ago due to awareness of quick detection.

Defining plagiarism

In his talk, Dr. Kopotev showed linguistic tools for investigating plagiarism. The first tool, ‘disserorubka’ (thesis-grinder), analyses identical chains of symbols and measure distances between them, giving the results of direct copy-paste. Dictionary-based methods are used to detect paraphrasing for example, in terms of nominalization (i.e. changing verbs into nouns). Being pioneer in their field, Dissernet applies a special tool for English, Russian und Ukrainian to find this out if translated plagiarism has occurred. Furthermore, deep revision is indeed carried out to detect plagiarised parts by using distributional semantics, i.e. contexts of words in vectors. There is also a visualizing tool to create a semantic fingerprint for a whole text and fabric networks for visualizing relations between scientists. Dr. Kopotev’s topic generated an interest among us. Legal consequences of plagiarism, impact of community’s work, ways to apply same methods for other purposes were discussed.

The seminar was continued by Dr. Saara Ratilainen who has background in philology and media studies. Nowadays, online magazines are described as linked culture, covering digital market and data. In her research, Dr. Ratilainen investigated the transformation of printed media into digital forms from perspective of algorithmic culture. Two case studies were presented: Afisha Daily, a Moscow-based commercial magazine, and Inde, a Kazan-based online magazine funded by Tatarstan. Ratilainen analysed the magazines and interviewed magazine’s directors and editors.

Having new concepts and dynamic approaches, teams behind magazines were inspired by possibilities digitalization has provided. The study also showed that magazines were seeking after cultural impact despite the time pressure in the web world. They were also aware of other challenges in digital era such as fancy for visualization and videos. Still, viewing website as one channel among others and competing for audience, magazines managed to diversify by using social media and creating platforms such a book festival. After the talk, such questions as defining the authority to evaluate culture and distribution of power were arisen.

Since this DRS seminar closed the first bath, we celebrated it with pizza round-table and announced that next seminars will be held in fall 2018 and their programme is already under preparation. The seminar culminated in presentation on collocations by PhD Maria Khokhlova, a computational linguist. Defining collocations as usual context words around particular word, there are different approaches and tools to measure them including dictionaries, statistical methods, linguistic model etc. PhD Khokholova showed different databases and an instrument called Sketch Engine System for investigating collocations. These linguistic methods interested participants in terms of using them in research. At the end of seminar, it was agreed that a workshop on corpus creation and management is indeed required to support researchers.

Collocations for ‘politician’ in CoCoCo

(Politics of) Digital Humanities in Eastern European Studies

On  10-11 September 2018 in Helsinki, there will be organized a joint workshop of the Herder Institute for Historical Research on East Central Europe and the Aleksanteri Institute (University of Helsinki).

Discourses about the essence of Digital Humanities (DH) became very frequent in the last decade. While digital mega-projects increasingly attract large research funding both on national and on European level, a large number of  questions regarding the added value of DH tools, the robustness of methodological approaches and vulnerabilities of infrastructure remain open.

This workshop – the first of a series on the challenges of DH in Europe, with a special focus on Eastern Europe – takes up a challenge to reflect on ‘digital turn’ in the context of area studies. In doing so, this event formulates questions on concrete strategies, policies and main actors shaping and constructing this field.

In a world going ever more digital, ideas, images and practices necessitate a rethinking and reconceptualization to capture the changes of research methods and infrastructures both at the national and regional levels. To investigate these connections and interdependencies, scholars with methodological and theoretical approaches from various disciplines such as history, art history, political sciences, sociology and digital humanities are invited to submit their proposals.

Venue: Aleksanteri Institute, Unioninkatu 33, Helsinki, 2d floor (meeting room)
Organization and Concept by Eszter Gantner, Daria Gritsenko

Check program here.

Russian sauna is in fact kitchen and other findings during #DHH18

On 23 May – 1 June 2018, at the UH was organized Helsinki Digital Humanities Hackathon #DHH for the fourth time. By bringing together students and researchers of computer science, humanities and social sciences, the aim is to co-operate and conduct multidisciplinary research. #DHH is a unique chance to invent new research methods and implement them. In addition, students learn to formulate research questions, not to mention the relevance of this experience for working life. The hackathon schedule covered discussing research interests, writing scripts, waiting for server to process them, drinking coffee and tea, analysing the results, preparing for presentations and poster session.

This year, there were altogether five groups participating in the hackathon with more than 50 persons involved, among them guest students from ITMO University (St. Petersburg) sharing their knowledge. One of groups was leaded by Dr. Daria Gritsenko and Andrey Indukaev from DRS research group. Gritsenko’s and Indukaev’s team analysed Russia ⇔ Finland. The idea was to study the image of Finns, Finland and Finnish issues in Russian media and vice versa. In the team were engaged participants with different backgrounds: Computational Science, Cultural History, Data Science, Instrumentation Technologies, Russian Language and Literature, Translation Studies.

The data used for the hackathon research included two corpora. The Russian corpus was based on Integrum with both regional (e.g. Delovoy Peterburg) and federal newspapers (e.g. Kommersant). The Finnish corpus was provided by Yle. Both corpora were filtered by words describing Finnish and Russian affairs. Making more than 120.000 articles altogether, it would have been impossible to scrutinize corpora in a week’s time by using only traditional methodological approaches. Hence, several methods of digital humanities were introduced including processes like data cleanup, lemmatization for both languages, topic modelling, defining locations per topic, creating yearly word clouds, distributed representations.

Work in process at #DHH18

As result, the team discovered that the leading topics in Russian media talking about Finland were sports, culture and economy. In Finnish media sports, politics and economy were agenda during the timeline. The number of cities mentioned has grown and geography has widened in Russian regional newspapers and Yle articles by time. For Russian federal newspapers, Finland remains represented only by the biggest cities.

The team was also interested in defining the neighbourhood by searching words neighbour in Russian and Finnish media and their distributed representations (Word2Vec). In Russian media, contexts linked to neighbourhood remained mostly with positive associations such as ‘ally’. Contexts in Finnish articles were less neutral and positive, words such  ‘tension’, ‘threat’ appearing in texts.

While exploring other concepts in their distributional representations, word ‘sauna’ was also checked in corpora. It showed that kitchen has the identical meaning for Russians as sauna has for Finns. The same open relaxed atmosphere.

#DHH18 poster: Rus­sia ⇔ Fin­land