Plagiarism detection algorithms, magazines in digital era and approaches to contexts

On 13 June, we had two speakers from UH, Dr. Mikhail Kopotev and Dr. Saara Ratilainen. Besides, a special guest from St. Petersburg State University, PhD Maria Khokhlova, gave her talk.

Dr. Mikhail Kopotev, a researcher of Russian language, presented Dissernet, a network community of volunteer experts who work against plagiarism in research and educational field with focus on economics, pedagogies, law and history. The community goals at revealing on the one hand politicians, university rectors in dishonesty. On the other hand, academic journals are also their target. According to statistics, dissertations have nowadays less elements of plagiarism than some years ago due to awareness of quick detection.

Defining plagiarism

In his talk, Dr. Kopotev showed linguistic tools for investigating plagiarism. The first tool, ‘disserorubka’ (thesis-grinder), analyses identical chains of symbols and measure distances between them, giving the results of direct copy-paste. Dictionary-based methods are used to detect paraphrasing for example, in terms of nominalization (i.e. changing verbs into nouns). Being pioneer in their field, Dissernet applies a special tool for English, Russian und Ukrainian to find this out if translated plagiarism has occurred. Furthermore, deep revision is indeed carried out to detect plagiarised parts by using distributional semantics, i.e. contexts of words in vectors. There is also a visualizing tool to create a semantic fingerprint for a whole text and fabric networks for visualizing relations between scientists. Dr. Kopotev’s topic generated an interest among us. Legal consequences of plagiarism, impact of community’s work, ways to apply same methods for other purposes were discussed.

The seminar was continued by Dr. Saara Ratilainen who has background in philology and media studies. Nowadays, online magazines are described as linked culture, covering digital market and data. In her research, Dr. Ratilainen investigated the transformation of printed media into digital forms from perspective of algorithmic culture. Two case studies were presented: Afisha Daily, a Moscow-based commercial magazine, and Inde, a Kazan-based online magazine funded by Tatarstan. Ratilainen analysed the magazines and interviewed magazine’s directors and editors.

Having new concepts and dynamic approaches, teams behind magazines were inspired by possibilities digitalization has provided. The study also showed that magazines were seeking after cultural impact despite the time pressure in the web world. They were also aware of other challenges in digital era such as fancy for visualization and videos. Still, viewing website as one channel among others and competing for audience, magazines managed to diversify by using social media and creating platforms such a book festival. After the talk, such questions as defining the authority to evaluate culture and distribution of power were arisen.

Since this DRS seminar closed the first bath, we celebrated it with pizza round-table and announced that next seminars will be held in fall 2018 and their programme is already under preparation. The seminar culminated in presentation on collocations by PhD Maria Khokhlova, a computational linguist. Defining collocations as usual context words around particular word, there are different approaches and tools to measure them including dictionaries, statistical methods, linguistic model etc. PhD Khokholova showed different databases and an instrument called Sketch Engine System for investigating collocations. These linguistic methods interested participants in terms of using them in research. At the end of seminar, it was agreed that a workshop on corpus creation and management is indeed required to support researchers.

Collocations for ‘politician’ in CoCoCo

(Politics of) Digital Humanities in Eastern European Studies

On  10-11 September 2018 in Helsinki, there will be organized a joint workshop of the Herder Institute for Historical Research on East Central Europe and the Aleksanteri Institute (University of Helsinki).

Discourses about the essence of Digital Humanities (DH) became very frequent in the last decade. While digital mega-projects increasingly attract large research funding both on national and on European level, a large number of  questions regarding the added value of DH tools, the robustness of methodological approaches and vulnerabilities of infrastructure remain open.

This workshop – the first of a series on the challenges of DH in Europe, with a special focus on Eastern Europe – takes up a challenge to reflect on ‘digital turn’ in the context of area studies. In doing so, this event formulates questions on concrete strategies, policies and main actors shaping and constructing this field.

In a world going ever more digital, ideas, images and practices necessitate a rethinking and reconceptualization to capture the changes of research methods and infrastructures both at the national and regional levels. To investigate these connections and interdependencies, scholars with methodological and theoretical approaches from various disciplines such as history, art history, political sciences, sociology and digital humanities are invited to submit their proposals.

Venue: Aleksanteri Institute, Unioninkatu 33, Helsinki, 2d floor (meeting room)
Organization and Concept by Eszter Gantner, Daria Gritsenko

Check program here.

Russian sauna is in fact kitchen and other findings during #DHH18

On 23 May – 1 June 2018, at the UH was organized Helsinki Digital Humanities Hackathon #DHH for the fourth time. By bringing together students and researchers of computer science, humanities and social sciences, the aim is to co-operate and conduct multidisciplinary research. #DHH is a unique chance to invent new research methods and implement them. In addition, students learn to formulate research questions, not to mention the relevance of this experience for working life. The hackathon schedule covered discussing research interests, writing scripts, waiting for server to process them, drinking coffee and tea, analysing the results, preparing for presentations and poster session.

This year, there were altogether five groups participating in the hackathon with more than 50 persons involved, among them guest students from ITMO University (St. Petersburg) sharing their knowledge. One of groups was leaded by Dr. Daria Gritsenko and Andrey Indukaev from DRS research group. Gritsenko’s and Indukaev’s team analysed Russia ⇔ Finland. The idea was to study the image of Finns, Finland and Finnish issues in Russian media and vice versa. In the team were engaged participants with different backgrounds: Computational Science, Cultural History, Data Science, Instrumentation Technologies, Russian Language and Literature, Translation Studies.

The data used for the hackathon research included two corpora. The Russian corpus was based on Integrum with both regional (e.g. Delovoy Peterburg) and federal newspapers (e.g. Kommersant). The Finnish corpus was provided by Yle. Both corpora were filtered by words describing Finnish and Russian affairs. Making more than 120.000 articles altogether, it would have been impossible to scrutinize corpora in a week’s time by using only traditional methodological approaches. Hence, several methods of digital humanities were introduced including processes like data cleanup, lemmatization for both languages, topic modelling, defining locations per topic, creating yearly word clouds, distributed representations.

Work in process at #DHH18

As result, the team discovered that the leading topics in Russian media talking about Finland were sports, culture and economy. In Finnish media sports, politics and economy were agenda during the timeline. The number of cities mentioned has grown and geography has widened in Russian regional newspapers and Yle articles by time. For Russian federal newspapers, Finland remains represented only by the biggest cities.

The team was also interested in defining the neighbourhood by searching words neighbour in Russian and Finnish media and their distributed representations (Word2Vec). In Russian media, contexts linked to neighbourhood remained mostly with positive associations such as ‘ally’. Contexts in Finnish articles were less neutral and positive, words such  ‘tension’, ‘threat’ appearing in texts.

While exploring other concepts in their distributional representations, word ‘sauna’ was also checked in corpora. It showed that kitchen has the identical meaning for Russians as sauna has for Finns. The same open relaxed atmosphere.

#DHH18 poster: Rus­sia ⇔ Fin­land


Hack­a­thon presentation on Finnish-Russian media research at #DHH18

The members of the DRS researcher group Dr. Daria Gritsenko and postdoctoral researcher Andrey Indukaev have been supervisors at the Hel­sinki Di­gital Hu­man­it­ies Hack­a­thon #DHH18 which is organized on 23 May – 1 June 2018 . The group leaded by them has been focusing on the way how Finnish YLE represents Russia and Russian regional and federal media (e.g. Kommersant, Izvestia, Delovoy Peterburg) represents Finland. In addition, the group have also taken a look into the YLE news in Russian.

Continue reading “Hack­a­thon presentation on Finnish-Russian media research at #DHH18”

Finding ways to study digital politics

On 11 May, at the Department of Political Governance (St. Petersburg State University) was held a research workshop and a master-class ‘Digital Politics and Post-Network Society’ in which the members of DRS research group participated.

The workshop was started by Dr. Daria Gritsenko,  the founder of Digital Russia Studies from UH, who introduced DRS and presented her view on the process of datafication in Russia in digital era governance and on methodological issues. She is into the questions how to capture a digital governance empirically and what indeed happens when the governance is transforming in digital era. Meeting a great response among the participants, the topics of presentation were largely discussed in the workshop with ideas to collaborate later.

The second speaker was Dr. Elena Tropinova, associate professor from St. Petersburg State University. The object of her research is the government’s ambition to become platform-based. According to Tropinova, government can be compared with a vending machine. The government is visible and meets the need of its citizens. The citizens become data miners and watchers and live in self-regulated IT-ecosystem that is managed both horizontal and vertical. Hence, priorities for the governance are to create strategic management approach. In Tropinova’s view, the challenge is rather lack of digital mentality than digitalization itself.

After Tropinova’s presentation, Dr. Leonid Tomin, associate professor from Saint Petersburg State University, gave a talk on governance in the age of platform capitalism. The ownership of data nowadays reminds colonial conditions while the platform enterprises are moving towards monopoly and the data consumption is growing accelerating. Using the same algorithms as the platform enterprises, how should governance interact in the era of platform capitalism?

The research workshop was continued by Dr. Aleksandr Sherstobitov, associate professor from Saint Petersburg State University. His research is based on policy networks and game theory. In his presentation, a concept of post-networks was introduced. By post-network is understood the new era of networks when networks are no longer ‘flat’ and have a tendency to become rather multidimensional due to lack of stable ties and presence of several actors and nodes.

Discussing modeling cluster

The fifth speaker was Andrey Indukaev from UH, a research group member of Digital Russia Studies and a postdoctoral researcher. His presentation opened up questions of ways to study data economy. In his view, digitalization has the same three dimensions as markets do, digital technologies effects do share similarity with market effects. He is interested in researching what Russian Skolkovo, Rusnano et al. actually are from a perspective of market instruments.

The research workshop was closed by Olga Parhimovich from ITMO University and ‘Informational Culture‘, a Russian NGO with focus on open data and open contents Open data is a machine-readable primary data that is licensed for use and published in the Internet with open access and free availability. In the last years, Russia has improved its placement in the open budget data ranking by providing open data. Parhimovich showed during her master-class how to organize and analyze Russian open budget data via tool called Openrefine.

Russia’s news aggregators reacting to regulation

In the May edition of DRS seminar Dr. Mariëlle Wijermars, a postdoctoral researcher from UH and a DRS research group member, presented her study. She explored the ways to measure the impact of Russia’s news aggregator regulation that entered force in the beginning of 2017.  The aim of regulation was to hold news aggregators liable for the veracity of the news they share. Links to news items that originate from registered media outlets are, however, exempt from liability.  As a result, news aggregators, such as Yandex News, were quick to adept their algorithms to avoid legal claims. Adopted under the pretense of combating the dissemination of fake news, the law thus effectively enables the Russian state to influence the dissemination of news online through already existing media regulation structures. Wijermars analysed to what extent the law creates a mechanism for censoring online news coverage of significant political events. In the talk, an overview of the Russian online news and social media landscape and research on patterns of news consumption among different generations of Russians was also given. Wijermars assessed the measure’s potential impact on online news consumption on the short term, e.g., by narrowing the number of alternative views offered and consumed, as well as the long term implications for the Russian online news landscape.

Screenshot from Yandex News with several media outlets

Counting propaganda and ethical conduct

The April edition of Digital Russia Studies seminar was opened by Dr. Reeta Kangas, a scholar of art history from the University of Turku. She presented A Quantitative Look at the Pravda Political Cartoons of the Great Patriotic War. A decade ago, Reeta collected all political cartoons published during the Great Patriotic War 1941-45 in the Pravda newspaper, counting 185. Her Master’s thesis based on these materials sought to quantify the use of the different themes during the three periods of war by use of crisp set categorization. Yet, today Dr. Kangas is looking for more advanced ways for quantitative analysis of political cartoons. She explores ways of applying digital humanities methods to compositional interpretation, as well as combining context, caption and code into a larger analysis using quantitative and qualitative methods. This is an exciting work in progress and seminar participants had a few suggestions for Reeta – we are looking forward to learn about the progress of this project!

Kukryniksy, 3 November 1944, Spanish-Portuguese neutrality

The second part of the seminar opened up some of the acute questions of research ethics for scholars working with internet forums. Teemu Oivo, a doctoral candidate from the University of Eastern Finland, has made different experiences during his work on Karelianness in Runet discussions about nationalism. While internet forums seem like ‘easy data’ –  open, free, and abundant – they can best be describes as a semi-private sphere that often turns out problematic in terms of research ethics. While people may share their thoughts on the internet, they usually do not think of these posts as a potential object of someone’s research. Hence, informing the users that they have become ‘informants’ in a research project is crucial, as well as obtaining their consent, even if from a legal perspective the data is ‘open’ and freely accessible. Another issue that Teemu has been wresting with is the use of memes as research objects. Memes are fluid and the attribution of intellectual property rights is often complex – or even impossible. Also, they have a tendency to come and go. Creating and curating own web-archives maybe a good way to preserve the memes and the context in which they were captures by a researcher.

Historical Brotherhoods and Quantified Film Music

The March edition of DRS seminar featured two talks by PhD researchers from the University of Helsinki.

The first presenter, Justyna Pierzynska from Media and Communication Studies at the Department of Social Research, asked in her research why historical brotherhoods are such an effective narrative in East European politics. She uses digital materials to discover how such ‘common history’ is being produced and how it becomes popular social knowledge. Comparative analysis of four ‘historical brotherhoods’ – Polish-Georgian brotherhood, Serbian-Armenian brotherhood (Serbia), Serbian-South Ossetian brotherhood (Serbia), Serbian-South Ossetian brotherhood (Republika Srpska) – show an interesting pattern. Brotherhood ideas do not spread to become „common knowledge“ without the mediating elite element!

”Poles and Georgian cousins be – new national truth”
User-generated image, 2010

In the second half, MA Ira Österberg from Aleksanteri Institute who will be defending her PhD dissertation “What Is That Song? Aleksej Balabanov’s Brother and Rock as Film Music in Russian Cinema” on Wednesday, May 9th, 2018, presented her proposal for postdoctoral research project. The idea is to study musical strategies of Soviet cinema by operationalizing film’s use of music along various dimensions. The aim is to find out how much music is in a given film, what types of music are used, how are the music type and positioning of the music connected. Using digital (semi)automated methods, Ira plans to scale her study to uncover how these features change over time. The ultimate aim of this exciting project is the creation of the overall methodology for quantitative film music analysis!

Arto Mustajoki on how to work with Integrum and why we need CCCP

On 2.2.18, professor Arto Mustajoki has held a presentation on the use of Integrum database for research on Russian language and Russia more broadly. While the public transport was not running in Helsinki and we were only four people in the seminar room, twice as many participants joined online. Very nice to see that Adobe provides us a convenient way of enabling participation not only across our campuses but also across the world.

Integrum is the largest proprietary electronic archive of mass media from Russia and the CIS countries. Currently Integrum comprises more than 15000 databases, including newspapers and journals,   news agency texts,  TV and radio texts, online news publications from central and regional mass media, as well as legal texts, together constituting a corpus with over 50 bn words. Full-text archives of many newspapers and magazines date back to the beginning of the 1990s.

While the company, also called Integrum, is a private provider of the database, scholars at the University of Helsinki and everyone logging in from the University of Helsinki has free access to all its mass media collections.

During a live demo, professor Mustajoki showed different search queries that can be performed on Integrum directly, as well as provided a wealth of real examples on the use of the database stemming from his own research. For example, the advanced search functionality of Integrum allowed to find 2500 of a unique ‘semipassive’ voice; discover why people pretend (not) to understand; investigate attitudes towards ‘fashionable words’; and research the objects of Soviet nostalgia.

Professor Mustajoki emphasized that in advancing digital humanities, we need CCCPcooperation, collaboration, curiosity and passion. While computational linguistics and social statistics have a long history, digital humanities seek to cross-cut these existing practices and add new functionality by leveraging the potential of big data.