Discourse analysis meets machine learning

On November 2d, the Digital Russia Studies autumn seminars continued with presentations on the use of machine learning and discourse analysis as a part of mixed methodology. How to conduct qualitative research in the digital era? Which digital tools are available for assisting discourse analysis? Why should we care about machine learning if our goal is ‘close reading’? Our guest MPhil Kristian Lundby Gjerde, a research fellow from the Norwegian Institute of International Affairs, gave a talk on a text analysis app he had created. His aim was to maintain the close reading aspect and apply digital tools at the same time. In our seminar, he presented an app prototype with collection of 10k free license documents from kremlin.ru. The app has a regular expression search engine and calendar view. It automatically adds new published documents to the collection. In addition, it is also possible to integrate PDF files with OCR and merge different text sources. We had a chance to test the app by using an example of Russian memory politics discourse that Kristian has studied. While the app is under development, we cannot wait to see it in the published open-source version!


Another presenter was BA Julia Nikolaenko, our trainee from HSE (Moscow), who is now doing her Master’s thesis on the Ukraine’s political images in the Russian media space. In her research, Julia aims at studying how discourses around the Ukrainian conflict differ between news and lifestyle media using computational text mining methods, such as topic modelling and sentiment analysis. In our seminar, she presented some preliminary results of her on-going research focusing on the lifestyle media Wonderzine,  publishing within a feminist framework including fashion and beauty. She found out that articles mentioning Ukraine that were published in this media had positive tone in all topics. A  wordcloud that was built on word frequencies is mainly about fashion, but human rights and social issues are also visible in the texts. The preliminary analysis suggests that Wonderzine’s framing and conceptual frameworks are not particularly affected by the interstate conflict. The seminar participants discussed the definitions of ideology and choice of media. Julia received fruitful comments and will continue her study based on them.

Wordcloud based on Wonderzine texts mentioning Ukraine, 2014-2018

Discussing Big Data

As part of Aleksanteri Conference ‘Liberation – Freedom – Democracy? 1918 – 1968- 2018’, on 24 October roundtable on Big Data and Area Studies was chaired by Dr. Daria Gritsenko, the founder of DRS network. Experts from Digital Humanities Dr. Mila Oiva (University of Turku), Dr. Jussi Pakkasvirta (UH), Dr. Ekaterina Kalinina  (Södertörn University, Sweden) and M.Soc.Sc Markus Kainu (The Social Insurance Institution of Finland KELA) participated in discussion.

Is big data revolution or evolution?

— It depends on perspective; for whom and for which purposes, says Kainu. Big data is evolution for existing data when moving from micro to macro level. When thinking of methodology, it is also seen as evolution because of qualitative approaches used before. Yet, era of big data can also be generally viewed as revolution. Why? Research data preparations are greatly different from traditional research.

Open science and digitalisation

Participants also brought up some peculiarities of digital humanities research in discussion. Pakkasvirta emphasized that research projects should always be conducted by following principles of open science when possible. For example, Suomi24 forum data project he has been working on is open for free use.

— Supporting digitalisation at transnational level for collaboration projects should take place, comments Oiva. She has been participating in transnational Oceanic Exchange research project with focus on global connectedness of 19th century newspapers. However, the study does not cover, for example, Russian or East European newspapers because only few digitised archive materials are available for research.

Talking about possibilities of big data technologies Kalinina mentioned an interactive crowd-sourced monument project by Yandex, Russian search engine. One of the features of the interactive map created by Yandex is a possibility for the publics to mark the National Yandex map with monuments and other memorial objects. Yandex’ analysts have studied the collected from the users’ data and have noticed several trends that illustrate how people understand cultural heritage sites.

What are challenges of aligning big data and area studies?

— Big data research methodologies could be very helpful, but there is a risk of producing already existing stereotypes, starts Kalinina. That is why it is important to be critical towards to source of data, know your data as well as be ready to employ some “traditional” humanities and social science methodologies.  It is important to know where the data is coming from, how companies form their data. It is also important to understand that data collected might not be representative due to digital divides in terms of access to digital technologies and the their uses.  Another issue is data regulations and data protection laws that might condition the access we have to data for research purposes.

— Ensuring that talented young researchers will stay at academia instead of moving to private or public sector is essential, adds Kainu. Academia should be interested in employing data experts in order to support digital research and see their merits from another perspective.

— For me, the biggest challenge is yet spreading propaganda, says Pakkasvirta. Source criticism lacks often among adults when they read or share something in the Internet with comparison to children who – surprisingly often – are quite aware of it. Hence, the coverage of propaganda in big data can be surprising.

What is the future of area studies in the digital era?

Kalinina wonders whether the use of big data methodologies and big data might lead to the emergence of studies ever more focused on transnational issues given the availability of such data.

—We will always have the context of history of each region, concludes Oiva.

Media framing of internet policy

Mariëlle, you just came back from the #AoIR2018 conference in Montréal. Could you briefly tell us what it was about?


#AoIR2018 is the annual conference of the Association of Internet Researchers that brings together scholars from across the world who study the internet and information technologies, more broadly, from a wide variety of disciplines. This year, the conference theme was ’Transnational Materialities’ and many papers reflected on the importance of, among others, infrastructures. In addition to parallel panel sessions and two keynote events, the conference included various pre-conference workshops, of which I participated in the workshop on ‘Digital Methods in Internet Research’.

That sounds like an exciting programme! What did you present at the conference?


I presented a paper on how governmental policy concerning the internet is framed in Russian mass media. While there is quite a significant body of research on internet governance in Russia, much less is known about the role of the media in shaping public perceptions about the internet and the extend to which it should be regulated by the state. The question how the Russian government strives to create popular support for governmental regulation in this sphere has received little attention. At the moment, I draw upon my previous experience of researching Russian state television and political communication to shed light on this particular aspect of internet governance. In the paper, I focussed on mass media framing of the decision to block popular messaging app Telegram in April 2018. The attempts by government agency Roskomnadzor to block Telegram were not very successful (indeed, the app is still accessible for many Russians), while it did negatively affect many other online services, even outside of Russia. In response, mass demonstrations took place across Russia to protest against the blocking and restrictions of internet freedom more generally. The paper I presented is part of my current project ’Selling Censorship: Affective Framing and the Legitimation of Internet Control in Russia’ that is funded by the Netherlands Organisation for Scientific Research (NWO).

What are the main findings of your research?


The research project is still at an early stage, therefore I presented some preliminary findings based on my analysis of television coverage on the topic on the main Russian state-funded TV channels, and why it is important to study these discourses. The television coverage I examined emphasised how the encrypted communication option offered by Telegram is used by terrorist groups, for example in the preparations of the 2017 terrorist attack in St. Petersburg. Indeed, Telegram’s refusal to provide access to such private communications to the Russian Federal Security Services, in compliance with an anti-terrorist legislation package known as the Yarovaya Law, is the official reason for its blockage. The fact that encrypted messaging is used not only by terrorists and criminals, but also for legitimate purposes such as by investigative journalists is simply ignored or disregarded, as is Telegram’s increasingly important function as a platform for (independent) news. By, instead, creating a black-and-white opposition between privacy and security, any open debate about citizens’ right to private communications is effectively cut short.

That is very interesting! What are the next steps in this project?


The next step in this research is to compare the framing of internet policy in mass media to how it is discussed and legitimised in political debates, for example in the Russian parliament. In particular, I want to find out whether there are significant differences between the types of argumentation and degree of rationality and emotion used in these different spheres. What role do media play in communicating about internet regulation to the general public, and do they add their own frames or simply repeat those developed in policy circles?

Thoughts on Digital Humanities, part II

The workshop ‘(Politics of) Digital Humanities in Eastern European Studies’ was also attended by Felix Herrmann, a research associate IT (Research Centre for East-European Studies, Bremen) and Anastasiya Bonch-Osmolovskaya, an associate professor of School of Linguistics (HSE, Moscow).

Herrmann presented Discuss Data Project that aims to facilitate research data management in East-European Studies. In the project, an independent platform will be created for discussion and peer reviewing of datasets with links to literature and many more features. Bonch-Osmolovskaya, in turn, gave a talk on challenges and advantages of big data sources. She showed how to define context and discourse of a certain subject by combining computational linguistic methods, such as finding collocations, frequencies. Both scholars shared their thoughts on Digital Humanities with us.

Felix Herrmann presenting “Discuss Data Project”

What did you expect from the workshop?

Felix Herrmann: Organising these kind of events are necessary steps in forming Digital Humanities methods. In the field of digitalisation, there is still much to do and knowledge to share.

Anastasiya Bonch-Osmolovskaya: I am pleased to say that the workshop exceeded my expectations; as a result, we not only enjoyed a good organisation, but also got new contacts and effective results.

What are challenges of Digital Humanities?

Felix Herrmann: In Germany, computer literacy is less supported in school education compare to Finland, for example. Regardless of students’ digital nativity, they lack a deep understanding of things behind digitalisation. And again, talented graduates from Digital Humanities transfer into business life instead of remaining at university.

Another challenge in Digital Humanities is a need of transnational funding since most research projects are funded at national level. Yet, it would have been more efficient if funding were transnational with less bureaucracy occurring.

Anastasiya Bonch-Osmolovskaya: Digital Humanities is developing very quickly on a certain unprepared base. Projects are sometimes reminiscent of biology laboratories with experts with various backgrounds. Thus, these projects are very difficult to manage and they require certain organisational skills.

What are perspectives of Digital Humanities?

Felix Herrmann: For the future of Digital Humanities methods in compare to traditional ones, they will remain parallel. Choosing methodology indeed depends on research question.

Anastasiya Bonch-Osmolovskaya: In my opinion, the boundaries between different disciplines will be blurred. Science will be less descriptive and based on evidence. Thus, generalisation will occur through numbers.

What potential does Digital Humanities posses for students?

Felix Herrmann:  What I have observed among students is that the motivation for DH should come from inside. Digital Humanities courses should not be obligatory since it does not bring much. Generally, acquiring basic coding skills is yet an advantage in working life today.

Anastasiya Bonch-Osmolovskaya: We should promote Digital Humanities courses since students who will not attend them may lose some knowledge and important skills as future experts.

Anastasiya Bonch-Osmalovskaya presenting computational linguistic methods

Thoughts on Digital Humanities, part I

On 10-11 September, together with the Herder Institute for Historical Research on East Central Europe (Marburg, Germany) a joint workshop was organised at the Aleksanteri Institute. Digital humanities enthusiasts enjoyed two days of discussing Digital Humanities and networking with colleagues. A following workshop will be held in the future.

How did participants find the workshop? What are challenges and possibilities of Digital Humanities when applying them in Eastern European Studies? What potential does Digital Humanities posses for students? What is the future of Digital Humanities? We interviewed workshop’s presenters Dr. Mila Oiva, a postdoctoral researcher of cultural history (University of Turku) and Misha Melnichenko, a historian and the founder of Prozhito Project. Oiva gave a talk on economic and advertising discourse in the Polish newspaper “Žycie Gospodarcze” between 1950 and 1980. By organising data, using topic modelling and studying collocations, she tracked presence of export strategies in the newspaper. Melnichenko, in turn, coordinates Prozhito Project that goals to digitalise diaries and publish them in the Internet.

Dr. Mila Oiva: The initiative to organise the workshop was prominent and warmly welcomed. I got into applying Digital Humanities methods a few years ago and ever since there has been a demand on academic events with focus on Eastern Europe and Digital Humanities.

The challenges I have faced is that availability of digital resources is still limited and memory politics selective. Moreover, Digital Humanities methods seem to be too quick and easy way to conduct research.  Machine does everything on the behalf of human. Meanwhile, there is a need of deep understanding. How is data actually processed and how does that affect research? There is sometimes a lack of digital skills and knowledge of Digital Humanities methods among scholars. In addition, DH projects need continuity and that should be a part of decision-making strategy when funding them. We still need philosophy of research and research methods should be developed to have best practices. Yet, Digital Humanities methods provide great advantages over traditional methods when big data in question.

What will happen in the future? DH will be more linked with society, with no division between traditional and digital methods. Transnational and interdisciplinary cooperation should increase including cooperation between institutes, such as universities, archives. I highly recommend students to attend courses in Digital Humanities because digital literacy, digital source criticism and understanding of data are vital skills in our society. Knowledge of processing and analysing masses of data will be certainly needed in working life. Moreover, it is an exciting field of research for those interested in academic career.

Mila Oiva presenting her research

Misha Melnichenko: I found the workshop great because of opportunity to meet, network and make cooperation agreements with colleagues. Discussion on best practices of digitalisation was also very fruitful. From my point of view, approaches of Prozhito Project are slightly different from traditional research. We position ourselves as opposite to traditional archives. Most archives have certain conditions under which they take materials for storage. Hence, some material will never be stored and disappear by time. I indeed support data’s availability and information’s free distribution. In our project, content’s accuracy and relevance is less important. Our target audience is generally media but also scholars who might get interested in diaries.

To proceed in our aims, we have used advantages of digitalisation. Starting was the hardest part since the amount of work was huge in the beginning. Luckily, we have many volunteers who are eager to support us; most of them with no academic background. They get instructions for their tasks and we proof their job, from time to time. In addition, students from HSE (Moscow) do their traineeships at our project. For them, contributing to the project has been a source of inspiration. They start to see history from a different point of view; thorough glasses of individual stories instead of general facts. In addition, students acquire essential working life skills in editing texts and processing source materials.

We are interested in widening our map and hope to increase cooperation since we have technical solutions for other languages too. Today we cooperate with the Herder Institute in order to support them with digitalisation.

Round table: Digital Archives in Russia with Marianna Muravyeva (University of Helsinki), Sofia Gavrilova (HSE, Moscow) and Misha Melnichenko (Prozhito Project, Moscow)

Continue reading interview with two other participants Felix Herrmann and Dr. Anastasiya Bonch-Osmolovskayain part II!

Organisers’ view on Digital Humanities: read also interview with Dr. Markku Kangaspuro (Aleksanteri Institute) and Prof. Dr. Peter Haslinger (Herder Institute) in the Russian Media Lab blog.

100 years of ICT

On Friday, 7.9.19, we opened the second season of DRS seminars with two talks that span over 100 years of ICT development in their attempt to answer one question – how can we capture the contingencies of new technologies?

Dr. Brendan Humphreys, a political historian from the University of Helsinki, started his talk by reminding the audience of the fact that as a “boring historian” one has to admit that “nothing is new” – or at least not as new as one may think. “Lenin’s Tweets: the Telegram Seen from the Age of Social Media” is a provocative exploration of the telegram vs the social media, in particular, Twitter. Lenin’s telegrams were short,  quite often aggressive in tone, and they were reported in the mainstream media (newspapers) as a source – not unlike tweets of some politicians today. At the same time, the ‘like and repost’ features enables by the modern technology were not present 100 years ago. Nevertheless, it is useful to think that the politics of short public statements is not peculiar to our digital age. Rather, it has been re-shaped through social media.

The Visiting Fellow at the Aleksanteri Institute and a senior researcher at the Leibniz Institute for Regional Geography Dr. Wladimir Sgib­nev is interested in production of space, peripheral urban regions, and mobility in the post-Soviet area. His current research focuses on the marshrutkas, private urban mini-buses, as a major and highly contested mobility phenomenon throughout the former Soviet Union, which has barely received any academic attention so far. His talk titled “The Dark Side of Digitalisation. Spatial Justice and Informal Transport in the Age of Uber” investigated the impact of digitalisation on marshrutka drivers’ working environments and passenger travel conditions. Building on ethnographic fieldwork in Central Asia, Dr. Sgibnev demonstrated that while digitalisation may be seen as a positive trend that allows to formalise and order the messiness of marshrutkas – in terms of routes, finances, and governance – it may have unexpected consequences for mobility justice.

Bus stop in Turkmenistan (Wikipedia)

Plagiarism detection algorithms, magazines in digital era and approaches to contexts

On 13 June, we had two speakers from UH, Dr. Mikhail Kopotev and Dr. Saara Ratilainen. Besides, a special guest from St. Petersburg State University, PhD Maria Khokhlova, gave her talk.

Dr. Mikhail Kopotev, a researcher of Russian language, presented Dissernet, a network community of volunteer experts who work against plagiarism in research and educational field with focus on economics, pedagogies, law and history. The community goals at revealing on the one hand politicians, university rectors in dishonesty. On the other hand, academic journals are also their target. According to statistics, dissertations have nowadays less elements of plagiarism than some years ago due to awareness of quick detection.

Defining plagiarism

In his talk, Dr. Kopotev showed linguistic tools for investigating plagiarism. The first tool, ‘disserorubka’ (thesis-grinder), analyses identical chains of symbols and measure distances between them, giving the results of direct copy-paste. Dictionary-based methods are used to detect paraphrasing for example, in terms of nominalization (i.e. changing verbs into nouns). Being pioneer in their field, Dissernet applies a special tool for English, Russian und Ukrainian to find this out if translated plagiarism has occurred. Furthermore, deep revision is indeed carried out to detect plagiarised parts by using distributional semantics, i.e. contexts of words in vectors. There is also a visualizing tool to create a semantic fingerprint for a whole text and fabric networks for visualizing relations between scientists. Dr. Kopotev’s topic generated an interest among us. Legal consequences of plagiarism, impact of community’s work, ways to apply same methods for other purposes were discussed.

The seminar was continued by Dr. Saara Ratilainen who has background in philology and media studies. Nowadays, online magazines are described as linked culture, covering digital market and data. In her research, Dr. Ratilainen investigated the transformation of printed media into digital forms from perspective of algorithmic culture. Two case studies were presented: Afisha Daily, a Moscow-based commercial magazine, and Inde, a Kazan-based online magazine funded by Tatarstan. Ratilainen analysed the magazines and interviewed magazine’s directors and editors.

Having new concepts and dynamic approaches, teams behind magazines were inspired by possibilities digitalization has provided. The study also showed that magazines were seeking after cultural impact despite the time pressure in the web world. They were also aware of other challenges in digital era such as fancy for visualization and videos. Still, viewing website as one channel among others and competing for audience, magazines managed to diversify by using social media and creating platforms such a book festival. After the talk, such questions as defining the authority to evaluate culture and distribution of power were arisen.

Since this DRS seminar closed the first bath, we celebrated it with pizza round-table and announced that next seminars will be held in fall 2018 and their programme is already under preparation. The seminar culminated in presentation on collocations by PhD Maria Khokhlova, a computational linguist. Defining collocations as usual context words around particular word, there are different approaches and tools to measure them including dictionaries, statistical methods, linguistic model etc. PhD Khokholova showed different databases and an instrument called Sketch Engine System for investigating collocations. These linguistic methods interested participants in terms of using them in research. At the end of seminar, it was agreed that a workshop on corpus creation and management is indeed required to support researchers.

Collocations for ‘politician’ in CoCoCo

(Politics of) Digital Humanities in Eastern European Studies

On  10-11 September 2018 in Helsinki, there will be organized a joint workshop of the Herder Institute for Historical Research on East Central Europe and the Aleksanteri Institute (University of Helsinki).

Discourses about the essence of Digital Humanities (DH) became very frequent in the last decade. While digital mega-projects increasingly attract large research funding both on national and on European level, a large number of  questions regarding the added value of DH tools, the robustness of methodological approaches and vulnerabilities of infrastructure remain open.

This workshop – the first of a series on the challenges of DH in Europe, with a special focus on Eastern Europe – takes up a challenge to reflect on ‘digital turn’ in the context of area studies. In doing so, this event formulates questions on concrete strategies, policies and main actors shaping and constructing this field.

In a world going ever more digital, ideas, images and practices necessitate a rethinking and reconceptualization to capture the changes of research methods and infrastructures both at the national and regional levels. To investigate these connections and interdependencies, scholars with methodological and theoretical approaches from various disciplines such as history, art history, political sciences, sociology and digital humanities are invited to submit their proposals.

Venue: Aleksanteri Institute, Unioninkatu 33, Helsinki, 2d floor (meeting room)
Organization and Concept by Eszter Gantner, Daria Gritsenko

Check program here.

Russian sauna is in fact kitchen and other findings during #DHH18

On 23 May – 1 June 2018, at the UH was organized Helsinki Digital Humanities Hackathon #DHH for the fourth time. By bringing together students and researchers of computer science, humanities and social sciences, the aim is to co-operate and conduct multidisciplinary research. #DHH is a unique chance to invent new research methods and implement them. In addition, students learn to formulate research questions, not to mention the relevance of this experience for working life. The hackathon schedule covered discussing research interests, writing scripts, waiting for server to process them, drinking coffee and tea, analysing the results, preparing for presentations and poster session.

This year, there were altogether five groups participating in the hackathon with more than 50 persons involved, among them guest students from ITMO University (St. Petersburg) sharing their knowledge. One of groups was leaded by Dr. Daria Gritsenko and Andrey Indukaev from DRS research group. Gritsenko’s and Indukaev’s team analysed Russia ⇔ Finland. The idea was to study the image of Finns, Finland and Finnish issues in Russian media and vice versa. In the team were engaged participants with different backgrounds: Computational Science, Cultural History, Data Science, Instrumentation Technologies, Russian Language and Literature, Translation Studies.

The data used for the hackathon research included two corpora. The Russian corpus was based on Integrum with both regional (e.g. Delovoy Peterburg) and federal newspapers (e.g. Kommersant). The Finnish corpus was provided by Yle. Both corpora were filtered by words describing Finnish and Russian affairs. Making more than 120.000 articles altogether, it would have been impossible to scrutinize corpora in a week’s time by using only traditional methodological approaches. Hence, several methods of digital humanities were introduced including processes like data cleanup, lemmatization for both languages, topic modelling, defining locations per topic, creating yearly word clouds, distributed representations.

Work in process at #DHH18

As result, the team discovered that the leading topics in Russian media talking about Finland were sports, culture and economy. In Finnish media sports, politics and economy were agenda during the timeline. The number of cities mentioned has grown and geography has widened in Russian regional newspapers and Yle articles by time. For Russian federal newspapers, Finland remains represented only by the biggest cities.

The team was also interested in defining the neighbourhood by searching words neighbour in Russian and Finnish media and their distributed representations (Word2Vec). In Russian media, contexts linked to neighbourhood remained mostly with positive associations such as ‘ally’. Contexts in Finnish articles were less neutral and positive, words such  ‘tension’, ‘threat’ appearing in texts.

While exploring other concepts in their distributional representations, word ‘sauna’ was also checked in corpora. It showed that kitchen has the identical meaning for Russians as sauna has for Finns. The same open relaxed atmosphere.

#DHH18 poster: Rus­sia ⇔ Fin­land


Hack­a­thon presentation on Finnish-Russian media research at #DHH18

The members of the DRS researcher group Dr. Daria Gritsenko and postdoctoral researcher Andrey Indukaev have been supervisors at the Hel­sinki Di­gital Hu­man­it­ies Hack­a­thon #DHH18 which is organized on 23 May – 1 June 2018 . The group leaded by them has been focusing on the way how Finnish YLE represents Russia and Russian regional and federal media (e.g. Kommersant, Izvestia, Delovoy Peterburg) represents Finland. In addition, the group have also taken a look into the YLE news in Russian.

Continue reading “Hack­a­thon presentation on Finnish-Russian media research at #DHH18”