Discussing Big Data

As part of Aleksanteri Conference ‘Liberation – Freedom – Democracy? 1918 – 1968- 2018’, on 24 October roundtable on Big Data and Area Studies was chaired by Dr. Daria Gritsenko, the founder of DRS network. Experts from Digital Humanities Dr. Mila Oiva (University of Turku), Dr. Jussi Pakkasvirta (UH), Dr. Ekaterina Kalinina  (Södertörn University, Sweden) and M.Soc.Sc Markus Kainu (The Social Insurance Institution of Finland KELA) participated in discussion.

Is big data revolution or evolution?

— It depends on perspective; for whom and for which purposes, says Kainu. Big data is evolution for existing data when moving from micro to macro level. When thinking of methodology, it is also seen as evolution because of qualitative approaches used before. Yet, era of big data can also be generally viewed as revolution. Why? Research data preparations are greatly different from traditional research.

Open science and digitalisation

Participants also brought up some peculiarities of digital humanities research in discussion. Pakkasvirta emphasized that research projects should always be conducted by following principles of open science when possible. For example, Suomi24 forum data project he has been working on is open for free use.

— Supporting digitalisation at transnational level for collaboration projects should take place, comments Oiva. She has been participating in transnational Oceanic Exchange research project with focus on global connectedness of 19th century newspapers. However, the study does not cover, for example, Russian or East European newspapers because only few digitised archive materials are available for research.

Talking about possibilities of big data technologies Kalinina mentioned an interactive crowd-sourced monument project by Yandex, Russian search engine. One of the features of the interactive map created by Yandex is a possibility for the publics to mark the National Yandex map with monuments and other memorial objects. Yandex’ analysts have studied the collected from the users’ data and have noticed several trends that illustrate how people understand cultural heritage sites.

What are challenges of aligning big data and area studies?

— Big data research methodologies could be very helpful, but there is a risk of producing already existing stereotypes, starts Kalinina. That is why it is important to be critical towards to source of data, know your data as well as be ready to employ some “traditional” humanities and social science methodologies.  It is important to know where the data is coming from, how companies form their data. It is also important to understand that data collected might not be representative due to digital divides in terms of access to digital technologies and the their uses.  Another issue is data regulations and data protection laws that might condition the access we have to data for research purposes.

— Ensuring that talented young researchers will stay at academia instead of moving to private or public sector is essential, adds Kainu. Academia should be interested in employing data experts in order to support digital research and see their merits from another perspective.

— For me, the biggest challenge is yet spreading propaganda, says Pakkasvirta. Source criticism lacks often among adults when they read or share something in the Internet with comparison to children who – surprisingly often – are quite aware of it. Hence, the coverage of propaganda in big data can be surprising.

What is the future of area studies in the digital era?

Kalinina wonders whether the use of big data methodologies and big data might lead to the emergence of studies ever more focused on transnational issues given the availability of such data.

—We will always have the context of history of each region, concludes Oiva.

Media framing of internet policy

Mariëlle, you just came back from the #AoIR2018 conference in Montréal. Could you briefly tell us what it was about?

 

#AoIR2018 is the annual conference of the Association of Internet Researchers that brings together scholars from across the world who study the internet and information technologies, more broadly, from a wide variety of disciplines. This year, the conference theme was ’Transnational Materialities’ and many papers reflected on the importance of, among others, infrastructures. In addition to parallel panel sessions and two keynote events, the conference included various pre-conference workshops, of which I participated in the workshop on ‘Digital Methods in Internet Research’.

That sounds like an exciting programme! What did you present at the conference?

 

I presented a paper on how governmental policy concerning the internet is framed in Russian mass media. While there is quite a significant body of research on internet governance in Russia, much less is known about the role of the media in shaping public perceptions about the internet and the extend to which it should be regulated by the state. The question how the Russian government strives to create popular support for governmental regulation in this sphere has received little attention. At the moment, I draw upon my previous experience of researching Russian state television and political communication to shed light on this particular aspect of internet governance. In the paper, I focussed on mass media framing of the decision to block popular messaging app Telegram in April 2018. The attempts by government agency Roskomnadzor to block Telegram were not very successful (indeed, the app is still accessible for many Russians), while it did negatively affect many other online services, even outside of Russia. In response, mass demonstrations took place across Russia to protest against the blocking and restrictions of internet freedom more generally. The paper I presented is part of my current project ’Selling Censorship: Affective Framing and the Legitimation of Internet Control in Russia’ that is funded by the Netherlands Organisation for Scientific Research (NWO).

What are the main findings of your research?

 

The research project is still at an early stage, therefore I presented some preliminary findings based on my analysis of television coverage on the topic on the main Russian state-funded TV channels, and why it is important to study these discourses. The television coverage I examined emphasised how the encrypted communication option offered by Telegram is used by terrorist groups, for example in the preparations of the 2017 terrorist attack in St. Petersburg. Indeed, Telegram’s refusal to provide access to such private communications to the Russian Federal Security Services, in compliance with an anti-terrorist legislation package known as the Yarovaya Law, is the official reason for its blockage. The fact that encrypted messaging is used not only by terrorists and criminals, but also for legitimate purposes such as by investigative journalists is simply ignored or disregarded, as is Telegram’s increasingly important function as a platform for (independent) news. By, instead, creating a black-and-white opposition between privacy and security, any open debate about citizens’ right to private communications is effectively cut short.

That is very interesting! What are the next steps in this project?

 

The next step in this research is to compare the framing of internet policy in mass media to how it is discussed and legitimised in political debates, for example in the Russian parliament. In particular, I want to find out whether there are significant differences between the types of argumentation and degree of rationality and emotion used in these different spheres. What role do media play in communicating about internet regulation to the general public, and do they add their own frames or simply repeat those developed in policy circles?

Open Data: From Messy to Neat

The October session aimed to teach Digital Russia Studies enthusiasts how new research resources – such as open government data – can be cleaned and pre-processed in an easy and efficient manner. Ilona Repponen, a research assistant at the Digital Russia Studies research group, held a master-class inspired by the work of Olga Parkhimovich and her clearspending.ru project.

In the first part of the workshop, the participants got acquainted with Russian open budget resources and learned how to use their search engines and download the data. According to Open Budget Index, Russia has succeeded in that field. Budget data on public procurement, local government, public services, subsidy agreements is available for free use in the Internet. Repponen emphasized that although published data is more or less correct it is advisable to study it critically because of a human factor behind it. Hence, mistakes can occur.

While publishing open data in mandated by the executive order of the Russian president since 2012, it remains notoriously difficult for researchers to collect and process these data. Luckily, there are tools that can help along the way. The second part of the workshop was devoted to OpenRefine,  an open-source tool suitable for processing messy data that needs to be calculated. The participants tried several functions and facets for clarifying and organizing the data in different steps. Finally, based on her own research experience, Repponen explained what should be taken into consideration when assessing the results of data transformations and checking unclear items manually.