Arto Mustajoki on how to work with Integrum and why we need CCCP

On 2.2.18, professor Arto Mustajoki has held a presentation on the use of Integrum database for research on Russian language and Russia more broadly. While the public transport was not running in Helsinki and we were only four people in the seminar room, twice as many participants joined online. Very nice to see that Adobe provides us a convenient way of enabling participation not only across our campuses but also across the world.

Integrum is the largest proprietary electronic archive of mass media from Russia and the CIS countries. Currently Integrum comprises more than 15000 databases, including newspapers and journals,   news agency texts,  TV and radio texts, online news publications from central and regional mass media, as well as legal texts, together constituting a corpus with over 50 bn words. Full-text archives of many newspapers and magazines date back to the beginning of the 1990s.

While the company, also called Integrum, is a private provider of the database, scholars at the University of Helsinki and everyone logging in from the University of Helsinki has free access to all its mass media collections.

During a live demo, professor Mustajoki showed different search queries that can be performed on Integrum directly, as well as provided a wealth of real examples on the use of the database stemming from his own research. For example, the advanced search functionality of Integrum allowed to find 2500 of a unique ‘semipassive’ voice; discover why people pretend (not) to understand; investigate attitudes towards ‘fashionable words’; and research the objects of Soviet nostalgia.

Professor Mustajoki emphasized that in advancing digital humanities, we need CCCPcooperation, collaboration, curiosity and passion. While computational linguistics and social statistics have a long history, digital humanities seek to cross-cut these existing practices and add new functionality by leveraging the potential of big data.