Open Data: From Messy to Neat

The October session aimed to teach Digital Russia Studies enthusiasts how new research resources – such as open government data – can be cleaned and pre-processed in an easy and efficient manner. Ilona Repponen, a research assistant at the Digital Russia Studies research group, held a master-class inspired by the work of Olga Parkhimovich and her clearspending.ru project.

In the first part of the workshop, the participants got acquainted with Russian open budget resources and learned how to use their search engines and download the data. According to Open Budget Index, Russia has succeeded in that field. Budget data on public procurement, local government, public services, subsidy agreements is available for free use in the Internet. Repponen emphasized that although published data is more or less correct it is advisable to study it critically because of a human factor behind it. Hence, mistakes can occur.

While publishing open data in mandated by the executive order of the Russian president since 2012, it remains notoriously difficult for researchers to collect and process these data. Luckily, there are tools that can help along the way. The second part of the workshop was devoted to OpenRefine,  an open-source tool suitable for processing messy data that needs to be calculated. The participants tried several functions and facets for clarifying and organizing the data in different steps. Finally, based on her own research experience, Repponen explained what should be taken into consideration when assessing the results of data transformations and checking unclear items manually.

Leave a Reply

Your email address will not be published.