What is research data for a historian? How to promote opening and sharing the data? And above all, how to improve guidance for data management so that historians would benefit from it in the best possible way?
These questions and many others were discussed by historians from the University of Helsinki in the Helsinki University Library (Kaisa House) on Monday, June 19. The aim of the hackathon event organized by the Helsinki University Library (HULib) was to develop guidance and instructions of the data management planning tool Tuuli (DMPTuuli) (see RDM guidance in the Mildred M5 project) for historians. The library has hosted a similar workshop for researchers in the fields of arts, design and architecture.
The three-hour long search for better guidance was proceeding step by step, following the structure of DMPTuuli. Hence, the researchers shared their views with (1) data description, (2) data documentation, (3) data storing, (4) ethical and legal issues, and (5) data sharing and long-term preserving (see also Research Guide by HU).
The hackathon was carried out in the form of group work, and the historians raised a number of concrete issues and general ideas about DMPTuuli guidance. As Susanna Nykyri from HULib summarized, historians’ remarks challenged and encouraged the library to further develop the guidance. In this blog post we have put together some of the key findings of the intensive Monday session. So, here’s 15 preliminary notes from the hackathon – you are free to share further ideas in the comment section!
(1) The data management model and terminology used in DMPTuuli appears somewhat distant for historians. Researcher Lauri Hirvonen assumes that the tool may be more suitable for social and natural scientists than for historians accustomed to archive materials. ”The DMPTuuli seems to be most applicable for research projects making use of distinctive and limited data sets that are collected for analysis. A historian may not be able to produce this kind of specific data set, in which case some elements in DMPTuuli may seem rather far-fetched,” Hirvonen says.
(2) However, DMPTuuli as well as the hackathon event have encouraged historians to reflect their documentation practices. Researcher Maiju Wuokko: ”We are forced to think about data management now because funding institutions demand it, but at the same time this brings out many crucial principles and practices in the discipline. So at its best, data management planning can be much more than a compulsory obligation in the funding process!”
(3) The hackathon clarified the overall process of managing the data, says Tuula Pääkkönen who works as information systems specialist in National Library of Finland. Pääkkönen: ”All the questions in DMPTuuli highlight the issues, and even though they can seem tricky at first glance, it is good to bring them into the open. One challenge is how deep into the data management plan topics one should go. The recommendation for the DMP is 1–2 pages, so finding the suitable detail level for oneself but useful for others is one consideration.” Pääkkönen participated in the hackathon to seek for ideas for National Library’s Digi project.
(4) DMPTuuli sample answers tailored for historians are at the top of researchers’ wish list. Maiju Wuokko believes that concrete sample answers would have a positive impact on the internal debate in the field of history. But then, sample answers might control and direct the data management planning too much. However, the need for discipline-specific workshops in developing the guidance and strengthening the practices is obvious. The ultimate goal is to have guidance that follows the research process, as Professor Anu Lahtinen from the Department of Philosophy, History, Culture and Art Studies says.
(5) The data produced by historians was divided into three types: the source material (e. g. archive materials as such), the data produced by the researcher (e. g. notes on archive materials), and the project management data (e. g. research permits). This division was considered relevant in principle. A researcher is solely responsible for the data produced by herself/himself.
(6) Due to the nature of the research, the data produced by a historian often consists of notes. Maiju Wuokko points out that this is a good thing to keep in mind when communicating with historians: ”It might be important to emphasize that the most urgent thing for us is to keep our own notes safe and to think about further use of the notes. Archives take care of the source material for us.”
(7) The core idea of the documentation is to ensure that the data collected by the researcher is also understandable for other researchers. As illustrated by Tuula Pääkkönen, the documentation makes sure that your data is understood – by fellow researchers during your holiday and by yourself after the holiday. In the field of history, where the research process is often an individual performance, this can be challenging because the documentation for others is not a self-evident practice. And on the other hand, historians might feel that inclusive documentation takes too much time from research.
(8) The documentation is often included in the research itself in the field of history. In this respect Lauri Hirvonen considers the documentation in DMP slightly overlapping with the research plan and research questions. Hirvonen: ”The actual publication should disclose all of the most relevant data in the form of references and data catalogs.”
(9) If the data produced by historians is mainly notes, the quality and integrity of the data can be ensured, according to Lauri Hirvonen, by readable notes that are secured by backups and reviewed with source-critical methods. Hirvonen: ”In practice, this is accomplished by meticulous referencing and by reviewing the data comprehensively in the introduction. Through the precise list of references and data sources, any researcher should be able to check the accuracy of the research. ”
(10) The practices of data storing vary from historian to historian. Some researchers have stored data in the Finnish Social Science Data Archive, others use a commercial cloud service. When making use of external cloud services, problems may arise if the service is not supported by the University of Helsinki. In addition to archives and cloud services, historians have been using hard disk drives, external hard disk drives, and network drives to store data.
Ethical and legal issues
(11) The most complicated ethical and legal issues are usually related to large data sets. Tuula Pääkkönen: ”In this data, there may be all kinds of special unique cases. One solution is to use old-enough material to minimize the risk,” Pääkkönen says.
(12) The anonymization of the data is sometimes demanding because a person can be identified in many ways, even when the name is removed. And on the other hand, if the anonymization goes too far, the data may lose its value and relevance for other researchers. Ethical evaluation included in the DMP process is considered particularly useful when there is uncertainty about the sensitivity of the data.
Data sharing and long-term preserving
(13) According to Lauri Hirvonen, sharing the data among fellow researchers may sound somewhat irrelevant to some historians. It is because of the nature of the data, explains Hirvonen: ”The research in the field of history usually doesn’t generate distinctive data sets that are analyzed by other researchers. The data produced is often notes based on a variety of historical records (e. g. documents, manuscripts, maps etc.), and merely sharing those notes as open data is problematic. Of course, also historians may produce valuable qualitative or quantitative data, for example in the field of economic history, and this data can be shared as a distinctive data set.”
(14) At what stage can the data be published? A historian may be ready to share the data only when it is used up, that is, when a certain number of articles based on the data have been written. The historians in the hackathon also mentioned the possibility to share part of the data, while the rest would be left for the researcher until the opening of the completed data. However, there are also other reasons why data is only partially shared. For example, the same data may have a wide variety of contents that must be shared or preserved in different ways.
(15) Sharing and preserving the data produced by historians may generate costs in at least two ways: the costs may arise from the storing itself (e. g. storage capacity) or the costs may arise from the production of metadata required for sharing and preserving. Additionally, costs may arise when increasing the usability of the data (e. g. data visualizations). The question worth considering is when the extra documentation (for sharing and preservation purposes) requires extra funding.
Text by Juuso Ala-Kyyny, photos by Jussi Männistö. Remarks in this blog post are based on the notes by Juuso Ala-Kyyny (HULib) and Markku Roinila (HULib), and on the email interviews with historian Lauri Hirvonen, historian Maiju Wuokko and information systems specialist Tuula Pääkkönen. The hackathon event was organized by Helsinki University Library – Mari Elisa Kuusniemi, Susanna Nykyri, Jari Friman, Tuija Korhonen, Ville Träff, Mikko Ojanen and Dolf Assmann.