The key components of research data management consist of knowing and describing your data, following ethical and legal principles, understanding the workflows related to securing, storing, sharing, archiving, opening and publishing your data. Here we take a closer look at these components of RDM and their relationship with the scientific conduct of research and partially with the basic services provided by the home organization.
(Tämä artikkeli on saatavilla myös suomeksi.)
Text: Mikko Ojanen
Research data management (RDM) – as a process separate from the scientific conduct of research – is often considered as a burden and bureaucratic process. As tasks, which are beyond the research project and on somebody else’s responsibility. This is largely because researchers are unaware of what research data management is about.
Do I have data or not?
Most of the research projects involves research material and data. In RDM context, the definition of data is very broad, and it covers practically everything research results are based on, whether they are measurements, physical samples, algorithms, works of art, humans, animals, organisms, buildings, thoughts, or beliefs etc. For this reason, in RDM context research material and data are considered as synonymous. Only a researcher conducting completely theoretical or conceptual research project can probably consider that their project does not collect, reuse or produce data. Thus, RDM concerns more or less everyone involved in science and research. Knowing your data is the first key component of good RDM.
How to describe the data?
Describing the data is the second key component of RDM. Eventually, it even helps in understanding what the project is about. Descibing the data dissolves into different levels of data documentation. It is a different thing to describe the data on the level of data sets, where the documentation remains on a broader level than when describing the parameters and variable within a dataset. The former is related to facilitating the findability, accessibility, interoperability and re-usability of data while the latter is closely related to the methodological and analytical processes of the research project. In many cases, the documentation workflow and the metadata employed in a process needs to be considered before the project starts and designed at the latest before the data collection starts. For example, cataloguing 14 000 digital photographs or transcribing 100 hours of interview material on sound recordings it is considerably laborious to do afterwards.
How to keep things ethical and legal?
The third important component of RDM is to know and to follow the ethical and legal principles. They are related to every research project in one form or the other. Following ethical and legal issues entails expertise to recognize when and how to inform research participants, when privacy notices are required, who is the controller if personal data is collected and processed, or how GDPR dictates the conduct of research. In addition, the researcher must consider who owns their data or on what grounds they have a right to use the data. The core expertise of a researcher covers such questions as does their data consist of personal information, sensitive information or sensitive personal information and does the research data comprises IPRs such as copyrighted material. Ethical and legal issues are rarely black and white, and at times they can even contradict. Thus, a risk assessment is an integral part of the process, when different values and principles needs to be prioritized.
Where and how to secure, store, share, archive, open, and publishing data?
As can be concluded from the themes discussed above, RDM is considerably larger part of research than technical solutions for the storing of digital files. Here, it is important to make a distinction between the active phase of the research project when data is processed on a daily basis and the static phase when data sets are cleaned up for archiving or publishing, typically, after the project has ended. It is also noteworthy that sharing the data within a secured storing system should not be confused with opening, publishing and archiving the data. Even though the tools used for sharing, opening, publishing and archiving can be very similar, from the RDM point of view they differ significantly.
The opening and publication phase of RDM life cycle requires a researcher to choose a suitable repository and to decide under which license they open their data. According to the rule of a thumb, the selected repository should be curated and capable of providing data sets with persistent identifiers (PID). Choosing a repository can be frustrating processes. However, finding out where other researchers publish their data by reading the journals from your own field or browsing suitable repositories from Re3data.org  can be an effective way to find a suitable data archive. Persistent identifiers will facilitate the findability and citeability of the data. A researcher should provide a data citation guide along with their data publications.
Now that we have seen what the different RDM components are, some maybe easier to grasp than others, we can see that different kind of expertise is needed in order to effectively manage research data. Understanding the different components is a crucial part for any research and it helps to avoid many pitfalls. In the next parts of the series, we dig deeper into why research data management has become more and more important for the research and why planning in advance is important.
Research Data Management – know your data!