The key components of research data management (RDM) consist of the following: knowing and describing your data, following ethical and legal principles, and understanding the workflows related to securing, storing, sharing, archiving, opening, and publishing your data. Here, we take a closer look at these RDM components and their relationship with the scientific research process and the basic services provided by a researcher’s home organisation.
(Tämä artikkeli on saatavilla myös suomeksi.)
Text: Mikko Ojanen
As a process that is separate from the scientific aspects of a research project, research data management (RDM) is often considered burdensome and bureaucratic – a task that is external to the research project and somebody else’s responsibility. This view is largely held because researchers are unaware of the fundamentals of RDM. This blog post presents the components of RDM:
- Knowing your data
- Describing your data
- Understanding the ethical and legal aspects of your data
- Storing, sharing and opening your data
1. Do I have data or not?
Most research projects involve research material and data. In an RDM context, the definition of data is very broad, and it covers practically everything research results are based on, such as measurements, physical samples, algorithms, works of art, humans, animals, organisms, buildings, thoughts, or beliefs. For this reason, research material and data are thought of as synonymous in the RDM context. Only a researcher conducting a completely theoretical or conceptual research project could carry out a study that did not collect, reuse, or produce data. Thus, RDM is a focus for more or less everyone involved in science and research. Knowing your data is the first key component of good RDM.
2. How to describe the data?
The second key component of RDM is describing the data. This process will also lead to a better understanding of what the project is about. A variety of data descriptions are required at the different levels of data documentation. Describing data at the level of datasets, where the focus of the documentation is broader, is different to describing the parameters and variables within a dataset. The former relates to facilitating the findability, accessibility, interoperability, and re-usability of data, whereas the latter is closely related to the methodological and analytical processes of the research project. In most cases, the documentation workflow and the metadata employed in a process need to be considered before the project begins or, at the very latest, before the start of the data collection. For example, it is considerably more laborious to wait until the end of the data collection to complete the cataloguing of 14 000 digital photographs or the transcription of 100 hours of recorded interview material.
3. How to keep things ethical and legal?
The third important component of RDM is to understand and follow the relevant ethical and legal principles. These are applicable to every research project in some form and include the following areas of expertise: recognising when and how to inform research participants, determining when privacy notices are required, identifying the controller if personal data is collected and processed, and understanding how GDPR dictates the conduct of research. In addition, the researcher must consider who owns their data or on what grounds they have a right to use the data. The core expertise of a researcher must also cover questions such as: Does the data include personal information, sensitive information, or sensitive personal information? Does the research data include IPRs, such as copyrighted material? Ethical and legal issues are rarely black and white and, at times, they can even be contradictory. Therefore, a risk assessment is an integral part of the RDM process when different values and principles needs to be prioritised.
4. Where and how to secure, store, share, archive, open, and publish data?
It can be concluded from the themes discussed above that RDM contributes more to the research process than just the technical solutions for storing digital files. Here, it is important to make a distinction between the active phase of the research project – when data is processed on a daily basis – and the static phase – when datasets are cleaned up for archiving or publishing, typically after the project has ended. It is also worth noting that sharing the data within a secure storage system should not be confused with opening, publishing, and archiving the data. The tools used for sharing, opening, publishing, and archiving can be very similar; however, from the RDM point of view they differ significantly.
It is also worth noting that sharing the data within a secure storage system should not be confused with opening, publishing, and archiving the data.
In the RDM life cycle, the opening of the data and the publication phase require a researcher to choose a suitable license and repository. The general rule of thumb is that the selected repository should be curated and capable of providing datasets with persistent identifiers (PID). Choosing a repository can be a frustrating process. However, an effective way to select a data archive is to review journals from your field to find where other researchers publish their data, or browse the suitable repositories in Re3data.org. Persistent identifiers will facilitate the findability and citability of the data. A researcher should also provide a data citation guide along with their data publications.
Understanding the components is important
Now that we have described the different RDM components – some may be easier to grasp than others – we can see that a different kind of expertise is needed to manage research data effectively. Understanding the different components is a crucial part of any research and it helps avoid the many pitfalls. In the next part of the series, we dig deeper into why RDM has become increasingly important for research and why planning in advance is advised.
Research Data Management – know your data!