At the beginning of 2019, University of Helsinki (UH) Data Support together with the Faculty of Medicine conducted a survey of the faculty Principal Investigators about where data is stored during a project and where it is made available after the project. Almost 50 Principal Investigators participated in the survey. More than half of the researchers were affiliated only to University of Helsinki and around a quarter were affiliated both to the University and HUS.
(Tämä artikkeli on saatavilla myös suomeksi.)
Storing data during a project
The survey confirmed what was already known beforehand: researchers need help in protecting sensitive data. Sensitive and confidential data is information that could cause damage if revealed. It is impossible to make an inclusive list of that sort of information, but it includes, for example, sensitive personal information such as health information, risk of disease, sexual orientation, ethnic origin, religion or genetic information. Other than that, sensitive and confidential data can include sensitive information about species, patents, military information or trade secrets. The researcher is responsible for identifying any data that, if revealed, could harm the data subject or other target.
Securing sensitive data requires special effort regarding data protection. In addition of secure storage solutions around 85% of the researchers said their sensitive data was protected: data was either encrypted, pseudonymised or anonymised. Anonymisation is the best way to make data safe in respect of data protection, because then an individual data unit (e.g., person) is no longer identifiable using reasonable efforts. Furthermore, the EU general data protection regulation (GDPR) does not apply to anonymised data. More information about anonymisation and identifiers can be found in the FSD Data Management guidelines.
Protecting sensitive data will also help in choosing storing solutions for the data. Depending on the protection and level of sensitivity of the data, researchers can use basic storing solutions offered by the University of Helsinki IT Services, such as the group storage space. Nevertheless, please contact UH IT experts when you are planning to store sensitive data (firstname.lastname@example.org).
The storing solutions for sensitive data at the university include UMPIO, as well as virtual and physical servers. In addition, CSC offers ePouta for reseachers. ePouta and other storing solutions by CSC can be found on their webpages. When research funding is planned, resources for storing data are not always included in the budget. Especially storing sensitive data can become expensive, if protection or anonymisation is not an option, as the so-called basic IT options are not adequate.
In addition, the survey revealed that surprisingly many researchers use external hard drives for saving data. Whether researchers use external hard drives as a backup method or as their only storing system was not covered by the survey. The storing solutions by UH IT Services are automatically backed up and, for example, group storage space can be enlarged by contacting IT Services. More information about storing solutions can be found on the Data Support webpages and by contacting email@example.com.
Making data openly available after publishing results
We asked the researchers where, in addition to storing data, they make their data openly available after publishing the research results. If the data was sensitive and hence not suitable to be made openly available, we asked whether the metadata, the description of the data, had been made available. It was positive to notice that half of the researchers participating in the survey had made data openly available. Data repositories mentioned included EMBL-EBI (EGA, ENA, Array Express), NCBI (GEO, bdGaP), NIH, Open Science Framework, Zenodo, ResearchGate, GitHub as well as Biobanks. More data repositories especially for biomolecular data can be found from the Elixir webpages.
Nevertheless, it was concerning to notice from the survey results that making descriptions (metadata) of the collected sensitive data openly available was not familiar to the researchers. For example, Academy of Finland requires that: “If the research data cannot be made openly available, the metadata must be stored in a Finnish or international data finder.” These sorts of finders or repositories include, for instance, the national Etsin or the international Zenodo. Open science principles do not expect everything to be made openly available; moreover, degrees of data openness may justifiably vary, ranging from fully open to strictly confidential. However, making data openly available to others makes data reuse easier, enhances new findings and innovations, and advances research cooperation.
Snapshot of researchers’ everyday life
The survey identified which services researchers use, which services are not that well known, and where the service needs are. It is important to identify these matters in order to be able to develop facilities both at the Faculty of Medicine as well as at Data Support. The most important issues this survey revealed were that making data openly available after publishing research results could be made more familiar to researchers, and that researchers need help with managing sensitive data. Especially clarity and easy solutions are required for data storage. The medical faculty will respond to this need, together with the university IT Services, by developing a new storage service for sensitive data, which will be an addition to the solutions already available. We will hear more about this in due course.
These sorts of surveys give a good snapshot of how researchers currently manage their data and what the major issues where help is needed are. Hence, redoing this kind of survey in a couple of years’ time could be really useful for support service providers, as we would then find out where improvements have been made, and what should be developed