”It does not help if the data is open but impossible to understand” – Jaana Bäck’s thoughts on open science

”As a fundamental concept for helping in open science, the FAIR principle should be brought to the attention of everybody at the university”, writes professor Jaana Bäck in this blog post. For Bäck, following the principles of open science is a natural way of doing research because it improves the impact of her scientific work. ”Open access to the data allows efficient collaboration, co-authorship with researchers from other countries and continents, and overall, larger visibility and impact of the work we do.”

In this blog post, professor Jaana Bäck writes about open science’s impact on scientific work, about importance of FAIR principles and obstacles to open data, and about a need for research data management training. Bäck is a professor of forest-atmosphere interactions at the Faculty of Agriculture and Forestry, University of Helsinki. She was granted the Open Science Award in October 2018, during Open Access Week. The award is granted annually at the Uni­versity of Helsinki for the promotion of open science.

Jaana Bäck was granted this year’s Open Science Award. In her research, she is investigating the role of Nordic forests in climate change. Photo by Veikko Somerpuro

How did you get interested in open science and how do you implement it in your work?

The need for open science has always been very evident to me. In my MSc thesis I studied the impact of a pulp mill on nearby forests, and even that small study ended up into a published paper in 1987. By then nobody talked about open science however, and unfortunately, even today that paper is not yet open access, although it is available in digital form.

It is evident that following the principles of open science improves the impact of scientific work. Already for many decades almost all the data we collect automatically at our field stations, as well as the related metadata, is freely available and downloadable for scientific purposes. Data users are naturally expected to follow the principles of fair use and follow the principles of scientific ethics. Open access to the data allows efficient collaboration, co-authorships with researchers from other countries and continents, and overall, larger visibility and impact of the work we do.

How easy or laborious is implementing open science practices in your field of science? Is there support for this (e. g. established open access publishing channels)?

The field of climate change and Earth systems research is very international, and the recognition that all humans share one planet and one atmosphere makes it evident that, wherever you work, your data and results are potentially relevant to other scientists in your field as well. Although there may be differences in disciplinary or methodological approaches, there is a general need for using large datasets and find the interconnections and feedbacks, which contribute to the system behaviour.

The need for such open data, however, is not always fulfilled. Data may be in wrong format, lacking important metadata, or just not available if it is a ‘legacy’ dataset which may sit in somebody’s notebook and not even yet digitized. That’s why I am currently pursuing the open science concept also by participating in European and global initiatives, where environmental research infrastructures are aiming at harmonizing their methods and data, making them accessible, and in this way increasing the value of their data. The quest of openness in science is also coming from the funding agencies: it is much more cost-efficient if we have wide awareness of what has been done and have access to all collected data, naturally provided that it is organized and annotated in a useful way.

The need for such open data is however not always fulfilled. Data may be in wrong format, lacking important metadata, or just not available if it is a ‘legacy’ dataset which may sit in somebody’s notebook and not even yet digitized.

Open publishing is a prerequisite for progress in science and for facilitating knowledge discovery as a whole. It is also a matter of equality: when data has been collected and somebody analysed it, the results need to be made public and available to all who can use them. Openness means also that the results are replicated and verified. It also means that you get cited from what you have published. And that is of course how we are evaluated so open science promotes research careers the best possible way.

Is there situations when it is not possible to open research results or data?

Openness is not always easy. There may be embargoes due to ownership issues or commercial reasons, but they should be kept in minimum. Large research consortia may end up having datasets where parts of them are open and some are not due to various reasons. This should be avoided by joint agreements on data policies already in the beginning of the projects. In public research institutions such as universities we should be aware that in the end all what we do should end up into a public domain to facilitate knowledge discovery.

Large research consortia may end up having datasets where parts of them are open and some are not due to various reasons. This should be avoided by joint agreements on data policies already in the beginning of the projects.

Metadata and documentation are the identified bottlenecks in open data. What’s your point of view? What kind of skills does data opening require and what kind of support do research groups need?

Good data management, including good metadata, is not a goal in itself, but is rather the principal tool leading to knowledge discovery, subsequent data integration and further, to reuse by the community after the data has ultimately been published. Proper metadata and annotation is crucial for accessing datasets – it does not help if the data is open but impossible to understand. Therefore, proper ontologies and vocabularies are needed already when data are collected and stored, as well as good user interfaces which allow for searching and finding the proper data, and also tools to identify datasets when they are being used for e.g. publications. To this end, persistent identifiers are being developed for datasets, which are accessible via many central data systems, and this is really very important step. One important aspect is that it encourages researchers to open their data, when they can receive credits via PIDs from other scientists using and citing the data. Training in proper data management should be integral part in all researcher education. Also, emphasising the scientific working ethics that all components of the research process should be available to evaluation, to ensure transparency, reproducibility, and reusability.

Good data management, including good metadata, is not a goal in itself, but rather is the principal tool leading to knowledge discovery, subsequent data integration and further, to reuse by the community after the data has ultimately been published.

How well does the University of Helsinki support researchers in implementing open science practices?

The University has recently adopted many good ways to support open science. The agreements with some big publishing companies on publishing fees, general policy of openness of research and publications, and the open access to doctoral theses in many faculties are just a few examples. As a fundamental concept for helping in open science, the FAIR principle (Wilkinson et al 2015) should be brought to the attention of everybody at the university: research results and data should be Findable, Accessible, Interoperable, and Reusable. These principles are a guideline for open science also in general, and has been endorsed in many European data centres.