Not just storage as usual – Long-term storing of research data for the University of Helsinki (TPAT)

In 2022, the University of Helsinki made a decision to strengthen its research data storage services and data curation over the next five years backed by 1.5 million euros in funding. The long-term storage of research data (Tutkimusdatan pitkäaikaistallennus, TPAT) differs from a conventional data storage service. The TPAT service focuses on storing data for 5–15 years after the research is completed. TPAT answers the question: ”Where can I save this valuable data for future use or as evidence of research already completed?”

Text: Ville Tenhunen (Center for Information Technology)

The University of Helsinki has identified the need to develop the long-term storage of research data and its services. A new service developed for this is intended for inactive data that are no longer in the active processing phase and are not necessarily suitable to be transferred to the service for the national long-term storage of research data (Fairdata PAS).

We call the project TPAT, based on the Finnish name Tutkimusdatan pitkäaikaistallennus (Long-term storage of research data; see the project in Flamma intranet). The new service will create a space where the valuable data of the research project can be saved for future use or as evidence for research already completed.

The purpose of the project is to facilitate data management for researchers by creating practices for long-term data storage and providing the support and devices to implement it.

More than just storage

TPAT does not create just the usual storage service. It is much more than that.

The project creates an operating model for the selection, curation, provision of services and further development of research data stored long term. This helps researchers to manage their research data better and gives the University better information about the valuable data used by its communities.

The project also implements technical and functional solutions for the long-term storage of research data in the IT infrastructure of the University of Helsinki, as well as directs the procurement of equipment according to need.

Finally, the services produced in the project will be transferred to the operations of the University’s organisation. In this process one objective is to improve the quality of operations through continuous development, as well as to define a service model that crosses organisational boundaries. The existing UH Data Support service has a remarkable role as a service produced by multiple University units.

The project creates an operating model for the selection, curation, provision of services and further development of research data stored long term. This helps researchers to manage their research data better and gives the University better information about the valuable data used by its communities.

Inactive data and data lifecycles

Data stored long term have to be inactive. This means that data have to be static, one way or another. They do not have to be published or linked to a publication.

There are numerous use cases where data might be valuable enough to be stored. Here are some preliminary examples of the potential use of the new service:

  • Scientific devices produce raw data which is worth storing some years before they are analysed with new technologies or in the next research project (which will be funded later).
  • A large survey which will be partially used for several projects or publications. The base data of the survey might be valuable enough to store over a long period to create time series, etc.
  • Data from a research project which will end and whose output has to be stored somewhere because of the funder’s rules.
  • Data from an external source which will be used in research projects years later.

If it is possible to store the data in the national long-term preservation service (PAS), then this option will also be investigated during the selection process. TPAT differs from PAS in a number of ways, such as the selection process, data lifecycle status, criteria, and other aspects. The estimated lifecycle of data in the TPAT is 15 years, whereas in PAS it is undefined (meaning: much longer). What these services have in common is that the data must be stored in a static format, i.e., TPAT storage is not possible to use for computing. It is also common for both services that the data are curated before being stored.

TPAT as a storage is also not planned for data sharing or publishing.

Support and use cases

Timo Lahtinen, Matilda Mela and Niina Nurmi are the library’s new recruits for the TPAT project. Photo: Jussi Männistö

The project is based on real life use cases and user testing. Collecting and describing these cases has already started.

Research support processes will be also developed together with the UH Data Support service. During the project, new resources will be recruited for the library and IT Center.

The project also needs to clarify and organise the metadata of the research data sets by researchers or units. Additionally, existing solutions for metadata management will be evaluated and reported on during the project.

Selection process based on scientific assessment

The data stored in the TPAT will be selected via scientific assessment. This means that the data stored for a long period has to fulfil defined criteria and a value assessment before the technical process starts. Faculties have a role in carrying out the assessment and defining the usage of the quota.

The University has also decided to make investments in storage capacity. This capacity will be shared to research groups and researchers based on quotas. The project will define an exact model for this, but the faculty-level quota will be based on the total budget of the faculty.

The financing model will include instructions about purchasing additional storage in cases where a group or unit needs an extensive amount.

Generally, the project produces an overall service management system to organise these activities and operations.

Project schedule and organisation

The project and operations after the project will be funded by the University of Helsinki for the next five years. During this period, the next steps of the research data storing management will be clarified based on the management model. It is clear that the University has data which need longer storing periods. The project itself is planned to end in December 2023.

The project has been divided into five work packages (WPs):

  • WP1 is about ”Developing the overall RDM process and service management”, led by Charlotte Granberg-Haakana from UH Research services.
  • WP2 is called ”Use cases and user testing”, and it is led by Janne Markkanen from the IT Center.
  • WP3 is about ”Selecting data for the service and library competencies”, led by Mari Elisa Kuusniemi from the UH library.
  • WP4 is ”Increasing storage capacity and IT competencies”, and it is led by Mikko Hassinen from the IT Center.
  • WP5 is responsible for ”Project management and communication”, led by Ville Tenhunen from the IT Center.

Vice-Rector Kai Nordlund chairs the project’s steering committee. The committee includes representatives from some of the faculties, research institutions and organisations involved in the project.

A few limitations – What TPAT is and what it isn’t

Even though the project has just started and the results are still forthcoming, some contextual assessments can be made.

TPAT is a different project and service from others in many ways. It does not create a classic data repository for data publishing where advanced data curation also takes place. It is not a storage place where researchers can store all kinds of data in various modes without any curation. TPAT also does not offer services for data sharing or processing.

Instead of these important phases of classic research data cycles, TPAT answers the question: ”Where can I save this valuable data for future use or as evidence of research already completed?”

Combined with sustainable service management processes and data curation, this approach also makes FAIR data management possible and gives opportunities to catch data earlier on the road to research reproducibility.

TPAT answers the question: ”Where can I save this valuable data for future use or as evidence of research already completed?”


Ville Tenhunen (ORCID, HY, @vtenhunen) works as a head of development in the Center for Information Technology. In the TPAT project, he works as a project manager.