What was learned in Project Mildred in 2016–2018? A review of a project that aimed at reforming research data services at the University of Helsinki

In autumn 2015, under the leadership of the Rector, the University of Helsinki decided to launch a project to update the university’s research data infrastructure. The Project Mildred project started in spring 2016 and continued until spring 2018.

The aim of Project Mildred was to provide researchers with a state-of-the-art infrastructure and design data–related services to help researchers, and thus support open science and the sharing of research data. Project Mildred was also a part of the implementation of the University of Helsinki research data policy confirmed in February 2015.

Project Mildred was a joint venture between Research Services, the IT Center and the Helsinki University Library. Mildred consisted of five sub-projects. In addition to infrastructure construction, the project included data storing and opening solutions for research data, as well as advisory services.

This is the final blog post of the Project Mildred. In this blog post, we create an overview of the project and see what was learned from it. In the future, you can follow issues related to research data services and open science on the Think Open blog.

M1 – The guidance wizard for service searching was successful

The aim of Mildred’s Sub-project 1 was to make data services easily accessible and easy to acquire to researchers through digital solutions. M1’s output was an online service channel that makes various services available in a one-stop shop, and thus makes the services more discoverable. The online service channel needs to be developed in the future. Eeva Nyrövaara (Research Services) worked as Project Owner and Aija Kaitera (Library) as Project Manager for Mildred 1.

“The original idea was to digitise the service acquisition, helping researchers to obtain data services through an electronic process. However, there was not enough infrastructure for this in the project schedule. After all, when implementing the Datasupport service channel, we learned what the offerings are of the university’s data services, how to combine different UH data services into a single interface and how to guide researchers in choosing services. In particular, the guidance wizard for service searching has been considered successful and has already been introduced to another service channel. We also learned that maintaining the service channel’s content in a constantly changing and complex environment is challenging. The ownership of the service channel has recently been transferred to the Helsinki University Library,” Aija Kaitera says.

M2 and M4 – New data services will be introduced during 2019

The aim of Mildred’s Sub-project 2 was to build the Data Repository Service, which will help researchers manage, share and store research data. M2 piloted EUDAT services and made a plan for how to implement a data service in the University of Helsinki’s own infrastructure.

The aim of Mildred’s Sub-project 4 was to introduce the data storage system infrastructure and the data backup service. M4 defined and acquired the equipment and systems required for the infrastructure and prepared for the implementation.

Minna Harjuniemi (IT Center) worked as Project Owner and Ville Tenhunen (IT Center) as Project Manager for Mildred 2 and 4.

“During Project Mildred, we acquired infrastructure to deploy services related to storing, sharing and publishing research data. Thus, as a result of the project, we can provide cost-effective data storage and basic storage-related services to researchers. Services will be introduced step by step during 2019,” Ville Tenhunen says.

M3 – The quality of metadata is crucial in data publishing services

The aim of Mildred’s Sub-project 3 was to build the data publishing and search service (metadata service) that makes the research data published by UH researchers easy to find and more widely exploited. M3 focused on updating the Think Open website (now the Open Science site) and harvesting the metadata of UH research data from publicly available metadata repositories (see the ATTX project). Pälvi Kaiponen (Library) worked as Project Owner and Pauli Assinen (Library) as Project Manager for Mildred 3.

“Although the planned goal of renewing the Think Open website was not achieved, the M3 Sub-project was useful: interaction between researchers and research support services’ stakeholders was strengthened in the handling of the use cases and the service concept; the descriptions of the use cases and the service concept could be utilised within the University of Helsinki and in national and international cooperation; and we also discovered how the poor quality of metadata for research data affects the building of services for publishing data,” Pauli Assinen says.

M5 – DMPTuuli is in use and DMP guidance is updated regularly

The aim of Mildred’s Sub-project 5 was to launch DMPTuuli, a tool designed for researchers’ data planning. DMPTuuli has been in use since the autumn of 2016, and M5 focused on developing guidance for the use of DMPTuuli. Pälvi Kaiponen (Library) worked as Project Owner and Mari Elisa Kuusniemi (Library) as Project Manager for Mildred 5.

“M5 focused on developing guidelines, and as a result, the first guidelines for data management planning (DMP) were made jointly by various UH stakeholders. On the basis of this work, it has been good to continue, and the guidance has been updated twice after Project Mildred. In 2019, a number of actors at University of Helsinki Datasupport have participated in the guidance work, which is coordinated by the library. Guidance for sensitive data management has also been developed as a part of the national Tuuli project,” Kuusniemi says.

In addition to the DMP guidelines, it is essential to integrate data management planning into good research practice. To achieve this goal, services need to be continuously developed.

“The importance of data management plans is already widely understood, but the requirement for a plan is not always remembered – and even less often is the DMP evaluated and given feedback. Datasupport provides a DMP commenting service. So far, it has focused on plans that are being made as part of calls from the Academy of Finland. However, the DMP commenting service might need to be implemented in other university activities, such as doctoral programmes,” Kuusniemi says.

Summary of the Mildred and prospects for the future

The Project Mildred steering group was chaired by Assistant Professor Mikko Tolonen. According to him, Mildred opened up a broader view to the research services provided by the different stakeholders at the University of Helsinki.

“Project Mildred was a bold move by the University of Helsinki to start developing its own research data infrastructure. During the project, we learned how important it is to balance the infrastructure of one’s own organisation, solutions from other stakeholders, various opportunities for open science and future solutions. Personally, the main lesson of Project Mildred was to learn to understand how good the work done at the University of Helsinki is in supporting research infrastructure and how much more investment is required to remain competitive in the future. In addition, I understood better the excellence of the support offered by CSC (IT Center for Science) for developing research-based projects,” Tolonen says.

During Project Mildred, the University of Helsinki was engaged in co-operation negotiations, and the human resources for the project were insufficient. As a result, Mildred did not achieve all of its goals.

Mildred succeeded in bringing researcher-users into the service design process. User feedback also had an impact: it changed the direction of the sub-projects. Mildred also provided experience and opened up development opportunities for collaboration between UH organisations: Research Services, the IT Center and the Library.

There is an overall plan for digitising research data services: Digital research university – University of Helsinki Digitalisation Programme: Roadmap for 2018–2020 and vision for 2024 (2018, pdf file). However, the implementation of the roadmap requires the financing decisions proposed in the roadmap. In future, attention should also be paid to raising awareness of research support services, increasing human resources and developing skills. The continuous development of services is essential, and work cannot be left to development projects alone.

During the Project Mildred, also the Mildred basil was planted. The video below tells the story of the basil from September 2017 to April 2019:

Which license should I choose for publications, research data or source code?

Helsinki University Library has published a license guide.

Shortly:

“In general, the University of Helsinki recommends CC BY – license for sharing publications, unless there are other recommendations by the publisher.”
“For sharing of research data The University of Helsinki recommends CC 0 – license, where the author gives up all copyrights in order to advance open science.”
“For sharing of source code the University of Helsinki recommends MIT- or GNU-GPL (v2) –licenses which have been developed for just this purpose”

Please, find further information from Open Access: License Guide: http://libraryguides.helsinki.fi/oa/eng/license

Meet Mildred 5: In the beginning, there was DMP

The starting point for Project Mildred was the need to improve the quality of research data management (see the University of Helsinki research data policy). Funders, for example the Academy of Finland and the EU, require proper data management. A data management plan (DMP) is the basis on which all Mildred services are built. And in turn, the Mildred services provide researchers with the tools to implement a DMP in practice.

Mari Elisa Kuusniemi aka MEK. Photo by Jussi Männistö.

Mildred’s Sub-project 5 focuses on data management planning. The tool designed for researchers’ data planning, DMPTuuli, connects all Mildred services.

“DMPTuuli will market and provide links to other Mildred services. For example, data management during the research process requires Mildred 2 and Mildred 4 services,” says Mildred 5’s Project Manager, Mari Elisa Kuusniemi, also known as MEK.

DMPTuuli has been in use since the autumn of 2016. Currently, Mildred 5 focuses on developing discipline-specific guidance for the use of DMPTuuli, in co-operation with researchers (see DMP hackathon for historians).

“The tool itself is easy to use and doesn’t really need development. User ratings have been positive (see DMPTuuli user survey data in Figshare). Our challenge is the content. Research data management is a new issue for many researchers, and a researcher must take into account how the data are managed, collected, described and organized, already during the research. Then they need to know what is required to publish the data, and how to preserve it. Research data management involves all kinds of processes and agreements. This isn’t an easy task, even though the tool is easy to use,” says MEK.

Research data management practices vary by discipline, and various organisations within the University of Helsinki also provide guidance. The problem is that this guidance is scattered; legal advice is located in one place, research funding advice in another. DMPTuuli tries to serve all research data management guidance in one place.

“We are about to begin co-operation with Ethics Committees in order to provide guidance for the management of sensitive data. This is important, because legislation is about to change (see General Data Protection Regulation, GDPR),” says MEK.

Pälvi Kaiponen. Photo by Jussi Männistö.

DMPTuuli has been implemented in close co-operation with the Academy of Finland. In the autumn of 2016, the Academy received 1000 applications from the University of Helsinki, 800 of which were made via DMPTuuli.

“The strength of Project Mildred lies in the fact that it has involved research funding organisations right from the beginning. And every time the funder is involved, the researcher’s interest is aroused. This enables us to market university services,” says Mildred 5’s Project Owner, Pälvi Kaiponen.

A year ago, Academy funding made DMP better known among researchers and also provided information on the development of DMPTuuli. For example, researchers wanted to be able to log in using their organisation IDs, a wish which has now been granted. Exemplary DMPs are also published.

The DMP is part of the transformation process of research culture, in which proper research data management plays a significant role.

“It’s important to increase researchers’ understanding of why a DMP is relevant. A DMP is not only for the Academy of Finland; it’s for the researcher him/herself to better manage data,” claims Pälvi Kaiponen.

“Once the DMP has been made, its significance is usually understood. More and more, researchers ask for help with the matter itself, that is, managing the data, instead of asking for help to meet the funder’s requirements. Eighty per cent of the researchers who participated in the DMP workshops in the spring of 2017 did so because of data management. The change has been pretty quick,” says MEK.

DMPTuuli is also suitable for teaching purposes, and the goal is that the tool will be used in teaching already at the bachelor level. This would foster an open data culture at the University of Helsinki. Research data management is one of the key points of open science.

“Teachers could use DMPTuuli in various courses; for example, in courses related to research methodology. Proper data management skills are also important in working life. You need to know where to save your files, you need to understand the importance of backing up, version control, and description. Such basic skills are needed in all academic professions,” says MEK.

Meet Mildred 4: Storage for big data researchers and safety for all

The technical infrastructure of the Mildred services are built at the IT Center. Whereas Mildred’s Sub-project 2 was primarily concerned with data storage, sharing and management, Mildred’s Sub-project 4 is mainly related to data storage (capacity) and backup.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The goal of Mildred 4 is to ensure that researchers can flexibly increase their data storage capacity and preserve their data in the event of potential technical problems. These data storage and backup services are important to all who use data storage services, but especially to researchers and research groups with high data volumes.

“We are building a storage that suits big data researchers. Researchers with smaller data can use the services that built by Mildred 2. Mildred 4 storage is useful for researchers who have hundreds of terabytes of data. The target group is clearly in data intensive research,” explains Mildred 4’s Project Manager, Ville Tenhunen.

“The data storage built in Mildred 4 is perhaps related more to the data management during the research process. When the dataset is ready to be published, Mildred 2 and Mildred 3 services come into play,” says Mildred 4’s Project Owner Minna Harjuniemi.

Piloting is already complete: Ceph and GlusterFS have been tested for Mildred 4 data storage. The purchases should take place in the autumn of 2017.

The data backup service is basically now ready to be bought. Several Mildred services are being piloted by researchers this autumn, but the backup service does not need to be tested in the same way. Technical piloting is enough for “invisible background service”.

“No piloting is needed for the backup service because it’s a well-established standard service,” says Tenhunen.

According to Minna Harjuniemi and Ville Tenhunen, the Mildred 2 and 4 process has revealed no real surprises as regards content issues.

“We still feel we’re on the right track and that we’re doing the right things for both society and the planet. It’s a good idea to get involved in data sharing, and it’s a good idea to promote it and provide researchers with tools for it,” says Minna Harjuniemi.

Meet Mildred 2: The data fridge for data chefs

An essential part of Project Mildred is building technical infrastructure for data services. This is carried out in the Mildred’s Sub-projects 2 and 4, under the co-ordination of the IT Center. Mildred 2 is responsible for building the Data Repository Service, which will help researchers manage, share, and store research data. The service looks like a website, but researchers can also use it via their own applications and file systems.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The repository service is divided into two parts: the implementation of EUDAT (European Data Infrastructure) tools and the building of our own repository. The EUDAT tools are for storing, sharing and collaborating data during the research process, as well as for publishing and describing finished datasets. The University of Helsinki’s own repository provides tools for researchers whose data are not suitable for a cloud service like EUDAT’s.

“The EUDAT pilots began in the autumn of 2017, and the tools are to be implemented this year. The university’s own repository service is to be technically tested, and it will be ready, in some form, by the end the year,” promises Mildred 2’s Project Manager, Ville Tenhunen.

Already now, researchers have access to a variety of data services outside the University of Helsinki. Project Mildred aims to exploit existing services and link them to the research process in the best possible way.

“We don’t need to step on other players’ toes here. In some disciplines, it’s clear that research data will be stored in an international, rather than a national or a local data repository. On a national level, CSC (IT Center for Science) provides long-term storage, but we are heading in a different direction. Of course, we understand that some valuable research data require long-term preservation, but we don’t want to enter too deep into the problematics of this,” says Mildred 2’s Project Owner, Minna Harjuniemi.

When we talk about data, we must not forget metadata. Metadata and data are closely integrated and cannot be separated, especially in data management, which covers the entire life cycle of research data. Project Mildred deals with metadata in both Mildred 2 and Mildred 3. However, Mildred 2 is primarily about data.

“Mildred 3 introduces data story-telling [based on metadata], which is now a red-hot topic. Mildred 2 only makes the data available, and we’ll ensure that the data for data stories are at hand when needed,” explains Ville Tenhunen.

“You can compare our work in Mildred 2 to cooking. We give you the kitchen and the ingredients. It’s someone else’s task to prepare the food according to the recipe,” says Minna Harjuniemi. Ville Tenhunen continues:

“Mildred 2 is like a fridge, into which researchers put their carrots. Then the chef comes in and does something with them. We only supply a fridge. But there are many kinds of fridge. Fridges have different degrees of coldness, for example, and we can say ‘don’t put your fish in that one, put it in this one.”

The data fridge, with its related data services, is created specifically for the needs of researchers at the University of Helsinki. At present, research data are stored everywhere; here and there, in the most diverse places (see Data Repository Survey).

“Until now, files of a certain size have been difficult for us. For example, 1–2 terabytes of data are too large to fit into existing systems at a reasonable cost, and too small for a more extensive repository system. A lot of data fall in between. Researchers with 300–400 terabytes of data are easier to handle, because they clearly need special solutions, and they have the money and expertise. Also, a small amount of data, such as 20 gigabytes for example, easily fits into Wiki, or almost anywhere, in fact,” says Tenhunen.

Ville Tenhunen, Minna Harjuniemi and Jussi Männistö. Photo by Juuso Ala-Kyyny.

Even when the repository services are technically finished and released, the work is not yet complete. Project Mildred represents a new kind of service thinking, in which the service provider and the customer work together to develop the service.

“Social interaction isn’t achieved by an administrational organization, such as the IT Center or the Helsinki University Library, producing a lot of material. We can’t say ‘Here’s the sandbox and some nice rules for you, off you go and play’. This won’t lead to social connectedness in 2017. Users have to be involved, and the service provider has to be part of the community. Communication and feedback channels can be used to respond to situations and to share information with users. The world is changing fast, and services need to change too. Competitive advantage is based on our ability to respond to these changing needs,” says Ville Tenhunen.

Meet Mildred 1: Data support from a one-stop shop

The goal of Mildred’s Sub-project 1 (there are four others) is to make data services easily accessible to researchers on the ThinkOpen website. This happens in two ways: by gathering the University of Helsinki’s existing data services onto one online service channel, and by designing self-service functions for researchers.

Eeva Nyrövaara and Aija Kaitera. Photo by Jussi Männistö

The service concept is more or less the same as that in Book Navigator: the services currently provided by several service providers are all available in a one-stop shop. Aggregation is sorely needed, as researchers are unaware of the university data services available to them.

“The idea is to bring together services such as storage and data publishing, to make it easier for a researcher to find them. At the same time they can get the required service for themselves,” says Mildred 1’s Project Manager, Aija Kaitera.

As well as single services and service packages (e. g. Storing confidential data, Sharing data), the researcher can use the search engine or the guidance wizard for service searching. The wizard also teaches the user basic data management. Aija Kaitera believes that typical service needs are related to data protection, data publishing and long-term storage.

“The question of personal data is of interest to many researchers, and we have to help them manage personal data properly. This service may include expert consultancy and technical solutions related to security; for example, specific storage,” Kaitera explains.

Now it is clearer what the university can offer and how these services are offered to researchers. The university’s data services were described and listed during the summer, and the online service channel will be introduced in the autumn of 2017. The website is ready to be tested, and researchers have been invited to take part in the piloting.

“This autumn, we have to decide how to maintain the services. How can we ensure that they are kept up to date? And how can we add new services to what we already have?” asks Mildred 1’s Project Owner, Eeva Nyrövaara.

Project Mildred is a unique venture, because it can cut across organizational boundaries. The Data Support team, part of Mildred’s Sub-project 1 illustrates this: via Data Support, researchers can access data management specialists in the Helsinki University Library, IT Services, Central Archives, Research Affairs, Personnel Services, and Legal Affairs. What is important is that the researcher does not need to know anything about the organizational structure behind Data Support.

“Our challenge is to present services with different backgrounds in a coherent, comprehensible way to the researcher. The services must be understandable from the perspective of the research process,” stresses Aija Kaitera.

Jussi Männistö, Eeva Nyrövaara and Aija Kaitera. Photo by Juuso Ala-Kyyny

The first phase involves only the services provided by the University of Helsinki. But co-operation with different service providers is under negotiation.

“We aim to provide services from external service providers as well; for example, a service request to CSC (IT Center for Science) could be sent through our online service channel,” says Kaitera.

In the future, the online service channel may include automated self-service functions (e.g. acquiring disk space) and various personalization features (e.g. a shopping cart, suggestions based on the discipline).

“We can’t quite get there this year, but these features would offer researchers really good added value,” claims Kaitera.

Embedding and marketing the online service channel among researchers is a major challenge, and support from researchers is welcome.

“Of course, it’s not enough for the online service channel to be on ThinkOpen. Researchers won’t find it. The data service channel must be integrated into project guidance, and everywhere else where the data is mentioned,” Aija Kaitera emphasizes.

Meet the Mildred Five!

“The University of Helsinki provides researchers and research groups with a research data infrastructure that includes tools and services for supporting the management, use, findability and sharing of data as well as with the capacity for storage, preservation, computing and processing. This data infrastructure is built and developed together with national and international parties, taking into account the services and infrastructures that they offer.”

Above is the fifth point in the University of Helsinki research data policy that was confirmed in February 2015. This research data policy was also the beginning of the Project Mildred that was launched in the spring 2016.

The goal of the Project Mildred is to implement the research data policy in practice. This is accomplished by providing the researchers with the tools to carry out the proper data management required by the University of Helsinki (see other points in research data policy) and other research funding institutions, for example Academy of Finland.

As we know, the Project Mildred is divided into five sub-projects:

  1. Digitalization of Research Data Services Delivery
  2. Data Repository Service
  3. Data Publishing and Metadata Service
  4. Data Storage and Backup
  5. Implementation of Data Management Planning Tool – Tuuli

The first phase of the Project Mildred will end by the end of the year 2017. In the next week, we will present each subproject and its current situation in this blog. So follow the blog posts and tweets (#MildredFive), and meet the Mildred Five!

In the video above: The photo session for the Project Mildred in Viikki, Helsinki. Ville Tenhunen (project manager) and Minna Harjuniemi (project owner) work in the Mildred sub-project 2 and 4. Photographer Jussi Männistö works in the Helsinki University Library. Photos will be published next week!

It is time to pilot Mildred services – volunteer researchers are being sought!

In the Project Mildred we have developed data management services for researchers. The work has been done in five Mildred sub-projects for over a year. Now most of the services are ready to be tested out. We invite researchers to join in any of the following five pilot schemes:

Searching the data service (Mildred 1)

What will be tested? The web site that helps a researcher to find the most suitable data service. This is an online service channel, where all the University of Helsinki data related services can be found.
When? In August 2017. The web service provider will be chosen in the first week of August, and researchers are expected to be involved in the development of the service at an early stage.
More information & enrollment: Aija Kaitera (aija.kaitera@helsinki.fi), Helsinki University Library.
Important! The online service channel is intended for users of all levels but especially for researchers who have little experience with data issues. In this respect, data newbies are most welcome to develop the service.
Good to know: The web site developed here is expected to be published in the autumn 2017.

Sharing and storing the data (Mildred 2)

What will be tested? The repository services designed for sharing and storing the data produced by researcher. The services make use of ready-made data tools developed by EUDAT, and the following services are introduced in the pilot: (1) B2DROP, a Dropbox-like service for data sharing and collaboration, (2) B2SHARE for data sharing and longer-term storage, and (3) B2SAFE for storing. Researchers use these services in their actual research. Researchers are expected to give feedback on the functionality of the services as well as on the further needs regarding the use of the service.
When? At first researchers use B2DROP, and the pilot of this service starts in August. The B2SHARE pilot will start in September. The B2SAFE pilot takes place by end of the year, and the schedule will be informed later. The pilots will continue until the end of the year unless otherwise agreed.
More information & enrollment: Kimmo Koskinen (kimmo.koskinen@helsinki.fi), Helsinki University Library and Ville Tenhunen (ville.tenhunen@helsinki.fi), IT Center.
Important! Participation in the pilot is risk free for researcher: firstly, the data management tools in the pilot are largely based on already existing and finished products, and secondly, EUDAT services are available to researchers also after the pilot. During the pilot phase, the use of the services is free of charge for the researchers. The distribution of the post-pilot costs will be clarified in the autumn. Furthermore, it is important to know that piloting involves only non-sensitive research data at this point. Solutions for sensitive data (e. g. personal data) will be run at a later point of time.
Good to know: The data handled in B2DROP can be transferred to B2SHARE in which case the data will be complemented with metadata and other information required for permanent preservation. The technical realization of the services is mainly carried out by EUDAT but the design and part of the technology is adjusted by the University of Helsinki. The service operator is IT Center for Science (CSC).

Searching and publishing the data (Mildred 3)

What will be tested? The user interface of the research data search service. In this service the user can search useful data sets. The pilot is supposed to give information about the functionality and browsing opportunities of the search service. Also the data publishing service will reach the pilot phase this year. The service is designed for the opening of the data produced by the researcher.
When? The prototype of the search service is tested in August/September 2017. The more sophisticated version of the search service will be opened to the public by the end of the year. The research data publishing service comes into the pilot phase during the autumn and is reported separately. The pilot for the data publishing service will be realized probably in October/November. A more accurate schedule will be informed later.
More information: Researchers interested in testing the search service can contact Pauli Assinen (pauli.assinen@helsinki.fi). Service provider Digitalist is responsible for implementing the pilot and recruiting.
Good to know: In the beginning, the Mildred search service crawls data from Zenodo (CERN), B2Share (EUDAT) and Etsin (ATT). Other sources for research data are being surveyed and the supply will expand later this year.

* * *

In addition to the pilots above, the storage and backup service for researchers (Mildred 4) will be piloted in the autumn 2017. The pilot plan and schedule will be refined later. The data management planning tool Tuuli (Mildred 5) is already in use, and researchers can already use it in drafting research plans. Discipline-specific Tuuli guidance will continue to develop, and researchers are involved in this work (See DMPTuuli hackathon for historians).