Tag Archives: RDM

Meet Mildred 5: In the beginning, there was DMP

The starting point for Project Mildred was the need to improve the quality of research data management (see the University of Helsinki research data policy). Funders, for example the Academy of Finland and the EU, require proper data management. A data management plan (DMP) is the basis on which all Mildred services are built. And in turn, the Mildred services provide researchers with the tools to implement a DMP in practice.

Mari Elisa Kuusniemi aka MEK. Photo by Jussi Männistö.

Mildred’s Sub-project 5 focuses on data management planning. The tool designed for researchers’ data planning, DMPTuuli, connects all Mildred services.

“DMPTuuli will market and provide links to other Mildred services. For example, data management during the research process requires Mildred 2 and Mildred 4 services,” says Mildred 5’s Project Manager, Mari Elisa Kuusniemi, also known as MEK.

DMPTuuli has been in use since the autumn of 2016. Currently, Mildred 5 focuses on developing discipline-specific guidance for the use of DMPTuuli, in co-operation with researchers (see DMP hackathon for historians).

“The tool itself is easy to use and doesn’t really need development. User ratings have been positive (see DMPTuuli user survey data in Figshare). Our challenge is the content. Research data management is a new issue for many researchers, and a researcher must take into account how the data are managed, collected, described and organized, already during the research. Then they need to know what is required to publish the data, and how to preserve it. Research data management involves all kinds of processes and agreements. This isn’t an easy task, even though the tool is easy to use,” says MEK.

Research data management practices vary by discipline, and various organisations within the University of Helsinki also provide guidance. The problem is that this guidance is scattered; legal advice is located in one place, research funding advice in another. DMPTuuli tries to serve all research data management guidance in one place.

“We are about to begin co-operation with Ethics Committees in order to provide guidance for the management of sensitive data. This is important, because legislation is about to change (see General Data Protection Regulation, GDPR),” says MEK.

Pälvi Kaiponen. Photo by Jussi Männistö.

DMPTuuli has been implemented in close co-operation with the Academy of Finland. In the autumn of 2016, the Academy received 1000 applications from the University of Helsinki, 800 of which were made via DMPTuuli.

“The strength of Project Mildred lies in the fact that it has involved research funding organisations right from the beginning. And every time the funder is involved, the researcher’s interest is aroused. This enables us to market university services,” says Mildred 5’s Project Owner, Pälvi Kaiponen.

A year ago, Academy funding made DMP better known among researchers and also provided information on the development of DMPTuuli. For example, researchers wanted to be able to log in using their organisation IDs, a wish which has now been granted. Exemplary DMPs are also published.

The DMP is part of the transformation process of research culture, in which proper research data management plays a significant role.

“It’s important to increase researchers’ understanding of why a DMP is relevant. A DMP is not only for the Academy of Finland; it’s for the researcher him/herself to better manage data,” claims Pälvi Kaiponen.

“Once the DMP has been made, its significance is usually understood. More and more, researchers ask for help with the matter itself, that is, managing the data, instead of asking for help to meet the funder’s requirements. Eighty per cent of the researchers who participated in the DMP workshops in the spring of 2017 did so because of data management. The change has been pretty quick,” says MEK.

DMPTuuli is also suitable for teaching purposes, and the goal is that the tool will be used in teaching already at the bachelor level. This would foster an open data culture at the University of Helsinki. Research data management is one of the key points of open science.

“Teachers could use DMPTuuli in various courses; for example, in courses related to research methodology. Proper data management skills are also important in working life. You need to know where to save your files, you need to understand the importance of backing up, version control, and description. Such basic skills are needed in all academic professions,” says MEK.

Meet Mildred 4: Storage for big data researchers and safety for all

The technical infrastructure of the Mildred services are built at the IT Center. Whereas Mildred’s Sub-project 2 was primarily concerned with data storage, sharing and management, Mildred’s Sub-project 4 is mainly related to data storage (capacity) and backup.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The goal of Mildred 4 is to ensure that researchers can flexibly increase their data storage capacity and preserve their data in the event of potential technical problems. These data storage and backup services are important to all who use data storage services, but especially to researchers and research groups with high data volumes.

“We are building a storage that suits big data researchers. Researchers with smaller data can use the services that built by Mildred 2. Mildred 4 storage is useful for researchers who have hundreds of terabytes of data. The target group is clearly in data intensive research,” explains Mildred 4’s Project Manager, Ville Tenhunen.

“The data storage built in Mildred 4 is perhaps related more to the data management during the research process. When the dataset is ready to be published, Mildred 2 and Mildred 3 services come into play,” says Mildred 4’s Project Owner Minna Harjuniemi.

Piloting is already complete: Ceph and GlusterFS have been tested for Mildred 4 data storage. The purchases should take place in the autumn of 2017.

The data backup service is basically now ready to be bought. Several Mildred services are being piloted by researchers this autumn, but the backup service does not need to be tested in the same way. Technical piloting is enough for “invisible background service”.

“No piloting is needed for the backup service because it’s a well-established standard service,” says Tenhunen.

According to Minna Harjuniemi and Ville Tenhunen, the Mildred 2 and 4 process has revealed no real surprises as regards content issues.

“We still feel we’re on the right track and that we’re doing the right things for both society and the planet. It’s a good idea to get involved in data sharing, and it’s a good idea to promote it and provide researchers with tools for it,” says Minna Harjuniemi.

Meet Mildred 2: The data fridge for data chefs

An essential part of Project Mildred is building technical infrastructure for data services. This is carried out in the Mildred’s Sub-projects 2 and 4, under the co-ordination of the IT Center. Mildred 2 is responsible for building the Data Repository Service, which will help researchers manage, share, and store research data. The service looks like a website, but researchers can also use it via their own applications and file systems.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The repository service is divided into two parts: the implementation of EUDAT (European Data Infrastructure) tools and the building of our own repository. The EUDAT tools are for storing, sharing and collaborating data during the research process, as well as for publishing and describing finished datasets. The University of Helsinki’s own repository provides tools for researchers whose data are not suitable for a cloud service like EUDAT’s.

“The EUDAT pilots began in the autumn of 2017, and the tools are to be implemented this year. The university’s own repository service is to be technically tested, and it will be ready, in some form, by the end the year,” promises Mildred 2’s Project Manager, Ville Tenhunen.

Already now, researchers have access to a variety of data services outside the University of Helsinki. Project Mildred aims to exploit existing services and link them to the research process in the best possible way.

“We don’t need to step on other players’ toes here. In some disciplines, it’s clear that research data will be stored in an international, rather than a national or a local data repository. On a national level, CSC (IT Center for Science) provides long-term storage, but we are heading in a different direction. Of course, we understand that some valuable research data require long-term preservation, but we don’t want to enter too deep into the problematics of this,” says Mildred 2’s Project Owner, Minna Harjuniemi.

When we talk about data, we must not forget metadata. Metadata and data are closely integrated and cannot be separated, especially in data management, which covers the entire life cycle of research data. Project Mildred deals with metadata in both Mildred 2 and Mildred 3. However, Mildred 2 is primarily about data.

“Mildred 3 introduces data story-telling [based on metadata], which is now a red-hot topic. Mildred 2 only makes the data available, and we’ll ensure that the data for data stories are at hand when needed,” explains Ville Tenhunen.

“You can compare our work in Mildred 2 to cooking. We give you the kitchen and the ingredients. It’s someone else’s task to prepare the food according to the recipe,” says Minna Harjuniemi. Ville Tenhunen continues:

“Mildred 2 is like a fridge, into which researchers put their carrots. Then the chef comes in and does something with them. We only supply a fridge. But there are many kinds of fridge. Fridges have different degrees of coldness, for example, and we can say ‘don’t put your fish in that one, put it in this one.”

The data fridge, with its related data services, is created specifically for the needs of researchers at the University of Helsinki. At present, research data are stored everywhere; here and there, in the most diverse places (see Data Repository Survey).

“Until now, files of a certain size have been difficult for us. For example, 1–2 terabytes of data are too large to fit into existing systems at a reasonable cost, and too small for a more extensive repository system. A lot of data fall in between. Researchers with 300–400 terabytes of data are easier to handle, because they clearly need special solutions, and they have the money and expertise. Also, a small amount of data, such as 20 gigabytes for example, easily fits into Wiki, or almost anywhere, in fact,” says Tenhunen.

Ville Tenhunen, Minna Harjuniemi and Jussi Männistö. Photo by Juuso Ala-Kyyny.

Even when the repository services are technically finished and released, the work is not yet complete. Project Mildred represents a new kind of service thinking, in which the service provider and the customer work together to develop the service.

“Social interaction isn’t achieved by an administrational organization, such as the IT Center or the Helsinki University Library, producing a lot of material. We can’t say ‘Here’s the sandbox and some nice rules for you, off you go and play’. This won’t lead to social connectedness in 2017. Users have to be involved, and the service provider has to be part of the community. Communication and feedback channels can be used to respond to situations and to share information with users. The world is changing fast, and services need to change too. Competitive advantage is based on our ability to respond to these changing needs,” says Ville Tenhunen.

Meet Mildred 1: Data support from a one-stop shop

The goal of Mildred’s Sub-project 1 (there are four others) is to make data services easily accessible to researchers on the ThinkOpen website. This happens in two ways: by gathering the University of Helsinki’s existing data services onto one online service channel, and by designing self-service functions for researchers.

Eeva Nyrövaara and Aija Kaitera. Photo by Jussi Männistö

The service concept is more or less the same as that in Book Navigator: the services currently provided by several service providers are all available in a one-stop shop. Aggregation is sorely needed, as researchers are unaware of the university data services available to them.

“The idea is to bring together services such as storage and data publishing, to make it easier for a researcher to find them. At the same time they can get the required service for themselves,” says Mildred 1’s Project Manager, Aija Kaitera.

As well as single services and service packages (e. g. Storing confidential data, Sharing data), the researcher can use the search engine or the guidance wizard for service searching. The wizard also teaches the user basic data management. Aija Kaitera believes that typical service needs are related to data protection, data publishing and long-term storage.

“The question of personal data is of interest to many researchers, and we have to help them manage personal data properly. This service may include expert consultancy and technical solutions related to security; for example, specific storage,” Kaitera explains.

Now it is clearer what the university can offer and how these services are offered to researchers. The university’s data services were described and listed during the summer, and the online service channel will be introduced in the autumn of 2017. The website is ready to be tested, and researchers have been invited to take part in the piloting.

“This autumn, we have to decide how to maintain the services. How can we ensure that they are kept up to date? And how can we add new services to what we already have?” asks Mildred 1’s Project Owner, Eeva Nyrövaara.

Project Mildred is a unique venture, because it can cut across organizational boundaries. The Data Support team, part of Mildred’s Sub-project 1 illustrates this: via Data Support, researchers can access data management specialists in the Helsinki University Library, IT Services, Central Archives, Research Affairs, Personnel Services, and Legal Affairs. What is important is that the researcher does not need to know anything about the organizational structure behind Data Support.

“Our challenge is to present services with different backgrounds in a coherent, comprehensible way to the researcher. The services must be understandable from the perspective of the research process,” stresses Aija Kaitera.

Jussi Männistö, Eeva Nyrövaara and Aija Kaitera. Photo by Juuso Ala-Kyyny

The first phase involves only the services provided by the University of Helsinki. But co-operation with different service providers is under negotiation.

“We aim to provide services from external service providers as well; for example, a service request to CSC (IT Center for Science) could be sent through our online service channel,” says Kaitera.

In the future, the online service channel may include automated self-service functions (e.g. acquiring disk space) and various personalization features (e.g. a shopping cart, suggestions based on the discipline).

“We can’t quite get there this year, but these features would offer researchers really good added value,” claims Kaitera.

Embedding and marketing the online service channel among researchers is a major challenge, and support from researchers is welcome.

“Of course, it’s not enough for the online service channel to be on ThinkOpen. Researchers won’t find it. The data service channel must be integrated into project guidance, and everywhere else where the data is mentioned,” Aija Kaitera emphasizes.