Category Archives: data repository

Meet Mildred 4: Storage for big data researchers and safety for all

The technical infrastructure of the Mildred services are built at the IT Center. Whereas Mildred’s Sub-project 2 was primarily concerned with data storage, sharing and management, Mildred’s Sub-project 4 is mainly related to data storage (capacity) and backup.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The goal of Mildred 4 is to ensure that researchers can flexibly increase their data storage capacity and preserve their data in the event of potential technical problems. These data storage and backup services are important to all who use data storage services, but especially to researchers and research groups with high data volumes.

“We are building a storage that suits big data researchers. Researchers with smaller data can use the services that built by Mildred 2. Mildred 4 storage is useful for researchers who have hundreds of terabytes of data. The target group is clearly in data intensive research,” explains Mildred 4’s Project Manager, Ville Tenhunen.

“The data storage built in Mildred 4 is perhaps related more to the data management during the research process. When the dataset is ready to be published, Mildred 2 and Mildred 3 services come into play,” says Mildred 4’s Project Owner Minna Harjuniemi.

Piloting is already complete: Ceph and GlusterFS have been tested for Mildred 4 data storage. The purchases should take place in the autumn of 2017.

The data backup service is basically now ready to be bought. Several Mildred services are being piloted by researchers this autumn, but the backup service does not need to be tested in the same way. Technical piloting is enough for “invisible background service”.

“No piloting is needed for the backup service because it’s a well-established standard service,” says Tenhunen.

According to Minna Harjuniemi and Ville Tenhunen, the Mildred 2 and 4 process has revealed no real surprises as regards content issues.

“We still feel we’re on the right track and that we’re doing the right things for both society and the planet. It’s a good idea to get involved in data sharing, and it’s a good idea to promote it and provide researchers with tools for it,” says Minna Harjuniemi.

Meet Mildred 2: The data fridge for data chefs

An essential part of Project Mildred is building technical infrastructure for data services. This is carried out in the Mildred’s Sub-projects 2 and 4, under the co-ordination of the IT Center. Mildred 2 is responsible for building the Data Repository Service, which will help researchers manage, share, and store research data. The service looks like a website, but researchers can also use it via their own applications and file systems.

Ville Tenhunen and Minna Harjuniemi in Viikki, Helsinki. Photo by Jussi Männistö.

The repository service is divided into two parts: the implementation of EUDAT (European Data Infrastructure) tools and the building of our own repository. The EUDAT tools are for storing, sharing and collaborating data during the research process, as well as for publishing and describing finished datasets. The University of Helsinki’s own repository provides tools for researchers whose data are not suitable for a cloud service like EUDAT’s.

“The EUDAT pilots began in the autumn of 2017, and the tools are to be implemented this year. The university’s own repository service is to be technically tested, and it will be ready, in some form, by the end the year,” promises Mildred 2’s Project Manager, Ville Tenhunen.

Already now, researchers have access to a variety of data services outside the University of Helsinki. Project Mildred aims to exploit existing services and link them to the research process in the best possible way.

“We don’t need to step on other players’ toes here. In some disciplines, it’s clear that research data will be stored in an international, rather than a national or a local data repository. On a national level, CSC (IT Center for Science) provides long-term storage, but we are heading in a different direction. Of course, we understand that some valuable research data require long-term preservation, but we don’t want to enter too deep into the problematics of this,” says Mildred 2’s Project Owner, Minna Harjuniemi.

When we talk about data, we must not forget metadata. Metadata and data are closely integrated and cannot be separated, especially in data management, which covers the entire life cycle of research data. Project Mildred deals with metadata in both Mildred 2 and Mildred 3. However, Mildred 2 is primarily about data.

“Mildred 3 introduces data story-telling [based on metadata], which is now a red-hot topic. Mildred 2 only makes the data available, and we’ll ensure that the data for data stories are at hand when needed,” explains Ville Tenhunen.

“You can compare our work in Mildred 2 to cooking. We give you the kitchen and the ingredients. It’s someone else’s task to prepare the food according to the recipe,” says Minna Harjuniemi. Ville Tenhunen continues:

“Mildred 2 is like a fridge, into which researchers put their carrots. Then the chef comes in and does something with them. We only supply a fridge. But there are many kinds of fridge. Fridges have different degrees of coldness, for example, and we can say ‘don’t put your fish in that one, put it in this one.”

The data fridge, with its related data services, is created specifically for the needs of researchers at the University of Helsinki. At present, research data are stored everywhere; here and there, in the most diverse places (see Data Repository Survey).

“Until now, files of a certain size have been difficult for us. For example, 1–2 terabytes of data are too large to fit into existing systems at a reasonable cost, and too small for a more extensive repository system. A lot of data fall in between. Researchers with 300–400 terabytes of data are easier to handle, because they clearly need special solutions, and they have the money and expertise. Also, a small amount of data, such as 20 gigabytes for example, easily fits into Wiki, or almost anywhere, in fact,” says Tenhunen.

Ville Tenhunen, Minna Harjuniemi and Jussi Männistö. Photo by Juuso Ala-Kyyny.

Even when the repository services are technically finished and released, the work is not yet complete. Project Mildred represents a new kind of service thinking, in which the service provider and the customer work together to develop the service.

“Social interaction isn’t achieved by an administrational organization, such as the IT Center or the Helsinki University Library, producing a lot of material. We can’t say ‘Here’s the sandbox and some nice rules for you, off you go and play’. This won’t lead to social connectedness in 2017. Users have to be involved, and the service provider has to be part of the community. Communication and feedback channels can be used to respond to situations and to share information with users. The world is changing fast, and services need to change too. Competitive advantage is based on our ability to respond to these changing needs,” says Ville Tenhunen.