One One Two Day

Today is national One One Two Day 112 (in Finnish and Swedish). Today several government officials wish to direct attention to public safety and the proper use of the emergency phone number 112. The aim of the day is to make Finland the safest country in Europe. How to prevent accidents from happening? How to make everyday life more safe and secure? What kind of infrastructure promotes safety? How to find help when needed?

Today is also the second birthday of University of Helsinki Research Data Policy. Two years ago, on February 11, 2015 Rector Jukka Kola signed the Research Data Policy. The aim of the Research Data Policy is to define high-level principles regarding the collection, storing, use, and management of research data. The ultimate goal is to make researchers’ everyday life and core business, that is, research, easier and – in a sense – more safe.

Although the policy paper itself does not make anybody’s life easier of safer, the policy requires university to take certain actions in order to safeguard professional research data management throughout the research life cycle. From researcher’s point of view the key questions are: How to find help when needed? What kind of infrastructure supports my research?

During the implementation phase of the Research Data Policy, University of Helsinki has answered to these questions by first, setting up the Datasupport network and, second, launching the Project MILDRED. The Datasupport network assists researchers in the management of research data. Researchers are offered tools, services, and training regarding the management, use, access to, and sharing of research data. Via DataSupport researchers can reach all data management specialists at the University. The specialists will help researchers in data related emergency. As such, the Datasupport email researchdata@helsinki.fi is a kind of the 112 number for researchers.

But as is with the One One Two Day, the focus of the Datasupport network is not on the emergency cases, but on how to avoid them. How to plan research more secure and safe data-wise? The Datasupport network’s aim is to prevent emergencies and help researchers plan the collection, storing, use, and management of research data in advance.  Project MILDRED, on the other hand, builds a safe and well planned infrastructure on which the Datasupport and researcher can trust to support both data intensive research and research based on the so called long tail data. Infrastructure is the basis of well-working society and research.

What is there for me in Mildred?

By Mikko Tolonen

University of Helsinki has decided to update its research data infrastructure. This project is called Mildred – as hopefully you already know. Mildred includes five different sub-projects that have their emphases on storage, metadata creation, data management, the actual use of the research data, sharing of the data and interoperability. From an architectural perspective the project seems more or less straightforward. How this translates into practice, we are about to find out in the near future.

What were the reasons for starting this project? The apparent motivation from University’s side is the fact that research data will become a very viable asset in the future. It is within the interest of the University to provide the infrastructure that enables their researchers to storage their data, work on it and to disseminate it. It is also just as important that other scholars, general public and for example funding bodies can easily find the relevant research data and then put it to further use. It is a reasonable assumption that questions of intellectual property, data management and open science will be decisive factors for all the invested parties in the future as we move towards a more dynamic research culture. We have already good examples of different infrastructures competing on the quality of their research data and its openness. I think we can all agree that this direction of openness, reusability and reproducibility is the way to advance science, and it is therefore evident that to provide a functional research data infrastructure is within the general interest of scholars.

If this is the motivation at the University level, what about individual researchers? Why do we need Mildred? Can’t we just rely on existing research data infrastructures? There are plenty of those to go around with, or not? Isn’t Zenodo enough? What do we get out of this as University of Helsinki researchers?

First observation that I would like to make is that when we are discussing research data infrastructures on a general level, we need to remember that there is no such thing as a monolith concept of a “researcher”. As researchers we do not form a coherent whole with similar needs. We come from various different fields, in all shapes and sizes and with multiple research infrastructure needs. Therefore, it is evidently a good idea that individual research projects from different fields can have an impact on the kind of research data infrastructure that the university is building. That is also why we are after concrete user cases.

When looking at the question of research data infrastructure from the perspective of the humanities, for example, we can even say that we cannot really see how our research data infrastructure needs will be shaped in ten years time. A plain fact is that much of the humanities research data is not currently in a digital form. This is why we also need to focus on the digitization process together with libraries and archives that might not be a concern for some other fields of science. Also, in one sense, in open science and in the sharing of research data, we in the humanities are perhaps behind natural science.

At the same time, looking at the question of the needs of the humanities, it is evident that some of our research data needs will follow those of science in general. As an example, the use of CSC computing services is easily adaptable to humanities needs. Pouta cloud service functions quite well for our research group for example, but the question of backing up the data (or enabling reproducibility) is not that clear at the moment for us, to be honest. However, there are also particular hermeneutic questions that we are dealing with within the humanities that might mean that our data does not fit directly within the same scope as genome data, the level of detail can be different and so forth, but these are not reasons not to be on the same boat in the Mildred project as natural science. What Mildred enables us to do is also to see the similarities and realize the differences leading to the adoption of best practices from other fields of science when possible.

At the same time, what should not be forgotten is that what goes hand in hand with Mildred is different faculties and disciplines updating – or in some cases formulating – their own research data policies. This has been recently undertaken at the Faculty of Arts by the dean Hanna Snellman. Since most of humanities data has not yet been digitized, it is only natural that currently there is no research data policy at the Faculty of Arts – all of these questions have come to us quite recently, although language technology, for example, with large digital datasets has been developing at the University of Helsinki for past fifty years or so. But there should be a research data policy for the humanities and there soon will be.

This gives us an opportunity to think what are the right kinds of infrastructures for different types of humanities data. Most of the relevant infrastructures exist already (or are being developed) – what is crucial is to use them in an efficient way. The idea is not that Mildred will replace the existing infrastructures – Mildred is not designed as one research data infrastructure to rule them all – but we need to deliberate what research data should be curated by the national archives, what by the National or University Library, what by Language Bank of Finland, what by Mildred and so forth. As the humanities scholars come in all shapes and sizes, so does our data (also in different forms, for example, as sand from Egypt). Forming a research data policy is therefore crucial for humanities scholars and it will crucially help us in the questions of open science in the future. Let me underline that this is from my perspective certainly a process where Mildred is useful.

The ethos of Mildred, as I have understood it, is to engage as many real life projects (with real research data needs) as possible. This is also the obvious reason why these user groups have been formed. I know that the Mildred people have done their best to spread the word about the project and that it is possible to get involved in shaping it, but unfortunately it is remarkably hard to get the word through at the University level, at least through Flamma. I would encourage you that if you know that someone should be involved with Mildred, to get in touch with Ville Tenhunen or Eeva Nyrövaara and I am sure that they do their best to accommodate their needs as well.

One part of Mildred is (along with research data policies in different fields) to negotiate the different roles of various different existing research data infrastructures (both national and international). Collaboration with ATT, national data committee, other universities, repositories, libraries etc. is crucial. Luckily this is something that the project managers of Mildred are handling.

This includes the integration of much of the global standard setting processes that seem to be going on everywhere with respect to research data: How to refer to the data sets? How to use PIDs? How to implement these persistent identifiers for datasets (something that CSC and the British Library, according to my experience, are very much concerned with). Luckily, in Finland the Open Science and Research initiative is involved in these questions and Mildred is directly engaged in these processes as well. It would be hard for individual researchers focusing on their own research to stay on top of all these different developments in many different directions and levels. This is also one place where the processes of the Mildred project can be beneficial for us as researchers – research data infrastructure should eventually be guiding us in the use of best practices with clear guidelines.

One thing that I do know about Mildred is that what we do not want to do as researchers is to build a large, hard to use, inflexible, one-size fits all system that is not modeled after any real user cases.  At the same time, it is evident that we need storage space for different types and sizes of data.

Then there are of course great many concrete questions that we should be thinking quite hard based on the knowledge that can be gained from different use cases:

What level of reproducibility are we aiming at? Compatibility, version control, software citations, tool development with respect to our research data? How are these best supported by Mildred?

It seems that in all of these lingering general questions, small details with respect to an infrastructure are important. Meaning, it is important to build clear processes and functionalities that researchers are able to – and want to – use. This aspect of service design together with actual user cases cannot be overlooked.

All of these are questions that can be answered only through one particular route: that is through implementing and testing the forthcoming Mildred infrastructure against concrete research cases where research questions rule. My wish is that some kind of “platform thinking” would come through in Mildred for various different fields. For me, at best Mildred offers a platform for your research data needs so that you can better answer your current questions and take your scholarship to the next level. It is to me quite evident that this includes taking care of the data management, storage, reproducibility, tool development and dynamic use of the research data across different borders.

The idea of the user groups is not that they would be lip service. Mildred itself ought to be seen as an open science project. Sure, there are hardware questions that need to be decided and then cannot be altered, but if we are able to implement the right kind of platform thinking that includes the aspects of dynamic research, research collaboration beyond the university of Helsinki, questions of intellectual property, I am sure that this is a process worth participating in.

One important theme in the project is experiences where things have been going wrong with respect to your current research data and infrastructures that should support its use. I hope we get to concrete cases of implementing the structure of Mildred with respect to interoperability, dynamic data and so forth as soon as possible.

Request for Comments: User Stories and Scenarios

The ultimate goal in Mildred is to create services that are both useful and useable for its users. “Useful” in this context means something, which is needed and which makes one’s life at least somewhat (rather significantly) easier regarding handling and managing data. “Useable” and user experience will be a topic for another blog post, so let’s concentrate on the question how to make useful services.

When developing new software, modern organizations more and more often want to use so called agile methodologies. They promote incremental development of the product with close collaboration with customer; as the feedback is received constantly during the development process, the final product should meet customers’ real needs.

Among the several practices inside agile framework are user stories and roles. These are written in everyday language, and basically describe what task a user wants to achieve using the product. An example could be: “As an author of an article I want to publish my data set so that it can be found and citated by other researchers” or “As a researcher I want to find data sets published by other researchers so that I can make new observations combining the data sets”.

This is where we need you, dear future user of Mildred services! We have gathered some user stories based on the cases that our experts and data support group have solved during the years. Are they correct? Is something missing? Are some of them more important to you than others?

Read and comment directly here: https://wiki.helsinki.fi/display/Tutkimusdata/MILDRED+User+stories (requires authentication with UH or Haka credentials).

These user stories are used in most Mildred projects, so it’s uttermost important that they are a good presentations of your challenges. Unfortunately we cannot promise to fulfil all needs in the coming year, but the development will continue even after Mildred’s project phase is over.

Programme of the next user group meeting

Coordinates of the next MILDRED User Group meeting are:

Date and time: Wednesday 1st February 10:00-12:00

Place: Meilahti Campus, Biomedicum Helsinki 1 (Haartmaninkatu 8),  Kokoushuone 3 (http://www.helsinki.fi/teknos/opetustilat/meilahti/h8/kok3.htm).

Registration is open until 31st January 2017 (for dietary restrictions 24th January 2017): https://elomake.helsinki.fi/lomakkeet/76215/lomake.html

Programme:

10:00 – 10:10 Coffee

10.10 – 10.20 Opening, prof. Mikko Tolonen

10:20 – 10:30 Introduction to the theme and method: Current obstacles in data-centric research, Aija Kaitera & Mari Elisa Kuusniemi

10:30 – 11:10 Workshop session, participants

11:10 – 11:20 Summary, Aija Kaitera & Mari Elisa Kuusniemi

11:20 – 11:55 MILDRED 2: Example and demo of the possible data service, Kimmo Koskinen & Ville Tenhunen

11:55 – 12:00 Next steps and closing, Eeva Nyrövaara

Welcome!

Building better RDM guidance

The aim of the Mildred M5 group is the implementation of the data management planning tool DMPTuuli which helps you write data management plans. In data management plan you give a brief description of how you will collect, manage and store your data, and how the data can be used now and in the future. Many research funders like Academy of Finland require a data management plans in their funding applications. DMPTuuli links the funders requirements, the guidance and the support services provided by the UH to the same user interface.

In the M5 project the focus is on writing targeted research data management (RDM) guidance to researchers and students in the University of Helsinki. The current guidance in DMPTuuli is generic level instructions for all disciplines. Obviously, on size doesn’t fit for all and so we are planning to create discipline or data type specific guidance to better meet your demands.

First we are planning to make simplified guidance aimed for those who are not so familiar with the data management plans. So far we have defined the principles of the good guidance and benchmarked the available RDM guidance made by other organizations. At the moment we are writing a glossary of terms related to RDM.

In the next phase we will discover the needs for more specific guidance. For example, the sensitive data that needs to be anonymized might benefit of having RDM guidance that would go deeper than the generic level instructions.

But, we cannot write guidance without your contribution. Now that we are working on the beginner level guidance aimed for students for example, we would be happy to have volunteers to help us by giving comments and suggestions on what kind of guidance would be useful for students. And furthermore, if your research group has a need for more specific RDM instructions, please let us know! Feel free to use our feedback form or leave a reply to this post.

Licenses and next meeting of the user group

First things first. Next the MILDRED User group meeting will take place on Wednesday 1st February 2017 from 10 to 12. The venue will be Biomedicum Helsinki 1 (Haartmaninkatu 8),  Kokoushuone 3 (http://www.helsinki.fi/teknos/opetustilat/meilahti/h8/kok3.htm).

Last day to register is 26th January 2017 and you can do the registration here: https://elomake.helsinki.fi/lomakkeet/76215/lomake.html

Then, some thoughts about licenses.

It is important to define terms and conditions of the data, codes, materials etc. before someone else going to use them. It makes life easier if users know these things.

Therefore we have thought that we describe principles of the the Project MILDRED licenses before we have services or softwares anywhere. Idea is that you can comment this proposal. The steering group of the project will make decisions later, after the user group have had a possibility to comment these things.

The proposal:

If there is not any legal restrictions or other powerful arguments to some other licenses, within the Project MILDRED will be used following licenses:

The legal restriction could be for example so called copyleft license or use of the commercially licensed software etc. The powerful argument could be also a will of the owner of the information or material.

We like to hear your opinion about this proposal. You can leave your comment to this blog or you can use the feedback form of the project: https://elomake.helsinki.fi/lomakkeet/76062/lomake.html

EDIT: 21.12.2017 Venue and registration.

The MILDRED user group and research data service user profiles

The Project Mildred had the user group meeting last Tuesday with nearly 20 participants from all campuses of the University of Helsinki. Main issue of the meeting was the user profiles and researchers need.

Open space session produced numbers of comments and ideas which are very useful for the project. On the behalf of the project managers of the project, I like to thank everyone their participation and discussions! Results are now in the wiki.

If you didn’t make the meeting, no worries. We have opened wiki page where you can find the material and leave you comment. You have to log in to the wiki with University’s credentials or HAKA credentials.

Here are links:

Newsletter 2/2016

The second MILDRED Newsletter was emailed today. You can read it also below.

1. The user group workshop

How you use the research data? What tools you need? Which services the Project MILDRED should build up?

The Project MILDRED arrange the User Group meeting and workshop on Tuesday 15th November. Event will take a place 14:00-16:00 at Siltavuorenpenger 5, K218. In this sessions we will discuss about user profiles, services, requirements and next steps of the project.

Come and tell what kind of services you need for your research.

Registration is open until Friday 11.11.2016: https://elomake.helsinki.fi/lomakkeet/74066/lomake.html

2. Project managers of the subprojects of the MILDRED

Now all the Project MILDRED subprojects are up and running. They have own project managers as listed below:

Continue reading

Storing, sharing and visualizing

Autumn is coming! And the MILDRED-projects are finally accelerating to full speed. During the summer we have tried to figure out, what we really want to do: We have learned what kind of data services our researchers are using at the moment, we have gathered general requirements for research data infrastructures and we’ve tried to find out what kind of research data services there exits around the world nowadays.

The functional requirements and use cases for data repository and publishing services we have gathered during the summer can be put into three categories: Storing, Analyzing & Visualizing, and Sharing.

  1. Storing

    By storing we mean a service that provides a scalable storage space throughout the research data lifecycle so that users needs to save their data only once and only in one storage. The data should be secured, backed up and accessible from all different kind of clients e.g. browsers, command line or even desktop clients.

  2. Analysing & Visualizing

    Research data is very valuable as such, but we can add even more value to data by analysing and visualizing it. Therefore our research data services should provide all different kind of tools for users to enrich their data without moving or copying their data to other systems.

  3. Sharing

    By sharing, we actually mean three different types of sharing:

  • Short term sharing: As said in an earlier post by Anna Salmi, a big part of our researcher are using these file sync & share services like Dropbox and SugarSync. The researchers need a easy and fast way to share their files with colleagues all over the world without worrying the user permissions or federated authentication.
  • Shared storage: Researchers also need a data storage that is shared with their research groups or other colleagues. Additionally, research groups might need collaborative tools to ease their data workflow.
  • Data publication: For publishing research data, there are all kinds of requirements that is needed to provide a trusted and accessible repository, i.e. The publishing service must support discoverability, reproducibility, reusability. That means, for example, standardized APIs, rich metadata services and automated text citations.

During the summer we’ve tried to find an out-of-the-box open source software that would meet all the requirements we’ve gathered. Unfortunately and not so surprisingly there is no silver bullet. There are few software or solutions that would fulfil some requirements, but in the end they all lack something crucial. Basically, what we are trying to develop is some kind of platform-driven e-infrastructure, that is a rising trend in the field of research data infrastructures. There are few platform-driven repositories in the world today. For example Purr repository in Purdue University and The CUSP Datahub in New York University. Unfortunately the technology underneath these platforms are either obsolete or otherwise unsuitable.

Although, it may be that we can’t build a world-class open science platform right away, but we want to do something to build on and be, at least, one step closer to our world-class platform. Still the most important goal is to provide data services that our researchers want to use. We would like to offer a seamless user experience throughout the data life cycle to set the bar low for publishing research data. Hence this autumn user group will have a possibility to comment  use cases and concepts before any decisions. We want to be sure, we are delivering the right services for our researchers. And remember, any kind of feedback is appreciated.

 

MILDRED poster for International Data Week 2016

The poster about Project MILDRED has accepted and presented on International Data Week 2016 at Denver, CO, USA 11.-17.9.2016.

PDF-version of the poster you can find here: RDA-poster-final-2016-09-01

Edit:

Survey data mentioned in the poster is now available on FigShare: https://figshare.com/articles/Project_MILDRED_Research_Data_Survey/3806394