Autumn is coming! And the MILDRED-projects are finally accelerating to full speed. During the summer we have tried to figure out, what we really want to do: We have learned what kind of data services our researchers are using at the moment, we have gathered general requirements for research data infrastructures and we’ve tried to find out what kind of research data services there exits around the world nowadays.
The functional requirements and use cases for data repository and publishing services we have gathered during the summer can be put into three categories: Storing, Analyzing & Visualizing, and Sharing.
By storing we mean a service that provides a scalable storage space throughout the research data lifecycle so that users needs to save their data only once and only in one storage. The data should be secured, backed up and accessible from all different kind of clients e.g. browsers, command line or even desktop clients.
- Analysing & Visualizing
Research data is very valuable as such, but we can add even more value to data by analysing and visualizing it. Therefore our research data services should provide all different kind of tools for users to enrich their data without moving or copying their data to other systems.
By sharing, we actually mean three different types of sharing:
- Short term sharing: As said in an earlier post by Anna Salmi, a big part of our researcher are using these file sync & share services like Dropbox and SugarSync. The researchers need a easy and fast way to share their files with colleagues all over the world without worrying the user permissions or federated authentication.
- Shared storage: Researchers also need a data storage that is shared with their research groups or other colleagues. Additionally, research groups might need collaborative tools to ease their data workflow.
- Data publication: For publishing research data, there are all kinds of requirements that is needed to provide a trusted and accessible repository, i.e. The publishing service must support discoverability, reproducibility, reusability. That means, for example, standardized APIs, rich metadata services and automated text citations.
During the summer we’ve tried to find an out-of-the-box open source software that would meet all the requirements we’ve gathered. Unfortunately and not so surprisingly there is no silver bullet. There are few software or solutions that would fulfil some requirements, but in the end they all lack something crucial. Basically, what we are trying to develop is some kind of platform-driven e-infrastructure, that is a rising trend in the field of research data infrastructures. There are few platform-driven repositories in the world today. For example Purr repository in Purdue University and The CUSP Datahub in New York University. Unfortunately the technology underneath these platforms are either obsolete or otherwise unsuitable.
Although, it may be that we can’t build a world-class open science platform right away, but we want to do something to build on and be, at least, one step closer to our world-class platform. Still the most important goal is to provide data services that our researchers want to use. We would like to offer a seamless user experience throughout the data life cycle to set the bar low for publishing research data. Hence this autumn user group will have a possibility to comment use cases and concepts before any decisions. We want to be sure, we are delivering the right services for our researchers. And remember, any kind of feedback is appreciated.