Storing, sharing and visualizing

Autumn is coming! And the MILDRED-projects are finally accelerating to full speed. During the summer we have tried to figure out, what we really want to do: We have learned what kind of data services our researchers are using at the moment, we have gathered general requirements for research data infrastructures and we’ve tried to find out what kind of research data services there exits around the world nowadays.

The functional requirements and use cases for data repository and publishing services we have gathered during the summer can be put into three categories: Storing, Analyzing & Visualizing, and Sharing.

  1. Storing

    By storing we mean a service that provides a scalable storage space throughout the research data lifecycle so that users needs to save their data only once and only in one storage. The data should be secured, backed up and accessible from all different kind of clients e.g. browsers, command line or even desktop clients.

  2. Analysing & Visualizing

    Research data is very valuable as such, but we can add even more value to data by analysing and visualizing it. Therefore our research data services should provide all different kind of tools for users to enrich their data without moving or copying their data to other systems.

  3. Sharing

    By sharing, we actually mean three different types of sharing:

  • Short term sharing: As said in an earlier post by Anna Salmi, a big part of our researcher are using these file sync & share services like Dropbox and SugarSync. The researchers need a easy and fast way to share their files with colleagues all over the world without worrying the user permissions or federated authentication.
  • Shared storage: Researchers also need a data storage that is shared with their research groups or other colleagues. Additionally, research groups might need collaborative tools to ease their data workflow.
  • Data publication: For publishing research data, there are all kinds of requirements that is needed to provide a trusted and accessible repository, i.e. The publishing service must support discoverability, reproducibility, reusability. That means, for example, standardized APIs, rich metadata services and automated text citations.

During the summer we’ve tried to find an out-of-the-box open source software that would meet all the requirements we’ve gathered. Unfortunately and not so surprisingly there is no silver bullet. There are few software or solutions that would fulfil some requirements, but in the end they all lack something crucial. Basically, what we are trying to develop is some kind of platform-driven e-infrastructure, that is a rising trend in the field of research data infrastructures. There are few platform-driven repositories in the world today. For example Purr repository in Purdue University and The CUSP Datahub in New York University. Unfortunately the technology underneath these platforms are either obsolete or otherwise unsuitable.

Although, it may be that we can’t build a world-class open science platform right away, but we want to do something to build on and be, at least, one step closer to our world-class platform. Still the most important goal is to provide data services that our researchers want to use. We would like to offer a seamless user experience throughout the data life cycle to set the bar low for publishing research data. Hence this autumn user group will have a possibility to comment  use cases and concepts before any decisions. We want to be sure, we are delivering the right services for our researchers. And remember, any kind of feedback is appreciated.


One thought on “Storing, sharing and visualizing

  1. Elsa

    I think your goal #2, although obviously very noble, is also Really Brave (unless I totally misunderstood its description, or that of the Mildred project up in the banner). There are approximately three trillion tools and software programs that researchers use in different fields to analyze and visualize different kinds of data. Even when there are services focused on one field only (say, the FIMM GIU for example), even these tend to not manage to keep up with all the programs that are available and handy for one or more of their customers. So I seriously wonder whether goal #2 would ever work with any reasonable effort. If it is technically possible to provide a direct and smooth connection to an existing service of computational capacity and analysis programs, such as CSC, so that the programs could somehow be used without having to move the data about, it might work.

Comments are closed.