Author Archives: Deleted User

About Deleted User

Special user account.

One One Two Day

Today is national One One Two Day 112 (in Finnish and Swedish). Today several government officials wish to direct attention to public safety and the proper use of the emergency phone number 112. The aim of the day is to make Finland the safest country in Europe. How to prevent accidents from happening? How to make everyday life more safe and secure? What kind of infrastructure promotes safety? How to find help when needed?

Today is also the second birthday of University of Helsinki Research Data Policy. Two years ago, on February 11, 2015 Rector Jukka Kola signed the Research Data Policy. The aim of the Research Data Policy is to define high-level principles regarding the collection, storing, use, and management of research data. The ultimate goal is to make researchers’ everyday life and core business, that is, research, easier and – in a sense – more safe.

Although the policy paper itself does not make anybody’s life easier of safer, the policy requires university to take certain actions in order to safeguard professional research data management throughout the research life cycle. From researcher’s point of view the key questions are: How to find help when needed? What kind of infrastructure supports my research?

During the implementation phase of the Research Data Policy, University of Helsinki has answered to these questions by first, setting up the Datasupport network and, second, launching the Project MILDRED. The Datasupport network assists researchers in the management of research data. Researchers are offered tools, services, and training regarding the management, use, access to, and sharing of research data. Via DataSupport researchers can reach all data management specialists at the University. The specialists will help researchers in data related emergency. As such, the Datasupport email researchdata@helsinki.fi is a kind of the 112 number for researchers.

But as is with the One One Two Day, the focus of the Datasupport network is not on the emergency cases, but on how to avoid them. How to plan research more secure and safe data-wise? The Datasupport network’s aim is to prevent emergencies and help researchers plan the collection, storing, use, and management of research data in advance. Project MILDRED, on the other hand, builds a safe and well planned infrastructure on which the Datasupport and researcher can trust to support both data intensive research and research based on the so called long tail data. Infrastructure is the basis of well-working society and research.

What is there for me in Mildred?

By Mikko Tolonen

University of Helsinki has decided to update its research data infrastructure. This project is called Mildred – as hopefully you already know. Mildred includes five different sub-projects that have their emphases on storage, metadata creation, data management, the actual use of the research data, sharing of the data and interoperability. From an architectural perspective the project seems more or less straightforward. How this translates into practice, we are about to find out in the near future.

What were the reasons for starting this project? The apparent motivation from University’s side is the fact that research data will become a very viable asset in the future. It is within the interest of the University to provide the infrastructure that enables their researchers to storage their data, work on it and to disseminate it. It is also just as important that other scholars, general public and for example funding bodies can easily find the relevant research data and then put it to further use. It is a reasonable assumption that questions of intellectual property, data management and open science will be decisive factors for all the invested parties in the future as we move towards a more dynamic research culture. We have already good examples of different infrastructures competing on the quality of their research data and its openness. I think we can all agree that this direction of openness, reusability and reproducibility is the way to advance science, and it is therefore evident that to provide a functional research data infrastructure is within the general interest of scholars.

If this is the motivation at the University level, what about individual researchers? Why do we need Mildred? Can’t we just rely on existing research data infrastructures? There are plenty of those to go around with, or not? Isn’t Zenodo enough? What do we get out of this as University of Helsinki researchers?

First observation that I would like to make is that when we are discussing research data infrastructures on a general level, we need to remember that there is no such thing as a monolith concept of a “researcher”. As researchers we do not form a coherent whole with similar needs. We come from various different fields, in all shapes and sizes and with multiple research infrastructure needs. Therefore, it is evidently a good idea that individual research projects from different fields can have an impact on the kind of research data infrastructure that the university is building. That is also why we are after concrete user cases.

When looking at the question of research data infrastructure from the perspective of the humanities, for example, we can even say that we cannot really see how our research data infrastructure needs will be shaped in ten years time. A plain fact is that much of the humanities research data is not currently in a digital form. This is why we also need to focus on the digitization process together with libraries and archives that might not be a concern for some other fields of science. Also, in one sense, in open science and in the sharing of research data, we in the humanities are perhaps behind natural science.

At the same time, looking at the question of the needs of the humanities, it is evident that some of our research data needs will follow those of science in general. As an example, the use of CSC computing services is easily adaptable to humanities needs. Pouta cloud service functions quite well for our research group for example, but the question of backing up the data (or enabling reproducibility) is not that clear at the moment for us, to be honest. However, there are also particular hermeneutic questions that we are dealing with within the humanities that might mean that our data does not fit directly within the same scope as genome data, the level of detail can be different and so forth, but these are not reasons not to be on the same boat in the Mildred project as natural science. What Mildred enables us to do is also to see the similarities and realize the differences leading to the adoption of best practices from other fields of science when possible.

At the same time, what should not be forgotten is that what goes hand in hand with Mildred is different faculties and disciplines updating – or in some cases formulating – their own research data policies. This has been recently undertaken at the Faculty of Arts by the dean Hanna Snellman. Since most of humanities data has not yet been digitized, it is only natural that currently there is no research data policy at the Faculty of Arts – all of these questions have come to us quite recently, although language technology, for example, with large digital datasets has been developing at the University of Helsinki for past fifty years or so. But there should be a research data policy for the humanities and there soon will be.

This gives us an opportunity to think what are the right kinds of infrastructures for different types of humanities data. Most of the relevant infrastructures exist already (or are being developed) – what is crucial is to use them in an efficient way. The idea is not that Mildred will replace the existing infrastructures – Mildred is not designed as one research data infrastructure to rule them all – but we need to deliberate what research data should be curated by the national archives, what by the National or University Library, what by Language Bank of Finland, what by Mildred and so forth. As the humanities scholars come in all shapes and sizes, so does our data (also in different forms, for example, as sand from Egypt). Forming a research data policy is therefore crucial for humanities scholars and it will crucially help us in the questions of open science in the future. Let me underline that this is from my perspective certainly a process where Mildred is useful.

The ethos of Mildred, as I have understood it, is to engage as many real life projects (with real research data needs) as possible. This is also the obvious reason why these user groups have been formed. I know that the Mildred people have done their best to spread the word about the project and that it is possible to get involved in shaping it, but unfortunately it is remarkably hard to get the word through at the University level, at least through Flamma. I would encourage you that if you know that someone should be involved with Mildred, to get in touch with Ville Tenhunen or Eeva Nyrövaara and I am sure that they do their best to accommodate their needs as well.

One part of Mildred is (along with research data policies in different fields) to negotiate the different roles of various different existing research data infrastructures (both national and international). Collaboration with ATT, national data committee, other universities, repositories, libraries etc. is crucial. Luckily this is something that the project managers of Mildred are handling.

This includes the integration of much of the global standard setting processes that seem to be going on everywhere with respect to research data: How to refer to the data sets? How to use PIDs? How to implement these persistent identifiers for datasets (something that CSC and the British Library, according to my experience, are very much concerned with). Luckily, in Finland the Open Science and Research initiative is involved in these questions and Mildred is directly engaged in these processes as well. It would be hard for individual researchers focusing on their own research to stay on top of all these different developments in many different directions and levels. This is also one place where the processes of the Mildred project can be beneficial for us as researchers – research data infrastructure should eventually be guiding us in the use of best practices with clear guidelines.

One thing that I do know about Mildred is that what we do not want to do as researchers is to build a large, hard to use, inflexible, one-size fits all system that is not modeled after any real user cases. At the same time, it is evident that we need storage space for different types and sizes of data.

Then there are of course great many concrete questions that we should be thinking quite hard based on the knowledge that can be gained from different use cases:

What level of reproducibility are we aiming at? Compatibility, version control, software citations, tool development with respect to our research data? How are these best supported by Mildred?

It seems that in all of these lingering general questions, small details with respect to an infrastructure are important. Meaning, it is important to build clear processes and functionalities that researchers are able to – and want to – use. This aspect of service design together with actual user cases cannot be overlooked.

All of these are questions that can be answered only through one particular route: that is through implementing and testing the forthcoming Mildred infrastructure against concrete research cases where research questions rule. My wish is that some kind of “platform thinking” would come through in Mildred for various different fields. For me, at best Mildred offers a platform for your research data needs so that you can better answer your current questions and take your scholarship to the next level. It is to me quite evident that this includes taking care of the data management, storage, reproducibility, tool development and dynamic use of the research data across different borders.

The idea of the user groups is not that they would be lip service. Mildred itself ought to be seen as an open science project. Sure, there are hardware questions that need to be decided and then cannot be altered, but if we are able to implement the right kind of platform thinking that includes the aspects of dynamic research, research collaboration beyond the university of Helsinki, questions of intellectual property, I am sure that this is a process worth participating in.

One important theme in the project is experiences where things have been going wrong with respect to your current research data and infrastructures that should support its use. I hope we get to concrete cases of implementing the structure of Mildred with respect to interoperability, dynamic data and so forth as soon as possible.

Building better RDM guidance

The aim of the Mildred M5 group is the implementation of the data management planning tool DMPTuuli which helps you write data management plans. In data management plan you give a brief description of how you will collect, manage and store your data, and how the data can be used now and in the future. Many research funders like Academy of Finland require a data management plans in their funding applications. DMPTuuli links the funders requirements, the guidance and the support services provided by the UH to the same user interface.

In the M5 project the focus is on writing targeted research data management (RDM) guidance to researchers and students in the University of Helsinki. The current guidance in DMPTuuli is generic level instructions for all disciplines. Obviously, on size doesn’t fit for all and so we are planning to create discipline or data type specific guidance to better meet your demands.

First we are planning to make simplified guidance aimed for those who are not so familiar with the data management plans. So far we have defined the principles of the good guidance and benchmarked the available RDM guidance made by other organizations. At the moment we are writing a glossary of terms related to RDM.

In the next phase we will discover the needs for more specific guidance. For example, the sensitive data that needs to be anonymized might benefit of having RDM guidance that would go deeper than the generic level instructions.

But, we cannot write guidance without your contribution. Now that we are working on the beginner level guidance aimed for students for example, we would be happy to have volunteers to help us by giving comments and suggestions on what kind of guidance would be useful for students. And furthermore, if your research group has a need for more specific RDM instructions, please let us know! Feel free to use our feedback form or leave a reply to this post.

Storing, sharing and visualizing

Autumn is coming! And the MILDRED-projects are finally accelerating to full speed. During the summer we have tried to figure out, what we really want to do: We have learned what kind of data services our researchers are using at the moment, we have gathered general requirements for research data infrastructures and we’ve tried to find out what kind of research data services there exits around the world nowadays.

The functional requirements and use cases for data repository and publishing services we have gathered during the summer can be put into three categories: Storing, Analyzing & Visualizing, and Sharing.

Storing

By storing we mean a service that provides a scalable storage space throughout the research data lifecycle so that users needs to save their data only once and only in one storage. The data should be secured, backed up and accessible from all different kind of clients e.g. browsers, command line or even desktop clients.
Analysing & Visualizing
Research data is very valuable as such, but we can add even more value to data by analysing and visualizing it. Therefore our research data services should provide all different kind of tools for users to enrich their data without moving or copying their data to other systems.
Sharing
By sharing, we actually mean three different types of sharing:

Short term sharing: As said in an earlier post by Anna Salmi, a big part of our researcher are using these file sync & share services like Dropbox and SugarSync. The researchers need a easy and fast way to share their files with colleagues all over the world without worrying the user permissions or federated authentication.
Shared storage: Researchers also need a data storage that is shared with their research groups or other colleagues. Additionally, research groups might need collaborative tools to ease their data workflow.
Data publication: For publishing research data, there are all kinds of requirements that is needed to provide a trusted and accessible repository, i.e. The publishing service must support discoverability, reproducibility, reusability. That means, for example, standardized APIs, rich metadata services and automated text citations.

During the summer we’ve tried to find an out-of-the-box open source software that would meet all the requirements we’ve gathered. Unfortunately and not so surprisingly there is no silver bullet. There are few software or solutions that would fulfil some requirements, but in the end they all lack something crucial. Basically, what we are trying to develop is some kind of platform-driven e-infrastructure, that is a rising trend in the field of research data infrastructures. There are few platform-driven repositories in the world today. For example Purr repository in Purdue University and The CUSP Datahub in New York University. Unfortunately the technology underneath these platforms are either obsolete or otherwise unsuitable.

Although, it may be that we can’t build a world-class open science platform right away, but we want to do something to build on and be, at least, one step closer to our world-class platform. Still the most important goal is to provide data services that our researchers want to use. We would like to offer a seamless user experience throughout the data life cycle to set the bar low for publishing research data. Hence this autumn user group will have a possibility to comment use cases and concepts before any decisions. We want to be sure, we are delivering the right services for our researchers. And remember, any kind of feedback is appreciated.

Newsletter 1/2016

The first MILDRED Newsletter was emailed today. You can read it also below.

The aim of MILDRED is to provide researchers with a state-of-the-art research data infrastructure and design data related services to help researchers of different diciplines. The project try to take into account continually changing needs of the researchers in the diverse community where participants have an influence on results. Secondly, this kind cooperation between various areas of the research e.g. from digital humanities to natural sciences give opportunities to make surprising discoveries and creative insights.

Therefore we invite research to join this user group of the Project MILDRED.

1 What’s going on? Some words about the MILDRED Project and it’s subprojects.

Digitalization of Research Data Services Delivery
This project will start on the autumn. Currently we are preparing the project and collecting pieces of the information. More about this project on autumn.

Data Repository Service
This project is up and running. Currently we are sorting out possible solutions for repository platform and describing use cases for the repository services. Next autumn user group will have a possibility comment these use cases and platform ideas before any decisions.

Data Publishing and Metadata
The project is on the planning phase and the project plan will be ready on the autumn.

Data Storage and Backup
This project is also up and running. Currently we are on starting phase of the project and preliminary discussions about co-operation with CSC has taken place. Other possibilities will be evaluated on autumn and will be based on possible use cases.

Implementation of Data Management Planning Tool – Tuuli
Data management planning tool Tuuli (https://www.dmptuuli.fi/) will help you write data management plans for Academy of Finland, European Commission (Horizon 2020), TEKES, National Institutes of Health and other funders applications. This is the service you can use already.

2. Survey

As a part of the project three, data publishing and metadata services development, a survey was sent to the University of Helsinki researchers before Midsummer about research data repositories currently in use. Reasons for not using a specific repository were also asked.

Delightfully, over 200 answers has been gathered so far. Warm thanks to all who answered! All information about depositing practices and preferences is extremely valuable in planning the future data services.

The survey is still open and all the researchers in the University of Helsinki are welcome to participate. If you haven’t answered but would like to, please visit the survey: https://elomake.helsinki.fi/lomakkeet/71594/lomake.html. The survey will end on 15.7.2016.

3. Autumn 2016

The Project MILDRED will organize on autumn meeting of the user group and you will be invited. More information on August when next user group email will be send. Themes of the meeting are for example use cases, possible research data service ideas and concepts.

4. Have you some points of view you like to say to everybody but don’t know how?

If yes, you can always offer texts to the MILDRED’s blog (https://blogs.helsinki.fi/mildred/). If you have some idea, please don’t hesitate to contact us via email (project-mildred@helsinki.fi).

Have a nice summer and don’t forget to rest!

MILDRED’s Birthday

The Project MILDRED was launched in a kick-off event at the end of April. With MILDRED the University of Helsinki will update its research data infrastructure to provide researchers with a state-of-the-art services.

MILDRED will offer tools and services for supporting the management, use, discoverability and sharing of data.

Goal in the strategic core

The kick-off event focused on the goals of the project: Research data management, open science, democratizing the data mining and decision making based on facts and values. MILDRED is about true collaboration.

The event was opened by Vice Rector Keijo Hämäläinen who stated that the paradigm is changing in many fields of science. The cahnge is enabled by digitalization and the open science movement. There is an on-going transformation in how research is performed, researchers collaborate, knowledge is shared, and science is organized. Vice Rector Hämäläinen emphasized that open science is both the top and the middle of the strategy at the University of Helsinki. The university will invest in fostering open science and developing services for researchers.

The University of Helsinki is ranked number one in open science in Finland, but as vice rector Hämäläinen said: “We can become even better”

Keijo Hämäläinen’s opening words

Secretary General Pirjo-Leena Forsström from the Open Science and Research Initiative described how openness is strengthening science and empowering scientists. Forsström also presented so called science accelerator which is based on a process model of the open science.

Professor Mikko Tolonen and Principal Investigator Minna Ruckenstein then discussed open data and the tools for data management in humanities. Ruckenstein characterized the environment where people with different skills can come together and collaborate as the basic need for open research.

Minna Ruckenstein and Mikko Tolonen discussing data management in humanities.

“Science without data is merely an unsupported hypothesis”

Project Director of Integrated Carbon Observation System Research Infrastructure, ICOS Ari Asmi presented his reasons for being all for open science. Asmi demanded more fact based decisions noting that science is about being open to the critical comments of others. As the saying goes, science without data is only an unsupported hypothesis.

Ari Asmi explaining why he’s for open science.

Chair of the Steering Group of the project, professor Mikko Tolonen, introduced five subprojects of the MILDRED; Digitalization of Research Data Services Delivery, Data Repository Service, Data Publishing and Metadata Service, Data Storage and Backup and Implementation of Data Management Planning Tool – Tuuli. After the introduction participants had a possibility to discuss the project topics in an Open Space session.

As a closing remarks the chair of the steering group encouraged the participants to encage their colleagues who were not able to attend the kick-off, and send the good word around on open science, research data and MILDRED.

Reach MILDRED online

The Project MILDRED has a blog and a Yammer-group. MILDRED can be followed on Twitter: @ProjectMildred. News and updates on the project will also be delivered to a mailing list. Please, contact the project service email if you want to join the mailing list. The service email for the project MILDRED is project-mildred@helsinki.fi.

MILDRED encourages all projects, groups, departments, faculties to contact the project team and discuss their data related needs. Project MILDRED can be invited into a project group meeting.

MILDRED Kickoff

Would you like to participate in developing the UH research data infrastructure?

Yes? Great!

The development project – MILDRED – will be launched in kickoff event. Come and learn more about the project and join the user groups!

Date and time: April 29, 2016, from 9.00 am until 12.00

Venue: Runeberg Hall, University of Helsinki Main building, 2^nd floor

Programme:

9.00	Coffee
9.15	Welcome	Vice Rector Keijo Hämäläinen
9.30	Open Science and Research Initiative (ATT)	Secretary General, Pirjo-Leena Forsström, CSC
9.40	Research Data Management in Everyday Life of the Researcher	Principal investigator, Minna Ruckenstein, Consumer Society Research Centre and Professor Mikko Tolonen, Department of Modern Languages
9.55	I’m For Open Data because…	Project Director Ari Asmi, Integrated Carbon Observation System
10.10	What is MILDRED?	Professor Mikko Tolonen, The Steering Group Chair
	Open Space – Digitalization of Research Data Services Delivery – Data Repository Service – Data Publishing and Metadata Service – Data Storage and Backup – Implementation of Data Management Planning Tool – Tuuli	Senior Adviser in Research Administration, Eeva Nyrövaara, Research Affairs Presentation
	Feedback from Open Space & Next Steps	Project Manager, Ville Tenhunen, IT Center
11.40	UH DataSupport & RDM guide & DMPTuuli	Service Director Pälvi Kaiponen, and Information Specialist Mari Elisa Kuusniemi, Helsinki University Library Presentation
11.50	Closing Remarks	Professor Mikko Tolonen, The Steering Group Chair