So, how did it all start in the publication and metadata services planning back this June? As a preliminary groundwork, an inventory was conducted of the research data repositories that UH researchers currently use. The sampling of 211 peer-reviewed articles was gleaned from the open access journal family PLOS which requires authors to give a specific data availability statement as an integral part of the article form. The articles had been published in one of the PLOS journals between January 1, 2015 and June 6, 2016. These included PLOS ONE, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Deseases, and PLOS Pathogenes. A smaller sampling was also gathered using 20 articles in Web of Science and the journal Scientific Reports each. The aim was to compare results from different sources and to identify possible other data repositories that did not appear in the principal PLOS sampling.
Of all the PLOS articles, only five lacked explicit data repositing statement. Of generalist repository services Figshare, Dryad and GitHub appeared in more than two cases. 85 % of all articles shared data partly or completely through Figshare, most commonly in the article’s supporting files with Figshare’s cloud service curation. In addition to this, PLOS offers authors server space for sharing the data by embedding it within the article. Domain-specific repositories such as ArrayExpress, European Nucleotide Archive and Sequence Read Archive appeared in several articles. However, a notable dispersion prevailed, with 29 of all 37 named repositories appearing only once. These other domain-specific repositories included e.g. Cancer Genome Atlas Data Portal, The IUCN Red List of Threatened Species, The Natural Resources Institute Finland, and UK Data Archive.
In accordance with the PLOS sharing policy, 45 % of all articles stated to have included all relevant data within the paper and its supporting files and 13 % stated that all data was to be found in the article itself. 23 % of the articles expressly stated to have made all relevant data available. 11 % stated that the data was partly behind restricted access, and all data was restricted or not shareable in only 5 % of the cases.
The inventory formed basis for the research data repository survey sent to researchers later in June and again in the beginning of July. In future, we will use our increasing knowledge about repositing practices and preferences to harvest existing UH data. More news about the inventory and the survey will follow by August. Please stay tuned…