Collections as Data event on 19-20.2.2024

KBR – Royal Library of Belgium and Europeana Foundation created this event to enable sharing so called data space development from across different National Libraries, (G)LAMs and / or researchers.  The two days were full of interesting keynotes, talks, pitches and even couple of more interactive sessions in order to peak to the past, ongoing or future development projects happening in the cultural heritage space.

Dilawar Ali talks about ML and computer vision and how those can help searchability

The presentations panned a wide range of development happening in the national libraries within Europe and UK. The material types were quite versatile, from books, newspapers and even though methods could have differences there were also usual challenges with OCR, layout analysis, and the generic variance of historical data. Then again, all of this variability was also visible in the researcher needs towards the data existing in the national libraries. The shape of the data space started to form at least in the minds of the participants, when we started to talk about how to define data set, and potential generic workflow in creating datasets , which we got opportunity to give feedback on. Then datasheet concept was introduced by Europeana’s Digital Cultural Heritage working group, where idea is that datasheet contains metadata of creation and provenience of the dataset – all the little details, researchers typically can ask when they really start to use some dataset.

We also noticed lots of challenges and tasks to work on.

Open space flip board talking about rights

For example the open space discussion on rights statements, copyrights, how to label data and handle datasets containing both in-copyright and copyright-free materials are something that we need to discuss more and one suggestion was to educate ourselves and others on the needs of the libraries in order to serve the researchers of the current day in a better way.

All in all, very good two days. It gave some new ideas, which could be easily piloted, adapted or experimented in a small scale at least. Workshop organizers will also create a report on the discussions and information gathered, which could act as an action plan for collaborating across data spaces could work better between cultural heritage and open science, e.g. via EOSC, CLARIAH-infrastructures, so that reproducibility is ensured but also the long-term aspects of the “care” and versioning of the datasets.