Quantitative analysis of text reuse: Towards a methodology – A Workshop on 23-24 November 2023 at Minerva Plaza, Helsinki

When: 23-24 November 2023
Where: University of Helsinki, Minerva Plaza, Siltavuorenpenger 5 A, Helsinki, Finland
Livestream: https://video.helsinki.fi/unitube/live-stream.html?room=l5

The detection of similar passages of text in a large corpus, often called text reuse, is gaining popularity as a methodological step in Digital Humanities research. Intertextual relationships have been an object of interest for various humanities disciplines for a long time, but their discovery with traditional methods is usually very labour-intensive. Automatic text reuse detection provides results that are easy to interpret and can readily lead to new insights especially for less studied corpora.

A lot of research has been done on the text reuse detection methods, but the understanding of the results and their application in answering research questions often seems very context-specific. Furthermore, a most typical application of text reuse is as a discovery method to find examples, on which a qualitative argument can be built. There seems to be a need for a more developed theory about the phenomenon of text similarity and insights that can be gained from a large-scale quantitative analysis of it.

The purpose of this workshop is to bring together researchers from various DH projects that have employed text reuse, to answer the leading question: Is text reuse more than a discovery method? Can we progress towards a more formal theory of what kinds of patterns can be found in text reuse data and how to interpret them?

Proposed topics to discuss include but are not limited to:

Measures to quantify text similarity and their distribution
Kinds and degrees of similarity (verbatim copy, paraphrase, translation, oral transmission etc.) and their quantitative characteristics
Quantitative methods to analyse the spreading of texts in time and space
Measuring the influence and reception of a text
Network analysis of graphs of text similarity
Bridging quantitative and qualitative analysis

The workshop is organised by Maciej Janicki (University of Helsinki, FILTER project) and Mikko Tolonen (University of Helsinki, head of the Computational History group, HPC-HD project) in collaboration with Dariah-FI and the project Informationsflöden över Östersjön funded by SLS.