When: 23-24 November 2023
Where: University of Helsinki, Minerva Plaza, Siltavuorenpenger 5 A, Helsinki, Finland
Livestream: https://video.helsinki.fi/unitube/live-stream.html?room=l5

The detection of similar passages of text in a large corpus, often called text reuse, is gaining popularity as a methodological step in Digital Humanities research. Intertextual relationships  have been an object of interest for various humanities disciplines for a long time, but their discovery with traditional methods is usually very labour-intensive. Automatic text reuse detection provides results that are easy to interpret and can readily lead to new insights especially for less studied corpora.

A lot of research has been done on the text reuse detection methods, but the understanding of the results and their application in answering research questions often seems very context-specific. Furthermore, a most typical application of text reuse is as a discovery method to find examples, on which a qualitative argument can be built. There seems to be a need for a more developed theory about the phenomenon of text similarity and insights that can be gained from a large-scale quantitative analysis of it.

The purpose of this workshop is to bring together researchers from various DH projects that have employed text reuse, to answer the leading question: Is text reuse more than a discovery method? Can we progress towards a more formal theory of what kinds of patterns can be found in text reuse data and how to interpret them?

Proposed topics to discuss include but are not limited to:

  • Measures to quantify text similarity and their distribution
  • Kinds and degrees of similarity (verbatim copy, paraphrase, translation, oral transmission etc.) and their quantitative characteristics
  • Quantitative methods to analyse the spreading of texts in time and space
  • Measuring the influence and reception of a text
  • Network analysis of graphs of text similarity
  • Bridging quantitative and qualitative analysis

The workshop is organised by Maciej Janicki (University of Helsinki, FILTER project) and Mikko Tolonen (University of Helsinki, head of the Computational History group, HPC-HD project) in collaboration with Dariah-FI and the project Informationsflöden över Östersjön funded by SLS.