NLP in the Humanities

Digital Humanities is a research area that combines computing, digital resources and the traditional disciplines of humanities. Human knowledge is to a large extent stored and promoted through natural language and, hence, language technology is a natural ally of the field of digital humanities. Different disciplines in the humanities and social sciences such as history, sociology, and even philosophy provide new challenges and ideas for natural language processing and text mining. Computational linguistics itself combines findings from general linguistics, computer science, mathematics and statistics among other research fields and the language technology group at the University of Helsinki emphasizes the development of NLP methodology and technology in the humanities. Read news and posts related to NLP in the humanities.

Relevant Resources:

  • FIN-CLARIN: Integration of the Language Resources in Finland
  • Kielipankki: The Language Bank of Finland
  • Korp: A Web-Based Concordance Tool
  • LAT: Language Archive Tools
  • Natas: A Python library for processing historical English

People who are involved in related research are:

Previously in the group:



  • Publications by Krister Lindén
  • Publications by Yves Scherrer
  • Publications by Tommi Jauhiainen
  • Publications by Niko Partanen
  • Publications by Jack Rueter
  • Tiina Lindh-Knuutila and Timo Honkela (2015): Exploratory analysis of semantic categories: comparing data-driven and human similarity judgments. Computational Cognitive Science, 1(1): 1-25.
  • Hardwick, S., Silfverberg, M., & Lindén, K. (2015). Extracting Semantic Frames using hfst-pmatch. In Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA 2015.
  • Kettunen, K., Honkela, T., Lindén, K., Kauppinen, P., Pääkkönen, T., & Kervinen, J. (2014). Analyzing and improving the quality of a historical news collection using language technology and statistical machine learning methods. In IFLA World Library and Information Congress Proceedings 80th IFLA General Conference and Assembly.
  • Honkela, T., Korhonen, J., Lagus, K., & Saarinen, E. (2014). Five-dimensional sentiment analysis of corpora, documents and words. In Advances in Self-Organizing Maps and Learning Vector Quantization (pp. 209-218). Springer International Publishing.
  • Honkela, T., Raitio, J., Lagus, K., Nieminen, I. T., Honkela, N., & Pantzar, M. (2012). Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. In Neural Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-9). IEEE.