Enthusiasm about the new horizons – automatic data science adapted to European history and its everyday documents: newspapers

The NewsEye project builds an automatic Research Assistant based on artificial intelligence. Research will benefit from the tools and possibilities of searching and browsing huge amounts of text material. Kick off meeting of the project was held in France in May.

About 30 computer scientists, digital humanities scholars and library professionals gathered in the historical city of La Rochelle (France) in late May to discuss the start of a Horizon project and new ways of finding answers to questions about European history.

NewsEye in a nutshell
  • Time: 1.5.2018 – 30.4.2021.
  • Participants: Computer science (University of Helsinki / Hannu Toivonen & group, University of Innsbruck, La Rochelle (coordinator) and Rostock); Digital Humanities (University of Helsinki / Mikko Tolonen & group, Universities of Innsbruck, Montpellier and Vienna); National libraries (Finland, Austria, France).
  • Budget: about 3 million euros, University of Helsinki’s share about 900 000 euros.
  • The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770299.

The NewsEye project will cover resource material of interest to scholars of different research fields: digitised historical newspapers. The aim of the project is to develop methods and tools for effective exploration and exploitation by means of new technologies and ”big data” approaches. The historical phenomena and developments in focus are nationalism and revolutions, gender issues, media and journalism as well as migration.

Artificial intelligence will be exploited in building an automatic Research Assistant. The Assistant is to find references and connections between matters in the newspapers and to explain them to the user of the service. The novelty of the NewsEye project is analyzing historical assets with new methods across different language corpora. In addition the texts will be automatically enriched by recognising names and sentiments.

The modern University of La Rochelle – the main building in the background – offered a lovely venue for the first meeting of the NewsEye project experts in May.

The project is coordinated by the University of La Rochelle and the modern – very much so in the French context – university and its host city offered a lovely venue for the first meeting. The research challenges, which will result in significant scientific advances, were vividly discussed. The datasets to be used are the key to results.

The processes to be improved in the project are in text recognition, text analysis, natural language processing, computational creativity and natural language generation. Research will benefit from Optical Character Recognition (OCR) / automatic text recognition, article separation and the availability of useful tools and possibilities of searching and browsing huge amounts of text material.

How does it work and to whom is it for?

Professor Hannu Toivonen points out the automatic Research Assistant which will take advantage of the tools created in the project, search for relevant results and suggest them to the user. It can also report its findings and document its own behaviour.

The old port of La Rochelle was a landmark on the way to from the old centre to the campus area. Photo by Juha Rautiainen.

The user might want to look for information on the history of his family and gives his own name as a search term. The Assistant will find out that the search term is indeed a family name, compare its context to that of other family names and report about the special connections of the given name.

For the national libraries the improvements in processing the historical data mean improving the usability of digitised resources and strengthening cooperation with scholars from both computer science and DH, Minna Kaukonen from The National Library of Finland comments. The Finnish materials in the project will cover all the Finnish newspapers published in 1771–1917 and digitised at the Library’s own unit in Mikkeli.

Join in – testers are being sought!

An important part of the project is dissemination and the practical testing of the text mining and analysis tools by scholars, students and citizens. The tools to be developed need users and you have the chance to become a tester in due time. Contact persons of the NewsEye project in Finland are: Hannu Toivonen (Computer Science), Juha Rautiainen and Minna Kaukonen (National Library of Finland), Jani Marjanen and Mikko Tolonen (Digital Humanities). Join the team and follow us: #NewsEye / @NewsEyeEU newseye.eu.

Minna Kaukonen
Minna Meriläinen-Tenhu
Juha Rautiainen

Kaukonen works as head of planning at the National Library of Finland. Meriläinen-Tenhu works as science communicator at the Communications and Community Relations at the University of Helsinki. Rautiainen works as information systems specialist at the National Library of Finland.