Notes on Digital humanities hackathon 2016

You know it is nearly summer, when the digital humanities hackathon appears to the calendar. This year it was organized on 16–20 May 2016 together with Aalto university and Helsinki university. The hackathon is the culmination of the digital humanities studies, as a project work that combines students from various fields to think a research question and possible answers with the available materials.

This year material for the hackathon had been got from multiple sources. TV program metadata from the broadcasting company YLE, newspaper content and metadata from National Library of Finland, EEBO-TCP transcribed early English books and data from Helsinki, e.g. via Museum Finna, HRI and Helsinki GeoServer.

Data, team, go!

This year the groups also had some orientation days to prepare to the whole week of extensive work. There were 4 groups each targeting to specific questions and data sets.

Team1 was the team targeting the newspapers, and their focus was the emigration from Finland at the turn of the century (1870-1910). Here interesting was the iterative way of generating training data, and improving it gradually. The reality vs. emigration discussion in newspapers provided also intriguing comparison, the flows followed each other, but interestingly started to decrease from the papers even though emigration was its highest.  Their slides for details.

Team2 was the Yle group which investigated the metadata of TV programs. The content was classified to specific program type, which was then visualized. NER, structural topic modelling were considered to the next steps. Their Poster.

Team3 explored the changing Helsinki via case of Länsi-Pasila, where the hard data of demographics was incorporated with the soft data of evidence of photos of the area. Both old and new street names were used to get relevant photos of the area via API of Finna and when combined these formed interesting story of a re-creation of an neighboroughood. Presentation of the group.

Team4 analyzed those early modern English books, where those were published, and especially how the texts were recycled. There were couple of pilot cases were e.g. civil war case the same texts appeared on both sides. There were novel solution in finding overlapping regions, by ‘straightening’ and cleaning the phrases in a unique way.  Presentation of the group.

Methods in use

Named entity recognition, topic modelling and network analysis were readily available in the toolbox of the participants when analyzing the materials. In addition, all the code is available from one location in Github: , which gives good portal to continue from the existing work. Having code openly available enables experimenting with it and see if it could be applicable to the own questions and data sets.


