#dhh19 — The day of the truth

“You ask, what is our aim? I can answer in one word: Victory.” (Winston Churchill)

00:29 — Poster completed. ? This very productive last hackathon day starts by us sending our sophisticated poster to the organizers, after hours of rearranging pictures and discussing content.

comparing printed & digital poster size

09:15 — Hot preparation phase. ? On time, the group gets together at 9:15 in the morning to work out the missing steps of our battle plan. The final presentation is completed by 13:00 and we even have time for the rehearsal.

13:20 — We’re live (streaming)! ? The moment of truth is approaching. We listen to the presentations of the Brexit and Parliament groups and have some digital humanities yoga exercises. Then the time has come.

15:00 — Finally presenting. ? All the blood, sweat and tears of the past ten days are unifying in this one final presentation. The whole team stands in front of the DHH19 audience and we are ready to share our findings with the rest of the world. Not only have we found answers to our research questions, but we basically revealed a new path for identifying genres based on named entities. Plus, we received valuable feedback that can be included in subsequent examinations.

16:15 — Wine and poster session. ? The Newspapers group has finished with their presentation as well and we pass over to the poster session. Wine and snacks create a pleasant atmosphere that facilitates an open discussion of results, questions and philosophical concepts with external visitors, but also amongst participants.

#teamspirit @ poster

18:00 — Socializing. ? In company of countless thirsty scholars, we recapitulated this last day at the “Thirsty Scholar”. Later on, some very persistent individuals of the #genreandstyle group experienced authentic Finnish karaoke, joining in to classical Finnish songs with complex character combinations.

Conclusion. In the end, the Digital Humanities Hackathon 2019 meant a lot more than “just another hackathon” to us. We received all the essential resources for our project (data, rooms, technical equipment, coffee (!) and experts) and it was up to us to employ them rationally. From this perspective, the Digital Humanties Hackathon was also part of an extensive learning process about the other group members as well as yourself and about how to communicate and coordinate in such an interdisciplinary and diverse environment.

Therefore, we would like to say “thank you” to the organizers, sponsors, participants and everyone else who was involved in making this event possible. We hope, to see you again someday (DHH20?) and are looking forward to exchanging sentimental memories.

This blog post was written by Sophie Schneider (@BibWiss), who graduated in library science (B.A.) at Potsdam University of Applied Sciences. She intends to enroll in a master programme in information science for the upcoming semester.

The Genre and Style Datastorm

Some of the graphs extracted by our data

Prepare your cyber-umbrellas, it’s raining data on the 7th day of the hackathon!

After running some algorithms to process data all night, we managed to extract 110k named entities from our 3 subsets (History, Religion, Social Sciences).

Those entities had to prove their worth to be included in our final data. Their first trial was a matching with the DBpedia database to ensure that our name-dropped people had some kind of ID correspondance, like checking passports at the airport (sorry “the coffee man”, you didn’t make it past this trial).

The second trial consisted in the extraction of birthdates from the URI of DBpedia, to see if our entities were old enough to actually be written about in the 18th century, like checking the IDs of people before letting them enter in a disco (sorry Obama, only 200+ years old are admitted this time!).

The 5k names which survived thought that this extremely brutal selection was over but, guess what, it wasn’t. The remaining entities were then presented to the final, indisputable, strict commitee of Veera, Selina and Annika (i.e. the name-dropping police). Jonathan Swift’s lawyers already contacted our team: his popularity in the corpus decreased a lot after this qualitative analysis. I guess we’ll see ya in court, Jo!

Once the struggle to filter the data was over, we started creating new networks, graphs, statistics, wordclouds, and historical gossip (of course) throughout our subset in clusters of time periods.

Ispired by the subject of our project, it’s important for us to name-drop too:

This work has been made possible by funding from FIN-CLARIN, DARIAH, HIIT, and the Faculty of Arts at the University of Helsinki.

Dariah Logo
Clarin Logo

You can close your umbrellas , my fellow readers, the storm is over.

Now it’s the time to fit our results into the final poster that will be revealed tomorrow.

Just like after every storm, we have data-rainbows too 🙂

Wordcloud of words correlated to Charles II — One of the most popular entity in the Social Sciences SubGenre

This Blog post was written by Bruno Sartini, second year master degree student in Digital Humanities and Digital Knowledge at the University of Bologna, Italy.

Guess what day it is… ?

We are now more than half-way into the hackathon. The hump is embodied by an interim presentation on the group’s progress in the afternoon.

In the morning, we extracted subsets of the data containing keywords, and utilised collaborative data verification technology (i.e. the group eyeballing through a Google Sheet) to check results

The keyword of the day is “keywords”. We extracted keywords for each title, then use them to discover patterns for the topic clustering.

The level of intrigue featured in the 18th century history circles #spoilers

In the later half of the day, we also got the top named entities mentioned in publications, decade-by-decade. The idea is that books of a similar nature would refer to a similar set of entities. Some colourful and fabulous graphs for the topics are coming up in the very near future.

Presenting some initial results, with impressive visual effects

At 4pm, it’s presentation time. Each group gave an update on the progress and presented intermediary results. Saving the best for the last, our very animated presentation drew the day towards a highlighted close.

Well, the working day, that is. At 6pm, we all headed to the marvelous main building of the university — as an annual tradition for each iteration of the hackathon — to wine and dine at the archive where collections of theses on Fennistics were enshrined.

Social evening at the Morphological Archive in the main building of the university at the Senate Square

Afterwards also as a tradition, the socialising continues on at the aptly named “Thirsty Scholar”, with a beer garden and a view of the Helsinki Dome.

 

This blog post was written by David Rosson, graduate-to-be of a double-master’s program in Human-Computer Interaction at TU Berlin and Aalto University.

The End is Nigh, It’s Day Five!

Another day of Hackathoning is nearing its end, and the Genre and Style (tweeting @GenreAndStyle)group is operating at full speed. Our NER (named-entity recognition) approach is providing us with data to analyze as well as a great deal of choices to make. Today we’ve been looking at the names recognised in our subsets of the ECCO data. We are using external databases (mainly DBpedia) to verify names from the list of entities, which includes a rather large variety of strings — it is 18th century text OCR’d from various microfilm copies, after all. The goal for today has been to create preliminary networks to look at, and that was accomplished in the afternoon. Tomorrow we will among other things try to incorporate another subset of data to compare the NER profiles of the datasets.

From the perspective of a DH novice, the week has been wonderful and bewildering. Many aspects of the research we are doing seem enticing, but the time limits and intensity of the hackathon largely deny the possibility to immerse in a technical area one is not acquainted with. However, the discussions and planning sessions (as well as the input of friendly visiting Hackathon researchers) present a wonderful opportunity to bridge together research interests, types of data, and humanities substance on the one hand, and computational and digital methods on the other. Furthermore, the interesting task of visualizing and communicating the crucial message and findings of the project are yet ahead of us (visualization in particular being an aspect of data sciencish research I am personally not well-versed in). The Hackathon is certainly an experience I would recommend to all driven humanists.

18th century writers can have a thing or to teach to pseudonimity-seeking Social Media aspirants

Hopefully we will be able to really immerse ourselves with the results of our entity linking efforts before the week reaches its end. Inspecting the content of our object of study, historical texts, in depth and in all its interactions is after all what many of us aim for.

Yours Sincerely,
Gentleman Who Has Made Reading His Diversion Upwards Of Twenty-eight Years

This blog post was written by Aleksi Jalavala, an MA student at the University of Helsinki, who is pushing himself into new territories at the Hackathon as well as in studies in Digital Humanities in general.

Let’s get in formation

View from the Uni cafeteria. Not bad, one might say.

And so it’s a wrap on day 4 of Digital Humanities Hackathon 2019. After Friday’s presentation, in which we introduced our preliminary research plan and the research question itself, it was clear that we needed to do some serious reformulating. Today we have continued to work hard on finetuning the research question and the actual research.

As was mentioned on Friday’s post, the Genre and Style group decided to go on a new path with regards to our focus and the way of going about it. Our research stems from a (newly-found) interest in exploring firstly, what named entities can be found in 18th century publications, and secondly, whether genre-classification of these texts can be done on the basis of possible patterns that clusters of named entities comprise. At the moment, the named entities we are placing our primary focus on are proper nouns referring to people. However, further elaborations and updates on this can be expected on the blog as the group’s work proceeds.

 

The group members doing their thing.

In addition to this, the day consisted mainly of independent work; the computer scientists among us were busy with, for example, compiling sample sets of the data retrieved from ECCO so that the applicability and accuracy of the named entity recognition (NER) algorithm can be examined. Further elaboration of that can maybe be expected in tomorrow’s post. While waiting to get our hands on the quantitative results in the making, others conducted various preparational work, such as literature review on articles discussing genres in early modern English texts as well as NER and text mining. Feels good to have this show on the road!

The day shall end with popcorn and a group viewing of the season finale of Game of Thrones in the ComHis group’s office. So as you can understand, I gotta go now.

This post is written by Annika Pensola, a first year student in the Mas­ter’s Pro­gramme in Eng­lish Stud­ies, Faculty of Arts in the University of Helsinki. Currently feeling inspired by the international and multidisciplinary composition of the hackathon and excited for its results.

DH Helsinki Hackathon is a rollercoaster

A quick update on the Genre and Style group in the DHH19. The discussion sessions yesterday ran late, so this post arrives with a small delay.

Day 3 of the hackathon (or day 2 of the blog) has been a great example of why these gatherings can lead to unexpected results — we started the day with one question in mind, but along the way decided on something quite different. In our group we are trying to use digital humanities tools to study the variations in genre and style in 18th century digitized English texts.

On Friday, we had to come up with a formulated research question and a plan for our research based on this. A well-formulated plan would naturally keep us in focus, and help us do solid and reasonable work during the week.

Our task for Friday

This is not of course how our team works! 🙂 We arrived this morning diligently — at 9.15 AM — to study the genres in relation to gender. However by 10.15 we had found out that out of all the texts available to us, a very small proportion was written by women (less than 5%). This would make it quite difficult to compare with other texts.

Additionally, we have been made aware that the metadata on the texts is pretty simple and unreliable, and the OCR quality of the texts makes it difficult to do many types of comparisons (although many analyses work surprisingly well even for bad quality texts).

So, we started thinking once again, how could we compare these texts and how could we get to the genre within these texts. Now we already knew each other a bit, what we were interested in, and what we could do, and could more comfortably discuss an interesting topic to study.

So, by 11.15 we came up with a different topic — what if we could look at 18th century texts as a type of network, with similar texts connected to each other. Texts on the same topics would mention the same people and would be discovered by these means. In this case, we could talk about genres independently of the metadata that had uncertain quality and were marked up quite unevenly.

Named Entity Recognition finds entities from text through various heuristics and algorithms.

By 12.15 we were testing Named-Entity-Recognizers on the old texts, and by 13.15 we had almost forgotten to eat lunch because of it. Eventually we came up with a plan that seemed to make sense from a humanities perspective, seemed to be feasible technologically, and most importantly seemed within reach of our group and interesting enough to try to do.

So, by 14.15, we had come up with a research strategy and ran initial tests, and by 15.15, we were ready with our slides for a presentation. We got some tough (but necessary) questions from the instructors and the audience, and got a good way to move forward. Now, unless basic steps of the plan fail, we would each have something interesting to do with the data.

Our research question.

We planned to have a working meeting after the presentations at 16.15, but walking through the outdoors even briefly — it was 18 degrees of warmth outside (i.e. feels like 30!), this turned into a meeting in the park, which gradually moved into more informal discussions (see illustration below).

The overtime working team avoiding being captured on the photo.

So, within just a few hours, we explored a lot of data, came up with another research plan, formulated it, and made plans for next week. Having gotten to know each other over some time already, discussing also plans and topics has become easier and easier — also looks like we should get some very interesting results!

So, turbulent times in the hackathon. Catch up with us here next week* with more info!

Kirnu roller coaster in Linnanmäki amusement park in Helsinki

*- Technically, even this post is cheating, since we have been given strict instructions not to do any work during the weekend. However, as we prepare our minds for the week ahead, it’s good give a quick overview of where we got.

This blog post was written by Peeter Tinits, a last year PhD student at Tallinn University and a digital humanities grunt in University of Tartu, in Estonia. Attending the hackathon from abroad for the learnings and the funs.

This blog post is also published on Medium (19/5/2019): https://medium.com/@GenreAndStyle

Digital Humanities Hackathon 2019 kicked off!

The Helsinki Digital Humanities Hackathon or DHH is celebrating its fifth year. The week and a half long hackathon brings together researchers and students from computer science, data science, the humanities and social science to work on an interdisciplinary research project. This year DHH welcomed international participants to take part in the event!

In the series of blog posts coming this week from our Genre and Style in Early Modern Publications group, we will recap our experience in the hackathon (including all triumphs and tribulations) with short introductions of ourselves, so stay tuned!

The event started on Wednesday 15th of May with general introductions and division into groups. The day begun with brainstorming and formulating a research question and getting to know each other on the side. Today, on the second day of the hackathon, we continued our brainstorming actions while getting familiar with our eighteenth century publications data. At this point we don’t have a specific research question to present to you yet, but we have been spitballing with ideas about genres and gender, e.g. to compare female and male authors’ style of writing and possibly creating some kind of classifier that predicts the gender of the author of the publication.

 

Image
Our day started with a little tea-on-laptop incident, but the laptop survived!

We continued with extracting the data from ECCO database through the Octavo API and faced some hiccups with the server crashing a few times but that certainly did not discourage us and we continue to work hard on the this. Next, we are planning on creating some simple statistics of the data in order to explore it more efficiently.

At the end of the day we were treated with an interesting lecture about 4D Modeling and SBIM by Anthony Caldwell from HumTech UCLA.

This is our first hackathon and we are looking forward to the coming days with excitement!

This blog post was written by Selina Lehtoranta, a first year Data Science master’s student from University of Helsinki, Faculty of Science and Veera Oksala, a first year student in the Master’s Programme in English Studies, Faculty of Arts in the University of Helsinki.

This blog post was published first on Medium (17/5/2019): https://medium.com/@GenreAndStyle