So, the after the workshops, the DH2015 finally took off. These are my observations from the first conference day.
The first speech of the day was held by Jeffrey T. Schnapp. His keynote lecture was titled as Small Data (the intimate lives of cultural objects). Usually, I tend to miss the key messages of keynote lecture and this time wasn’t an exception. This time, I am blaming more my flue than the speaker himself. In my opinion, if I got it right to any extent, his key message to the audience was that discourse over big and small data should be replaced with big and small data orientation. After all, the big data is built of small data and they are firmly linked with each other’s. This wasn’t a unique, but at least very welcomed statement in deed. I remember at least Stanley Fish arguing in 2012 almost the same as Schnapp. There’s still a firm dichtotomy in my home organization when it comes to big and small data, but times will change, I believe, as we become more experienced in this field. In the meanwhile, I also keep my hands as crossed.
After keynote lecture and “coffee” break, the parallel sessions finally begun. I had chosen to attend a session, in which three different platforms for text alignment of medieval manuscripts we highlighted. The manuscript annotation, or HTR (Handwritten Text Recognition) in general, is becoming a hot topic in Finland too and we need to face the question sooner or later, so I didn’t mind expanding my knowledge on this field.
Dominique Stutzmann (From Text and Image to Historical Resource: Text-Image Alignment for Digital Humanist) was the first to present their solutions, created in the Oriflamms project. They had managed to apply their approach with two medieval datasets, Graal, that contained 130 pages of manuscripts from the 13th century and 104 pages from the Fontenay collection from the the 12th and 13th century. Some results on aligning the word and character can be studied here.
Fouad Slimane (Text Line Detection and Transcription Alignment: a Case Study on the Statuti del Doge Tiepolo) from the École polytechnique fédérale de Lausanne presented a fully automatic system for the transcription alignment of historical documents and introduced the ‘Statuti del Doge Tiepolo’ data that include images as well as transcription from the 14th century written in Gothic script. Nice and neat work.
Another Swiss-based talker, Hao Wei (DIVADIAWI – A Web-based Interface for Semi-automatic Labeling of Historical Document Images) from the University of Fribourg, presented their DivaDiaWI tool that had, by a glimpse, a plenty of benefits to the others. Their solution is a web application, so no installation is needed. That’s the approach I like. I could foresee that this solution could be used back in Finland too, though the everlasting question on overlapping polygons seem to be unresolved. Maybe TILT is the solution here? Or maybe the co-operation between two Swiss projects? For me, two ponies doing the same trick seem like a waste of resources. Anyway, have a look at the video presentation for more detailed information on DivaDiaWI and other Diva services here.
The afternoon session was dedicated to the languages. I was expected to hear about revitalization and I wasn’t wrong.
Nick Thieberger and Conal Tuohy from the University of Melbource (Encoding Vocabularies of Australian Indigenous Languages) were speaking about their coding solutions in the Digital Daisy Bates project. Story is fine. There’s a a collection of native vocabulary, “collected” by Daisy Bates in 1904. The collection contains about 8,16 shell meters of questionnaires, interviews and manuscripts in aboriginal languages from wide geographic area in Australia. Thieberger and Tuohy have used TEI for enabling the geotagging of words in each (tracable) individual language. The outcome is a rather nice geomapping of words Here’s an example on emu’s presence in aboriginal languages: http://bates.conaltuohy.com/word-maps/#emu
Have you ever been interested in historical name entities in Chinese? Well, then the paper of Chao-Lin Liu from National Chengchi University (Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach) could have been a top hit. He presented results of expanding the contents of the China Biographical Database by text mining historical local gazetteers, difangzhi. The gazetteers are the single most important collection of names and offices covering the Song through Qing periods, says Liu. The goal of the database is to see how people are connected together, through kinship, social connections, and the places and offices in which they served. This is quite a complex way of matching the elements together, but it seems that they have succeeded to go over the hurdles.
Cathy Bow from Charles Darwin University (Bringing to life the Living Archive of Aboriginal Languages) informed us about the Living Archive of Aboriginal Languages, which is a digital archive of endangered literature in Australian Indigenous languages from around the Northern Territory. Back in time, there were around 450 aboriginal languages in Australia, but only 120 are spoken today and out of them, only 13 languages can be classified as strong. During the era of bilingual education (from the 1970s to the year of 2000) in the Northern Territory, many books were produced at 20 Literature Production Centres in more than 25 languages. These materials are both widely dispersed and endangered, and contain interesting and significant stories in Indigenous Australian languages. They have tried to collect, digitize and publish this material online within the restrictions of copyrights etc. They have nicely built collections based on the region and a map, click on the book to retrieve the book as PDF. Find here additional information on the archive and its functions. Language revitalization par excellence!
I was intended to attend the poster session in late afternoon, and eventually I did, but about a half an hour before it began. No-one was on the stands and many questions were left unanswered. Around 4 pm, when the poster session took off, I had already found a couple of old friends in auditorium foyer, where I spend the next two hours having conversations from here to there and, thus missing the poster session completely. Shame on me.
Tomorrow, I am contributing to this event and presenting a long paper for the audience. I will be discussing the concept of nichesourcing in our project’s context. The location is EA Building, G.34 Room and the session will start at 11.00 am. See you there.