The DH2015 is taking place during this week in Sydney, Australia. Digitization Project of Kindred Languages will be present here as I was enabled to have a long paper on Nichesourcing of Uralic Languages later this week. Yesterday and today, I was attending the pre-conference workshops. This is a brief summary on my experiences in three workshops.
On Monday morning, I participated the Search for Needles in DH Haystacks, which was led by Andreas Witt from the Institute of German Language. As usual, my approach in conferences and workshops is quite a simple and I tend to bear in mind this broad guideline: how our project (or other functions at the National Library of Finland) could benefit out of this?
Witt’s workshop was split into two parts. The first one was focusing on the Xquery, which was highlighted with the help of OxygenXML editor. Despite the fact that we hadn’t much of time to play around with OxygenXML editor, I could foresee that the provided tool could support us in creation of cross-checked wordlists in various Uralic languages. The core problem in our case is that we do have both, sets of enhanced and uncorrected words, which hypothetically contain same mistakes per language. Since we don’t have a “find and replace” function available in our own XML editor (Revizor), this tool could ease the pain in locating the repetitive mistakes in data.
The second leg of this workshop was dedicated to the linguistic work that is done at the Institute of German Language. Andreas Witt presented KorAP project, which aims to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The KorAP platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. Have a look at the tutorial here.
The afternoon workshop, Visualizing data for Digital Humanities, included three brief presentations and some tutorials afterwards. Pablo Ruiz had a talk about entity linking, (presentation here) Steven Gray demonstrated possibilities of Big Data Toolkit by linking with real time online data. His slides can be retrieved here. Glenn Roe addressed the identification and visualization of text reuse in unstructured corpora. Even though the two first presentations made a great contribution to the workshop, my mind was set to the text reuse.
In his briefing and tutorial session, Glenn Roe showcased the ViTA, Visualization for Text Alignment, which is a web-based tool for exploring and identifying shared passages between two sample texts. The tool has been developed for the developed for the project Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics. The project explored the paradigm shift in the 18th-century culture from the perspective of commonplaces and their textual and historical deployment in the contexts of collecting, reading, writing, classifying, and learning.
We also had a chance to give a try with ViTA and regardless of my weak attempt to analyse one chapter in two different languages (Same textbook and sequence in Komi-Permyak and Hill Mari). Due to the fact that we have digitized quite a lot of parallel texts (mainly translations of Russian textbooks in Uralic languages) in our project, I would predict that this tool could be useful for those who are interested in locating russificated words, dialectical forms, anomalies etc. in parallel texts. In my view, a slight modification of a tool is needed, since the tool is supported with certain dictionaries only. I have to play around with this application a bit more before making the final judgment though, but sounds like promising to me.
On Tuesday morning I was attending the HuNI: Building and Linking Research Collections Online workshop. /ˈhʌni/, ie. Humanities Networked Infrastructure combines data from many Australian cultural websites into the biggest humanities and creative arts database ever assembled in Australia, which sounds a bit like our Finna to me, but it actually offers functions that are missing from Finna for the time being.
The HuNI user may create their own collections out of aggregated data, make connections and create socially-linked data, save and share their data and their findings, curate and import their own data. But what makes HuNI superior to Finna is the fact that the HuNI users may annotate the data, express the relationships they see in the data and allow multiple relationships between the same entities, even though they would be contradictionary. One is able to create relations (links) between the records. The linked data in collection are also visualized rather nicely (Hit the head of Cadel Evans to open up the chart and drag the mouse on the screen to).
Have also a look at the HuNI video for additional information.
That’s all for the DH2015 workshops, the DH2015 conference is about to begin tomorrow.