ScanTent – quick review (part I)

Posted on 13.3.2020 by TuulaP

We’ve had ScanTent available for evaluation and experimentation for some time in the DH projects. We wanted to see a) what ScanTent could do (or facilitate) b) how it generally could work in helping digitization in different kinds of materials for/by researchers, c) what kind of mobile camera qualities would be sufficient and d) how the process beyond ScanTent would go onwards.

ScanTent in its pouch

Introduction – What is ScanTent?

Scantent is basically as name suggests, it is actually a tiny table-based ‘tent’ (small cloth-covered area) on top of which you can put a mobile phone or a light-weight camera to take pictures of the material within the tent.

Unboxing

As we got one of the early prototypes from the first manufacturing batch, the package contained: a pouch for ScanTent, ScanTent base fabric (black) and cover cloth (white), the support ‘sticks ‘and a battery for the led lights. So it is quite small and light-weight package, that you can easily carry with you.

Setup

There was no leaflet or setup document within the package, but the setup was quite self-evident (with some learning on-the-fly). Sticks go to the corners of the cloth to assembly it to the proper position and then to top part’s camera ‘plate’. Personally, I believe the setup is two-person job, as some hands are required to keep previous sticks in place, while assembling the rest, but anyhow it can be done in few minutes.

It can be done by one person, but caused some noise in the office when some sticks dropped while trying to fix the others to the top part. But after some practice it goes more and more conveniently every time.

After the tent is set up , one can plug in the led lights to the package-included external battery* and you can start the actual scanning. The led lights are quite light-weight, so be sure that all of them are downwards and not tilted upwards, which can easily happen after the assembly.

ScanTent assembled, battery pack within the tent(*)

Quick Experiments of Scanning Different Kinds of Materials

Ephemera and leaflets

Some items we tried with was the ephemera material, which were locally collected during the elections and were easily available. Some concerns in them were the glossiness i.e. there was easily reflections, and varying background colors and fonts.

Books

Second use case was for books. What if there is a exam coming, the study book is the only one in the library and the loan-time is running out, what do you do? Or shortly could one scan a book with help of ScanTent? This was interesting in a sense that gave an opportunity for experimenting “multishot” or illustration series which the DocScan provides. One thing noticed, that in any bigger book, or if there is a tight binding – the challenge was keeping the book open (there was no support for that). Second thing was also keeping an eye that the image taken stays straight while swapping pages. The mobile that you can put on top of the tent is on a flat surface, which has no ‘holds’ so it can very easily tilt or move a bit out of place.

Papers

One sheet papers were in a way the easiest case, just put a paper to within the ScanTent and take the image. But a normal scanner of any kind can do the same also. So depends on what you have available.

An open book within ScanTent

The support sticks are a quite thin, and would have to consider a bit, before putting an expensive system camera on top. There is no cradle for the camera, this version has only a flat surface and accidentally bumped even mobile phone off its intended location for couple of times. Luckily phone dropped only from top of the tent to the table….

After the image scan

In fact, the real magic happens at the phase after the scanning phase. You can utilize ScanTent purely as a ‘stand’ for a mobile phone and be well off. There are even apps already, which can take the image and do OCR on that on-the-fly.

In ScanTent the image to data phase was done with the DocScan application, that is available for Androids and Apple devices. You create an account to the Transkribus platform, login to that account via DocScan app, and then choose to send images for further processing to Transkribus server (to explain it shortly).

Phone screen on top of the ScanTent

But more about that in the following post….

P.S. (*) When taking images for this post also realized that had done couple of error(s) in the 2nd assembly, but this is one of the things one learns after more usage. When assembling ScanTent for the first time, then had to utilize the ScanTent videos online, which gave hints how to do things correctly.

Heldig Summit 2019

Posted on 19.11.2019 by TuulaP

Heldig Digital Humanities Summit 2019 was on 7.11.2019. See here for program and slides: https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/heldig-digital-humanities-summit-2019 . The day was full of both research infrastructure people and researchers, so it was good place for both parties to learn about novelties in either side.

Iso kissa vasemmalla ja pikkukissoja rivissä

Tie vapauteen, 01.09.1922, nro 9, s. 9
https://digi.kansalliskirjasto.fi/aikakausi/binding/1360361/articles/3278605?page=9
Kansalliskirjaston digitaaliset aineistot

The keynotes expanded the thinking and told about current latest development. Dr Marieke von Erp presented research and work done with the Dutch newspapers. For example, some interesting work had been done in analysing food recipes of old newspapers and what do tell about the food culture. With this case sample also the quality of digitisation and text recognition errors were also mentioned as things that the researcher should be aware of .

There was also interesting work done with fiction books. A network analysis done for the characters of the book where the ‘-mark in a name can confuse the algorithm, and give even tilted results.

Time Machine – Now!

Tomi Ahoranta from National Archives of Finland presented the Time Machine project. He explained that also National Library and National Museum have joined in to Finnish Time Machine planning team. Now Time Machine concept is being fine-tuned further based on recent user survey and it is still going strong towards the next EU funding period. The question is that is this our last chance to get digitisation levels up, as the development of the techniques and work itself will take its own time.

P.S. We are hiring,so check out the job application here: https://www.helsinki.fi/fi/avoimet-tyopaikat/suunnittelija

Nimiä, nimiä, nimiä

Posted on 30.8.2019 by TuulaP

Digitalian Kansalliskirjaston osaprojektissa yhtenä tavoitteena on ollut tutkia kuinka nimiä voisi laajoista lehtiainestoista löytyä. Nimet olisivat tässä yhteydessä henkilönimiä, paikannimiä, ja yhteisönimiä. Nimet olisivat kiinnostavia koska suurin osa hauista kohdistuu nimiin ja niiden avulla voi löytää juttuja omista sukulaisista, aikansa merkkihenkilöistä tai vaikka tarinoita paikallishistoriaan. Tutkimuksen tuloksena digiin asti on saatu uusi kokeellinen työkalu ‘Nimiapuri’, jota voit kokeilla https://digi.kansalliskirjasto.fi/name-search -osoitteessa

Nimiä varten kehitettiin tapa jolla ALTO XML tiedostosta voi paikallistaa nimet ja merkitä ne sinne yhdeksi lisätiedoksi tiettyjen sanojen tai sanayhdistelmien oheen. Tavasta voit lukea lisää tutkimusartikkelista:

Ruokolainen, Teemu Petteri ; Kettunen, Kimmo Tapio. / À la recherche du nom perdu – Searching for Named Entities with Stanford NER in a Finnish Historical Newspaper and Journal Collection. Julkaisun esittämispaikka: IAPR International Workshop on Document Analysis System, Wien, Itävalta.2 Sivumäärä (Suora linkki)

Nimiapuri, Name Acolyte!

Nimihakua parantamaan olemme tehneet Digiin ‘Nimiapuri’ -aputyökalun, johon pääsee haun yhteydestä. Nimiapuri näyttää Uuteen Suomettareen tehtyjen nimienhaun tuloksia ja hausta on mahdollista valita nimimuodot, jotka omaan käyttöön ovat hyödyllisimpiä. Esimerkiksi etsimällä nimihausta Runebergin, voi hakujoukoksi muodostua esimerkkilauseen mukainen. Voikin tavallaan ajatella, että Uusi Suometar toimii oraakkelina kaikkeen aineistoon, sillä läheisyyshaku toimii kaikkiin aineistoihin vaikka niille nimientunnistusta ei ole vielä tehty. Nimiapuri voikin auttaa sekä muistamaan miten läheisyyshaku tehtiinkään, ja sitä myöten löytämään niitä viimeisimpiä harvinaisimpia osumia

Huom, huom!

Huomaathan että nimien poimintaa on tehty nyt yhdelle lehdelle, joten aineiston määrä on erittäin rajattu suhteessa kaikkeen aineistoon. Haku kuitenkin osuu kaikkiin lehtiin, joten se voi kuitenkin tuoda esille uusia löytöjä. Nimien poiminnassa voi olla myös virheitä, joten edes samalta sivulta kaikki nimet eivät välttämättä löydy. Tarkkuus on noin 80%, joka voi kuitenkin vaihdella riippuen esimerkiksi sivun rakenteesta.

Lähetä siis palautetta ja kommentoi jos löydät erikoisia osumia…

Liberin digitaalisten ihmistieteiden kysely tutkimuskirjastoille

Posted on 2.8.2019 by TuulaP

Aiemmin keväällä julkaistiin Liberin joka on kansainvälisten tutkimuskirjastojen yhteisö, järjestämän tutkimuksen tulokset, jossa käytiin läpi sitä kuinka digitaalisten ihmistieteiden (digital humanities, DH) tutkimusta eri kirjastoissa lähestytään. Kyselyyn vastasi 56 ihmistä, 54 instituutiosta jotka olivat jakautuneet 20:neen maahan.

Kiinnostavaa oli että digitaalisiin ihmistieteisiin liittyvä tekeminen rahoitettiin pääosin kirjaston omasta budjetista, ja noin 34-39% oli saanut hankkeisiin rahoitusta joko tutkimusrahoituksesta, Eurooppalaisesta rahoituksesta tai muista määrärahoista. Syynä lienee että kyseessä on uudenlainen “palvelutarve” joten kestää hetken ennen kuin rahoitusta osataan hakea ja toisaalta sitä myönnetään, joten ensikäynnistys tuntuisi vaativan omaa ponnistusta, mutta tukea kirjastoissa tunnuttiin tarvittavan rahoitukselle.

Key takeaways (p.29)

Digitointi ja fyysiset kokoelmat

Digitointi ja fyysiset kokoelmat nähtiin merkitykseltään yhteensitoutuneina. Fyysinen kokoelma luo pohjan jonka päälle digitointi voi rakentua. Digitointia pidetään jopa aivan kriittisenä tehtävänä. Kuten myös pitkäaikaissäilytystä, joka varmistaa tilanteen tuleville tutkijasukupolville.

Tämän myötä myös vaateet datan tuottamiselle kasvavat, datan tarjoamiseen tulee uusia tehtäviä, jotka vaativat myös osaamista.

DH data creation activities

Jatkuvana toimintaa tai ad-hoc toiminnan määrän kasvattamisessa on vielä työtä, tosin voi tietysti olla että kaikissa tilanteissa jotakin ei voi tehdä ja tilanteet ehkä eri vuosina vaihtelevat. Yhtenä vuonna pääkeskittymiskohde voi olla yhdenlainen ja toisena toisenlainen, joka voi datan luomisessakin vaikuttaa.

Yhteistyö, yhteistyö, yhteistyö

Vaikuttavuutta ja metriikoita myös korostettiin. Vaikuttavuudella saa yhteistyökumppaneita, joiden kanssa voi sitten yhdessä oppia sekä menetelmistä, että jakaa myös kirjastoissa sisässä olevaa tietoa kokoelmista ja niiden ominaispiirteistä. Yhteistyön myötä toivottiin myös taitojen kasvattamista, joka taas auttaa paremmin asiakkaiden toiveiden ymmärtämisessä. Vaikka väkeä on vähän, voidaan silti ehkä löytää keinoja joilla voidaan joitakin asioita tehdä paremmin.

Uusi Suometar, 01.04.1915, nro 88, s. 3 https://digi.kansalliskirjasto.fi/sanomalehti/binding/1197802?page=3 Kansalliskirjaston digitaaliset aineistot

Kiinnostavaa kehitystä on siis edessä. Liberin DH WG työryhmä olikin etsimässä uusia jäseniä pohtimaan raportista löydettyjä yhteisiä teemoja, ja niitä aletaan sitten järjestelmällisesti käsittelemään. Jos haluat lukea koko raportin se löytyy osoitteesta: https://doi.org/10.5281/zenodo.3247285

DH 2019 – notes

Posted on 19.7.2019 by TuulaP

Thanks to Mikkeli University Consortium help, the National Library of Finland was present at the DH 2019 conference in Utrecht. The conference is organized by the The Alliance of Digital Humanities Organizations (ADHO), and organized annually, usually every other year in Europe or in Americas. Netherlands was a good choice of venue, since the National Library of Netherlands is doing excellent work with the library lab development as with the researcher-in-residence program, having then much of calls for visiting them, as we got to hear in the library labs session during one lunch break. At building library labs pre-conference workshop we had a paper, outlining current work we do with the researchers (morning coffees, data clinics etc) and briefly outlining the possible future directions for improving tools and services forwards.

A poster of recent survey done about DH in European Research Libraries

About the conference

Conference was large and very popular, at the very last minute the participant amount grew to the bit over 1000 so the venue was packed. This despite there were four parallel tracks that nicely divided the participants to the different locations of the venue. The conference venue was a music hall, with halls and rooms for 30-1000 people in different wings within the hall. Usually it was just simplest to experience a session in full to secure a spot there where you wanted it.

Researchers, scholars, academics, go!

I tended to pick most of the long paper sessions, where there were academics from various universities and research institutes from all over the world. Also the fields of studies varied: data, quantitative and qualitative, machine learning used and also with many different languages and collections were targeted. Mostly digital, but also mentions of microfilm were there.

The multitude of tracks meant that there was a cornucopia of good sessions to choose from, and regardless it felt that you were missing out something after picking some session, but that is how it usually goes in these events.

One side discussion in Twitter occurred on the woes of the lack of digitisation. The situation in countries is naturally varies, some have got some millions of pages of their collections digitized and some are only starting their way.

Most striking feature of the discussion of newspaper digitisation at #dh2019 so far is the low percentage of newspapers so far digitised – between about 4% and 6.5% various colleagues suggest.

— Andrew Prescott (@Ajprescott) July 10, 2019

Heard a few , as expected, mentions of the OCR quality, and at least one, quick remark of the post correction of OCR. Newspapers, books, libraries, archives of many kinds were used (even if not all of them were named). One interesting one was the Impresso project, which talks about “newspapers as an Eldorado”, that with an addition to the NewsEye and READ (Transcribus) development could be that even more in the future.

Again impressed by the work of @ImpressoProject as presented by @maudehrmann with shout-out to CfP for conference 'Digitised newspapers – a new Eldorado for historians ?' https://t.co/WUwOCZw5pX #DH2019

— Martijn Kleppe (@MartijnKleppe) July 10, 2019

Lots of libraries were developing platforms, where the machine learning capabilities are part of the solution given to the researchers, British Library had ‘Curatr’ where there was word embeddings that work in the background, and allow reseacher to make their own small subcorpora of their research question as the tools were integrated within the presentation system. There also one main feature was to be able to variate between distant reading and close reading – you could get back to the original page which could explain why certain thing was included to the search results. Then there was authorship verification work, work with dictionaries and DH, maps, image capturing of various material types, segmentation, fake news, multi-language woes (i.e. why quite much of research is done on English corpora?) . But also to the concrete, how to get the new generation to read, how the collections should be put available online and combine different media? What kind of features would be needed from the newspaper collections, which initiated an ad-hoc meetup on Friday to collect the researcher needs and work on those onwards in the coming DH2020 conference and between.

Finland

Finnish universities were present quite well in the conference. Comhis group had several papers and posters that talked about utilizing the national bibliography, or newspaper collections. They tweeted the slides and abstracts for anyone to utilize:

All the slides & abstracts of our group’s six #dh2019 papers and a poster can be found here: https://t.co/P2A5sPykol Thanks everyone for a great conference! #helsinkiDH

— Helsinki Computational History Group (@COMHISgroup) July 12, 2019

and the interesting ways how you can utilize the digitisation process metadata in novel ways to show and tell the changes in newspaper landscape.

If you'd have wanted to see my presentation on analysing the material dimensions of newspapers but didn't fit, the presentation is at https://t.co/tJB8b5JFDm. It isn't completely parseable without the talk, but I'm happy to answer questions #DH2019

— Eetu Mäkelä (@jiemakel) July 11, 2019

University of Turku had also Comhis and Oceanic Exchanges related papers presented.

Trans-Atlantic Platform panel set to begin in #DH2019 conference @TiVre_Utrecht – @SuomenAkatemia #DIGIHUM programme represented by @MilaOiva & @hannusalmi @UniTurku pic.twitter.com/YBJqIrYKsT

— Risto Vilkko (@RistoVilkko) July 10, 2019