Our trip to India part I. – preparations – Approaches to Digital Language Typology

Hello, world!
We are a group of students from the Linguistic Diversity and Digital Humanities program at the University of Helsinki (UH), attending the Experimental Laboratory course in Phonetics. On this account, we are documenting our progress for the project on Digital Language Typology. The project consists of collecting and analyzing speech data on the Angami language spoken in North-East India, and it is organized between UH and our colleagues at the Indian Institute of Technology in Guwahati (IITG).

After weeks of preparations and weekly meetings with our colleagues at IITG, we packed our bags and began our journey to India. We wrote everything down on Instagram and in our project diary, and now we will merge the two in the following posts.

From September to December, 2022, a regular weekly schedule for remote meetings was established, and both teams (UH and IITG) attended the meetings. One part of the meetings was used to plan and set up the excursion for fieldwork and data collection in December, 2022. This excursion took place from the 5-17^th of December, 2022, during which the UH team traveled to Guwahati, Assam, India, and then both teams went together to Nagaland, India, specifically Kohima and its surrounding areas. During these meetings, we discussed the many different practical preparations related to traveling, such as visa applications, vaccinations, flights and accommodations. During the meetings, we created an Instagram page, which has been actively updated since.

Other aspects of the meetings were preparing for and planning data collection methods. We debated questions about how much data is realistic to collect, where the fieldwork was to happen, who the participants were going to be, and what kind of stimuli we would use to build an experiment. Most importantly, we discussed the possible research questions we would explore. We realised there were many possible questions we could attempt to answer, but our focus quickly shifted to Angami tones and their realizations among different dialects.

Other than this, the meetings included introductions to the Angami language and the people, as well as introductions to machine learning and data processing. We had decided early on that the data collected should be compatible with an existing corpus of standard Angami collected by the IITG team. We also investigated some samples from this corpus for the purpose of finding the right tools for annotation and automatic segmentation.

It was also established during these meetings that one of the UH team members, Anna Busheva, would conduct her MA thesis research in parallel with this project using some of the data collected during the excursion.

Leave a Reply Cancel reply