Our trip to India part II. – first days

Our trip to India took place from the 5th to the 17th of December, 2022. The first two days consisted of planning and preparatory discussions between the leaders of both team UH and team IITG, Juraj Šimko and Priyankoo Sarmah, respectively. It was during this time that Juraj went travelled to India ahead of the group, as there was a need to discuss logistical preparations and do administrating work. Part of this also included ensuring both teams’ successful travels to Nagaland and back.

On December 7th, the rest of the UH team arrived in Guwahati. The following two days were spent continuing the preparations mentioned above. Data collection, responsibilities, and the essential aspects of data management were also discussed. Data pre-processing was planned with regards to what type of metadata would be required for the machine learning (ML) systems intended for data analysis. Also, parts of the code for ML-based data analysis were introduced. On the 8th, both teams toured the IITG campus, including the Phonetics and Phonology laboratory. We also successfully completed and applied for essential permits regarding the visit to Nagaland.

On December 9th, a workshop led by Juraj Šimko about the capabilities of machine learning for data and speech analysis using deep learning methods was held at IITG. The central questions included the necessary data pre-processing and speech data types required for ML, such that the input resulted in a desired output. The focus was on analyzing speech prosody using deep learning methods. Some of the other questions discussed were: what kind of parameters do the ML systems extract from a speech signal, how does one input data into the systems in general, what is the significance of file naming and embeddings, and what is the importance of well-controlled data collection and compatible datasets.

Attention was turned to the practicalities of data collection during the latter portion of the workshop. The division of tasks and responsibilities was planned, as well as how fieldwork would be conducted. It was decided that the phoneticians with some experience, Anna Busheva and Ida-Lotta Myllylä, would take the lead. If the team needed to be split, they would each lead a group. Data management was discussed. Most importantly, the realities of fieldwork and realistic expectations about how much data could be collected were talked about. Estimates of 8 speakers were presented. The type of stimuli that would be used was debated, and ultimately, two sets of stimuli were decided upon.

The first task planned was a naming task, in which participants were asked to name an object in a picture (e.g., a cat, house, snake, or shoe) using their own dialect of Angami and then repeat the word within two carrier sentences. It was designed such that all tones of Angami would be represented with a diverse vowel and tone distribution. An example of the task:

I said ___
He / She sees one ___
(Angami doesn’t specify gender)

This task included 26 words in two carrier sentences, meaning the experiment consisted of 52 instances of picture naming. This task was created to collect data for Busheva’s MA thesis.

The other task was a repetition of a previous experiment designed to collect the standard Angami corpus, consisting of the participants reading 25 sentences written in standard Angami. Both tasks were based on the expertise of native speakers – our fellow colleagues at IITG – who deemed all words usable.

Lastly, at the end of the workshop, recording equipment was prepared and checked. Instructions were given by one of our colleagues at IITG to ensure that the entire team could use the equipment correctly. We used two Tascam audio recording devices and two headset microphones.

Yesterday we experienced India for the first time. We visited the museum of the river Brahmaputra, saw the campus of the Indian Institute of Technology in Guwahati, and met our Indian colleagues for the first time offline. The campus itself looked like a small town. Around 12 000 students study and live there, including the teaching staff. We ate a lot of Indian food, especially typical Assamese specialties, such as Assamese rice “Thaali” and the alkaline puree called “Khaar.”

Leave a Reply

Your email address will not be published. Required fields are marked *