Mapping the linguistic landscape of the Helsinki metropolitan area

The discipline of linguistics has a long-standing interest in researching language use in cities, because they bring together speakers of different languages from different backgrounds. Examples range from William Labov’s classic empirical research on the relation of social class and pronunciation of American English in New York City to modern theories of multilingualism in cities by Alastair Pennycook and Emi Otsuji. Regardless of the scope or focus, the consensus is that interactions between different languages and their speakers drive linguistic variation and change, whose effect is particularly strong in densely populated cities.

Within sociolinguistics, a subfield of linguistics broadly concerned with language in society, one emerging approach to the study of languages in cities concerns linguistic landscapes. The study of linguistic landscapes mainly focuses on the visibility and presence of languages in the built environment, performing qualitative analyses of languages in signs, advertisements, billboards and other media in built environments.

However, the physical spaces in which languages exist are being rapidly transformed due to technological development. These spaces increasingly extend into the digital realm due to the widespread use of positioning technology in smartphones and other mobile devices, which allow users to create and associate content with physical locations via geotagging. Social media platforms with geotagged content are a hallmark example of this development, which also offer new opportunities for linguistic research.

In our new project, funded by a three-year project grant from the Emil Aaltonen Foundation, we will map the linguistic landscape of the Helsinki metropolitan area using register and social media data. Whereas the register data provides a static view into the linguistic landscape, social media data provides a dynamic view into how speakers of different languages move around the city and when. We tackle this combination of data using methods from geoinformatics and natural language processing, which is expected to provide a new, quantitative perspective on linguistic landscapes, and complements the previous qualitative approaches.

Our initial analyses of social media data show that the Helsinki metropolitan area is indeed multilingual.

The distribution of unique languages into a 250 metre grid in the Helsinki metropolitan a
The number of unique languages in the Helsinki metropolitan area, as observed in monolingual Instagram captions from 2014–2016, which have been aggregated into 250x250m cells. The languages were detected automatically using fastText. Some cells in downtown Helsinki feature up to 55 unique languages during this period. What is also worth noting is that the majority of the grids feature more than one language.
The diversity of the linguistic landscape as measured using Shannon entropy
The diversity of the linguistic landscape, as measured using Shannon entropy, based on observations of unique languages and their respective counts in Instagram captions from 2014–2016. The clusters, which have been detected using Moran’s I, reveal areas with high linguistic diversity in downtown Helsinki, the Aalto University campus and in multicultural suburbs of eastern Helsinki and Espoo. Clusters with low linguistic diversity, in turn, can be observed mainly in Espoo and Vantaa.

By the end of the project in 2022, we hope to have learned something new about urban multilingualism in the Helsinki metropolitan area, but also hope that the results can inform when planning city services. The current estimate is that by 2030 almost 25% of the population will speak a first language other than Finnish, but this does not mean that these speakers would not speak Finnish at all, as they are likely to speak Finnish as a second language.

For more information about the project, contact Tuomo Hiippala.