Väiski’s Lectio Praecursoria

Tuomas Väisänen has defended his PhD “Diversity of places and people: Using big data to understand languages and activites across geographical space” successfully on Friday the 10th of November. His opponent was associate professor Grant McKenzie from McGill University, Canada. In case you missed the event and want to read the Lectio Praecursoria, you can find it below.

Cover of Väiski's PhD

Väiski’s Lectio Praecursoria:

Cities are home to over half of the human population. 

The number of people living in cities is increasing at an unprecedented scale due to accelerating growth of urbanization, international migration, and mobility. These global megatrends are further intensified by climate change and biodiversity loss. 

Today, 56 % of the world’s population lives in cities. The United Nations estimates that by 2050, this percentage has increased to 70 %. This will place immense pressure on cities to provide housing, employment, and services for a growing number of inhabitants. 

At the same time, the cities are not only becoming more populous, but the populations living in cities are becoming more diverse.  

More people of increasingly varied cultural, ethnic, and socio-economic backgrounds are interacting in cities than ever before. Accordingly, researchers in the last 15 years have recognized that variables commonly used to describe population diversity in the past, such as countries of birth or origin, or ethnicities of the individuals, are not adequate for assessing the new patterns of diversity present in contemporary urban populations. 

Recent research has thus called for characterizing urban populations as being “super-diverse.” That is, the populations are diverse across multiple variables at the same time, such as ethnicities and countries of origin, but also religions, languages, gender, age, socio-economic and immigration statuses.  

As you might have observed from the title of my work, in my thesis I focus on exploring diversity from the perspectives of languages and activities. 


“Why languages?” You might ask. 

Language provides the basis for all human interaction and communication. Languages mediate every social interaction in urban areas and enable sharing of information in various personal, communal, and international contexts. Languages also constitute a major part of individual and group identities, and unlike a country of birth, the individual has some agency over which language they use and is registered as their first language. Despite the evident importance of language, it has remained an underexplored variable in the study of urban diversity within geography. 


“What about the activities?” you might continue.  

The activities of diverse groups of people also contribute to the diversity of an urban area. Diversity in urban areas is increased by the activities of people, which also can vary based on the population groups. There might be people walking, biking, working, commuting, participating in an event, enjoying their day off or something else entirely.  


So, what does this all have to do with Finland? 

Finland has always been a multicultural and multilingual country; however, it only became a net in-migration country in the 80s. Since then, the diversity of the population has increased rapidly in Finland, especially in the Helsinki Metropolitan Area. The fall of the Soviet Union, Finland’s membership in the European Union, and conflicts in the Middle East have been clear watershed moments for diversity in Finland and the Helsinki Metropolitan Area. 


You might ask: “Alright, does this make the Helsinki Metropolitan Area a highly diverse metropolis?” 

It depends. 

In the context of Finland, yes. 

However, in comparison to other Nordic and European cities, the Helsinki Metropolitan Area has only recently started becoming diverse and cannot be considered similarly diverse just yet. For example, the native population makes up over 80 % of the inhabitants in the Helsinki Metropolitan Area, unlike in Amsterdam where the native Dutch make up half. 

This recency presents an opportunity for Finland and the Helsinki Metropolitan Area to avoid the mistakes made elsewhere while adopting what has worked. To do this, we need more information and understanding about urban diversity in the area. 

Luckily, the availability and amount of data about people and places has exploded. The devices we carry in our pockets, the applications we use, and the sensors embedded in urban space are generating immense volumes and varieties of data on a continuous basis. 

This flood of data is commonly known as big data. 

With big data, many phenomena can be detected as they happen while pinpointing also where they happen. For example, today you might have shared a picture of your morning coffee on social media alongside its location. Digital traces such as these provide researchers with information on the dynamic side of urban diversity. And by the dynamic side I mean, where people are, what they are doing, and who they are with when they are not at home.  


How does big data factor into urban diversity? 

Well, traditional sources of data, such as population registers, cannot capture information of this dynamic kind. These data are updated once a year or at longer intervals and the spatial information therein is connected to residential locations. As people are home mostly at night, the dynamic perspective provided by big data sources is key to understanding what happens elsewhere and at other times. 

Conversely, traditional sources are thus better for exploring the long-term and structural changes in urban diversity, especially in residential areas. Unlike social media data that can capture the daily rhythms and activities, population registers capture changes in residential areas that play out over several years or decades. 

From the variety of big data sources, social media data in particular has received wide attention in geographical research due to its rich user-generated content and availability, but even more so as it can be geotagged. A geotagged post contains information on the geographical location from where the post was shared. This enables geographers, such as me, to map where and when social media users are, but also what they are up to, based on the content and location information.  

Speaking of the content, social media data often consists of textual and visual information. This information can be used to reveal the languages, activities, and attitudes of social media users. This multimodality of the content also necessitates interdisciplinary methodological approaches, which I use throughout my work, as analyzing just the textual or visual content can omit crucial information.  

In summary, increasing urban diversity brings about new socio-spatial patterns and complexities which need to be understood so that the social sustainability of our cities can be supported. This can be pursued by using both traditional and big data sources, as they capture different spatio-temporal aspects of urban diversity.  

These topics form the core of my research interests. As a geographer I am interested in how various phenomena are distributed across geographical space. Why is the phenomenon more intense in one area, but less so somewhere else? How do the distributions of intensity vary in time? 

As I said previously, I explore urban diversity from the perspective of languages and activities. By urban diversity, I mean a combination of the super-diversity of the population and the diversity of their activities.  

Such an endeavor requires an interdisciplinary approach. I thus draw conceptually on the fields of GIScience, urban geography, and linguistic landscapes studies to contextualize and interpret my findings. 

Studying urban diversity also benefits from using a diverse selection of data sources. In my thesis I use big data and traditional data sources as they provide different, but complementary perspectives into urban diversity and its spatio-temporal patterns. 

The big data I use consists of geotagged social media content from Twitter, Instagram, and Flickr, and mobile phone data from a large Finnish mobile phone operator. These sources of data provide information on the dynamic side of urban diversity.  

The traditional sources of data that I use consist of individual-level population register data and the statistical grid database. These data contain annual demographic and socio-economic information on the inhabitants of Finland. The location information is based on home locations; thus the data provides perspectives to the structural side of urban diversity. I also use these sources of data to contextualize my findings from big data sources.  

My work explores the patterns of linguistic and activity diversity both on national and regional scales across Finland, but also on the local neighbourhood-level within the Helsinki Metropolitan Area. Temporally my work considers urban diversity across the times of day and weekly rhythms to annual and decade-long trends.  

Interdisciplinarity is also at the heart of the methods I use throughout this work. I use a combination of methods from natural language processing, machine learning, diversity measurement and spatial analysis to identify languages used in social media texts, detect objects and landscapes from photographs, quantify linguistic diversity, and analyze their spatio-temporal patterns. 

Finally, as I believe science and research should be available and accessible to everyone, all the analysis scripts I have used through this work are openly available online. The study of urban diversity inextricably concerns population groups who are disadvantaged societally and socio-economically, so opening the scripts is also an ethical step I take to ensure transparency of my work. 


In summary, study of urban diversity with traditional and big data sources is inherently an interdisciplinary endeavor. As urban diversity is complex and intersectional, reliance on any singular source of data or methodological tradition is bound to leave blind spots, so for a more balanced analysis interdisciplinarity is necessary. 


So, how is my thesis tackling this? 

My thesis has three objectives that I address through four distinct articles. Each article addresses several objectives from varying perspectives and spatio-temporal scales. The objectives are: 

  1. To reveal the urban diversity of the Helsinki Metropolitan Area and its dynamism using social media and population register data 
  2. Explore the potential of applying both traditional and novel sources of data with interdisciplinary methods to the study of urban diversity through languages and activities 
  3. Advance the methodological framework for studying linguistic diversity and activities in GIScience

In my first article, I provide an analysis of the spatio-temporal linguistic diversity and richness of Finnish Twitter users from regional and user-based perspectives. In the article, I identified the languages used by Finnish Twitter users from geotagged and non-geotagged Tweets. I then quantified the linguistic diversity of the users to understand their individual linguistic repertoires, and the linguistic diversity of the regions they tweeted from. 

In my second article, I provide an example of using computer vision as an additional tool for extracting information from social media data. I examine differences in landscape preferences, activities, and broad visual themes from Flickr photographs between two groups of people, Finnish nationals and foreign tourists, visiting Finnish national parks. To do this, I use computer vision techniques to detect objects commonly associated with activities, classify photographs based on the landscapes, and cluster semantically similar content together to understand the visual themes in the content. 

In my third article, I investigate the variations of linguistic diversity in the Helsinki Metropolitan Area using population register and social media data. I use geotagged social media data from Twitter and Instagram, first language information from the population register, and mobile phone data to understand how language use, linguistic diversity and population presence vary across the area and the times-of-day.  

In my fourth and final article, I explore the spatio-temporal development of linguistic diversity in the Helsinki metropolitan area from 1987 to 2019. I use population register data to do this and focus on two language groups, speakers of Somali and Estonian. These groups arrived in the Helsinki Metropolitan Area around the same time, but for different reasons. I examine how the linguistic diversity and socio-economic characteristics of their residential environment has changed during this period, and what it reveals about their integration into the Finnish society. 


Ok, but what can be said based on these articles?  

The results of my thesis illustrate how the Helsinki Metropolitan Area is rapidly becoming more diverse. This diversity is apparent from both social media and population register data. The geographical distribution of linguistic diversity changes during the day and across several decades. This means where people are likely to encounter linguistic diversity also varies depending on the time of day and has changed over the years, which is crucial information for urban planning. 

My work also shows the value of language for the study of urban diversity in geography, as it can produce a more fine-grained understanding of the socio-spatial patterns compared to more commonly used variables. Conversely, my work shows the value of spatio-temporal approaches to the study of linguistic diversity and linguistic landscapes, which rarely adopt geographical approaches. 

My results also show that computer vision can be used as an additional tool to extract information on activities from social media data. Computer vision techniques are especially helpful when the textual content does not provide useful information, which often is the case with Flickr data.  This approach can also be used to circumvent other issues arising from textual content. 

My results demonstrate that by using interdisciplinary approaches, diverse sources of data, and sharing openly how the work was done can provide a viable path towards an increased understanding of urban diversity and a more socially sustainable future. 

In summary, diversity presents challenges, but also opportunities for cities. It needs to be understood so that the opportunities can be taken, and the challenges mitigated and overcome. My thesis provides a methodological framework and empirical examples as a starting point towards this goal. 

Human civilization is now an urban civilization and will not stop being one in the foreseeable future. Cities, urban planning, and decision-makers must keep up with the emerging social and spatial patterns of urban diversity to be able to effectively respond to challenges and seize the opportunities. This will support social sustainability, well-being, and cohesion in our cities. 

My thesis represents a small step in this direction: I have provided a methodological framework for the study of urban diversity by using information on languages and activities, and I have demonstrated its effectiveness across several spatial and temporal scales. 

But this work is just one step, and many more are needed. 

As they say “The world is not finished.” 

Cities and their populations keep growing and changing. Much work remains. 

I want to end my Lectio by coming back to the prediction that by 2050 70 % of the world’s population will live in cities. This underscores the importance of focusing on the people living in cities, because, and to paraphrase Shakespeare: 

“What is the city, but the people?”


– – – – –

The Digital Geography Lab is an interdisciplinary research team focusing on spatial Big Data analytics for fair and sustainable societies at the University of Helsinki.


One Reply to “Väiski’s Lectio Praecursoria”

Comments are closed.