Geotagged Twitter Data to Reveal Cross-Border Mobility of People

Overview of Samuli Massinen’s MSc thesis:

How can Big Data sources, such as georeferenced social media, be used in cross-border research? What kind of cross-border mobility patterns can be detected geographically over time? How can daily cross-border movements be separated from other movements? These were the main questions I was trying to find answers for in my Master’s thesis “Modeling Cross-Border Mobility Using Geotagged Twitter in the Greater Region of Luxembourg”.

The Greater Region of Luxembourg is the largest cross-border labor market in the European Union with the greatest number of cross-border workers in the area. European integration, the Schengen Area, and socio-economical divergences between neighboring countries have been the main factors facilitating human cross-border movements in the region and thus the emergence and expansion of the borderland community. Despite the freedom of movement, country borders still exist as well as their socio-economic differences. We witness the growing trend of people migrating to the other side of the country border while still working in Luxembourg. This actuates daily cross-border mobilities, which are not well known, to date. Thus, there is a distinct need to understand cross-border mobility dynamics in the region, especially border crossings on a daily basis.

Thus far, cross-border mobility studies have mainly leaned on national registers, surveys, and census data. However, these datasets have mostly been too scarce in trying to understand the complexities of cross-border mobility in time and space. Many studies have only focused on aggregate-level movement patterns, and the viewpoint of individuals has been missing due to a lack of suitable data. One promising option to provide individual-level data to cross-border mobility research is the implementation of novel data sources, such as mobile phone positioning and social media data.

In my thesis, I employed a person-based approach using geotagged Twitter data to study cross-border mobility in the Greater Region of Luxembourg. The aim was to examine how to implement social media in cross-border mobility research and how to move beyond aggregate-level inspections. Being one of the first studies of its kind, I utilized a heuristic programmatic approach. To my knowledge, social media data sources have not been previously applied to distinguish different cross-border mobility types. Thus, I have published all scripts and algorithms developed in the study on Digital Geography Lab’s GitHub -pages.

Figure 1. Daily cross-border mover activity location density distribution in

The results show that social media can be implemented in cross-border mobility research, and social media data can provide a relatively good proxy for daily cross-border mobility of people on a regional level. Aggregate-level cross-border mobility patterns and activity location densities (Figure 1) correspond closely with previous studies, and outcomes from temporal frequency inspections indicate that it’s a promising approach for identifying and classifying border crossers  – Twitter users classified as daily cross-border movers are more mobile on working days whereas infrequent border crossers (potentially leisure and tourism-related) are mobile on weekends (Figure 2). Daily cross-border mobility patterns also provided new information about the spatial extent of the movements, indicating home-work commuting (Figure 3).

Figure 2. Weekday variation of cross-border trips to Luxembourg (both directions) by cross-border mover type within the Greater Region of Luxembourg in 2010–2018.

To obtain meaningful information from a Big Data source, several data processing steps are required, one crucial stage being the origin detection of social media users. I used a heuristic approach in home location detection which resulted in high accuracy – the “unique weeks” algorithm introduced in this study gave an accuracy of 88.6 % concerning the ground truth (i.e. Twitter user-given information about the individual’s home location).

Figure 3. Cross-border trips for identified daily cross-border movers in 2010–2018. Movements cover both ways between Luxembourg and neighboring areas. Note: each map has a different flow intensity scale.

Although the applied approach is promising in providing new knowledge about spatio-temporal patterns and trends, the results should be considered with slight caution. For example, this study did not consider population densities or Twitter use activity that might cause bias – both attributes vary spatio-temporally and effect where and when Twitter data is being recorded. Also, the sample size was rather limited in this study (~3200 Twitter users).

Hence, further research and method development are needed with larger sample sizes to draw more sound conclusions about cross-border mobility. Still, in my research, I was able to identify that the coverage of geotagged Twitter data is dependent on data acquisition processes and that Twitter Streaming API can provide valuable information for cross-border mobility research. In future studies, I recommend multi-level data acquisition processes to be applied jointly with a person-based approach combining both spatio-temporal and content analysis methodologies.

My research was part of the cross-border mobilities project at the Digital Geography Lab.


Text by: Samuli Massinen

My MSc thesis can be found here: E-Thesis

All my developed scripts and algorithms can be found here: DGL@GitHub

How to infer complex dynamics from present-day landscape patterns? New publication provides a method

Present-day landscape patterns may provide information on past dynamics of the landscape. Spatial ecologists have taken advantage of this for a long time, for instance to infer colonization rates and dispersal distances of species from their present-day spatial distributions and occurrence patterns.

Inferring past dynamics from present-day patterns gets more complicated if multiple landscape elements have been simultaneously on the move. In this case, it may be helpful to reconstruct and simulate past landscape dynamics, to understand how different dynamic elements must have interacted in the past to produce the present-day pattern. Methods that reconstruct past interactions may help us to understand complex dynamics, without having to wait for years for the accumulation of time series data.

In our recent publication in the field of spatial ecology, we tested this by using data on a well-studied epiphytic lichen, Lobaria pulmonaria. For our study area, fire scar data existed on the timings and locations of forest fires for a 400-year time period. Given this known landscape disturbance history, we simulated and calibrated the dynamics of L. pulmonaria host trees and L. pulmonaria colonizations so that they resulted in patterns that match with present-day data (locations of L. pulmonaria occurrences and host trees). Our resulting colonization model of L. pulmonaria performed well against a model fitted to time series data.

We hope to inspire further studies on complex dynamics that utilize multiple types of information contained in present-day landscapes.

Fabritius et al. (2019): Estimation of metapopulation colonization rates from disturbance history and occurrence pattern data. Ecology 100: e2814.

How green are the streets of Helsinki?

Overview of Akseli Toikka’s MSc thesis: Mapping the green view of Helsinki through Google street view images

Interactive webmap for the Green View Index across Helsinki

Click to browse the interactive GVI map of Helsinki.

Urban vegetation has traditionally been mapped through traditional ways of remote sensing like laser scanning and aerial photography. However, it has been stated that the bird view examination of vegetation cannot fully represent the amount of green vegetation that the citizens observe on street level. Recent studies have raised human perspective methods like street view images and measuring of green view next to more traditional ways of mapping vegetation. Green view index (GVI) states the percentage of green vegetation in street view on certain location. The purpose of my thesis was to create a green view dataset of Helsinki city using Google street view (GSV) imagery and to reveal the differences between human perspective and aerial perspective in vegetation mapping.

Toikka (2019): Downloading Google street view panoramas.

Figure 1. Summertime Google street view panoramas of Helsinki were downloaded in six horizontal images. The GVI value of a panorama is the average of these 6 images.

Street view imagery of Helsinki was downloaded from GSV application programming interface. The spatial extent of the data was limited by the availability of street view images of summer months. Every GSV panorama was downloaded in six images (Figure 1). The amount of vegetation in the images was calculated based on the spectral characteristics of green vegetation (Figure 2). The GVI value of each panorama image is an average of all the six images constructing the panorama.

Figure 2. From left to right: original, classified and overlay image. GVI was calculated based on the spectral characteristics of green surfaces. The GVI value of this street view is 43.97%.

Several green view maps of Helsinki were created based on the calculated GVI values (Figure 3). In order to understand the differences between human perspective and the aerial view, the GVI values were compared with the regional land cover dataset of Helsinki using linear regression. Areas with big differences between the datasets were examined visually through the street view imagery. Helsinki green view was also compared internationally with other cities with same kind of data available at the Treepedia website of Seanseable City Lab, MIT.

Figure 3. GVI values aggregated to YKR statistical grid. The downtown and industrial areas are easily recognized form the rest of the city with their lower GVI values.

It appealed that the green view of Helsinki is divided unequally across the city area. The lowest green view values can be found in downtown, industrial areas and the business centers of the suburbs. Highest values were located at the housing suburbs. Especially the older areas of lower housing like Kuusisaari, Lehtisaari and Laajasalo stand out with relatively high GVI values. Younger housing areas like Arabianranta, Latokartano and Herttoniemenranta have relatively low GVI values because of their yet undeveloped greenery.

When compared with the land cover data, it was found that the green view has a weak correlation with low vegetation and relatively high correlation with taller vegetation such as trees. Differences between the datasets were mainly concentrated on areas where the vegetation was not visible from the street by several reasons.  Main sources of errors were the oldest street view images and the flaws in image classification caused by other green objects and shadows.

Even though Helsinki has many parks and other green spaces, the greenery visible to the streets isn’t always that high. The green view dataset created in this study helps to understand the spatial distribution of street greenery and brings human perspective next to more traditional ways of mapping city vegetation. When combined with previous city greenery datasets, the green view dataset can help to build up more holistic understanding of the city greenery in Helsinki.

My thesis was a produced as a cooperation between the department of Geoinformatics and Cartography at the Finnish Geospatial research Institute and the Digital Geography Lab from the University of Helsinki. In the computing we made use of geospatial computing resources provided by CSC and the Open Geospatial Information Infrastructure for Research (oGIIR, urn:nbn:fi:research-infras-2016072513) funded by the Academy of Finland.

Text by: Akseli Toikka

Akseli’s thesis (only in Finnish) can be found in here.
The data processing scripts are available at Geoportti GitHub.

Environmental dialogues: how to plan for urban biodiversity?

Earlier this week, Henna Fabritius took part in the environmental dialogues -event  to discuss biodiversity-related modelling tools in urban greenery planning.

More information, and a recording of the event is available here (in Finnish):

Ympäristödialogeja: Miten suunnitella monimuotoista luontoa?

The event was organized by the Forum for Environmental Information (Ympäristötiedon foorumi in Finnish), which is a non-profit organization that aims at increasing interaction between the producers and users of environmental information in order to support national policy making in Finland, while keeping in mind the global significance of environmental problems.

Read also Henna’s blog about getting better at supporting urban biodiversity (in Finnish):

Minä väitän: Luonnon monimuotoisuutta voitaisiin tukea kaupungeissa nykyistä enemmän



DigiGeoLab back at the office!

Most of the Digital Geography Lab researchers are back at the office after relaxing summer holidays. We are happy to welcome Age Poom in the team!