Vuokko’s lectio 9.12.2020

User-generated Geographic Information for Understanding Human Activities in Nature

Lectio Praecursoria, in the public examination of MSc Vuokko Heikinheimo’s doctoral dissertation
the 9th of December 2020


Nature contributes to human well-being in countless ways. Many of us enjoy spending time in nature, going for a walk or a picnic and observing species and seasons. Nature-based tourism and outdoor recreation are evident examples of direct benefits of nature to people.

National parks are protected areas that are dedicated to safeguarding biodiversity and providing people the opportunity to enjoy nature.

Urban green spaces include the network of parks, forests and other green areas in the urban structure. Green spaces in cities offer opportunities for contact to nature in our everyday lives while protecting urban biodiversity.

We are also willing to travel far in order to experience and enjoy nature. In many places, visitors of protected areas – both domestic and international – are a significant source of income for park management and local communities. Information about protected area visitors is important for planning and management on regional, national and international scales.

The ongoing pandemic has emphasized the importance of access to green spaces in everyday life.

In Finland and many other countries, national parks and green spaces are attracting record numbers of visitors. However, in many regions, visiting green spaces has become more restricted due to regulations, and international travel has decreased dramatically – at least for now – having an impact on global nature-based tourism.

The global protected area network on land and sea is a key instrument in conserving biological diversity and ecosystem services on Earth. Despite active efforts to protect the global environment, human activities continue to cause significant declines in nature. Understanding how nature contributes to people’s well-being is a key aspect for finding successful solutions that support the protection of biodiversity and sustainable and equitable land-use planning.

In order to understand the interactions between people and the environment on different scales, we need data.

Remote sensing data such as satellite images have revolutionized the collection of data about the natural and built environment.

But how about data on human activities?

Information about visits to protected areas can be collected, for example, using surveys and on-site counters. However, gaps remain even in places with the most advanced visitor monitoring systems and some regions lack such data completely.

More recently, data generated through the interactions of people and location-based technologies (such as mobile phones and GPS-devices) have provided new opportunities for analyzing human activities all around the world. For example, many of us share our observations, activities and even location publicly online through various social media applications. These user-generated data offer new opportunities for understanding spatial and temporal patterns of human activities.

Elements of location-based data, such as geotags, timestamps, text, and images provide new possibilities for studying human activities in nature and can help filling some of the information gaps in this field.

This thesis links to a broader context of understanding human-nature interactions. From a holistic perspective, social and ecological systems are interlinked in complex systems. This thesis zooms into the human side of the system by looking into human activities – human spatial behavior in nature.

Social media, and other user-generated data sources provide information first and foremost about people and their observations and activities. These data can also provide information about the natural environment such as species observations and land cover information. In my thesis, I have mainly focused on people’s movements, activities and values represented in the data.

In this work, I investigated how well user-generated data reflect actual visits to nature. Digital data, in general, provide various perspectives about human-nature interactions beyond actual visits.

By definition, social media are channels for interaction and discussions in addition to platforms for sharing content. It is possible to raise awareness and to show interest in particular regions or animal species online without actually visiting nature.

In my thesis, I focus mostly on those digital data that are potentially linked to actual visits through geotags or other references to location. For example, it is relevant to ask that if someone has geotagged a photograph into a national park, did they actually visit the place?

As a concept, User-generated geographic information refers to data created through interactions of people and location-based technologies. Examples of such data include geotagged social media posts, GPS-tracks from sports applications, mobile network data and other location-based data generated by regular people instead of government authorities or experts.

In the geographic information science literature, the concept of volunteered geographic information and the idea of citizens as censors – introduced by Goodchild in 2007 is often used to refer to geographic data generated by non-professionals.

It is a good concept when describing actively contributed data sets, such as OpenStreetMap which is an open source world map that anyone can edit, or data contributed to citizen science campaigns and applications such as the iNaturalists where everyone can share their nature observations and comment on others’ observations.

However, I would argue that the concept of volunteered data does not perfectly capture all kinds of user-generated data sets.

For example, when analyzing publicly shared content on social media, or data from mobile network operators we are using these data to other purposes than originally intended. From a research point of view, it is important to acknowledge that these data might not be actively volunteered for such purposes. The same applies to new data sets about human mobility that big technology companies have recently released to support analysis related to the COVID-19 pandemic.

In this doctoral dissertation, I have investigated the use of user-generated data sets for studying human-nature interactions. I focused on human activities and preferences in national parks and urban green spaces. Main objectives were to

    1. Describe and critically evaluate the available data sources and to
    2. Discover spatial and temporal patterns of human activities in nature in the study areas.

To achieve these objectives, I investigated the elements of the user-generated data through these questions:

    • Where and when are people visiting national parks and green spaces?
    • What are people doing and valuing in nature? and perhaps why people visit specific places?
    • Who are the users who have shared their data from these areas?

This thesis consists of four articles and a summary section. The first three articles focus on social media data, and the fourth article also includes other sources of user-generated data.

    •  In Aricle I, I reviewed recent scientific literature using social media data in nature conservation research. The article provides an overview of relevant data sources and analysis methods that could further contribute to using social media data in conservation science.
    •  In Article II, I compared social media data and visitor survey data from Pallas-Yllästunturi National Park. We observed similar trends in both data sources regarding popular activities and visited locations.
    •  Article III focused on understanding who the users are. I compared several methods for identifying countries of residence of visitors to the Kruger National Park in South Africa, and evaluated the performance of these methods.
    • Article IV compares social media data, sports application data, mobile phone data and participatory geographic information for investigating the use of urban green spaces.

Extracting meaningful information from user-generated data sets is an iterative process.

In this thesis, I used spatial and temporal analysis methods and visual and textual content analysis approaches in addition to thorough data exploration.
During the analysis workflow, there are several important questions to consider, such as:

    •  Which data sources are best fit for the purpose?
    • What data to collect and store?
    • Does the data contain personal and sensitive information?
    • How to identify irrelevant data?
    • What type of analysis approaches, tools and skills are needed?
    • How to best visualize the results?
    • Can the observed patterns be validated?

So, where and when are people visiting national parks and green spaces? and how can we analyze these patterns based on user-generated data?

    • Results in my thesis show that visitor flows and hot-spots are visible in social media and other user-generated data sets on different scales.
    • These data work best in popular destinations and lack of data does not necessarily mean the absence of visitors.
    • The investigated data sets contains unique information about temporal trends of human activities. For example, these data can fill information gaps in between less frequent surveys.
    • Aggregating data over space and time not only preserves privacy but also often reveals regional and periodical trends even if the original data would be sporadic. For example, relative changes in posting activity can reveal interesting and even surprising patterns.

Content analysis of social media texts and images allows understanding activities and preferences from a new perspective.

    • Manual content analysis in this thesis found that most content shared from national parks and green spaces contained relevant information.
    • Activities detected from social media content reflected surveyed activities
    • This thesis also emphasizes the potential of automated content analysis approaches for acquiring further insights from large quantities of textual and visual content.

Finally, understanding who have shared their data from national parks and green spaces is important but challenging.

    • In this work, I identified approaches for deriving information about the users through further spatial and temporal analysis as well as content analysis.
    • For example, information about users’ places of residence can have a decisive importance for the meaningfulness of the entire analysis.
    • Data comparisons from national parks highlight that social media does not represent all visitor groups.

Main challenges related to using user-generated geographic information include data quality, limited access to data and privacy issues.

    • Limited access to data affects the repeatability of the analysis and the execution of longitudinal research projects.
    • Privacy issues cannot be neglected when analyzing data where individual people can be identified.
    • Minimizing the amount of data analyzed, and aggregating the results (for example, over time and to larger geographical units) helps using these data in a privacy-preserving way.

Overall, researchers, practitioners and decision-makers need to consider several limitations if aiming to extract meaningful information from user-genereated data sources.

Different sources of user-generated geographic data complement
each other in answering questions related to where, when, what, why and who.

In this thesis I propose that social media mostly captures being in nature covering leisure time activities, while sports tracking data and mobile phone data reflect moving through parks and green spaces (including daily commuting patterns and sports activities). Participatory geographic information (such as map-based surveys) can be used to acquire more in-depth information related to values and preferences.

Social media and other user-generated sources of geographic information are not able to fill in all information gaps related to understanding human activities in nature. However, if we understand the limitations of these data, they offer additional layers of geographic information to complement existing data sources and may inform further data collection efforts in less monitored areas.

The data used in this thesis represent time before the coronavirus pandemic, but the presented approaches and outlined limitations continue to be relevant for analyzing the ongoing changes in visits to national parks and green spaces.
This thesis contributes to understanding how to extract meaningful information about human activities in nature from the huge volumes of available data. User-generated data sets might be messy and unpredictable making their analysis challenging. At the same time, one of the strengths of these new data sources is the potential to discover emerging patterns and trends efficiently.

Even imperfect data, if understood as such, can contribute to integrating the value of nature into decision-making processes on different scales.

Data comparisons from national parks and green areas presented in this thesis provide insights about the properties of user-generated geographic information also to other fields of research.

Despite many limitations, there is a lot of potential to extract meaningful knowledge based on user-generated data sets for the benefit of people and the environment.

Vuokko Heikinheimo, MSc, defended her doctoral thesis entitled ‘User-Generated Geographic Information for Understanding Human Activities in Nature’ on 9 December 2020 at 10.00 at the Faculty of Science, University of Helsinki.

Professor Catherine Pickering from Griffith University, Australia, served as the opponent and Professor Tuuli Toivonen as the custos.

Vuokko completed her doctoral thesis in the Digital Geography Lab research group under the Social Media Data for Conservation Science project, which has received funding from the Kone Foundation.

PhD defence 2020
The Custos (Professor Tuuli Toivonen) and the Doctoral Candidate (MSc Vuokko Heikinheimo) in front of Athena building at Siltavuorenpenger on the 9th of December 2020.


Inaugural lecture by Professor Toivonen

Tuuli Toivonen is now a full professor in geoinformatics at the Faculty of Science!

Newly appointed  professors at the University of Helsinki are celebrated twice a year. As part of these celebrations, the professors hold an inaugural lecture.  This autumn, all festivities were (understandably) held online which allowed everyone interested to watch these lessons online.You can watch Tuuli’s lecture in here (Finnish audio, Finnish and English subtitles available):

Congrats once more to Tuuli!

NoRSA 2019 Keynote: Tuuli Toivonen

We will also start sharing our presentations in the Digital Geography Lab blog!

To start out, we will share Tuuli’s Keynote presentation at NoRSA 2019 Conference Seinäjoki, Finland, 19th June 2019 that summarizes much of our ongoing work related to socio-spatial interactions and Big Data:



A new paper on understanding the use of urban green spaces from user-generated data

Parks and other green spaces are an important part of sustainable, healthy and socially equal urban environment. Urban planning and green space management benefit from information about green space use and values, but such data are often scarce and laborious to collect. Temporally dynamic geographic information generated by users of different mobile devices and social media platforms are a promising source of data for studying green spaces.

Social media data, sports tracking data, mobile network data and PPGIS data from green spaces
Examples of social media data, sports tracking data, mobile phone data and PPGIS data from green spaces in Helsinki, Finland.

In a recent article published in the Landscape and Urban Planning journal we compare the ability of different user-generated data sets to provide information on where, when and how people use and value urban green spaces. We compare four types of data: social media, sports tracking, mobile phone operator and public participation geographic information systems (PPGIS) data in a case study from Helsinki, Finland, and ask: 1) where the spatial hot-spots of green space use are, 2) when people use green spaces, 3) what activities are present in green spaces and 4) who are using green spaces based on available sample data sets.

Map of green spaces in Helsinki, Finland
Green spaces in Helsinki, Finland

Being in, moving through and perceiving urban green spaces

Our results show that user-generated geographic information provide dynamic information about the use of urban green spaces. Social media data highlight patterns of leisure time activities and allow further content analysis. Language detection allows further understanding of the different user groups. Sports tracking data and mobile phone data capture green space use at different times of the day, including commuting through the parks. PPGIS studies allow asking specific questions, including relevant background information from active participants.

Spatial distribution of the different data sources
in 250 m x 250 m grid squares. (see more results in the original publication).


Daily pattern of activity in green spaces (social media data on the left, sports tracking data on the right).

Each data source has its limitations which need to be acknowledged in further analyses. For example, social media data are mainly produced by young adults and the popularity of different platforms might change quickly over time. Sports tracking data is dominantly produced by men and focuses on physical activities such as biking and jogging. Mobile network data is often difficult to access at fine granularity, and PPGIS data are limited in duration and extent. In all cases, user-generated data should be processed and reported in a privacy-preserving manner.

Despite evident limitations, these data might often be the best available information about the use of green spaces. Combining information from multiple user-generated data sets complements traditional data sources and provides a more comprehensive understanding of green space use and preferences.

Potential of different user-generated data sets for answering different questions related to green space use. The further towards the corner of the radar chart, the better the quality.

New data have become available during the COVID-19 pandemic

Near-real time information about human activities in different areas have become extremely relevant during the COVID-19 pandemic in spring 2020, and new data sources about people’s mobility patterns have become available for research.

The Digital Geography Lab recently analyzed changes in population distribution, and mobility patterns in Finland based on aggregated and anonymized mobile network data acquired from a local operator. The local newspaper Helsingin Sanomat had access to more detailed mobile network data allowing the comparison of activity in green spaces between spring 2019 and 2020, showing a significant increase of people, for example, in national parks in the Helsinki Region.

Also Google, and Apple  have shared previously inaccessible information about people’s mobility patterns openly online (for a limited time period). These mobility data sets show a general trend of increased activity in green spaces in the Helsinki Region, and decreased activity in transit, retail and workplaces. Data from Google and Apple contribute to understanding questions related to where, when and what with most accurate information about the temporal dimension.

These data sets are highly aggregated and anonymized  – individual users, or even individual parks are not visible in these mobility data sets.

Main limitations for using these data are still the same as we highlight in our paper (Heikinheimo et al. 2020): the representativeness of, access to and ethical use of user-generated content.

Overall, there is clearly an increasing need for versatile information about crowded (and silent) places in urban areas, and the various benefits of urban green spaces to people.

Read the full article (written before the covid-crisis) at:

Heikinheimo, V. , Tenkanen, H., Bergroth, C., Järv, O., Hiippala, T., & Toivonen, T. (2020). Understanding the use of urban green spaces from user-generated geographic information. Landscape and Urban Planning, 201, [103845].


Congratulations Claudia, Hertta and Elias for the City of Helsinki Master’s thesis award!

Yearly award for the best Master’s thesis have been again presented by the city of Helsinki on Monday the 9th of December 2019. We at the Digital Geography Lab would like to give special congrats to three awarded GIS wizards from the Department of Geography and Geosciences, Univerisity of Helsinki: Claudia Bergroth, Hertta Sydänlammi and Elias Willberg!

Bergroth Claudia (2019): Uncovering population dynamics using mobile phone data: the case of Helsinki Metropolitan Area.

The estimated hourly distribution of people on an average weekday in the Finnish Capital Region based on network-driven cellular mobile phone data. Read more about Claudia’s work in this blog post. 

Sydänlammi Hertta (2019): Strategic districting for the mitigation of educational segregation: A pilot model for school district optimization in Helsinki.

Original and re-calculated school districts.  Read more about Hertta’s work from Helsingin Sanomat (in Finnish).

Willberg Elias (2019): Bike sharing as part of urban mobility in Helsinki: a user perspective.

Bike_sharing 15.5.2017

All the trips (n~7200) made by Helsinki bike-sharing system bikes on Monday 15.5.2017. Routes have been modeled. Read more about Elias’ work in this blog post.

You can find more information, and other recipients of the price  (in Finnish) in here:  Congratulations to all 9 recipients!

How green are the streets of Helsinki?

Overview of Akseli Toikka’s MSc thesis: Mapping the green view of Helsinki through Google street view images

Interactive webmap for the Green View Index across Helsinki

Click to browse the interactive GVI map of Helsinki.

Urban vegetation has traditionally been mapped through traditional ways of remote sensing like laser scanning and aerial photography. However, it has been stated that the bird view examination of vegetation cannot fully represent the amount of green vegetation that the citizens observe on street level. Recent studies have raised human perspective methods like street view images and measuring of green view next to more traditional ways of mapping vegetation. Green view index (GVI) states the percentage of green vegetation in street view on certain location. The purpose of my thesis was to create a green view dataset of Helsinki city using Google street view (GSV) imagery and to reveal the differences between human perspective and aerial perspective in vegetation mapping.

Toikka (2019): Downloading Google street view panoramas.

Figure 1. Summertime Google street view panoramas of Helsinki were downloaded in six horizontal images. The GVI value of a panorama is the average of these 6 images.

Street view imagery of Helsinki was downloaded from GSV application programming interface. The spatial extent of the data was limited by the availability of street view images of summer months. Every GSV panorama was downloaded in six images (Figure 1). The amount of vegetation in the images was calculated based on the spectral characteristics of green vegetation (Figure 2). The GVI value of each panorama image is an average of all the six images constructing the panorama.

Figure 2. From left to right: original, classified and overlay image. GVI was calculated based on the spectral characteristics of green surfaces. The GVI value of this street view is 43.97%.

Several green view maps of Helsinki were created based on the calculated GVI values (Figure 3). In order to understand the differences between human perspective and the aerial view, the GVI values were compared with the regional land cover dataset of Helsinki using linear regression. Areas with big differences between the datasets were examined visually through the street view imagery. Helsinki green view was also compared internationally with other cities with same kind of data available at the Treepedia website of Seanseable City Lab, MIT.

Figure 3. GVI values aggregated to YKR statistical grid. The downtown and industrial areas are easily recognized form the rest of the city with their lower GVI values.

It appealed that the green view of Helsinki is divided unequally across the city area. The lowest green view values can be found in downtown, industrial areas and the business centers of the suburbs. Highest values were located at the housing suburbs. Especially the older areas of lower housing like Kuusisaari, Lehtisaari and Laajasalo stand out with relatively high GVI values. Younger housing areas like Arabianranta, Latokartano and Herttoniemenranta have relatively low GVI values because of their yet undeveloped greenery.

When compared with the land cover data, it was found that the green view has a weak correlation with low vegetation and relatively high correlation with taller vegetation such as trees. Differences between the datasets were mainly concentrated on areas where the vegetation was not visible from the street by several reasons.  Main sources of errors were the oldest street view images and the flaws in image classification caused by other green objects and shadows.

Even though Helsinki has many parks and other green spaces, the greenery visible to the streets isn’t always that high. The green view dataset created in this study helps to understand the spatial distribution of street greenery and brings human perspective next to more traditional ways of mapping city vegetation. When combined with previous city greenery datasets, the green view dataset can help to build up more holistic understanding of the city greenery in Helsinki.

My thesis was a produced as a cooperation between the department of Geoinformatics and Cartography at the Finnish Geospatial research Institute and the Digital Geography Lab from the University of Helsinki. In the computing we made use of geospatial computing resources provided by CSC and the Open Geospatial Information Infrastructure for Research (oGIIR, urn:nbn:fi:research-infras-2016072513) funded by the Academy of Finland.

Text by: Akseli Toikka

Akseli’s thesis (only in Finnish) can be found in here.
The data processing scripts are available at Geoportti GitHub.

Environmental dialogues: how to plan for urban biodiversity?

Earlier this week, Henna Fabritius took part in the environmental dialogues -event  to discuss biodiversity-related modelling tools in urban greenery planning.

More information, and a recording of the event is available here (in Finnish):

Ympäristödialogeja: Miten suunnitella monimuotoista luontoa?

The event was organized by the Forum for Environmental Information (Ympäristötiedon foorumi in Finnish), which is a non-profit organization that aims at increasing interaction between the producers and users of environmental information in order to support national policy making in Finland, while keeping in mind the global significance of environmental problems.

Read also Henna’s blog about getting better at supporting urban biodiversity (in Finnish):

Minä väitän: Luonnon monimuotoisuutta voitaisiin tukea kaupungeissa nykyistä enemmän



DigiGeoLab back at the office!

Most of the Digital Geography Lab researchers are back at the office after relaxing summer holidays. We are happy to welcome Age Poom in the team!