Vuokko’s lectio 9.12.2020

User-generated Geographic Information for Understanding Human Activities in Nature

Lectio Praecursoria, in the public examination of MSc Vuokko Heikinheimo’s doctoral dissertation
the 9th of December 2020


Nature contributes to human well-being in countless ways. Many of us enjoy spending time in nature, going for a walk or a picnic and observing species and seasons. Nature-based tourism and outdoor recreation are evident examples of direct benefits of nature to people.

National parks are protected areas that are dedicated to safeguarding biodiversity and providing people the opportunity to enjoy nature.

Urban green spaces include the network of parks, forests and other green areas in the urban structure. Green spaces in cities offer opportunities for contact to nature in our everyday lives while protecting urban biodiversity.

We are also willing to travel far in order to experience and enjoy nature. In many places, visitors of protected areas – both domestic and international – are a significant source of income for park management and local communities. Information about protected area visitors is important for planning and management on regional, national and international scales.

The ongoing pandemic has emphasized the importance of access to green spaces in everyday life.

In Finland and many other countries, national parks and green spaces are attracting record numbers of visitors. However, in many regions, visiting green spaces has become more restricted due to regulations, and international travel has decreased dramatically – at least for now – having an impact on global nature-based tourism.

The global protected area network on land and sea is a key instrument in conserving biological diversity and ecosystem services on Earth. Despite active efforts to protect the global environment, human activities continue to cause significant declines in nature. Understanding how nature contributes to people’s well-being is a key aspect for finding successful solutions that support the protection of biodiversity and sustainable and equitable land-use planning.

In order to understand the interactions between people and the environment on different scales, we need data.

Remote sensing data such as satellite images have revolutionized the collection of data about the natural and built environment.

But how about data on human activities?

Information about visits to protected areas can be collected, for example, using surveys and on-site counters. However, gaps remain even in places with the most advanced visitor monitoring systems and some regions lack such data completely.

More recently, data generated through the interactions of people and location-based technologies (such as mobile phones and GPS-devices) have provided new opportunities for analyzing human activities all around the world. For example, many of us share our observations, activities and even location publicly online through various social media applications. These user-generated data offer new opportunities for understanding spatial and temporal patterns of human activities.

Elements of location-based data, such as geotags, timestamps, text, and images provide new possibilities for studying human activities in nature and can help filling some of the information gaps in this field.

This thesis links to a broader context of understanding human-nature interactions. From a holistic perspective, social and ecological systems are interlinked in complex systems. This thesis zooms into the human side of the system by looking into human activities – human spatial behavior in nature.

Social media, and other user-generated data sources provide information first and foremost about people and their observations and activities. These data can also provide information about the natural environment such as species observations and land cover information. In my thesis, I have mainly focused on people’s movements, activities and values represented in the data.

In this work, I investigated how well user-generated data reflect actual visits to nature. Digital data, in general, provide various perspectives about human-nature interactions beyond actual visits.

By definition, social media are channels for interaction and discussions in addition to platforms for sharing content. It is possible to raise awareness and to show interest in particular regions or animal species online without actually visiting nature.

In my thesis, I focus mostly on those digital data that are potentially linked to actual visits through geotags or other references to location. For example, it is relevant to ask that if someone has geotagged a photograph into a national park, did they actually visit the place?

As a concept, User-generated geographic information refers to data created through interactions of people and location-based technologies. Examples of such data include geotagged social media posts, GPS-tracks from sports applications, mobile network data and other location-based data generated by regular people instead of government authorities or experts.

In the geographic information science literature, the concept of volunteered geographic information and the idea of citizens as censors – introduced by Goodchild in 2007 is often used to refer to geographic data generated by non-professionals.

It is a good concept when describing actively contributed data sets, such as OpenStreetMap which is an open source world map that anyone can edit, or data contributed to citizen science campaigns and applications such as the iNaturalists where everyone can share their nature observations and comment on others’ observations.

However, I would argue that the concept of volunteered data does not perfectly capture all kinds of user-generated data sets.

For example, when analyzing publicly shared content on social media, or data from mobile network operators we are using these data to other purposes than originally intended. From a research point of view, it is important to acknowledge that these data might not be actively volunteered for such purposes. The same applies to new data sets about human mobility that big technology companies have recently released to support analysis related to the COVID-19 pandemic.

In this doctoral dissertation, I have investigated the use of user-generated data sets for studying human-nature interactions. I focused on human activities and preferences in national parks and urban green spaces. Main objectives were to

    1. Describe and critically evaluate the available data sources and to
    2. Discover spatial and temporal patterns of human activities in nature in the study areas.

To achieve these objectives, I investigated the elements of the user-generated data through these questions:

    • Where and when are people visiting national parks and green spaces?
    • What are people doing and valuing in nature? and perhaps why people visit specific places?
    • Who are the users who have shared their data from these areas?

This thesis consists of four articles and a summary section. The first three articles focus on social media data, and the fourth article also includes other sources of user-generated data.

    •  In Aricle I, I reviewed recent scientific literature using social media data in nature conservation research. The article provides an overview of relevant data sources and analysis methods that could further contribute to using social media data in conservation science.
    •  In Article II, I compared social media data and visitor survey data from Pallas-Yllästunturi National Park. We observed similar trends in both data sources regarding popular activities and visited locations.
    •  Article III focused on understanding who the users are. I compared several methods for identifying countries of residence of visitors to the Kruger National Park in South Africa, and evaluated the performance of these methods.
    • Article IV compares social media data, sports application data, mobile phone data and participatory geographic information for investigating the use of urban green spaces.

Extracting meaningful information from user-generated data sets is an iterative process.

In this thesis, I used spatial and temporal analysis methods and visual and textual content analysis approaches in addition to thorough data exploration.
During the analysis workflow, there are several important questions to consider, such as:

    •  Which data sources are best fit for the purpose?
    • What data to collect and store?
    • Does the data contain personal and sensitive information?
    • How to identify irrelevant data?
    • What type of analysis approaches, tools and skills are needed?
    • How to best visualize the results?
    • Can the observed patterns be validated?

So, where and when are people visiting national parks and green spaces? and how can we analyze these patterns based on user-generated data?

    • Results in my thesis show that visitor flows and hot-spots are visible in social media and other user-generated data sets on different scales.
    • These data work best in popular destinations and lack of data does not necessarily mean the absence of visitors.
    • The investigated data sets contains unique information about temporal trends of human activities. For example, these data can fill information gaps in between less frequent surveys.
    • Aggregating data over space and time not only preserves privacy but also often reveals regional and periodical trends even if the original data would be sporadic. For example, relative changes in posting activity can reveal interesting and even surprising patterns.

Content analysis of social media texts and images allows understanding activities and preferences from a new perspective.

    • Manual content analysis in this thesis found that most content shared from national parks and green spaces contained relevant information.
    • Activities detected from social media content reflected surveyed activities
    • This thesis also emphasizes the potential of automated content analysis approaches for acquiring further insights from large quantities of textual and visual content.

Finally, understanding who have shared their data from national parks and green spaces is important but challenging.

    • In this work, I identified approaches for deriving information about the users through further spatial and temporal analysis as well as content analysis.
    • For example, information about users’ places of residence can have a decisive importance for the meaningfulness of the entire analysis.
    • Data comparisons from national parks highlight that social media does not represent all visitor groups.

Main challenges related to using user-generated geographic information include data quality, limited access to data and privacy issues.

    • Limited access to data affects the repeatability of the analysis and the execution of longitudinal research projects.
    • Privacy issues cannot be neglected when analyzing data where individual people can be identified.
    • Minimizing the amount of data analyzed, and aggregating the results (for example, over time and to larger geographical units) helps using these data in a privacy-preserving way.

Overall, researchers, practitioners and decision-makers need to consider several limitations if aiming to extract meaningful information from user-genereated data sources.

Different sources of user-generated geographic data complement
each other in answering questions related to where, when, what, why and who.

In this thesis I propose that social media mostly captures being in nature covering leisure time activities, while sports tracking data and mobile phone data reflect moving through parks and green spaces (including daily commuting patterns and sports activities). Participatory geographic information (such as map-based surveys) can be used to acquire more in-depth information related to values and preferences.

Social media and other user-generated sources of geographic information are not able to fill in all information gaps related to understanding human activities in nature. However, if we understand the limitations of these data, they offer additional layers of geographic information to complement existing data sources and may inform further data collection efforts in less monitored areas.

The data used in this thesis represent time before the coronavirus pandemic, but the presented approaches and outlined limitations continue to be relevant for analyzing the ongoing changes in visits to national parks and green spaces.
This thesis contributes to understanding how to extract meaningful information about human activities in nature from the huge volumes of available data. User-generated data sets might be messy and unpredictable making their analysis challenging. At the same time, one of the strengths of these new data sources is the potential to discover emerging patterns and trends efficiently.

Even imperfect data, if understood as such, can contribute to integrating the value of nature into decision-making processes on different scales.

Data comparisons from national parks and green areas presented in this thesis provide insights about the properties of user-generated geographic information also to other fields of research.

Despite many limitations, there is a lot of potential to extract meaningful knowledge based on user-generated data sets for the benefit of people and the environment.

Vuokko Heikinheimo, MSc, defended her doctoral thesis entitled ‘User-Generated Geographic Information for Understanding Human Activities in Nature’ on 9 December 2020 at 10.00 at the Faculty of Science, University of Helsinki.

Professor Catherine Pickering from Griffith University, Australia, served as the opponent and Professor Tuuli Toivonen as the custos.

Vuokko completed her doctoral thesis in the Digital Geography Lab research group under the Social Media Data for Conservation Science project, which has received funding from the Kone Foundation.

PhD defence 2020
The Custos (Professor Tuuli Toivonen) and the Doctoral Candidate (MSc Vuokko Heikinheimo) in front of Athena building at Siltavuorenpenger on the 9th of December 2020.