Sports activity research with Twitter data

Why study sports activities?

Being physically active affects us positively in many ways: it prevents many diseases and supports our mental wellbeing. While sedentary lifestyle is becoming more prominent, the effects of physical inactivity become more pronounced on both individual and societal levels. Rates of obesity are on surge, and the cost for society is seen in growing health service bills, longer sick leaves and lost productivity (Lundqvist et al., 2018; Vasankari et al., 2018).

Need for spatial data

Regardless of the importance of the topic, there is quite limited spatial information about physical activities. Most studies focus on people self-reporting their activities and the studies lack the spatial aspect (Sterdt et al., 2014). As there are no official statistics about people’s physical activities, user-generated data, like social media, can serve as a good proxy. The number of social media posts from national parks has been proved to follow the same trends as the official visitor statistics (Heikinheimo et al., 2017; Tenkanen et al., 2017). Similarly, I will try to approximate the sports activities in different parts of the Helsinki Metropolitan area by analysing social media data from Twitter.

Increasing popularity of social media data

The use of social media data in research has become increasingly popular in the course of the last decade. Many researchers have used social media data to extract information about the movement of tourists, impact of natural disasters or discussion about different diseases, to name a few topics (Middleton et al., 2018; Scholz & Jeznik, 2020; Viguria et al., 2020). Some big social media platforms, like Facebook and Instagram, have stopped sharing their data but Twitter has kept their data available for researchers and therefore I am using it in my MSc thesis. Twitter is a free microblogging platform where people can share short messages, links and media content for their followers. From a Twitter database gathered by Digital Geography Lab, I will gather all sport-related tweets and see how they are located in the Helsinki Area.

Geoparsing produces more spatial data

In social media platforms, you can geotag a post, which means attaching location information like coordinates to it. However, only around 1% of the tweets are geotagged, but many tweets mention place names in the text (Lee et al., 2013; MacEachren et al., 2011). I will use Natural Language Processing to analyse the text, extract the location names and convert them to coordinates. This process is called geoparsing and with it I can produce more geographical data to work with. After attaching location information to sports-related tweets where applicable, I will look for spatial patterns in the data. My final results will shed light on questions like:

  • Where are the hotspots and cold spots of sports-related tweets in the Metropolitan area?
  • Does the number of sports facilities in the neighbourhood affect the tweeting activity?
  • Or the socio-economic and educational indicators of the area?

And last but not least:

  • Is Twitter data a suitable indicator of sports activities?


Sonja Koivisto, University of Helsinki


Heikinheimo, V., Minin, E. Di, Tenkanen, H., Hausmann, A., Erkkonen, J., & Toivonen, T. (2017). User-generated geographic information for visitor monitoring in a national park: A comparison of social media data and visitor survey. ISPRS International Journal of Geo-Information, 6(3).

Lee, K., Ganti, R., Srivatsa, M., & Mohapatra, P. (2013). Spatio-temporal provenance: Identifying location information from unstructured text. 2013 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2013, March, 499–504.

Lundqvist, A., Männistö, S., Jousilahti, P., Kaartinen, N., Mäki, P., & Borodulin, K. (2018). Terveys, toimintakyky ja hyvinvointi Suomessa – FinTerveys 2017 -tutkimus. Terveyden ja hyvinvoinnin laitos (THL), Raportti 4/2018. In K. S. and S. K. P. Koponen, K. Borodulin, A. Lundqvist (Ed.), Terveyden ja hyvinvoinnin laitos (pp. 38–41). Terveyden ja hyvinvoinnin laitos.

MacEachren, A. M., Jaiswal, A., Robinson, A. C., Pezanowski, S., Savelyev, A., Mitra, P., Zhang, X., & Blanford, J. (2011). SensePlace2: GeoTwitter analytics support for situational awareness. VAST 2011 – IEEE Conference on Visual Analytics Science and Technology 2011, Proceedings, October, 181–190.

Middleton, S. E., Kordopatis-Zilos, G., Papadopoulos, S., & Kompatsiaris, Y. (2018). Location extraction from social media: Geoparsing, location disambiguation, and geotagging. ACM Transactions on Information Systems, 36(4).

Scholz, J., & Jeznik, J. (2020). Evaluating Geo-Tagged Twitter Data to Analyze Tourist Flows in Styria, Austria. ISPRS International Journal of Geo-Information, 9(11), 681.

Sterdt, E., Liersch, S., & Walter, U. (2014). Correlates of physical activity of children and adolescents: A systematic review of reviews. Health Education Journal, 73(1), 72–89.

Tenkanen, H., Di Minin, E., Heikinheimo, V., Hausmann, A., Herbst, M., Kajala, L., & Toivonen, T. (2017). Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas. Scientific Reports, 7(1), 1–11.

Vasankari, T., Kolu, P., Kari, J., Pehkonen, J., Havas, E., Tammelin, T., Jalava, J., Koski, H., Pihlainen, K., Kyröläinen, H., Santtila, M., Sievänen, H., Raitanen, J., & Kari, T. (2018). Costs of physical activity are increasing – the societal costs of physical inactivity and poor physical fitness.

Viguria, I., Alvarez-Mon, M. A., Llavero-Valero, M., del Barco, A. A., Ortuño, F., & Alvarez-Mon, M. (2020). Eating disorder awareness campaigns: Thematic and quantitative analysis using twitter. Journal of Medical Internet Research, 22(7), 1–11.