Authors: Bryan Vallejo, Olle Järv
We developed the OptiSS tool to optimize geodetic spatial joining for assigning geographical attributes to social media data in the BORDERSPACE project at the Digital Geography Lab. The tool has a user-friendly local app, yet its Python script can be easily used in any workflow.
Why we developed the tool?
In the BORDERSPACE project, we need to assign hierarchical spatial attributes (municipality, region, country) to each geo-located tweet. Mostly, geo-located tweets obtained from Twitter’s API already have geographical information such as an administrative unit and a country, in addition to exact coordinates. Yet, not all tweets have such information and, most importantly, some tweets are not located on land – some are just off the coast or somewhere at sea (Figure 1). However, geodetic spatial joining requires computational resources and is time consuming, especially when we have 100+ million geo-located tweets to handle. Thus, we created the OptiSS tool to make computation more efficient. The tool works for any social media data that have at least geographical coordinates.
Figure 1. The OptiSS tool assigns geographical attributes like municipality or country efficiently to social media posts. This is useful particularly when posts are not only located on land, but also off the coast (highlighted in red circles).
How efficient the tool is?
We optimize the computational resources needed for geodetic spatial join by simply dividing calculations into two different parts: 1) spatial join of social media posts that are located on land; and 2) spatial join of social media posts that are located off the coast. Only for the latter posts, the tool calculates a geodetic spatial join to the closest point on a desired spatial layer (the local app uses the Global Administrative Areas layer). Finally, it combines both calculations together. This way, the OptiSS tool enables faster geodetic spatial joining than ArcGIS Pro console or Geodatabase.
To give an example, we tested the OptiSS tool on a global geo-located tweets dataset that includes 50,244 posts from 500 unique users over 9-year period (see the dataset here). The OptiSS tool optimized the calculation time over 60% compared to the time needed by ArcGIS Pro console and Geodatabase (Figure 2).
Figure 2. The comparison of the time needed to process geodetic spatial join for our dataset (a sample of tweets, n=50,244) between ArcGIS Pro console, Geodatabase and OptiSS.
How does the tool work?
The more detailed instructions for both using the local app and applying the Python script in your workflow can be found on GitHub.
As an output, the tool delivers an enriched dataset containing spatial attributes for each geo-located tweet (Figure 3).
Figure 3. An example of the output received from the local app of the OptiSS tool. The color indicates a country where Twitter posts are assigned to.
The BORDERSPACE project focuses on studying cross-border mobilities and interactions, transnational people, and functional transnational spaces. The novelty of the project stems from the use of novel big data sources to provide valuable insights for cross-border research and practice. The project is carried out at the Digital Geography Lab — an interdisciplinary research team focusing on spatial Big Data analytics for fair and sustainable societies at the University of Helsinki.