Author: Tatu Leppämäki
In a nutshell: A geoparser recognizes place names and locates them in a coordinate space. I explored this topic in my thesis and developed an open source geoparser for Finnish texts: find it in this GitHub repo.
As geographers, we are interested in the spatial aspects of data: where something is located is a prerequisite to the follow-up questions of whys and hows. Of the almost innumerable data sources available online – news articles, social media feeds, digital libraries – a good portion are wholly or partly text-based. Texts and the opinions and sentiments within are often related to space through toponyms (place names). For us humans, it’s very easy to understand a sentence like “I’m enjoying currywurst in Alexanderplatz, Berlin” and the spatial reference there, but geographical information systems process data in unambiguous coordinates. To bridge this gap between linguistic and geospatial information, the text must be analyzed and transformed: in other words, it must be parsed. This is the motivation for the development of geoparsers.
Geoparsing: what and why
Geoparsing can be divided into two sub-tasks: toponym recognition and toponym resolution. In the former, the task is to find toponyms amidst the text flows and in the second, to correctly locate the recognized toponyms. A geoparser wraps this process and outputs structured geodata.
Geoparsing: a top-level view.
Continue reading “Geoparsing: How to gain location information from (Finnish) texts?”