Geocoding for texts with fine-grain toponyms


Authors: Ludovic Moncla, Walter Renteria-Agualimpia,Javier Nogueras-Iso, and Mauro Gaio
Abstract: Geoparsing and geocoding are two essential middleware ser- vices to facilitate final user applications such as location- aware searching or different types of location-based services. The objective of this work is to propose a method for es- tablishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly lin- ked with space and with a frequent use of fine-grain topo- nyms. The geoparsing part is a Natural Language Proces- sing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transdu- cers). However, the real novelty of this work lies in the geoco- ding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The fea- sibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.