Talks and presentations

From Text to Graph: Mapping Geographic Knowledge in the Encyclopédie and Beyond

June 04, 2026

Invited talk, Computational Approaches to Eighteenth-Century Studies Symposium, ARTFL Chicago, Chicago, IL, USA

Talk: “From Text to Graph: Mapping Geographic Knowledge in the Encyclopédie and Beyond” at the Computational Approaches to Eighteenth-Century Studies Symposium organized by ARTFL at the University of Chicago. This talk will explore the use of computational methods to extract and structure geographic knowledge from the Encyclopédie as a kownledge graph.

Automatic Construction of a Geo-Historical Knowledge Graph from Early Modern Encyclopedic Texts

April 02, 2026

Talk, Fourth International Workshop on Geographic Information Extraction from Texts (GeoExT), Delft, Netherlands

Authors: Bin Yang, Ludovic Moncla, Fabien Duchateau, Frédérique Laforest
Abstract: Early modern encyclopedias, such as Diderot and d’Alembert’s (1751–1772), offer a valuable resource for studying the evolution of geographical knowledge, yet their sheer scale complicates manual analysis. This paper presents an automated method for constructing a geo-historical knowledge graph from these texts. We propose spatial and provenance ontologies tailored to the corpus and introduce a gold standard of 2,750 geographical articles. The pipeline combines supervised learning and Large Language Models (LLMs) for article classification, entity typing, and spatial relation extraction. Performance reaches F1 = 92% for relations and F1 > 97% for classification, resulting in an RDF graph of 35,000 entities and 46,000 relations. This work paves the way for the computational analysis of early geographical knowledge. Data, models, and code are available on HuggingFace and Gitlab.

Segmentation de corpus lexicographiques numérisés à l’aide de LLMs : étude du Dictionnaire Universel François-Latin et de La Grande Encyclopédie

November 06, 2025

Invited talk, Journée IXXI 2025, ENS Lyon, Lyon, France

Présentation invitée à la journée IXXI 2025. Le programme complet de la journée est disponible ici : (https://www.ixxi.fr/evenements/journee-ixxi-2025)[https://www.ixxi.fr/evenements/journee-ixxi-2025]

L’usage de l’IA pour une étude interdisciplinaire de la géographie dans l’Encyclopédie de Diderot et d’Alembert

October 16, 2025

Invited talk, Atelier Pratiques d’intelligence artificielle appliquées aux données spatiales et géographiques, RnMSH, Paris, France

Présentation invitée à l’atelier “Pratiques d’intelligence artificielle appliquées aux données spatiales et géographiques” organisé par le RnMSH - Réseau national des Maisons des Sciences sociales et des Humanités

Evaluation of Transformer Models (from BERT to GPT) for Geographic Information Recognition

December 11, 2024

Invited talk, Final conference of the Digital Humanities and Artificial Intelligence Thematic Semester, Observatoire de Paris, Paris, France

Invited talk at the final conference of the Digital Humanities and Artificial Intelligence Thematic Semester supported by the CNRS center of Artificial intelligence for science, science for artificial intelligence (AISSAI): https://semtemiahn.hypotheses.org/final-conference

Évaluation des Grands Modèles de Langage pour la Reconnaissance d’Entités Nommées

November 26, 2024

Talk, Un chat à la fac de lettres?, ENS Lyon, Aubervilliers, France

Talk at the conference “Un chat à la fac de lettres?” organized by the Huma-Num ARIANE consortium: https://csthn-ariane.sciencesconf.org

Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models

November 14, 2024

Talk, A Conversation between AI and the Humanities, ENS Lyon, Lyon, France

Talk at the conference “A Conversation between AI and the Humanities”: https://caih.sciencesconf.org

From BERT Fine-Tuning to LLM Prompting

September 10, 2024

Invited talk, GeoLiaison PHC Project Workshop 1, Maison de la Recherche, Paris, France

Invited talk at the first workshop of the GeoLiaison PHC project organized by Davide Buscaldi (LIPN) and Jochen L. Leidner (Coburg U.): https://sites.google.com/view/geoliaison

A digital exploration of geographic knowledge in Diderot and d’Alembert’s Encyclopédie

July 05, 2024

Talk, 30th International Conference on the History of Cartography (ICHC), Lyon, France

Authors: Thierry Joliveau, Ludovic Moncla, Antoine Taroni, Denis Vigier and Katie McDonough
How was geography communicated in Diderot and d’Alembert’s Encyclopédie (1751-72)? In this presentation, the interdisciplinary GEODE team will investigate the role of geographical knowledge within this encyclopedia, as part of our larger project to study these themes across French encyclopedias from the eighteenth to the twenty-first centuries. The Encyclopédie consists of 17 volumes of text (~74k articles, or 22M words) and 11 volumes of plates. And yet, the latter contain no maps. Apart from some pages (vol. 5 of plates) related to the “construction of globes” all the geography of the Encyclopédie is contained in the volumes of text. Addressing critics of the approach to the selection of knowledge shared throughout the work in his “Encyclopédie” article (vol. 5 of text), Diderot argues that an Encyclopedia might be seen as “dry”, but that its role is to share geographical knowledge of places that is “scientific”. It should be able to be used to “create good maps”. We take up Diderot’s call, using information retrieval and spatial analysis to create a dataset of all place names in the Encyclopédie, identify historical spatial coordinates as reported in the text, and connect all named places to modern coordinates through entity linking. The resulting dataset allows us to map the Encyclopédie. We explore the spatial coverage of the text, including the outsized representation of certain parts of the world, like France. In addition to this explicit geospatial approach to the data, we use network analysis to explore references to places across articles and volumes. Using such a variety of methods, for the first time, we name, define, classify, and locate, and map places in this key Enlightenment text.

Propositions pour une étude interdisciplinaire de la géographie dans un dictionnaire universel et une encyclopédie du XVIIIe siècle

June 06, 2024

Talk, 1er colloque du réseau METALEX, Cergy, France

Authors: Ludovic Moncla, Denis Vigier, Thierry Joliveau
Conçue dans une perspective interdisciplinaire, cette intervention se propose de mettre en lumière la manière dont les évolutions techniques dans le champ des humanités numériques du traitement automatique de la langue et des systèmes d’information géographique permettent de progresser dans notre connaissance des discours encyclopédiques tenus dans le domaine de la géographie au XVIIIe s.

Spring Data/Culture Workshop: Search inside maps with MapReader

May 01, 2024

Workshop, The Alan Turing Institute, London, UK

Spring Data/Culture Workshop: Search inside maps with MapReader

MapReader, which received the 2023 Roy Rosenweig Prize for Innovation in Digital History from the American Historical Association, is a software library that was designed for humanities research with big digitised map collections. It was developed on the recently concluded Living with Machines project, but it has been created with the wider community of historians in mind as future users.

GeoEDdA: A Gold Standard Dataset for Geo-semantic Annotation of Diderot & d’Alembert’s Encyclopédie

March 24, 2024

Talk, Second International Workshop on Geographic Information Extraction from Texts (GeoExT), Glasgow, Scotland, UK

Authors: Ludovic Moncla, Denis Vigier, Katherine McDonough
Abstract: This paper describes the methodology for creating GeoEDdA, a gold standard dataset of geo-semantic annotations from entries in Diderot and d’Alembert’s eighteenth-century Encyclopédie. Aiming to explore spatial information beyond toponyms identified with the commonly used Named Entity Recognition (NER) task, we test the newer span categorization task as an approach for retrieving complex references to places, generic spatial terms, other entities, and relations. We test an active learning method, using the Prodigy web-based tool to iteratively train a machine learning span categorization model. The resulting dataset includes labeled spans from 2,200 paragraphs. As a preliminary experiment, a custom spaCy spancat model demonstrates strong overall performance, achieving an F-score of 86.42%. Evaluations for each span category reveal strengths in recognizing spatial entities and persons (including nominal entities, named entities and nested entities).

PhD jury member

January 24, 2024

PhD jury member, INRIA, Sophia-Antipolis, France

Jury member for Lucie Cadorel’s Ph.D. Defense at INRIA Sophia-Antipolis, France.

Atelier HNS MAGIS 2023

November 24, 2023

Workshop organization, Journées MAGIS 2023, Bordeaux, France

Atelier Humanités Numériques Spatialisées co-organisé avec Carmen Brando lors des journées MAGIS 2023 à Bordeaux.

7th ACM SIGSPATIAL International Workshop on Geospatial Humanities

November 13, 2023

Workshop organization, 7th ACM SIGSPATIAL International Workshop on Geospatial Humanities, Hamburg, Germany

7th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, Katherine McDonough, and Xuke Hu

Perdido: Python library for geoparsing and geocoding French texts

April 02, 2023

Talk, First International Workshop on Geographic Information Extraction from Texts (GeoExT), Dublin, Ireland

Authors: Ludovic Moncla, Mauro Gaio
Abstract: This paper introduces the Perdido Python library for geoparsing and geocoding French texts. The architecture of the Perdido Geoparser, which includes three layers: back-office, API, and Python library, is outlined. We also provide details on the methods used in the development of the processing chain and the various tasks covered, such as named entity recognition and classification (NERC), and toponym resolution. Lastly, we showcase the different features of the Python library and explain how to use it. The library is built as an overlay using API services, enabling users to manipulate, visualize, and export the results of geoparsing and geocoding. A Jupyter notebook is also provided to demonstrate all the functionalities implemented in the library.

Un projet cartographique pour l’Encyclopédie

March 16, 2023

Talk, Observatoire de Paris, Paris, France

Séminaire d’histoire des sciences astronomiques

Présentation des travaux menés dans le cadre du projet GEODE autour du traitement des entités nommées et des coordonnées géographiques.
Vidéo du séminaire : https://syrteplay.obspm.fr/w/av4sK33GmPgamWVQTd7Mko

Vers une cartographie de l’Encyclopédie de Diderot et d’Alembert

March 16, 2023

Talk, ObTIC, Institut d'études avancées de Paris, Paris, France

La littérature au prisme des humanités numériques

Journée d’étude organisée par Glenn Roe et Motasem Alrahabi (ObTIC - Sorbonne Université) le 16 mars 2023 à Institut d’études avancées de Paris
Présentation des travaux menés dans le cadre du projet GEODE.

6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

November 02, 2022

Workshop organization, 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities, Seattle, WA, USA

6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, and Katherine McDonough

Formation ANF TDM CNRS 2022

October 05, 2022

Talk, CNRS, Paris, France

Atelier Librairies Python et Services Web pour la reconnaissance d’entités nommées et la résolution de toponymes organisé dans le cadre de la formation ANF TDM 2022 du CNRS (Exploration documentaire et extraction d’information).

Le support de formation est disponible ici : https://gitlab.liris.cnrs.fr/lmoncla/tutoriel-anf-tdm-2022-python-geoparsing

Présentation :
Cet atelier a pour objectif de présenter l’utilisation de librairies Python (ie. NLTK, Stacy, Stanza) et de services Web (ie. PERDIDO) pour l’extraction d’entités nommées à partir de textes. Nous nous intéresserons en particulier au repérage des noms de lieux et à leur localisation sur une carte géographique. Nous mettrons en avant la simplicité d’utilisation de ces outils mais également leur limites.
Programme :
Introduction et comparaison de différents outils de NER : librairies Python (NTLK, Spacy et Stanza), et Services Web (Perdido) Sélection des outils en fonction des corpus (nature des textes, choix de la langue, etc) Les expérimentations seront réalisées sur 2 cas d’application : descriptions de randonnées et articles encyclopédiques Notebook en ligne (Google Collab’) pour développer des prototypes d’applications faciles à utiliser et intuitifs en Python

Atelier TALN 2022

June 27, 2022

Workshop organization, Journées MAGIS 2022, Avignon, France

Atelier TAL & Humanités Numériques co-organisé avec Carmen Brando dans le cadre de la conférence TALN 2022 à Avignon.

Tutorial - Natural Language Processing (NLP) for historical texts

June 23, 2022

Tutorial, (online),

Materials for the SunoikisisDC Summer 2022 Course on Natural Language Processing (NLP) for historical texts (Session 9)

Tutorial: https://github.com/ludovicmoncla/SunoikisisDC-Summer2022-Session9
Youtube link: https://youtu.be/7NK2KyP2BYs

In this tutorial, we demonstrate how to use a custom version of the Perdido geoparser python library developed in the framework of the GEODE project. We will use texts from Diderot and d’Alembert’s Encyclopédie as a case study for querying a corpus and wrangling geoparsed data. We will also compare Perdido’s NER annotations (e.g. its output) to the results of other well-known python NER libraries (spaCy and Stanza).
Organized by Ludovic Moncla and Katherine McDonough

Atelier GAST EGC 2022

January 25, 2022

Workshop organization, EGC 2022, Blois, France

Atelier GAST – Gestion et Analyse de données Spatiales et Temporelles co-organisé avec Thomas Guyet, Eric Kergosien et Christian Sallaberry lors de la conférence EGC 2022 à Blois.

Séminaire au laboratoire ICAR (ENS Lyon)

January 17, 2022

Talk, online,

Séminaire du laboratoire ICAR sur le thème de la “Combinaison d’approches qualitative et quantitative pour le repérage et la classification des entités nommées dans l’Encyclopédie de Diderot et d’Alembert (1751-1772)”

Workshop Données et discours géographiques en France au 18e siècle

June 15, 2021

Workshop organization, UChicago Center in Paris (online event),

Workshop Données et discours géographiques en France au 18e siècle

TAL et apprentissage automatique pour le geoparsing de textes historiques

May 10, 2021

Talk, Journée SciDoLySE (en ligne),

TAL et apprentissage automatique pour le geoparsing de textes historiques

Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales

May 06, 2021

Talk, SAGEO 2021 (en ligne),

Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales

Séminaire au laboratoire ERIC (Lyon)

November 16, 2020

Talk, online,

Séminaire du laboratoire ERIC sur le thème du TAL et de l’apprentissage automatique appliqués au geparsing et à l’analyse géo-sémantique de textes.

Meeting of the ‘Digital Spatial Humanities’ working group

February 13, 2020

Meeting organization, online,

Meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.

Launch meeting of the ‘Digital Spatial Humanities’ working group

February 13, 2020

Meeting organization, EHESS, Paris, France

Launch meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.

Extraction et visualisation d’information géographique à partir de textes

January 22, 2020

Talk, SoundCityve, Lyon, France

I gave a talk on “Extraction et visualisation d’information géographique à partir de textes”.

Workshop organized for the end of the SOUNDCITYVE project - Archaeology of the sound landscape: a sensitive restoring of the Sounds of the past in the City of Lyon.
Organized by Véronique Eglin

Workshop GAST 2020

January 18, 2020

Workshop organization, Conférence Extraction et Gestion des Connaissances (EGC) 2020, Bruxelles, Belgique

Le sixième atelier — Gestion et Analyse des données Spatiales et Temporelles (GAST) — sera organisé lors d’EGC 2020. Cet atelier, s’appuyant sur le Groupe de Travail GAST, vise à regrouper les chercheurs, du domaine académique et de l’industrie, qui s’intéressent aux problématiques liées à la prise en compte de l’information temporelle ou spatiale – quantitative ou qualitative – dans leurs processus de gestion et d’analyse de données (méthodes et application de l’extraction, la gestion, la représentation, l’analyse et la visualisation d’informations).

Adapting and integrating existing open source projects

January 09, 2020

Talk, University of Nevada, Reno, NV, USA

I lead the session about ‘Adapting and integrating existing open source projects’.

Workshop: Ethical Visualization in the Age of Big Data. Contemporary Cultural Implications of Pre-Twentieth-Century French Texts. A workshop to seek interdisciplinary expert perspectives on ethically and visually representing the historical place of misrepresented peoples and locales.

13th Workshop on Geographic Information Retrieval (GIR)

December 08, 2019

Workshop organization, 13th Workshop on Geographic Information Retrieval (GIR), Lyon, France

The 13th Workshop on Geographic Information Retrieval will be held in Lyon, France from the 28th-29th November 2019. This workshop will address all aspects of Geographic Information Retrieval - including but not limited to the provision of methods to retrieve and analyse geo-spatial textual content, identify the geographic scope and relevance rank documents or other resources from both unstructured and partially structured collections.
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio

GeoDISCO: Encyclopedic Geographical Discourse in France from the Enlightenment to Wikipedia

December 08, 2019

Talk, 13th Workshop on Geographic Information Retrieval (GIR), Lyon, France

Authors: Denis Vigier, Thierry Joliveau, Ludovic Moncla, Katherine McDonough, and Alice Brenon
Abstract: The GeoDISCO project aims at studying the major changes in encyclopedic geographical discourse in France between 1751 (when the first volume of the Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers, by Diderot and D’Alembert, was published) and today (Wikipedia-France, 2018). Using linguistic and GIS methods to investigate patterns in geographical content will help us understand why authors deployed language in such ways that use place as a scaffold for ideas and practices. The spatial history of French encyclopedias is a foundation for asking broader questions about the relationship between early modern geographical information and digital geographical resources.

Workshop: 13th Workshop on Geographic Information Retrieval (GIR)
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio

Journée OpenDataCamp ‘Comment constituer un outil de recherche performant par interaction entre les solutions existantes, les moteurs de recherche génériques et une nouvelle brique à inventer ?’

November 21, 2019

Talk, DREAL Auvergne-Rhône-Alpes, Lyon, France

DREAL Open Data Camp 2

Spatial Entity Matching with GeoAlign (demo paper)

November 07, 2019

Talk, 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA

Authors: Nelly Barret, Fabien Duchateau, Franck Favetta, and Ludovic Moncla
Abstract: Points of interest (POI) are central in many applications such as tourism, itinerary search, crisis management. Cartographic providers usually represent these POI with a spatial entity. However, the description of these entities may significantly vary from one provider to another (e.g., missing properties, outdated information, conflicting values). Spatial entity matching (or record linkage) aims at detecting correspondences between entities referring to the same POI. Most existing approaches have a fixed function for combining similarity measures, thus limiting customization. Besides, evaluating the matching quality is a difficult task since a ground truth dataset cannot be built for all entities and providers. In this paper, we describe GeoAlign, an application that allows fine-grained tuning for spatial entity matching. A merging step is also provided using different strategies. Finally, we propose to estimate the quality of correspondences based on the differences between combination functions and to visualize this estimation in GeoAlign.

3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities

November 05, 2019

Workshop organization, 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities, Chicago, IL, USA

Following the success of previous editions in 2017 and 2018, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing a strong emphasis on new methodologies that leverage the aforementioned technical developments (e.g., the above-mentioned standard tools from geographic information systems, as well as more advanced methods such as text-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc.). The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences, interested in the application of spatial methods and technology to the humanities, to discuss progress in the field. Participants will explore and demonstrate the contributions to knowledge that modern GIS technologies can enable within and beyond the digital humanities.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores

Toponym Disambiguation in Historical Documents Using Network Analysis of Qualitative Relationships

November 05, 2019

Talk, 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities, Chicago, IL, USA

Authors: Ludovic Moncla, Katherine McDonough, Denis Vigier, Thierry Joliveau, and Alice Brenon
Abstract: In this paper we use network analysis to identify qualitative “neighbors” for toponyms in an eighteenth-century French encyclopedia, but could apply to any entry-based text with annotated toponyms. This method draws on relations in a corpus of articles, which improves disambiguation at a later stage with an external resource. We suggest the network as an alternative to geospatial representation, a useful proxy when no historical gazetteer exists for the source material’s period. Our first experiments have shown that this approach goes beyond a simple text analysis and is able to find relations between toponyms that are not co-occurring in the same documents. Network relations are also usefully compared with disambiguated toponyms to evaluate geographical coverage, and the ways that geographical discourse is expressed, in historical texts.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores

Towards the geoparsing and geocoding of enviromental narratives

April 10, 2019

Talk, Environmental Narratives Workshop, Stels, Switzerland

Abstract: In this talk I briefly describe some of our previous and current works on geographic information retrieval. Then, I introduce some first results that show how our works can be linked to English narratives and particularly how it can be used for geoparsing and geocoding environmental narratives.
Organized by Ross Purves, Olga Koblet, and Ben Adams,

Plateforme de services pour l’extraction automatique d’information géographique

February 15, 2019

Talk, Journée d'études HumaSpatia, Dijon, France

Cartographier les odonymes de Paris citées dans les romans du XIXème siècle

November 06, 2018

Talk, Atelier Humanités Numériques Spatialisées, SAGEO 2018, Montpellier, France

Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.

Expérimentation de méthodes d’extraction d’informations géographiques pour les documents historiques.

November 06, 2018

Talk, Atelier Humanités Numériques Spatialisées, SAGEO 2018, Montpellier, France

Authors: Katherine McDonough, Ludovic Moncla, and Matje van de Camp
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.

Automated geoparsing of paris street names in 19th century novels.

November 07, 2017

Talk, 1st ACM SIGSPATIAL International Workshop on Geospatial Humanities, Redondo Beach, CA, USA

Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau, and Yves-François Le Lay
Abstract: Our project involves building a platform able to retrieve, map and analyze the occurrences of place names in fictional novels published between 1800 and 1914 and whose action occurs wholly or partly in Paris. We describe a proof of concept using queries made via the TXM textual analysis platform for the extraction of street names. Then, we propose a fully automatic process using the named entity recognition (NER) components of the PERDIDO platform. This paper describes some encouraging initial results obtained by combining NLP approaches (NER methods) with textometric tools for the automated geoparsing of street names.
Organized by Bruno Martins and Patricia Murrieta-Flores

Extended Named Entity Recognition Using Finite-State Transducers: An Application To Place Names.

November 07, 2017

Talk, 9th International Conference on Advanced Geographic Information Systems, Applications, and Services, Nice, France

Authors: Mauro Gaio, Ludovic Moncla
Abstract: The textual geographical information is frequently or- ganized around spatial named entities. Such entities have intrinsic ambiguities and Named Entity Recognition and Classification methods should be improved in order to handle this problem. This article describes a knowledge-based method implementing a full process with the aim of annotating in a more precise way the spatial information in the textual documents. This gain in accuracy guarantees a better analysis of the spatial information and a better disambiguation of places. The backbone of our proposal is a construction grammar and a cascaded finite-state transducers. The evaluation shows that the introduced concept of hierarchical overlapping, is very helpful to detect a local context associated with Named Entities.

Pluridisciplinary aspects of NLP and GIS: an application to itinerary reconstruction

September 20, 2017

Poster, RDA 10th Plenary Meeting, Montréal, Canada

Authors: Ludovic Moncla
Abstract: One of the main challenge of this work is to connect text with geographicspaceand to provide a map-based representation of itineraries described intextual documents. The main objectives are:

data mining forGeographic Information Retrieval(GIR),
toponym resolution and disambiguation,
extract and retrieve displacement fromtextual documents.

Geocoding for texts with fine-grain toponyms

November 05, 2014

Talk, 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA

Authors: Ludovic Moncla, Walter Renteria-Agualimpia,Javier Nogueras-Iso, and Mauro Gaio
Abstract: Geoparsing and geocoding are two essential middleware ser- vices to facilitate final user applications such as location- aware searching or different types of location-based services. The objective of this work is to propose a method for es- tablishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly lin- ked with space and with a frequent use of fine-grain topo- nyms. The geoparsing part is a Natural Language Proces- sing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transdu- cers). However, the real novelty of this work lies in the geoco- ding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The fea- sibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.

Automatic itinerary reconstruction from texts

September 25, 2014

Talk, 8th International Conference on Geographic Information Science (GIScience 2014), Vienna, Austria

Authors: Ludovic Moncla, Mauro Gaio, and Sébastien Mustière
Abstract: This paper proposes an approach for the reconstruction of itineraries extracted from narrative texts. This approach is divided into two main tasks. The first extracts geographical information with natural language processing. Its outputs are annotations of so called expanded entities and expressions of displacement or perception from hiking descriptions. In order to reconstruct a plausible footprint of an itinerary described in the text, the second task uses the outputs of the first task to compute a minimum spanning tree.

Topographic subtyping of place named entities: a linguistic approach

May 15, 2013

Talk, 16th AGILE conference on Geographic Information Science, Leuven, Belgium

Authors: Van Tien Nguyen, Mauro Gaio, and Ludovic Moncla
Abstract: The aim of this work is to find sub-types for Place Named Entities, from the analysis of relations between Place Names and a nominal group within a specific phrasal context. The proposed method combines the use of specific intra-sentential lexico-syntactic relations and external resources like gazetteers, thesauri, or ontologies. It relies on expanded spatial named entities recognition transcribed into a symbolic representation expressed in terms of semantic features. This symbolic representation will then be associated with a geo-coded representation, depending on the available resources. Our method is completely implemented and has been tested on a corpus of travelogues.