Posts by Collection

Authors: Van Tien Nguyen, Mauro Gaio, and Ludovic Moncla
Abstract: The aim of this work is to find sub-types for Place Named Entities, from the analysis of relations between Place Names and a nominal group within a specific phrasal context. The proposed method combines the use of specific intra-sentential lexico-syntactic relations and external resources like gazetteers, thesauri, or ontologies. It relies on expanded spatial named entities recognition transcribed into a symbolic representation expressed in terms of semantic features. This symbolic representation will then be associated with a geo-coded representation, depending on the available resources. Our method is completely implemented and has been tested on a corpus of travelogues.

Automatic itinerary reconstruction from texts

Published: September 25, 2014

Authors: Ludovic Moncla, Mauro Gaio, and Sébastien Mustière
Abstract: This paper proposes an approach for the reconstruction of itineraries extracted from narrative texts. This approach is divided into two main tasks. The first extracts geographical information with natural language processing. Its outputs are annotations of so called expanded entities and expressions of displacement or perception from hiking descriptions. In order to reconstruct a plausible footprint of an itinerary described in the text, the second task uses the outputs of the first task to compute a minimum spanning tree.

Geocoding for texts with fine-grain toponyms

Published: November 05, 2014

Authors: Ludovic Moncla, Walter Renteria-Agualimpia,Javier Nogueras-Iso, and Mauro Gaio
Abstract: Geoparsing and geocoding are two essential middleware ser- vices to facilitate final user applications such as location- aware searching or different types of location-based services. The objective of this work is to propose a method for es- tablishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly lin- ked with space and with a frequent use of fine-grain topo- nyms. The geoparsing part is a Natural Language Proces- sing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transdu- cers). However, the real novelty of this work lies in the geoco- ding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The fea- sibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.

Pluridisciplinary aspects of NLP and GIS: an application to itinerary reconstruction

Published: September 20, 2017

Authors: Ludovic Moncla
Abstract: One of the main challenge of this work is to connect text with geographicspaceand to provide a map-based representation of itineraries described intextual documents. The main objectives are:

data mining forGeographic Information Retrieval(GIR),
toponym resolution and disambiguation,
extract and retrieve displacement fromtextual documents.

Extended Named Entity Recognition Using Finite-State Transducers: An Application To Place Names.

Published: November 07, 2017

Authors: Mauro Gaio, Ludovic Moncla
Abstract: The textual geographical information is frequently or- ganized around spatial named entities. Such entities have intrinsic ambiguities and Named Entity Recognition and Classification methods should be improved in order to handle this problem. This article describes a knowledge-based method implementing a full process with the aim of annotating in a more precise way the spatial information in the textual documents. This gain in accuracy guarantees a better analysis of the spatial information and a better disambiguation of places. The backbone of our proposal is a construction grammar and a cascaded finite-state transducers. The evaluation shows that the introduced concept of hierarchical overlapping, is very helpful to detect a local context associated with Named Entities.

Automated geoparsing of paris street names in 19th century novels.

Published: November 07, 2017

Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau, and Yves-François Le Lay
Abstract: Our project involves building a platform able to retrieve, map and analyze the occurrences of place names in fictional novels published between 1800 and 1914 and whose action occurs wholly or partly in Paris. We describe a proof of concept using queries made via the TXM textual analysis platform for the extraction of street names. Then, we propose a fully automatic process using the named entity recognition (NER) components of the PERDIDO platform. This paper describes some encouraging initial results obtained by combining NLP approaches (NER methods) with textometric tools for the automated geoparsing of street names.
Organized by Bruno Martins and Patricia Murrieta-Flores

Expérimentation de méthodes d’extraction d’informations géographiques pour les documents historiques.

Published: November 06, 2018

Authors: Katherine McDonough, Ludovic Moncla, and Matje van de Camp
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.

Cartographier les odonymes de Paris citées dans les romans du XIXème siècle

Published: November 06, 2018

Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.

Plateforme de services pour l’extraction automatique d’information géographique

Published: February 15, 2019

Towards the geoparsing and geocoding of enviromental narratives

Published: April 10, 2019

Abstract: In this talk I briefly describe some of our previous and current works on geographic information retrieval. Then, I introduce some first results that show how our works can be linked to English narratives and particularly how it can be used for geoparsing and geocoding environmental narratives.
Organized by Ross Purves, Olga Koblet, and Ben Adams,

Toponym Disambiguation in Historical Documents Using Network Analysis of Qualitative Relationships

Published: November 05, 2019

Authors: Ludovic Moncla, Katherine McDonough, Denis Vigier, Thierry Joliveau, and Alice Brenon
Abstract: In this paper we use network analysis to identify qualitative “neighbors” for toponyms in an eighteenth-century French encyclopedia, but could apply to any entry-based text with annotated toponyms. This method draws on relations in a corpus of articles, which improves disambiguation at a later stage with an external resource. We suggest the network as an alternative to geospatial representation, a useful proxy when no historical gazetteer exists for the source material’s period. Our first experiments have shown that this approach goes beyond a simple text analysis and is able to find relations between toponyms that are not co-occurring in the same documents. Network relations are also usefully compared with disambiguated toponyms to evaluate geographical coverage, and the ways that geographical discourse is expressed, in historical texts.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores

3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities

Published: November 05, 2019

Following the success of previous editions in 2017 and 2018, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing a strong emphasis on new methodologies that leverage the aforementioned technical developments (e.g., the above-mentioned standard tools from geographic information systems, as well as more advanced methods such as text-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc.). The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences, interested in the application of spatial methods and technology to the humanities, to discuss progress in the field. Participants will explore and demonstrate the contributions to knowledge that modern GIS technologies can enable within and beyond the digital humanities.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores

Spatial Entity Matching with GeoAlign (demo paper)

Published: November 07, 2019

Authors: Nelly Barret, Fabien Duchateau, Franck Favetta, and Ludovic Moncla
Abstract: Points of interest (POI) are central in many applications such as tourism, itinerary search, crisis management. Cartographic providers usually represent these POI with a spatial entity. However, the description of these entities may significantly vary from one provider to another (e.g., missing properties, outdated information, conflicting values). Spatial entity matching (or record linkage) aims at detecting correspondences between entities referring to the same POI. Most existing approaches have a fixed function for combining similarity measures, thus limiting customization. Besides, evaluating the matching quality is a difficult task since a ground truth dataset cannot be built for all entities and providers. In this paper, we describe GeoAlign, an application that allows fine-grained tuning for spatial entity matching. A merging step is also provided using different strategies. Finally, we propose to estimate the quality of correspondences based on the differences between combination functions and to visualize this estimation in GeoAlign.

Journée OpenDataCamp ‘Comment constituer un outil de recherche performant par interaction entre les solutions existantes, les moteurs de recherche génériques et une nouvelle brique à inventer ?’

Published: November 21, 2019

DREAL Open Data Camp 2

GeoDISCO: Encyclopedic Geographical Discourse in France from the Enlightenment to Wikipedia

Published: December 08, 2019

Authors: Denis Vigier, Thierry Joliveau, Ludovic Moncla, Katherine McDonough, and Alice Brenon
Abstract: The GeoDISCO project aims at studying the major changes in encyclopedic geographical discourse in France between 1751 (when the first volume of the Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers, by Diderot and D’Alembert, was published) and today (Wikipedia-France, 2018). Using linguistic and GIS methods to investigate patterns in geographical content will help us understand why authors deployed language in such ways that use place as a scaffold for ideas and practices. The spatial history of French encyclopedias is a foundation for asking broader questions about the relationship between early modern geographical information and digital geographical resources.

Workshop: 13th Workshop on Geographic Information Retrieval (GIR)
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio

13th Workshop on Geographic Information Retrieval (GIR)

Published: December 08, 2019

The 13th Workshop on Geographic Information Retrieval will be held in Lyon, France from the 28th-29th November 2019. This workshop will address all aspects of Geographic Information Retrieval - including but not limited to the provision of methods to retrieve and analyse geo-spatial textual content, identify the geographic scope and relevance rank documents or other resources from both unstructured and partially structured collections.
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio

Adapting and integrating existing open source projects

Published: January 09, 2020

I lead the session about ‘Adapting and integrating existing open source projects’.

Workshop: Ethical Visualization in the Age of Big Data. Contemporary Cultural Implications of Pre-Twentieth-Century French Texts. A workshop to seek interdisciplinary expert perspectives on ethically and visually representing the historical place of misrepresented peoples and locales.

Workshop GAST 2020

Published: January 18, 2020

Le sixième atelier — Gestion et Analyse des données Spatiales et Temporelles (GAST) — sera organisé lors d’EGC 2020. Cet atelier, s’appuyant sur le Groupe de Travail GAST, vise à regrouper les chercheurs, du domaine académique et de l’industrie, qui s’intéressent aux problématiques liées à la prise en compte de l’information temporelle ou spatiale – quantitative ou qualitative – dans leurs processus de gestion et d’analyse de données (méthodes et application de l’extraction, la gestion, la représentation, l’analyse et la visualisation d’informations).

Extraction et visualisation d’information géographique à partir de textes

Published: January 22, 2020

I gave a talk on “Extraction et visualisation d’information géographique à partir de textes”.

Workshop organized for the end of the SOUNDCITYVE project - Archaeology of the sound landscape: a sensitive restoring of the Sounds of the past in the City of Lyon.
Organized by Véronique Eglin

Launch meeting of the ‘Digital Spatial Humanities’ working group

Published: February 13, 2020

Launch meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.

Meeting of the ‘Digital Spatial Humanities’ working group

Published: February 13, 2020

Meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.

Séminaire au laboratoire ERIC (Lyon)

Published: November 16, 2020

Séminaire du laboratoire ERIC sur le thème du TAL et de l’apprentissage automatique appliqués au geparsing et à l’analyse géo-sémantique de textes.

Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales

Published: May 06, 2021

Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales

TAL et apprentissage automatique pour le geoparsing de textes historiques

Published: May 10, 2021

TAL et apprentissage automatique pour le geoparsing de textes historiques

Workshop Données et discours géographiques en France au 18e siècle

Published: June 15, 2021

Workshop Données et discours géographiques en France au 18e siècle

Séminaire au laboratoire ICAR (ENS Lyon)

Published: January 17, 2022

Séminaire du laboratoire ICAR sur le thème de la “Combinaison d’approches qualitative et quantitative pour le repérage et la classification des entités nommées dans l’Encyclopédie de Diderot et d’Alembert (1751-1772)”

Atelier GAST EGC 2022

Published: January 25, 2022

Atelier GAST – Gestion et Analyse de données Spatiales et Temporelles co-organisé avec Thomas Guyet, Eric Kergosien et Christian Sallaberry lors de la conférence EGC 2022 à Blois.

Tutorial - Natural Language Processing (NLP) for historical texts

Published: June 23, 2022

Materials for the SunoikisisDC Summer 2022 Course on Natural Language Processing (NLP) for historical texts (Session 9)

Tutorial: https://github.com/ludovicmoncla/SunoikisisDC-Summer2022-Session9
Youtube link: https://youtu.be/7NK2KyP2BYs

In this tutorial, we demonstrate how to use a custom version of the Perdido geoparser python library developed in the framework of the GEODE project. We will use texts from Diderot and d’Alembert’s Encyclopédie as a case study for querying a corpus and wrangling geoparsed data. We will also compare Perdido’s NER annotations (e.g. its output) to the results of other well-known python NER libraries (spaCy and Stanza).
Organized by Ludovic Moncla and Katherine McDonough

Atelier TALN 2022

Published: June 27, 2022

Atelier TAL & Humanités Numériques co-organisé avec Carmen Brando dans le cadre de la conférence TALN 2022 à Avignon.

Formation ANF TDM CNRS 2022

Published: October 05, 2022

Atelier Librairies Python et Services Web pour la reconnaissance d’entités nommées et la résolution de toponymes organisé dans le cadre de la formation ANF TDM 2022 du CNRS (Exploration documentaire et extraction d’information).

Le support de formation est disponible ici : https://gitlab.liris.cnrs.fr/lmoncla/tutoriel-anf-tdm-2022-python-geoparsing

Présentation :
Cet atelier a pour objectif de présenter l’utilisation de librairies Python (ie. NLTK, Stacy, Stanza) et de services Web (ie. PERDIDO) pour l’extraction d’entités nommées à partir de textes. Nous nous intéresserons en particulier au repérage des noms de lieux et à leur localisation sur une carte géographique. Nous mettrons en avant la simplicité d’utilisation de ces outils mais également leur limites.
Programme :
Introduction et comparaison de différents outils de NER : librairies Python (NTLK, Spacy et Stanza), et Services Web (Perdido) Sélection des outils en fonction des corpus (nature des textes, choix de la langue, etc) Les expérimentations seront réalisées sur 2 cas d’application : descriptions de randonnées et articles encyclopédiques Notebook en ligne (Google Collab’) pour développer des prototypes d’applications faciles à utiliser et intuitifs en Python

6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Published: November 02, 2022

6th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, and Katherine McDonough

Vers une cartographie de l’Encyclopédie de Diderot et d’Alembert

Published: March 16, 2023

La littérature au prisme des humanités numériques

Journée d’étude organisée par Glenn Roe et Motasem Alrahabi (ObTIC - Sorbonne Université) le 16 mars 2023 à Institut d’études avancées de Paris
Présentation des travaux menés dans le cadre du projet GEODE.

Un projet cartographique pour l’Encyclopédie

Published: March 16, 2023

Séminaire d’histoire des sciences astronomiques

Présentation des travaux menés dans le cadre du projet GEODE autour du traitement des entités nommées et des coordonnées géographiques.
Vidéo du séminaire : https://syrteplay.obspm.fr/w/av4sK33GmPgamWVQTd7Mko

Perdido: Python library for geoparsing and geocoding French texts

Published: April 02, 2023

Authors: Ludovic Moncla, Mauro Gaio
Abstract: This paper introduces the Perdido Python library for geoparsing and geocoding French texts. The architecture of the Perdido Geoparser, which includes three layers: back-office, API, and Python library, is outlined. We also provide details on the methods used in the development of the processing chain and the various tasks covered, such as named entity recognition and classification (NERC), and toponym resolution. Lastly, we showcase the different features of the Python library and explain how to use it. The library is built as an overlay using API services, enabling users to manipulate, visualize, and export the results of geoparsing and geocoding. A Jupyter notebook is also provided to demonstrate all the functionalities implemented in the library.

7th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Published: November 13, 2023

7th ACM SIGSPATIAL International Workshop on Geospatial Humanities

Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, Katherine McDonough, and Xuke Hu

Atelier HNS MAGIS 2023

Published: November 24, 2023

Atelier Humanités Numériques Spatialisées co-organisé avec Carmen Brando lors des journées MAGIS 2023 à Bordeaux.

PhD jury member

Published: January 24, 2024

Jury member for Lucie Cadorel’s Ph.D. Defense at INRIA Sophia-Antipolis, France.

GeoEDdA: A Gold Standard Dataset for Geo-semantic Annotation of Diderot & d’Alembert’s Encyclopédie

Published: March 24, 2024

Authors: Ludovic Moncla, Denis Vigier, Katherine McDonough
Abstract: This paper describes the methodology for creating GeoEDdA, a gold standard dataset of geo-semantic annotations from entries in Diderot and d’Alembert’s eighteenth-century Encyclopédie. Aiming to explore spatial information beyond toponyms identified with the commonly used Named Entity Recognition (NER) task, we test the newer span categorization task as an approach for retrieving complex references to places, generic spatial terms, other entities, and relations. We test an active learning method, using the Prodigy web-based tool to iteratively train a machine learning span categorization model. The resulting dataset includes labeled spans from 2,200 paragraphs. As a preliminary experiment, a custom spaCy spancat model demonstrates strong overall performance, achieving an F-score of 86.42%. Evaluations for each span category reveal strengths in recognizing spatial entities and persons (including nominal entities, named entities and nested entities).

Spring Data/Culture Workshop: Search inside maps with MapReader

Published: May 01, 2024

Spring Data/Culture Workshop: Search inside maps with MapReader

MapReader, which received the 2023 Roy Rosenweig Prize for Innovation in Digital History from the American Historical Association, is a software library that was designed for humanities research with big digitised map collections. It was developed on the recently concluded Living with Machines project, but it has been created with the wider community of historians in mind as future users.

Propositions pour une étude interdisciplinaire de la géographie dans un dictionnaire universel et une encyclopédie du XVIIIe siècle

Published: June 06, 2024

Authors: Ludovic Moncla, Denis Vigier, Thierry Joliveau
Conçue dans une perspective interdisciplinaire, cette intervention se propose de mettre en lumière la manière dont les évolutions techniques dans le champ des humanités numériques du traitement automatique de la langue et des systèmes d’information géographique permettent de progresser dans notre connaissance des discours encyclopédiques tenus dans le domaine de la géographie au XVIIIe s.

A digital exploration of geographic knowledge in Diderot and d’Alembert’s Encyclopédie

Published: July 05, 2024

Authors: Thierry Joliveau, Ludovic Moncla, Antoine Taroni, Denis Vigier and Katie McDonough
How was geography communicated in Diderot and d’Alembert’s Encyclopédie (1751-72)? In this presentation, the interdisciplinary GEODE team will investigate the role of geographical knowledge within this encyclopedia, as part of our larger project to study these themes across French encyclopedias from the eighteenth to the twenty-first centuries. The Encyclopédie consists of 17 volumes of text (~74k articles, or 22M words) and 11 volumes of plates. And yet, the latter contain no maps. Apart from some pages (vol. 5 of plates) related to the “construction of globes” all the geography of the Encyclopédie is contained in the volumes of text. Addressing critics of the approach to the selection of knowledge shared throughout the work in his “Encyclopédie” article (vol. 5 of text), Diderot argues that an Encyclopedia might be seen as “dry”, but that its role is to share geographical knowledge of places that is “scientific”. It should be able to be used to “create good maps”. We take up Diderot’s call, using information retrieval and spatial analysis to create a dataset of all place names in the Encyclopédie, identify historical spatial coordinates as reported in the text, and connect all named places to modern coordinates through entity linking. The resulting dataset allows us to map the Encyclopédie. We explore the spatial coverage of the text, including the outsized representation of certain parts of the world, like France. In addition to this explicit geospatial approach to the data, we use network analysis to explore references to places across articles and volumes. Using such a variety of methods, for the first time, we name, define, classify, and locate, and map places in this key Enlightenment text.

From BERT Fine-Tuning to LLM Prompting

Published: September 10, 2024

Invited talk at the first workshop of the GeoLiaison PHC project organized by Davide Buscaldi (LIPN) and Jochen L. Leidner (Coburg U.): https://sites.google.com/view/geoliaison

Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models

Published: November 14, 2024

Talk at the conference “A Conversation between AI and the Humanities”: https://caih.sciencesconf.org

Évaluation des Grands Modèles de Langage pour la Reconnaissance d’Entités Nommées

Published: November 26, 2024

Talk at the conference “Un chat à la fac de lettres?” organized by the Huma-Num ARIANE consortium: https://csthn-ariane.sciencesconf.org

Evaluation of Transformer Models (from BERT to GPT) for Geographic Information Recognition

Published: December 11, 2024

Invited talk at the final conference of the Digital Humanities and Artificial Intelligence Thematic Semester supported by the CNRS center of Artificial intelligence for science, science for artificial intelligence (AISSAI): https://semtemiahn.hypotheses.org/final-conference

L’usage de l’IA pour une étude interdisciplinaire de la géographie dans l’Encyclopédie de Diderot et d’Alembert

Published: October 16, 2025

Présentation invitée à l’atelier “Pratiques d’intelligence artificielle appliquées aux données spatiales et géographiques” organisé par le RnMSH - Réseau national des Maisons des Sciences sociales et des Humanités

Segmentation de corpus lexicographiques numérisés à l’aide de LLMs : étude du Dictionnaire Universel François-Latin et de La Grande Encyclopédie

Published: November 06, 2025

Présentation invitée à la journée IXXI 2025. Le programme complet de la journée est disponible ici : (https://www.ixxi.fr/evenements/journee-ixxi-2025)[https://www.ixxi.fr/evenements/journee-ixxi-2025]