Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in Journal 1, 2009
This paper is about the number 1. The number 2 is left for future work.
Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides
Published in Journal 1, 2010
This paper is about the number 2. The number 3 is left for future work.
Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides
Published in Journal 1, 2015
This paper is about the number 3. The number 4 is left for future work.
Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides
Published in GitHub Journal of Bugs, 2024
This paper is about fixing template issue #693.
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper
Published:
Authors: Van Tien Nguyen, Mauro Gaio, and Ludovic Moncla
Abstract: The aim of this work is to find sub-types for Place Named Entities, from the analysis of relations between Place Names and a nominal group within a specific phrasal context. The proposed method combines the use of specific intra-sentential lexico-syntactic relations and external resources like gazetteers, thesauri, or ontologies. It relies on expanded spatial named entities recognition transcribed into a symbolic representation expressed in terms of semantic features. This symbolic representation will then be associated with a geo-coded representation, depending on the available resources. Our method is completely implemented and has been tested on a corpus of travelogues.
Published:
Authors: Ludovic Moncla, Mauro Gaio, and Sébastien Mustière
Abstract: This paper proposes an approach for the reconstruction of itineraries extracted from narrative texts. This approach is divided into two main tasks. The first extracts geographical information with natural language processing. Its outputs are annotations of so called expanded entities and expressions of displacement or perception from hiking descriptions. In order to reconstruct a plausible footprint of an itinerary described in the text, the second task uses the outputs of the first task to compute a minimum spanning tree.
Published:
Authors: Ludovic Moncla, Walter Renteria-Agualimpia,Javier Nogueras-Iso, and Mauro Gaio
Abstract: Geoparsing and geocoding are two essential middleware ser- vices to facilitate final user applications such as location- aware searching or different types of location-based services. The objective of this work is to propose a method for es- tablishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly lin- ked with space and with a frequent use of fine-grain topo- nyms. The geoparsing part is a Natural Language Proces- sing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transdu- cers). However, the real novelty of this work lies in the geoco- ding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The fea- sibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.
Published:
Authors: Ludovic Moncla
Abstract: One of the main challenge of this work is to connect text with geographicspaceand to provide a map-based representation of itineraries described intextual documents. The main objectives are:
Published:
Authors: Mauro Gaio, Ludovic Moncla
Abstract: The textual geographical information is frequently or- ganized around spatial named entities. Such entities have intrinsic ambiguities and Named Entity Recognition and Classification methods should be improved in order to handle this problem. This article describes a knowledge-based method implementing a full process with the aim of annotating in a more precise way the spatial information in the textual documents. This gain in accuracy guarantees a better analysis of the spatial information and a better disambiguation of places. The backbone of our proposal is a construction grammar and a cascaded finite-state transducers. The evaluation shows that the introduced concept of hierarchical overlapping, is very helpful to detect a local context associated with Named Entities.
Published:
Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau, and Yves-François Le Lay
Abstract: Our project involves building a platform able to retrieve, map and analyze the occurrences of place names in fictional novels published between 1800 and 1914 and whose action occurs wholly or partly in Paris. We describe a proof of concept using queries made via the TXM textual analysis platform for the extraction of street names. Then, we propose a fully automatic process using the named entity recognition (NER) components of the PERDIDO platform. This paper describes some encouraging initial results obtained by combining NLP approaches (NER methods) with textometric tools for the automated geoparsing of street names.
Organized by Bruno Martins and Patricia Murrieta-Flores
Published:
Authors: Katherine McDonough, Ludovic Moncla, and Matje van de Camp
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.
Published:
Authors: Ludovic Moncla, Mauro Gaio, Thierry Joliveau
Abstract: In this article, we address two gaps in NLP research: working with his- torical French and working with complex textual structures moving beyond running text or lists of place names. Our methodology is based on the evaluation of the results of two spatial named entity recognition tools in the context of early modern document analysis structured as dictionaries.
Organized by Carmen Brando, Francesca Frontini, and Mathieu Roche.
Published:
Abstract: In this talk I briefly describe some of our previous and current works on geographic information retrieval. Then, I introduce some first results that show how our works can be linked to English narratives and particularly how it can be used for geoparsing and geocoding environmental narratives.
Organized by Ross Purves, Olga Koblet, and Ben Adams,
Published:
Authors: Ludovic Moncla, Katherine McDonough, Denis Vigier, Thierry Joliveau, and Alice Brenon
Abstract: In this paper we use network analysis to identify qualitative “neighbors” for toponyms in an eighteenth-century French encyclopedia, but could apply to any entry-based text with annotated toponyms. This method draws on relations in a corpus of articles, which improves disambiguation at a later stage with an external resource. We suggest the network as an alternative to geospatial representation, a useful proxy when no historical gazetteer exists for the source material’s period. Our first experiments have shown that this approach goes beyond a simple text analysis and is able to find relations between toponyms that are not co-occurring in the same documents. Network relations are also usefully compared with disambiguated toponyms to evaluate geographical coverage, and the ways that geographical discourse is expressed, in historical texts.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores
Published:
Following the success of previous editions in 2017 and 2018, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing a strong emphasis on new methodologies that leverage the aforementioned technical developments (e.g., the above-mentioned standard tools from geographic information systems, as well as more advanced methods such as text-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc.). The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences, interested in the application of spatial methods and technology to the humanities, to discuss progress in the field. Participants will explore and demonstrate the contributions to knowledge that modern GIS technologies can enable within and beyond the digital humanities.
Organized by Bruno Martins, Ludovic Moncla and Patricia Murrieta-Flores
Published:
Authors: Nelly Barret, Fabien Duchateau, Franck Favetta, and Ludovic Moncla
Abstract: Points of interest (POI) are central in many applications such as tourism, itinerary search, crisis management. Cartographic providers usually represent these POI with a spatial entity. However, the description of these entities may significantly vary from one provider to another (e.g., missing properties, outdated information, conflicting values). Spatial entity matching (or record linkage) aims at detecting correspondences between entities referring to the same POI. Most existing approaches have a fixed function for combining similarity measures, thus limiting customization. Besides, evaluating the matching quality is a difficult task since a ground truth dataset cannot be built for all entities and providers. In this paper, we describe GeoAlign, an application that allows fine-grained tuning for spatial entity matching. A merging step is also provided using different strategies. Finally, we propose to estimate the quality of correspondences based on the differences between combination functions and to visualize this estimation in GeoAlign.
Published:
Authors: Denis Vigier, Thierry Joliveau, Ludovic Moncla, Katherine McDonough, and Alice Brenon
Abstract: The GeoDISCO project aims at studying the major changes in encyclopedic geographical discourse in France between 1751 (when the first volume of the Encyclopédie ou dictionnaire raisonné des sciences, des arts et des métiers, by Diderot and D’Alembert, was published) and today (Wikipedia-France, 2018). Using linguistic and GIS methods to investigate patterns in geographical content will help us understand why authors deployed language in such ways that use place as a scaffold for ideas and practices. The spatial history of French encyclopedias is a foundation for asking broader questions about the relationship between early modern geographical information and digital geographical resources.
Workshop: 13th Workshop on Geographic Information Retrieval (GIR)
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio
Published:
The 13th Workshop on Geographic Information Retrieval will be held in Lyon, France from the 28th-29th November 2019. This workshop will address all aspects of Geographic Information Retrieval - including but not limited to the provision of methods to retrieve and analyse geo-spatial textual content, identify the geographic scope and relevance rank documents or other resources from both unstructured and partially structured collections.
Organized by Ross Purves, Chris Jones, Ludovic Moncla and Mauro Gaio
Published:
I lead the session about ‘Adapting and integrating existing open source projects’.
Workshop: Ethical Visualization in the Age of Big Data. Contemporary Cultural Implications of Pre-Twentieth-Century French Texts. A workshop to seek interdisciplinary expert perspectives on ethically and visually representing the historical place of misrepresented peoples and locales.
Published:
Le sixième atelier — Gestion et Analyse des données Spatiales et Temporelles (GAST) — sera organisé lors d’EGC 2020. Cet atelier, s’appuyant sur le Groupe de Travail GAST, vise à regrouper les chercheurs, du domaine académique et de l’industrie, qui s’intéressent aux problématiques liées à la prise en compte de l’information temporelle ou spatiale – quantitative ou qualitative – dans leurs processus de gestion et d’analyse de données (méthodes et application de l’extraction, la gestion, la représentation, l’analyse et la visualisation d’informations).
Published:
I gave a talk on “Extraction et visualisation d’information géographique à partir de textes”.
Workshop organized for the end of the SOUNDCITYVE project - Archaeology of the sound landscape: a sensitive restoring of the Sounds of the past in the City of Lyon.
Organized by Véronique Eglin
Published:
Launch meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.
Published:
Meeting of the ‘Digital Spatial Humanities’ working group of the GDR CNRS MAGIS.
Published:
Séminaire du laboratoire ERIC sur le thème du TAL et de l’apprentissage automatique appliqués au geparsing et à l’analyse géo-sémantique de textes.
Published:
Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales
Published:
TAL et apprentissage automatique pour le geoparsing de textes historiques
Published:
Workshop Données et discours géographiques en France au 18e siècle
Published:
Séminaire du laboratoire ICAR sur le thème de la “Combinaison d’approches qualitative et quantitative pour le repérage et la classification des entités nommées dans l’Encyclopédie de Diderot et d’Alembert (1751-1772)”
Published:
Atelier GAST – Gestion et Analyse de données Spatiales et Temporelles co-organisé avec Thomas Guyet, Eric Kergosien et Christian Sallaberry lors de la conférence EGC 2022 à Blois.
Published:
Materials for the SunoikisisDC Summer 2022 Course on Natural Language Processing (NLP) for historical texts (Session 9)
Tutorial: https://github.com/ludovicmoncla/SunoikisisDC-Summer2022-Session9
Youtube link: https://youtu.be/7NK2KyP2BYs
In this tutorial, we demonstrate how to use a custom version of the Perdido geoparser python library developed in the framework of the GEODE project. We will use texts from Diderot and d’Alembert’s Encyclopédie as a case study for querying a corpus and wrangling geoparsed data. We will also compare Perdido’s NER annotations (e.g. its output) to the results of other well-known python NER libraries (spaCy and Stanza).
Organized by Ludovic Moncla and Katherine McDonough
Published:
Atelier TAL & Humanités Numériques co-organisé avec Carmen Brando dans le cadre de la conférence TALN 2022 à Avignon.
Published:
Atelier Librairies Python et Services Web pour la reconnaissance d’entités nommées et la résolution de toponymes organisé dans le cadre de la formation ANF TDM 2022 du CNRS (Exploration documentaire et extraction d’information).
Le support de formation est disponible ici : https://gitlab.liris.cnrs.fr/lmoncla/tutoriel-anf-tdm-2022-python-geoparsing
Présentation :
Cet atelier a pour objectif de présenter l’utilisation de librairies Python (ie. NLTK, Stacy, Stanza) et de services Web (ie. PERDIDO) pour l’extraction d’entités nommées à partir de textes. Nous nous intéresserons en particulier au repérage des noms de lieux et à leur localisation sur une carte géographique. Nous mettrons en avant la simplicité d’utilisation de ces outils mais également leur limites.
Programme :
Introduction et comparaison de différents outils de NER : librairies Python (NTLK, Spacy et Stanza), et Services Web (Perdido) Sélection des outils en fonction des corpus (nature des textes, choix de la langue, etc) Les expérimentations seront réalisées sur 2 cas d’application : descriptions de randonnées et articles encyclopédiques Notebook en ligne (Google Collab’) pour développer des prototypes d’applications faciles à utiliser et intuitifs en Python
Published:
6th ACM SIGSPATIAL International Workshop on Geospatial Humanities
Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, and Katherine McDonough
Published:
La littérature au prisme des humanités numériques
Journée d’étude organisée par Glenn Roe et Motasem Alrahabi (ObTIC - Sorbonne Université) le 16 mars 2023 à Institut d’études avancées de Paris
Présentation des travaux menés dans le cadre du projet GEODE.
Published:
Séminaire d’histoire des sciences astronomiques
Présentation des travaux menés dans le cadre du projet GEODE autour du traitement des entités nommées et des coordonnées géographiques.
Vidéo du séminaire : https://syrteplay.obspm.fr/w/av4sK33GmPgamWVQTd7Mko
Published:
Authors: Ludovic Moncla, Mauro Gaio
Abstract: This paper introduces the Perdido Python library for geoparsing and geocoding French texts. The architecture of the Perdido Geoparser, which includes three layers: back-office, API, and Python library, is outlined. We also provide details on the methods used in the development of the processing chain and the various tasks covered, such as named entity recognition and classification (NERC), and toponym resolution. Lastly, we showcase the different features of the Python library and explain how to use it. The library is built as an overlay using API services, enabling users to manipulate, visualize, and export the results of geoparsing and geocoding. A Jupyter notebook is also provided to demonstrate all the functionalities implemented in the library.
Published:
7th ACM SIGSPATIAL International Workshop on Geospatial Humanities
Following the success of previous editions, this workshop concerns with the use of geographic information systems and other spatial technologies in humanities research, placing an emphasis on new methodologies that leverage the aforementioned technical developments. The standard tools from geographic information systems, as well as more advanced methods such as text- and image-based geographical analysis or spatial simulation, can all benefit from innovative approaches leveraging machine learning, parallel and/or distributed computation, semantic technologies, etc. on humanities sources like archival manuscripts, maps, encyclopedias, newspapers, correspondence collections and more. These kinds of documents pose new challenges for identifying and analyzing spatial information. The workshop aims to bring together researchers and practitioners from different sub-fields of computer science and the geographical information sciences interested in the application of spatial methods and technology to the humanities to discuss how to address these issues in ways that generate new knowledge in multiple disciplines. Participants will demonstrate their contributions and explore how modern GIS and other technologies can inform, and be inspired by, the digital humanities.
Organized by Ludovic Moncla, Bruno Martins, Katherine McDonough, and Xuke Hu
Published:
Atelier Humanités Numériques Spatialisées co-organisé avec Carmen Brando lors des journées MAGIS 2023 à Bordeaux.
Published:
Jury member for Lucie Cadorel’s Ph.D. Defense at INRIA Sophia-Antipolis, France.
Published:
Authors: Ludovic Moncla, Denis Vigier, Katherine McDonough
Abstract: This paper describes the methodology for creating GeoEDdA, a gold standard dataset of geo-semantic annotations from entries in Diderot and d’Alembert’s eighteenth-century Encyclopédie. Aiming to explore spatial information beyond toponyms identified with the commonly used Named Entity Recognition (NER) task, we test the newer span categorization task as an approach for retrieving complex references to places, generic spatial terms, other entities, and relations. We test an active learning method, using the Prodigy web-based tool to iteratively train a machine learning span categorization model. The resulting dataset includes labeled spans from 2,200 paragraphs. As a preliminary experiment, a custom spaCy spancat model demonstrates strong overall performance, achieving an F-score of 86.42%. Evaluations for each span category reveal strengths in recognizing spatial entities and persons (including nominal entities, named entities and nested entities).
Published:
Spring Data/Culture Workshop: Search inside maps with MapReader
MapReader, which received the 2023 Roy Rosenweig Prize for Innovation in Digital History from the American Historical Association, is a software library that was designed for humanities research with big digitised map collections. It was developed on the recently concluded Living with Machines project, but it has been created with the wider community of historians in mind as future users.
Published:
Authors: Ludovic Moncla, Denis Vigier, Thierry Joliveau
Conçue dans une perspective interdisciplinaire, cette intervention se propose de mettre en lumière la manière dont les évolutions techniques dans le champ des humanités numériques du traitement automatique de la langue et des systèmes d’information géographique permettent de progresser dans notre connaissance des discours encyclopédiques tenus dans le domaine de la géographie au XVIIIe s.
Published:
Authors: Thierry Joliveau, Ludovic Moncla, Antoine Taroni, Denis Vigier and Katie McDonough
How was geography communicated in Diderot and d’Alembert’s Encyclopédie (1751-72)? In this presentation, the interdisciplinary GEODE team will investigate the role of geographical knowledge within this encyclopedia, as part of our larger project to study these themes across French encyclopedias from the eighteenth to the twenty-first centuries. The Encyclopédie consists of 17 volumes of text (~74k articles, or 22M words) and 11 volumes of plates. And yet, the latter contain no maps. Apart from some pages (vol. 5 of plates) related to the “construction of globes” all the geography of the Encyclopédie is contained in the volumes of text. Addressing critics of the approach to the selection of knowledge shared throughout the work in his “Encyclopédie” article (vol. 5 of text), Diderot argues that an Encyclopedia might be seen as “dry”, but that its role is to share geographical knowledge of places that is “scientific”. It should be able to be used to “create good maps”. We take up Diderot’s call, using information retrieval and spatial analysis to create a dataset of all place names in the Encyclopédie, identify historical spatial coordinates as reported in the text, and connect all named places to modern coordinates through entity linking. The resulting dataset allows us to map the Encyclopédie. We explore the spatial coverage of the text, including the outsized representation of certain parts of the world, like France. In addition to this explicit geospatial approach to the data, we use network analysis to explore references to places across articles and volumes. Using such a variety of methods, for the first time, we name, define, classify, and locate, and map places in this key Enlightenment text.
Published:
Invited talk at the first workshop of the GeoLiaison PHC project organized by Davide Buscaldi (LIPN) and Jochen L. Leidner (Coburg U.): https://sites.google.com/view/geoliaison
Published:
Talk at the conference “A Conversation between AI and the Humanities”: https://caih.sciencesconf.org
Published:
Talk at the conference “Un chat à la fac de lettres?” organized by the Huma-Num ARIANE consortium: https://csthn-ariane.sciencesconf.org
Published:
Invited talk at the final conference of the Digital Humanities and Artificial Intelligence Thematic Semester supported by the CNRS center of Artificial intelligence for science, science for artificial intelligence (AISSAI): https://semtemiahn.hypotheses.org/final-conference