Information and Communication > Home > Information Retrieval, Document and Semantic Web > Issue

Vol 1 - Issue 1

Information Retrieval, Document and Semantic Web

List of Articles

Keyword Representations in Semantic Vector Space: a Keyword Assignment Method for Automatic Document Indexing

Jean-François Chartier, Dominic Forest

With the extremely rapid growth of the amount of digital documents in our societies, automatic keyword
indexing has become a central research issue in information retrieval and document management. Several scientific
competitions dealing with automatic indexing tasks have emerged in recent years. This article reports our participation in
one of them, the 2016 edition of Défi Fouille de Texte (DEFT-2016). First, we present a state of the art regarding the
importance, the issues and the challenges of automatic keyword indexing. After presenting the context and the task of the
DEFT-2016, we introduce the method we have developed. This method is based on the construction of a keyword
semantic vector space. The evaluation of our method and the analysis of the results suggest that our approach is
particularly adapted to automatic keyword indexing tasks which require a large proportion of controlled keyword
assignment that are absent from the text content of the documents.

Document vector embeddings for bibliographic records indexing

Morgane Marchand, Geoffroy Fouquier, Emmanuel Marchand, Guillaume Pitel

This article presents the eXenSa contribution to the 2016 DEFT shared task. The proposed task consists in indexing
bibliographic records with keywords chosen by professional indexers. We propose a statistical approach which combines graphical and
semantic approaches. The first approach defines a document keywords as thesaurus terms graphically similar to terms contained in
the title or the abstract of this document. The second approach assigns to document the keywords associated with semantically similar
documents in training corpora. Both approaches use vector space models generated using NC-ISC, a stochastic matrix factorisation
algorithm. Our system obtains the best F-score on half of the four test corpora and ranks second for the two others.

Automatic indexing of scientific papers Presentation and results of DEFT 2016 text mining challenge

Béatrice Daille, Sabine Barreaux, Adrien Bougouin, Florian Boudin, Damien Cram, Amir Hazem

This paper presents the 2016 edition of the DEFT text mining challenge. This edition adresses the keyword-based
indexing of scientific papers with the aim of simulating a professional indexer. The corpus is composed of French bibliographic records
from four domains : linguistics, information science, archaeology and chemisty. The results have been evaluated in terms of precision,
recall and f-measure computed on stemmed texts against a reference manual indexation.

A graph-based ranking approach for indexing in specialised domains

Adrien Bougouin, Florian Boudin, Béatrice Daille

This article presents the participation of the TALN group at LINA to the défi fouille de textes (DEFT) 2016. Developed
specifically for automatic keyphrase annotation, we propose a new method, TopicCoRank, extracting the most important phrases from a
document and providing key-phrases that do not occur in the document. Our system ranked third out of a total of five systems.

Notes about the Computational Journalism workshop 2017

Laurent Amsaleg, Vincent Claveau

This short paper gives an overview of the presentations and discussions held during the "Computational Journalism" workshop. This workshop was proposed by Laurent Amsaleg (CNRS, IRISA), Vincent Claveau (CNRS, IRISA) and Xavier Tannier (LIMSI- Univ. Paris Sud). It took place during the EGC2017 conference in Grenoble, France.

Other issues :

2019

Volume 19- 3

Issue 1

2018

Volume 18- 2

Issue 1

2017

Volume 17- 1

Issue 1