Information and Communication > Home > Information Retrieval, Document and Semantic Web > Issue
With the extremely rapid growth of the amount of digital documents in our societies, automatic keyword
indexing has become a central research issue in information retrieval and document management. Several scientific
competitions dealing with automatic indexing tasks have emerged in recent years. This article reports our participation in
one of them, the 2016 edition of Défi Fouille de Texte (DEFT-2016). First, we present a state of the art regarding the
importance, the issues and the challenges of automatic keyword indexing. After presenting the context and the task of the
DEFT-2016, we introduce the method we have developed. This method is based on the construction of a keyword
semantic vector space. The evaluation of our method and the analysis of the results suggest that our approach is
particularly adapted to automatic keyword indexing tasks which require a large proportion of controlled keyword
assignment that are absent from the text content of the documents.
This article presents the eXenSa contribution to the 2016 DEFT shared task. The proposed task consists in indexing
bibliographic records with keywords chosen by professional indexers. We propose a statistical approach which combines graphical and
semantic approaches. The first approach defines a document keywords as thesaurus terms graphically similar to terms contained in
the title or the abstract of this document. The second approach assigns to document the keywords associated with semantically similar
documents in training corpora. Both approaches use vector space models generated using NC-ISC, a stochastic matrix factorisation
algorithm. Our system obtains the best F-score on half of the four test corpora and ranks second for the two others.
This paper presents the 2016 edition of the DEFT text mining challenge. This edition adresses the keyword-based
indexing of scientific papers with the aim of simulating a professional indexer. The corpus is composed of French bibliographic records
from four domains : linguistics, information science, archaeology and chemisty. The results have been evaluated in terms of precision,
recall and f-measure computed on stemmed texts against a reference manual indexation.
This article presents the participation of the TALN group at LINA to the défi fouille de textes (DEFT) 2016. Developed
specifically for automatic keyphrase annotation, we propose a new method, TopicCoRank, extracting the most important phrases from a
document and providing key-phrases that do not occur in the document. Our system ranked third out of a total of five systems.
This short paper gives an overview of the presentations and discussions held during the "Computational Journalism" workshop. This workshop was proposed by Laurent Amsaleg (CNRS, IRISA), Vincent Claveau (CNRS, IRISA) and Xavier Tannier (LIMSI- Univ. Paris Sud). It took place during the EGC2017 conference in Grenoble, France.