Titre : RFreeStem: A language and rule-free stemmer Auteurs : Xavier Baril, Oihana Coustié, Josiane Mothe, Olivier Teste, Revue : Open Journal in Information Systems Engineering Numéro : Issue 1 Volume : 2 Date : 2021/01/19 DOI : 10.21494/ISTE.OP.2021.0605 ISSN : 2634-1468 Résumé : With the large expansion of available textual data, text mining has become of special interest. Due to their unstructured nature, such data require important preprocessing steps. Among them, stemming algorithms conflate the variants of words into their stems. However, the most popular algorithms are rule-based, and therefore highly languagedependent. In contrast, corpus-based stemmers often exhibit significant algorithmic complexity, making them inefficient. They do not necessarily provide the extracted stems either, which are required for certain text mining tasks. We propose a new approach, RFreeStem, that is corpus-based and can therefore be applied on many languages. The implementation of our method is flexible and efficient, since it relies on a single running through the words’ n-grams. We also detail a method to extract the stems. Our experiments show that RFreeStem improves the results of text mining tasks, even more than the Porter reference, while providing a stemming solution on poorly endowed languages, which do not benefit from a version of Porter. Éditeur : ISTE OpenScience