ARLSTem2 Arabic Light Stemmer The details about the implementation of this algorithm are described in: K. Abainia and H. Rebbani, Comparing the Effectiveness of the Improved ARLSTem Algorithm with Existing Arabic Light Stemmers, International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS’19), Skikda, Algeria, December 15-16, 2019. ARLSTem2 is an Arabic light stemmer based on removing the affixes from the words (i.e. prefixes, suffixes and infixes). It is an improvement of the previous Arabic light stemmer (ARLSTem). The new version was compared to the original algorithm and several existing Arabic light stemmers, where the results showed that the new version considerably improves the under-stemming errors that are common to light stemmers. Both ARLSTem and ARLSTem2 can be run online and do not use any dictionary.
- class nltk.stem.arlstem2.ARLSTem2¶
Return a stemmed Arabic word after removing affixes. This an improved version of the previous algorithm, which reduces under-stemming errors. Typically used in Arabic search engine, information retrieval and NLP.
>>> from nltk.stem import arlstem2 >>> stemmer = ARLSTem2() >>> word = stemmer.stem('يعمل') >>> print(word)
token (unicode) – The input Arabic word (unicode) to be stemmed
A unicode Arabic word
call this function to get the first stem
Strip affixes from the token and return the stem.
token (str) – The token that should be stemmed.
normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning.
remove prefixes from the words’ beginning.
remove the infixes from adjectives
remove the suffixes from the word’s ending.
transform the word from the feminine form to the masculine form.
transform the word from the plural form to the singular form.
stem the verb prefixes and suffixes or both
stem the present tense co-occurred prefixes and suffixes
stem the future tense co-occurred prefixes and suffixes
stem the present tense suffixes
stem the present tense prefixes
stem the future tense prefixes
stem the imperative tense prefixes