nltk.stem.arlstem2 module

ARLSTem2 Arabic Light Stemmer The details about the implementation of this algorithm are described in: K. Abainia and H. Rebbani, Comparing the Effectiveness of the Improved ARLSTem Algorithm with Existing Arabic Light Stemmers, International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS’19), Skikda, Algeria, December 15-16, 2019. ARLSTem2 is an Arabic light stemmer based on removing the affixes from the words (i.e. prefixes, suffixes and infixes). It is an improvement of the previous Arabic light stemmer (ARLSTem). The new version was compared to the original algorithm and several existing Arabic light stemmers, where the results showed that the new version considerably improves the under-stemming errors that are common to light stemmers. Both ARLSTem and ARLSTem2 can be run online and do not use any dictionary.

class nltk.stem.arlstem2.ARLSTem2[source]

Bases: StemmerI

Return a stemmed Arabic word after removing affixes. This an improved version of the previous algorithm, which reduces under-stemming errors. Typically used in Arabic search engine, information retrieval and NLP.

>>> from nltk.stem import arlstem2
>>> stemmer = ARLSTem2()
>>> word = stemmer.stem('يعمل')
>>> print(word)
عمل
Parameters

token (unicode) – The input Arabic word (unicode) to be stemmed

Returns

A unicode Arabic word

__init__()[source]
adjective(token)[source]

remove the infixes from adjectives

fem2masc(token)[source]

transform the word from the feminine form to the masculine form.

norm(token)[source]

normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning.

plur2sing(token)[source]

transform the word from the plural form to the singular form.

pref(token)[source]

remove prefixes from the words’ beginning.

stem(token)[source]

Strip affixes from the token and return the stem.

Parameters

token (str) – The token that should be stemmed.

stem1(token)[source]

call this function to get the first stem

suff(token)[source]

remove the suffixes from the word’s ending.

verb(token)[source]

stem the verb prefixes and suffixes or both

verb_t1(token)[source]

stem the present tense co-occurred prefixes and suffixes

verb_t2(token)[source]

stem the future tense co-occurred prefixes and suffixes

verb_t3(token)[source]

stem the present tense suffixes

verb_t4(token)[source]

stem the present tense prefixes

verb_t5(token)[source]

stem the future tense prefixes

verb_t6(token)[source]

stem the imperative tense prefixes