nltk.stem.arlstem module¶

ARLSTem Arabic Stemmer The details about the implementation of this algorithm are described in: K. Abainia, S. Ouamour and H. Sayoud, A Novel Robust Arabic Light Stemmer , Journal of Experimental & Theoretical Artificial Intelligence (JETAI’17), Vol. 29, No. 3, 2017, pp. 557-573. The ARLSTem is a light Arabic stemmer that is based on removing the affixes from the word (i.e. prefixes, suffixes and infixes). It was evaluated and compared to several other stemmers using Paice’s parameters (under-stemming index, over-stemming index and stemming weight), and the results showed that ARLSTem is promising and producing high performances. This stemmer is not based on any dictionary and can be used on-line effectively.

class nltk.stem.arlstem.ARLSTem[source]¶

Bases: StemmerI

ARLSTem stemmer : a light Arabic Stemming algorithm without any dictionary. Department of Telecommunication & Information Processing. USTHB University, Algiers, Algeria. ARLSTem.stem(token) returns the Arabic stem for the input token. The ARLSTem Stemmer requires that all tokens are encoded using Unicode encoding.

__init__()[source]¶

fem2masc(token)[source]¶: transform the word from the feminine form to the masculine form.

norm(token)[source]¶: normalize the word by removing diacritics, replacing hamzated Alif with Alif replacing AlifMaqsura with Yaa and removing Waaw at the beginning.

plur2sing(token)[source]¶: transform the word from the plural form to the singular form.

pref(token)[source]¶: remove prefixes from the words’ beginning.

stem(token)[source]¶: call this function to get the word’s stem based on ARLSTem .

suff(token)[source]¶: remove suffixes from the word’s end.

verb(token)[source]¶: stem the verb prefixes and suffixes or both

verb_t1(token)[source]¶: stem the present prefixes and suffixes

verb_t2(token)[source]¶: stem the future prefixes and suffixes

verb_t3(token)[source]¶: stem the present suffixes

verb_t4(token)[source]¶: stem the present prefixes

verb_t5(token)[source]¶: stem the future prefixes

verb_t6(token)[source]¶: stem the order prefixes

NLTK

Documentation

nltk.stem.arlstem module¶