nltk.tokenize.word_tokenize¶
- nltk.tokenize.word_tokenize(text, language='english', preserve_line=False)[source]¶
Return a tokenized copy of text, using NLTK’s recommended word tokenizer (currently an improved
TreebankWordTokenizer
along withPunktSentenceTokenizer
for the specified language).- Parameters
text (str) – text to split into words
language (str) – the model name in the Punkt corpus
preserve_line (bool) – A flag to decide whether to sentence tokenize the text or not.