NLTK 3.0 is the first version of NLTK to support Python 3. This version of NLTK is likely to contain bugs. The alpha releases are for people who would like to help us identify and fix them (by submitting bug reports and patches on GitHub). All contributions will be acknowledged.
This work is being led by Steven Bird and Mikhail Korobov, supported by the Python Software Foundation.
The NLTK 3.0.X series will support Python 2.6 and 2.7 as well as Python 3.
The most observable change to the API is that Unicode strings are now used throughout, and many NLTK methods that used to return large lists now return iterators. The simplified tagset has been replaced with the Universal Tagset. Some properties have become methods, e.g.: Tree: node -> label(), Relextract: show_raw_rtuple() -> rtuple(), show_clause() -> clause() WordNet: definition -> definition(), lemmas -> lemmas() etc The following are dropped: babelize, sourcedstrings, clean_html,
Most of the effort has gone into implementing the syntactic changes required for Python 3 and improving our test suite. The latter will make it easier to maintain NLTK into the future.