Software License

Feedback

This site is maintained by Steven Bird.

FAQ

Answers to Frequently Asked Questions about NLTK

Do you have a question that is not answered here? Please post it to the nltk-users or nltk-dev mailing list.

  1. What license does NLTK use?
  2. NLTK is open source software. The source code is distributed under the terms of the Apache License Version 2.0.  The documentation is distributed under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license. The corpora are distributed under various licenses, as documented in their respective README files.
  3. What are the plans for further development of NLTK?
  4. NLTK is undergoing continual development as new modules are added and existing ones are improved.  Now that NLTK 2.0 (beta) has been released we will be following a conservative upgrade process, and ensure that all code examples published in the NLTK book continue to work as advertized.  We will follow the recommended migration path for Python 3.0.
  5. I think I found a bug; where do I report it?
  6. Please check if an issue report has already been filed by searching the Issue Tracker.  If not, please report the problem, giving as much detail as possible. Please include a code sample that permits us to replicate the problem.
  7. What data sources does NLTK use and how can more be added?

    Dozens of corpora are available for use with NLTK (see the list of available datasets, and the Corpus HOWTO).  NLTK can be interfaced to other corpora; for instructions see section 2.1 of the NLTK book, and consult the code in the corpus module.  Requests for advice in developing corpus readers for new formats should be posted to the nltk-dev mailing list.  Completed corpus readers should be submitted via the Issue Tracker.  Please specify the location of the corpus and whether it can be redistributed with NLTK.

  8. I'm planning some long-term research using NLTK; how long is the toolkit going to be supported?
  9. We plan to continue supporting the toolkit for as long as possible. We published the NLTK book in 2009 and plan to support the toolkit for several years while the book is in active use, and while the developers are employed to teach natural language processing.  Bug reports will be attended to as quickly as possible.
  10. Why is Python giving me a syntax error when I use NLTK?
  11. NLTK requires Python version 2.5, 2.6, or 2.7. If you use an earlier version of Python you will see many syntax errors.
  12. What is the difference between NLTK and NLTK-Lite?
  13. Since mid-2005, the NLTK developers have been creating a lightweight version of NLTK, called NLTK-Lite. NLTK-Lite is simpler and faster than NLTK. Once it is complete, NLTK-Lite will provide the same functionality as NLTK (in fact, all of NLTK functionality is now in NLTK-Lite 0.9, and the package is called nltk). Unlike the old NLTK, NLTK-Lite does not impose such a heavy burden on the programmer. Wherever possible, standard Python objects are used instead of custom NLP versions, so that students learning to program for the first time will be learning to program in Python with some useful libraries, rather than learning to program in NLTK.  Once it reached version 1.0 (in mid 2009), NLTK-Lite took over the original NLTK name, and became NLTK 2.0.
  14. How can I install NLTK from the source code repository?
  15. Most users should install NLTK from a distribution. Please see the installation instructions.  However, if you need an up-to-the-minute version, then you will have to install NLTK from the source repository. Once you've downloaded this, you'll need to run the top level setup.py program to install this version of NLTK on your machine.
  16. How can I find out where NLTK is installed on my system?
  17. Do the following in a Python interpreter session. In this case we see that NLTK is installed in /Library/Python/2.5/site-packages/nltk
    >>> import nltk
    >>> nltk.__path__
    ['/Library/Python/2.5/site-packages/nltk']
  18. What papers have been published about NLTK?
  19. NLTK has been presented at several academic conferences, and reviewed in online forums. Please see the Documentation page for more information.
  20. How is NLTK development supported?
  21. NLTK is an open source project that depends mainly on the efforts of volunteers. Occasionally we have funds for a summer intern or TA to work on specified projects. Students and teachers also donate code.  In 2008 , we received support from Google Summer of Code. We strongly encourage volunteers to get involved: find out more about contributing to NLTK. If you find the toolkit useful, please make a donation to support further development.
  22. How did NLTK start?
  23. The NLTK project began when Steven Bird was teaching CIS-530 at the University of Pennsylvania in 2001, and hired his star student, Edward Loper, from the previous offering of the course to be the teaching assistant (TA). They agreed a plan for developing software infrastructure for NLP teaching that could be easily maintained over time. Edward wrote up the plan, and both began work on it right away. Here is the Version 0.2 release announcement that appeared in September 2001.
  24. If I just "use" NLTK using import statements in Python, am I obliged to publish my source code as well?
  25. No, there is no such obligation.  You can use and modify NLTK without making any code available (see question 1).
  26. What is Natural Language Processing?
  27. Please see our book, or http://en.wikipedia.org/wiki/Natural_language_processing