NLTK c-i setup

This is an overview of how our continuous integration setup works. It includes a quick introduction to the tasks it runs, and the later sections detail the process of setting up these tasks.

Our continuous integration is currently hosted at Shining Panda, free thanks to their FLOSS program. The setup is not specific to their solutions, it could be moved to any Jenkins instance. The URL of our current instance is https://jenkins.shiningpanda.com/nltk/

Base tasks

The base tasks of the c-i instance is as follows:

  • Check out the NLTK project when VCS changes occur
  • Build the project using setup.py
  • Run our test suite
  • Make packages for all platforms
  • Build these web pages

Because the NLTK build environment is highly customized, we only run tests on one configuration - the lowest version supported. NLTK 2 supports python down to version 2.5, so all tests are run using a python2.5 virtualenv. The virtualenv configuration is slightly simplified on ShiningPanda machines by their having compiled all relevant python versions and making virtualenv use these versions in their custom virtualenv builders.

VCS setup/integration

All operations are done against the NLTK repos on Github. The Jenkins instance on ShiningPanda has a limit to the build time it can use each day. Because of this, it only polls the main NLTK repo once a day, using the Poll SCM option in Jenkins. Against the main code repo it uses public access only, and for pushing to the nltk.github.com repo it uses the key of the user nltk-webdeploy.

The base build

To build the project, the following tasks are run:

  1. Create a VERSION file
A VERSION file is created using git describe --tags --match '*.*.*' > nltk/VERSION. This makes the most recent VCS tag available in nltk.__version__ etc.
  1. python setup.py build
This essentially copies the files that are required to run NLTK into build/

The test suite

The tests require that all dependencies be installed. These have all been installed beforehand, and to make them run a series of extra environment variables are initialized. These dependencies will not be detailed until the last section.

The test suite itself consists of doctests. These are found in each module as docstrings, and in all the .doctest files under the test folder in the nltk repo. We run these tests using nose, find code coverage using coverage.py and check for PEP-8 etc. standard violations using pylint.

All these tools are easily installable through pip your favourite OS’ software packaging system. For testing, only nose is really needed. This is also the only software that does not work properly out of the box. To use the options +ELLIPSIS and +NORMALIZE_WHITESPACE in our doctests, we have installed nose from source with a patch that allows this applied.

The results of these programs are parsed and published by the jenkins instance, giving us pretty graphs :)

The builds

The packages are built using make dist. The outputted builds are all placed in our jenkins workspace and should be safe to distribute. Builds specifically for mac are not available. File names are made based on the __version__ string, so they change every build.

Web page builder

The web page is built using Sphinx. It fetches all code documentation directly from the code’s docstrings. After building the page using make web it pushes it to the nltk.github.com repo on github. To push it, it needs access to the repo – because this cannot be done using a deploy key, it has the ssh key of the nltk-webdeploy user.