nltk.twitter.twitterclient module

NLTK Twitter client

This module offers methods for collecting and processing Tweets. Most of the functionality depends on access to the Twitter APIs, and this is handled via the third party Twython library.

If one of the methods below returns an integer, it is probably a Twitter error code. For example, the response of ‘420’ means that you have reached the limit of the requests you can currently make to the Twitter API. Currently, rate limits for the search API are divided into 15 minute windows.

class nltk.twitter.twitterclient.Query[source]

Bases: Twython

Retrieve data from the Twitter REST API.

__init__(app_key, app_secret, oauth_token, oauth_token_secret)[source]
Parameters:
  • app_key – (optional) Your applications key

  • app_secret – (optional) Your applications secret key

  • oauth_token – (optional) When using OAuth 1, combined with oauth_token_secret to make authenticated calls

  • oauth_token_secret – (optional) When using OAuth 1 combined with oauth_token to make authenticated calls

expand_tweetids(ids_f, verbose=True)[source]

Given a file object containing a list of Tweet IDs, fetch the corresponding full Tweets from the Twitter API.

The API call statuses/lookup will fail to retrieve a Tweet if the user has deleted it.

This call to the Twitter API is rate-limited. See <https://dev.twitter.com/rest/reference/get/statuses/lookup> for details.

Parameters:

ids_f – input file object consisting of Tweet IDs, one to a line

Returns:

iterable of Tweet objects in JSON format

register(handler)[source]

Register a method for handling Tweets.

Parameters:

handler (TweetHandlerI) – method for viewing or writing Tweets to a file.

search_tweets(keywords, limit=100, lang='en', max_id=None, retries_after_twython_exception=0)[source]

Call the REST API 'search/tweets' endpoint with some plausible defaults. See the Twitter search documentation for more information about admissible search parameters.

Parameters:
  • keywords (str) – A list of query terms to search for, written as a comma-separated string

  • limit (int) – Number of Tweets to process

  • lang (str) – language

  • max_id (int) – id of the last tweet fetched

  • retries_after_twython_exception (int) – number of retries when searching Tweets before raising an exception

Return type:

python generator

user_info_from_id(userids)[source]

Convert a list of userIDs into a variety of information about the users.

See <https://dev.twitter.com/rest/reference/get/users/show>.

Parameters:

userids (list) – A list of integer strings corresponding to Twitter userIDs

Return type:

list(json)

user_tweets(screen_name, limit, include_rts='false')[source]

Return a collection of the most recent Tweets posted by the user

Parameters:
  • user (str) – The user’s screen name; the initial ‘@’ symbol should be omitted

  • limit (int) – The number of Tweets to recover; 200 is the maximum allowed

  • include_rts (str) – Whether to include statuses which have been retweeted by the user; possible values are ‘true’ and ‘false’

class nltk.twitter.twitterclient.Streamer[source]

Bases: TwythonStreamer

Retrieve data from the Twitter Streaming API.

The streaming API requires OAuth 1.0 authentication.

__init__(app_key, app_secret, oauth_token, oauth_token_secret)[source]

Streaming class for a friendly streaming user experience Authentication IS required to use the Twitter Streaming API

Parameters:
  • app_key – (required) Your applications key

  • app_secret – (required) Your applications secret key

  • oauth_token – (required) Used with oauth_token_secret to make authenticated calls

  • oauth_token_secret – (required) Used with oauth_token to make authenticated calls

  • timeout – (optional) How long (in secs) the streamer should wait for a response from Twitter Streaming API

  • retry_count – (optional) Number of times the API call should be retired

  • retry_in – (optional) Amount of time (in secs) the previous API call should be tried again

  • client_args – (optional) Accepts some requests Session parameters and some requests Request parameters. See http://docs.python-requests.org/en/latest/api/#sessionapi and requests section below it for details. [ex. headers, proxies, verify(SSL verification)]

  • handlers – (optional) Array of message types for which corresponding handlers will be called

  • chunk_size – (optional) Define the buffer size before data is actually returned from the Streaming API. Default: 1

filter(track='', follow='', lang='en')[source]

Wrapper for ‘statuses / filter’ API call

on_error(status_code, data)[source]
Parameters:
  • status_code – The status code returned by the Twitter API

  • data – The response from Twitter API

on_success(data)[source]
Parameters:

data – response from Twitter API

register(handler)[source]

Register a method for handling Tweets.

Parameters:

handler (TweetHandlerI) – method for viewing

sample()[source]

Wrapper for ‘statuses / sample’ API call

class nltk.twitter.twitterclient.TweetViewer[source]

Bases: TweetHandlerI

Handle data by sending it to the terminal.

handle(data)[source]

Direct data to sys.stdout

Returns:

return False if processing should cease, otherwise return True.

Return type:

bool

Parameters:

data – Tweet object returned by Twitter API

on_finish()[source]

Actions when the tweet limit has been reached

class nltk.twitter.twitterclient.TweetWriter[source]

Bases: TweetHandlerI

Handle data by writing it to a file.

__init__(limit=2000, upper_date_limit=None, lower_date_limit=None, fprefix='tweets', subdir='twitter-files', repeat=False, gzip_compress=False)[source]

The difference between the upper and lower date limits depends on whether Tweets are coming in an ascending date order (i.e. when streaming) or descending date order (i.e. when searching past Tweets).

Parameters:
  • limit (int) – number of data items to process in the current round of processing.

  • upper_date_limit (tuple) – The date at which to stop collecting new data. This should be entered as a tuple which can serve as the argument to datetime.datetime. E.g. upper_date_limit=(2015, 4, 1, 12, 40) for 12:30 pm on April 1 2015.

  • lower_date_limit (tuple) – The date at which to stop collecting new data. See upper_data_limit for formatting.

  • fprefix (str) – The prefix to use in creating file names for Tweet collections.

  • subdir (str) – The name of the directory where Tweet collection files should be stored.

  • repeat (bool) – flag to determine whether multiple files should be written. If True, the length of each file will be set by the value of limit. See also handle().

  • gzip_compress – if True, output files are compressed with gzip.

do_continue()[source]

Returns False if the client should stop fetching Tweets.

handle(data)[source]

Write Twitter data as line-delimited JSON into one or more files.

Returns:

return False if processing should cease, otherwise return True.

Parameters:

data – tweet object returned by Twitter API

on_finish()[source]

Actions when the tweet limit has been reached

timestamped_file()[source]
Returns:

timestamped file name

Return type:

str

class nltk.twitter.twitterclient.Twitter[source]

Bases: object

Wrapper class with restricted functionality and fewer options.

__init__()[source]
tweets(keywords='', follow='', to_screen=True, stream=True, limit=100, date_limit=None, lang='en', repeat=False, gzip_compress=False)[source]

Process some Tweets in a simple manner.

Parameters:
  • keywords (str) – Keywords to use for searching or filtering

  • follow (list) – UserIDs to use for filtering Tweets from the public stream

  • to_screen (bool) – If True, display the tweet texts on the screen, otherwise print to a file

  • stream (bool) – If True, use the live public stream, otherwise search past public Tweets

  • limit (int) – The number of data items to process in the current round of processing.

  • date_limit (tuple) – The date at which to stop collecting new data. This should be entered as a tuple which can serve as the argument to datetime.datetime. E.g. date_limit=(2015, 4, 1, 12, 40) for 12:30 pm on April 1 2015. Note that, in the case of streaming, this is the maximum date, i.e. a date in the future; if not, it is the minimum date, i.e. a date in the past

  • lang (str) – language

  • repeat (bool) – A flag to determine whether multiple files should be written. If True, the length of each file will be set by the value of limit. Use only if to_screen is False. See also handle().

  • gzip_compress – if True, output files are compressed with gzip.