nltk.twitter.twitterclient module¶

NLTK Twitter client

This module offers methods for collecting and processing Tweets. Most of the functionality depends on access to the Twitter APIs, and this is handled via the third party Twython library.

If one of the methods below returns an integer, it is probably a Twitter error code. For example, the response of ‘420’ means that you have reached the limit of the requests you can currently make to the Twitter API. Currently, rate limits for the search API are divided into 15 minute windows.

class nltk.twitter.twitterclient.Query[source]¶

Bases: Twython

Retrieve data from the Twitter REST API.

__init__(app_key, app_secret, oauth_token, oauth_token_secret)[source]¶

Parameters:

app_key – (optional) Your applications key
app_secret – (optional) Your applications secret key
oauth_token – (optional) When using OAuth 1, combined with oauth_token_secret to make authenticated calls
oauth_token_secret – (optional) When using OAuth 1 combined with oauth_token to make authenticated calls

expand_tweetids(ids_f, verbose=True)[source]¶

Given a file object containing a list of Tweet IDs, fetch the corresponding full Tweets from the Twitter API.

The API call statuses/lookup will fail to retrieve a Tweet if the user has deleted it.

This call to the Twitter API is rate-limited. See <https://dev.twitter.com/rest/reference/get/statuses/lookup> for details.

Parameters:: ids_f – input file object consisting of Tweet IDs, one to a line
Returns:: iterable of Tweet objects in JSON format

register(handler)[source]¶

Parameters:: handler (TweetHandlerI) – method for viewing or writing Tweets to a file.

search_tweets(keywords, limit=100, lang='en', max_id=None, retries_after_twython_exception=0)[source]¶

Call the REST API 'search/tweets' endpoint with some plausible defaults. See the Twitter search documentation for more information about admissible search parameters.

Parameters:

keywords (str) – A list of query terms to search for, written as a comma-separated string
limit (int) – Number of Tweets to process
lang (str) – language
max_id (int) – id of the last tweet fetched
retries_after_twython_exception (int) – number of retries when searching Tweets before raising an exception

Return type:

python generator

user_info_from_id(userids)[source]¶

Convert a list of userIDs into a variety of information about the users.

See <https://dev.twitter.com/rest/reference/get/users/show>.

Parameters:: userids (list) – A list of integer strings corresponding to Twitter userIDs
Return type:: list(json)

user_tweets(screen_name, limit, include_rts='false')[source]¶

Return a collection of the most recent Tweets posted by the user

Parameters:

user (str) – The user’s screen name; the initial ‘@’ symbol should be omitted
limit (int) – The number of Tweets to recover; 200 is the maximum allowed
include_rts (str) – Whether to include statuses which have been retweeted by the user; possible values are ‘true’ and ‘false’

class nltk.twitter.twitterclient.Streamer[source]¶

Bases: TwythonStreamer

Retrieve data from the Twitter Streaming API.

The streaming API requires OAuth 1.0 authentication.

__init__(app_key, app_secret, oauth_token, oauth_token_secret)[source]¶

Streaming class for a friendly streaming user experience Authentication IS required to use the Twitter Streaming API

Parameters:

app_key – (required) Your applications key
app_secret – (required) Your applications secret key
oauth_token – (required) Used with oauth_token_secret to make authenticated calls
oauth_token_secret – (required) Used with oauth_token to make authenticated calls
timeout – (optional) How long (in secs) the streamer should wait for a response from Twitter Streaming API
retry_count – (optional) Number of times the API call should be retired
retry_in – (optional) Amount of time (in secs) the previous API call should be tried again
client_args – (optional) Accepts some requests Session parameters and some requests Request parameters. See http://docs.python-requests.org/en/latest/api/#sessionapi and requests section below it for details. [ex. headers, proxies, verify(SSL verification)]
handlers – (optional) Array of message types for which corresponding handlers will be called
chunk_size – (optional) Define the buffer size before data is actually returned from the Streaming API. Default: 1

filter(track='', follow='', lang='en')[source]¶: Wrapper for ‘statuses / filter’ API call

on_error(status_code, data)[source]¶

Parameters:

status_code – The status code returned by the Twitter API
data – The response from Twitter API

on_success(data)[source]¶

Parameters:: data – response from Twitter API

register(handler)[source]¶

Parameters:: handler (TweetHandlerI) – method for viewing

sample()[source]¶: Wrapper for ‘statuses / sample’ API call

class nltk.twitter.twitterclient.TweetViewer[source]¶

Bases: TweetHandlerI

Handle data by sending it to the terminal.

handle(data)[source]¶

Direct data to sys.stdout

Returns:: return False if processing should cease, otherwise return True.
Return type:: bool
Parameters:: data – Tweet object returned by Twitter API

on_finish()[source]¶: Actions when the tweet limit has been reached

class nltk.twitter.twitterclient.TweetWriter[source]¶

Bases: TweetHandlerI

Handle data by writing it to a file.

__init__(limit=2000, upper_date_limit=None, lower_date_limit=None, fprefix='tweets', subdir='twitter-files', repeat=False, gzip_compress=False)[source]¶

The difference between the upper and lower date limits depends on whether Tweets are coming in an ascending date order (i.e. when streaming) or descending date order (i.e. when searching past Tweets).

Parameters:

limit (int) – number of data items to process in the current round of processing.
upper_date_limit (tuple) – The date at which to stop collecting new data. This should be entered as a tuple which can serve as the argument to datetime.datetime. E.g. upper_date_limit=(2015, 4, 1, 12, 40) for 12:30 pm on April 1 2015.
lower_date_limit (tuple) – The date at which to stop collecting new data. See upper_data_limit for formatting.
fprefix (str) – The prefix to use in creating file names for Tweet collections.
subdir (str) – The name of the directory where Tweet collection files should be stored.
repeat (bool) – flag to determine whether multiple files should be written. If True, the length of each file will be set by the value of limit. See also handle().
gzip_compress – if True, output files are compressed with gzip.

do_continue()[source]¶: Returns False if the client should stop fetching Tweets.

handle(data)[source]¶

Write Twitter data as line-delimited JSON into one or more files.

Returns:: return False if processing should cease, otherwise return True.
Parameters:: data – tweet object returned by Twitter API

on_finish()[source]¶: Actions when the tweet limit has been reached

timestamped_file()[source]¶

Returns:: timestamped file name
Return type:: str

class nltk.twitter.twitterclient.Twitter[source]¶

Bases: object

Wrapper class with restricted functionality and fewer options.

__init__()[source]¶

tweets(keywords='', follow='', to_screen=True, stream=True, limit=100, date_limit=None, lang='en', repeat=False, gzip_compress=False)[source]¶

Process some Tweets in a simple manner.

Parameters:

keywords (str) – Keywords to use for searching or filtering
follow (list) – UserIDs to use for filtering Tweets from the public stream
to_screen (bool) – If True, display the tweet texts on the screen, otherwise print to a file
stream (bool) – If True, use the live public stream, otherwise search past public Tweets
limit (int) – The number of data items to process in the current round of processing.
date_limit (tuple) – The date at which to stop collecting new data. This should be entered as a tuple which can serve as the argument to datetime.datetime. E.g. date_limit=(2015, 4, 1, 12, 40) for 12:30 pm on April 1 2015. Note that, in the case of streaming, this is the maximum date, i.e. a date in the future; if not, it is the minimum date, i.e. a date in the past
lang (str) – language
repeat (bool) – A flag to determine whether multiple files should be written. If True, the length of each file will be set by the value of limit. Use only if to_screen is False. See also handle().
gzip_compress – if True, output files are compressed with gzip.