nltk.cluster.util module¶
- class nltk.cluster.util.Dendrogram[source]¶
Bases:
object
Represents a dendrogram, a tree with a specified branching order. This must be initialised with the leaf items, then iteratively call merge for each branch. This class constructs a tree representing the order of calls to the merge function.
- __init__(items=[])[source]¶
- Parameters:
items (sequence of (any)) – the items at the leaves of the dendrogram
- groups(n)[source]¶
Finds the n-groups of items (leaves) reachable from a cut at depth n. :param n: number of groups :type n: int
- class nltk.cluster.util.VectorSpaceClusterer[source]¶
Bases:
ClusterI
Abstract clusterer which takes tokens and maps them into a vector space. Optionally performs singular value decomposition to reduce the dimensionality.
- __init__(normalise=False, svd_dimensions=None)[source]¶
- Parameters:
normalise (boolean) – should vectors be normalised to length 1
svd_dimensions (int) – number of dimensions to use in reducing vector dimensionsionality with SVD
- classify(vector)[source]¶
Classifies the token into a cluster, setting the token’s CLUSTER parameter to that cluster identifier.
- abstract classify_vectorspace(vector)[source]¶
Returns the index of the appropriate cluster for the vector.
- cluster(vectors, assign_clusters=False, trace=False)[source]¶
Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.
- abstract cluster_vectorspace(vectors, trace)[source]¶
Finds the clusters using the given set of vectors.
- likelihood(vector, label)[source]¶
Returns the likelihood (a float) of the token having the corresponding cluster.