nltk.cluster.util module¶
- class nltk.cluster.util.VectorSpaceClusterer[source]¶
Bases:
nltk.cluster.api.ClusterI
Abstract clusterer which takes tokens and maps them into a vector space. Optionally performs singular value decomposition to reduce the dimensionality.
- __init__(normalise=False, svd_dimensions=None)[source]¶
- Parameters
normalise (boolean) – should vectors be normalised to length 1
svd_dimensions (int) – number of dimensions to use in reducing vector dimensionsionality with SVD
- cluster(vectors, assign_clusters=False, trace=False)[source]¶
Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.
- abstract cluster_vectorspace(vectors, trace)[source]¶
Finds the clusters using the given set of vectors.
- classify(vector)[source]¶
Classifies the token into a cluster, setting the token’s CLUSTER parameter to that cluster identifier.
- abstract classify_vectorspace(vector)[source]¶
Returns the index of the appropriate cluster for the vector.
- likelihood(vector, label)[source]¶
Returns the likelihood (a float) of the token having the corresponding cluster.
- nltk.cluster.util.euclidean_distance(u, v)[source]¶
Returns the euclidean distance between vectors u and v. This is equivalent to the length of the vector (u - v).
- nltk.cluster.util.cosine_distance(u, v)[source]¶
Returns 1 minus the cosine of the angle between vectors v and u. This is equal to
1 - (u.v / |u||v|)
.
- class nltk.cluster.util.Dendrogram[source]¶
Bases:
object
Represents a dendrogram, a tree with a specified branching order. This must be initialised with the leaf items, then iteratively call merge for each branch. This class constructs a tree representing the order of calls to the merge function.
- __init__(items=[])[source]¶
- Parameters
items (sequence of (any)) – the items at the leaves of the dendrogram
- merge(*indices)[source]¶
Merges nodes at given indices in the dendrogram. The nodes will be combined which then replaces the first node specified. All other nodes involved in the merge will be removed.
- Parameters
indices (seq of int) – indices of the items to merge (at least two)