nltk.translate.api module

class nltk.translate.api.AlignedSent[source]

Bases: object

Return an aligned sentence object, which encapsulates two sentences along with an Alignment between them.

Typically used in machine translation to represent a sentence and its translation.

>>> from nltk.translate import AlignedSent, Alignment
>>> algnsent = AlignedSent(['klein', 'ist', 'das', 'Haus'],
...     ['the', 'house', 'is', 'small'], Alignment.fromstring('0-3 1-2 2-0 3-1'))
>>> algnsent.words
['klein', 'ist', 'das', 'Haus']
>>> algnsent.mots
['the', 'house', 'is', 'small']
>>> algnsent.alignment
Alignment([(0, 3), (1, 2), (2, 0), (3, 1)])
>>> from nltk.corpus import comtrans
>>> print(comtrans.aligned_sents()[54])
<AlignedSent: 'Weshalb also sollten...' -> 'So why should EU arm...'>
>>> print(comtrans.aligned_sents()[54].alignment)
0-0 0-1 1-0 2-2 3-4 3-5 4-7 5-8 6-3 7-9 8-9 9-10 9-11 10-12 11-6 12-6 13-13
  • words (list(str)) – Words in the target language sentence

  • mots (list(str)) – Words in the source language sentence

  • alignment (Alignment) – Word-level alignments between words and mots. Each alignment is represented as a 2-tuple (words_index, mots_index).

__init__(words, mots, alignment=None)[source]
property alignment

Return the aligned sentence pair, reversing the directionality

Return type


property mots
property words
class nltk.translate.api.Alignment[source]

Bases: frozenset

A storage class for representing alignment between two sequences, s1, s2. In general, an alignment is a set of tuples of the form (i, j, …) representing an alignment between the i-th element of s1 and the j-th element of s2. Tuples are extensible (they might contain additional data, such as a boolean to indicate sure vs possible alignments).

>>> from nltk.translate import Alignment
>>> a = Alignment([(0, 0), (0, 1), (1, 2), (2, 2)])
>>> a.invert()
Alignment([(0, 0), (1, 0), (2, 1), (2, 2)])
>>> print(a.invert())
0-0 1-0 2-1 2-2
>>> a[0]
[(0, 1), (0, 0)]
>>> a.invert()[2]
[(2, 1), (2, 2)]
>>> b = Alignment([(0, 0), (0, 1)])
>>> b.issubset(a)
>>> c = Alignment.fromstring('0-0 0-1')
>>> b == c
static __new__(cls, pairs)[source]
classmethod fromstring(s)[source]

Read a giza-formatted string and return an Alignment object.

>>> Alignment.fromstring('0-0 2-1 9-2 21-3 10-4 7-5')
Alignment([(0, 0), (2, 1), (7, 5), (9, 2), (10, 4), (21, 3)])

s (str) – the positional alignments in giza format

Return type



An Alignment object corresponding to the string representation s.


Return an Alignment object, being the inverted mapping.


Work out the range of the mapping from the given positions. If no positions are specified, compute the range of the entire mapping.

class nltk.translate.api.PhraseTable[source]

Bases: object

In-memory store of translations for a given phrase, and the log probability of the those translations

add(src_phrase, trg_phrase, log_prob)[source]

log_prob (float) – Log probability that given src_phrase, trg_phrase is its translation


Get the translations for a source language phrase


src_phrase (tuple(str)) – Source language phrase of interest


A list of target language phrases that are translations of src_phrase, ordered in decreasing order of likelihood. Each list element is a tuple of the target phrase and its log probability.

Return type


class nltk.translate.api.PhraseTableEntry

Bases: tuple

PhraseTableEntry(trg_phrase, log_prob)

static __new__(_cls, trg_phrase, log_prob)

Create new instance of PhraseTableEntry(trg_phrase, log_prob)


Alias for field number 1


Alias for field number 0