nltk.translate.api module

class nltk.translate.api.AlignedSent[source]

Bases: object

Return an aligned sentence object, which encapsulates two sentences along with an Alignment between them.

Typically used in machine translation to represent a sentence and its translation.

>>> from nltk.translate import AlignedSent, Alignment
>>> algnsent = AlignedSent(['klein', 'ist', 'das', 'Haus'],
...     ['the', 'house', 'is', 'small'], Alignment.fromstring('0-3 1-2 2-0 3-1'))
>>> algnsent.words
['klein', 'ist', 'das', 'Haus']
>>> algnsent.mots
['the', 'house', 'is', 'small']
>>> algnsent.alignment
Alignment([(0, 3), (1, 2), (2, 0), (3, 1)])
>>> from nltk.corpus import comtrans
>>> print(comtrans.aligned_sents()[54])
<AlignedSent: 'Weshalb also sollten...' -> 'So why should EU arm...'>
>>> print(comtrans.aligned_sents()[54].alignment)
0-0 0-1 1-0 2-2 3-4 3-5 4-7 5-8 6-3 7-9 8-9 9-10 9-11 10-12 11-6 12-6 13-13
Parameters
  • words (list(str)) – Words in the target language sentence

  • mots (list(str)) – Words in the source language sentence

  • alignment (Alignment) – Word-level alignments between words and mots. Each alignment is represented as a 2-tuple (words_index, mots_index).

__init__(words, mots, alignment=None)[source]
property words
property mots
property alignment
invert()[source]

Return the aligned sentence pair, reversing the directionality

Return type

AlignedSent

class nltk.translate.api.Alignment[source]

Bases: frozenset

A storage class for representing alignment between two sequences, s1, s2. In general, an alignment is a set of tuples of the form (i, j, …) representing an alignment between the i-th element of s1 and the j-th element of s2. Tuples are extensible (they might contain additional data, such as a boolean to indicate sure vs possible alignments).

>>> from nltk.translate import Alignment
>>> a = Alignment([(0, 0), (0, 1), (1, 2), (2, 2)])
>>> a.invert()
Alignment([(0, 0), (1, 0), (2, 1), (2, 2)])
>>> print(a.invert())
0-0 1-0 2-1 2-2
>>> a[0]
[(0, 1), (0, 0)]
>>> a.invert()[2]
[(2, 1), (2, 2)]
>>> b = Alignment([(0, 0), (0, 1)])
>>> b.issubset(a)
True
>>> c = Alignment.fromstring('0-0 0-1')
>>> b == c
True
static __new__(cls, pairs)[source]
classmethod fromstring(s)[source]

Read a giza-formatted string and return an Alignment object.

>>> Alignment.fromstring('0-0 2-1 9-2 21-3 10-4 7-5')
Alignment([(0, 0), (2, 1), (7, 5), (9, 2), (10, 4), (21, 3)])
Parameters

s (str) – the positional alignments in giza format

Return type

Alignment

Returns

An Alignment object corresponding to the string representation s.

invert()[source]

Return an Alignment object, being the inverted mapping.

range(positions=None)[source]

Work out the range of the mapping from the given positions. If no positions are specified, compute the range of the entire mapping.

class nltk.translate.api.PhraseTableEntry

Bases: tuple

PhraseTableEntry(trg_phrase, log_prob)

static __new__(_cls, trg_phrase, log_prob)

Create new instance of PhraseTableEntry(trg_phrase, log_prob)

log_prob

Alias for field number 1

trg_phrase

Alias for field number 0

class nltk.translate.api.PhraseTable[source]

Bases: object

In-memory store of translations for a given phrase, and the log probability of the those translations

__init__()[source]
translations_for(src_phrase)[source]

Get the translations for a source language phrase

Parameters

src_phrase (tuple(str)) – Source language phrase of interest

Returns

A list of target language phrases that are translations of src_phrase, ordered in decreasing order of likelihood. Each list element is a tuple of the target phrase and its log probability.

Return type

list(PhraseTableEntry)

add(src_phrase, trg_phrase, log_prob)[source]
Parameters

log_prob (float) – Log probability that given src_phrase, trg_phrase is its translation