nltk.translate.ibm3 module

Translation model that considers how a word can be aligned to multiple words in another language.

IBM Model 3 improves on Model 2 by directly modeling the phenomenon where a word in one language may be translated into zero or more words in another. This is expressed by the fertility probability, n(phi | source word).

If a source word translates into more than one word, it is possible to generate sentences that have the same alignment in multiple ways. This is modeled by a distortion step. The distortion probability, d(j|i,l,m), predicts a target word position, given its aligned source word’s position. The distortion probability replaces the alignment probability of Model 2.

The fertility probability is not applicable for NULL. Target words that align to NULL are assumed to be distributed uniformly in the target sentence. The existence of these words is modeled by p1, the probability that a target word produced by a real source word requires another target word that is produced by NULL.

The EM algorithm used in Model 3 is:

E step

In the training data, collect counts, weighted by prior probabilities.

    1. count how many times a source language word is translated into a target language word

    1. count how many times a particular position in the target sentence is aligned to a particular position in the source sentence

    1. count how many times a source word is aligned to phi number of target words

    1. count how many times NULL is aligned to a target word

M step

Estimate new probabilities based on the counts from the E step

Because there are too many possible alignments, only the most probable ones are considered. First, the best alignment is determined using prior probabilities. Then, a hill climbing approach is used to find other good candidates.



Position in the source sentence Valid values are 0 (for NULL), 1, 2, …, length of source sentence


Position in the target sentence Valid values are 1, 2, …, length of target sentence


Number of words in the source sentence, excluding NULL


Number of words in the target sentence


A word in the source language


A word in the target language


Fertility, the number of target words produced by a source word


Probability that a target word produced by a source word is accompanied by another target word that is aligned to NULL


1 - p1


Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm3.IBMModel3[source]

Bases: nltk.translate.ibm_model.IBMModel

Translation model that considers how a word can be aligned to multiple words in another language

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> ibm3 = IBMModel3(bitext, 5)
>>> print(round(ibm3.translation_table['buch']['book'], 3))
>>> print(round(ibm3.translation_table['das']['book'], 3))
>>> print(round(ibm3.translation_table['ja'][None], 3))
>>> print(round(ibm3.distortion_table[1][1][2][2], 3))
>>> print(round(ibm3.distortion_table[1][2][2][2], 3))
>>> print(round(ibm3.distortion_table[2][2][4][5], 3))
>>> print(round(ibm3.fertility_table[2]['summarize'], 3))
>>> print(round(ibm3.fertility_table[1]['book'], 3))
>>> print(ibm3.p1)
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
__init__(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Train on sentence_aligned_corpus and create a lexical translation model, a distortion model, a fertility model, and a model for generating NULL-aligned words.

Translation direction is from AlignedSent.mots to AlignedSent.words.

  • sentence_aligned_corpus (list(AlignedSent)) – Sentence-aligned parallel corpus

  • iterations (int) – Number of iterations to run training algorithm

  • probability_tables (dict[str]: object) – Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table, fertility_table, p1, distortion_table. See IBMModel for the type and purpose of these tables.


Initialize probability tables to a uniform distribution

Derived classes should implement this accordingly.


Probability of target sentence and an alignment given the source sentence

class nltk.translate.ibm3.Model3Counts[source]

Bases: nltk.translate.ibm_model.Counts

Data object to store counts of various parameters during training. Includes counts for distortion.

update_distortion(count, alignment_info, j, l, m)[source]