nltk.translate.ibm2 module

Lexical translation model that considers word order.

IBM Model 2 improves on Model 1 by accounting for word order. An alignment probability is introduced, a(i | j,l,m), which predicts a source word position, given its aligned target word’s position.

The EM algorithm used in Model 2 is:

E step

In the training data, collect counts, weighted by prior probabilities.

    1. count how many times a source language word is translated into a target language word

    1. count how many times a particular position in the source sentence is aligned to a particular position in the target sentence

M step

Estimate new probabilities based on the counts from the E step



Position in the source sentence Valid values are 0 (for NULL), 1, 2, …, length of source sentence


Position in the target sentence Valid values are 1, 2, …, length of target sentence


Number of words in the source sentence, excluding NULL


Number of words in the target sentence


A word in the source language


A word in the target language


Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm2.IBMModel2[source]

Bases: IBMModel

Lexical translation model that considers word order

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm2 = IBMModel2(bitext, 5)
>>> print(round(ibm2.translation_table['buch']['book'], 3))
>>> print(round(ibm2.translation_table['das']['book'], 3))
>>> print(round(ibm2.translation_table['buch'][None], 3))
>>> print(round(ibm2.translation_table['ja'][None], 3))
>>> print(round(ibm2.alignment_table[1][1][2][2], 3))
>>> print(round(ibm2.alignment_table[1][2][2][2], 3))
>>> print(round(ibm2.alignment_table[2][2][4][5], 3))
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
__init__(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Train on sentence_aligned_corpus and create a lexical translation model and an alignment model.

Translation direction is from AlignedSent.mots to AlignedSent.words.

  • sentence_aligned_corpus (list(AlignedSent)) – Sentence-aligned parallel corpus

  • iterations (int) – Number of iterations to run training algorithm

  • probability_tables (dict[str]: object) – Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table. See IBMModel for the type and purpose of these tables.


Determines the best word alignment for one sentence pair from the corpus that the model was trained on.

The best alignment will be set in sentence_pair when the method returns. In contrast with the internal implementation of IBM models, the word indices in the Alignment are zero- indexed, not one-indexed.


sentence_pair (AlignedSent) – A sentence in the source language and its counterpart sentence in the target language

prob_alignment_point(i, j, src_sentence, trg_sentence)[source]

Probability that position j in trg_sentence is aligned to position i in the src_sentence

prob_all_alignments(src_sentence, trg_sentence)[source]

Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t

Each entry in the return value represents the contribution to the total alignment probability by the target word t.

To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.


Probability of t for all s in src_sentence

Return type

dict(str): float


Probability of target sentence and an alignment given the source sentence


Initialize probability tables to a uniform distribution

Derived classes should implement this accordingly.

class nltk.translate.ibm2.Model2Counts[source]

Bases: Counts

Data object to store counts of various parameters during training. Includes counts for alignment.

update_alignment(count, i, j, l, m)[source]
update_lexical_translation(count, s, t)[source]