nltk.translate.ibm_model module

Common methods and classes for all IBM models. See IBMModel1, IBMModel2, IBMModel3, IBMModel4, and IBMModel5 for specific implementations.

The IBM models are a series of generative models that learn lexical translation probabilities, p(target language word|source language word), given a sentence-aligned parallel corpus.

The models increase in sophistication from model 1 to 5. Typically, the output of lower models is used to seed the higher models. All models use the Expectation-Maximization (EM) algorithm to learn various probability tables.

Words in a sentence are one-indexed. The first word of a sentence has position 1, not 0. Index 0 is reserved in the source sentence for the NULL token. The concept of position does not apply to NULL, but it is indexed at 0 by convention.

Each target word is aligned to exactly one source word or the NULL token.

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm_model.AlignmentInfo[source]

Bases: object

Helper data object for training IBM Models 3 and up

Read-only. For a source sentence and its counterpart in the target language, this class holds information about the sentence pair’s alignment, cepts, and fertility.

Warning: Alignments are one-indexed here, in contrast to nltk.translate.Alignment and AlignedSent, which are zero-indexed This class is not meant to be used outside of IBM models.

__init__(alignment, src_sentence, trg_sentence, cepts)[source]

tuple(int): Alignment function. alignment[j] is the position in the source sentence that is aligned to the position j in the target sentence.


The ceiling of the average positions of the words in the tablet of cept i, or 0 if i is None


list(list(int)): The positions of the target words, in ascending order, aligned to a source word position. For example, cepts[4] = (2, 3, 7) means that words in positions 2, 3 and 7 of the target sentence are aligned to the word in position 4 of the source sentence


Fertility of word in position i of the source sentence


Whether the word in position j of the target sentence is a head word


The previous cept of j, or None if j belongs to the first cept


The position of the previous word that is in the same tablet as j, or None if j is the first word of the tablet


float: Optional. Probability of alignment, as defined by the IBM model that assesses this alignment


tuple(str): Source sentence referred to by this object. Should include NULL token (None) in index 0.


tuple(str): Target sentence referred to by this object. Should have a dummy element in index 0 so that the first word starts from index 1.


Zero-indexed alignment, suitable for use in external nltk.translate modules like nltk.translate.Alignment

Return type


class nltk.translate.ibm_model.Counts[source]

Bases: object

Data object to store counts of various parameters during training

update_fertility(count, alignment_info)[source]
update_lexical_translation(count, alignment_info, j)[source]
update_null_generation(count, alignment_info)[source]
class nltk.translate.ibm_model.IBMModel[source]

Bases: object

Abstract base class for all IBM models

MIN_PROB = 1e-12
best_model2_alignment(sentence_pair, j_pegged=None, i_pegged=0)[source]

Finds the best alignment according to IBM Model 2

Used as a starting point for hill climbing in Models 3 and above, because it is easier to compute than the best alignments in higher models

  • sentence_pair (AlignedSent) – Source and target language sentence pair to be word-aligned

  • j_pegged (int) – If specified, the alignment point of j_pegged will be fixed to i_pegged

  • i_pegged (int) – Alignment point to j_pegged

hillclimb(alignment_info, j_pegged=None)[source]

Starting from the alignment in alignment_info, look at neighboring alignments iteratively for the best one

There is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.


j_pegged (int) – If specified, the search will be constrained to alignments where j_pegged remains unchanged


The best alignment found from hill climbing

Return type


neighboring(alignment_info, j_pegged=None)[source]

Determine the neighbors of alignment_info, obtained by moving or swapping one alignment point


j_pegged (int) – If specified, neighbors that have a different alignment point from j_pegged will not be considered


A set neighboring alignments represented by their AlignmentInfo

Return type



Probability of target sentence and an alignment given the source sentence

All required information is assumed to be in alignment_info and self.

Derived classes should override this method


Sample the most probable alignments from the entire alignment space

First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a higher IBM Model. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point.

Hill climbing may be stuck in a local maxima, hence the pegging and trying out of different alignments.


sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from


A set of best alignments represented by their AlignmentInfo and the best alignment of the set for convenience

Return type

set(AlignmentInfo), AlignmentInfo


Initialize probability tables to a uniform distribution

Derived classes should implement this accordingly.


sentence_aligned_corpus (list(AlignedSent)) – Parallel corpus under consideration


Number of words in the longest target language sentence of sentence_aligned_corpus