nltk.metrics.edit_distance_align

nltk.metrics.edit_distance_align(s1, s2, substitution_cost=1)[source]

Calculate the minimum Levenshtein edit-distance based alignment mapping between two strings. The alignment finds the mapping from string s1 to s2 that minimizes the edit distance cost. For example, mapping “rain” to “shine” would involve 2 substitutions, 2 matches and an insertion resulting in the following mapping: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (4, 5)] NB: (0, 0) is the start state without any letters associated See more: https://web.stanford.edu/class/cs124/lec/med.pdf

In case of multiple valid minimum-distance alignments, the backtrace has the following operation precedence:

  1. Substitute s1 and s2 characters

  2. Skip s1 character

  3. Skip s2 character

The backtrace is carried out in reverse string order.

This function does not support transposition.

Parameters

s2 (str) – The strings to be aligned

Return type

List[Tuple(int, int)]