Hirschberg's Algorithm

Hirschberg's a dynamic programming algorithm algorithm developed by Dan Hirschberg has the capability of finding the best sequence alognment of two sequences. The algorithm utilizes Levenshtein edit distance which is distance between two strings is the number of single character deletions, insertions, or substitutions required to transform one string into the other. It can also be defined as an modified version of Needleman-Wunsh algorithm, it is more space efficient and uses the divide and conquer strategy.

Time and space calculations:
The major application of this algorithm is to search optimal sequence alignment between two nucleotide or amino acid sequences. There are also suboptimal heuristics approaches of sequence alignment e.g BLAST and FASTA. Finding optimal sequence alignment requires computational resources in terms of time and space. For example, Needleman-Wunsh algorithm will take O(nm) time and O(nm) space for finding an optimal sequence alignment of two strings of x and y, where length(x) = n and length(y) = m. On the other hand Hirschberg's algorithm still takes O(nm) time, but needs only O(min{n,m}) space. Hirschberg's algorithm is also a space-efficient way to calculate the longest common subsequence between two sets of data such as with the common diff tool.

How we compute Levenshtein edit distance:
For calculating the least score (Edit(x,y)) required for change x into y by using insertions, substitutions and deletions, we give a score to each of the event. For example, we give a score for inserting a character "A" into substring, a score for substituting character "B" with "A" in the string and similarly a score for deleting a character "A" from the string. The standard score can be Ins(a) = Del(a) = 1 for each character a, Sub(a,a) = 0, and Sub(a,b) = 1 if a is not equal to b.

Differences with Needleman-Wunsch algorithm:
Levenshtein edit distances can be computed using linear space. What we call the "forward subprogram" computes the values of Edit(Prefix[x,i],Prefix[y,j]) for all i and j, just as the Needleman-Wunsch and returns the array {Edit(x,Prefix[y,j])}0 = j = m. The "backward subprogram" is similar, except that the dynamic program is done in the opposite direction, i.e., starting from the right ends of the strings. It returns the array {Edit(x,Suffix[y,j])}0 = j = m, where Suffix[y,j] is the suffix of y of length j.

The Algorithm:
Compute V(A, B) while saving the values of the -th row. Denote D(A,B) as the Forward Matrix F.
Compute V(Ar, Br) while saving the -th row. Denote D(Ar, Br) as the Backward Matrix B.
Find the column k* so that the crossing point ( , k*) satisfies:

Now that k* is found, recursively partition the problem to two sub problems:
Find the path from (0,0) to ( , k*).
Find the path from (n, m) to ( , m - k*)



High-level description of the forwards subprogram

Forwards[x,y] is

1. n = length(x); m = length(y)
2. Edit[Prefix[0,0]] = 0;
3. For all i from 1 to n:
Edit[Prefix[x,i],Prefix[y,0]] = Edit[Prefix[x,i-1],Prefix[y,0]]
+ Del(x_i)
4. For all j from 1 to m:
A. Edit[Prefix[x,0],Prefix[y,j]] = Edit[Prefix[x,0],Prefix[y,j-1]]
+ Ins(y_j)
B. For all i from 1 to n, execute the following steps:
i. Edit[Prefix[x,i],Prefix[y,j]] =
min{Edit[Prefix[x,i-1],Prefix[y,j]] + Del(x_i),
Edit[Prefix[x,i-1],Prefix[y,j-1]] + Sub(x_i,y_j),
Edit[Prefix[x,i],Prefix[y,j-1]] + Ins(y_j)}
ii. Erase Edit[Prefix[x,i-1],Prefix[y,j-1]]
C. Erase Edit[Prefix[x,i-1],Prefix[y,j]]
5. RETURN Edit[x] %% an array of length m+1

High-level description of the backwards subprogram

Backwards[x,y] is

1. n = length(x); m = length(y)
2. For all i from 1 to n:
Edit[Suffix[x,i],Suffix[y,0]] = 0
3. For all j from 1 to m:
A. Edit[Suffix[x,0],Suffix[y,j]] = Edit[Suffix[x,n],Suffix[y,j-1]] +
B. For all i from 1 to n:
i. Edit[Suffix[x,i],Suffix[y,j]] =
min{Edit[Suffix[x,i-1],Suffix[y,j]] + Del(x_{n-i-1}),
Edit[Suffix[x,i-1],Suffix[y,j-1]] +
Edit[Suffix[x,i],Suffix[y,j-1]] + Ins(y_{m-j+1})}
ii. Erase Edit[Suffix[x,i-1],Suffix[y,j-1]]
C. Erase Edit[Suffix[x,i-1],Suffix[y,j]]
4. RETURN Edit[x] %% an array of length m+1

High level description of Hirschberg's algorithm:

Hirschberg(x,y) is
1. n = length(x); m = length(y)
2. If n <= 1 or m <= 1:
OUTPUT Alignment(x,y) using Needleman-Wunsch.
A. mid = floor(n/2)
B. x_left = Prefix[x,mid]
C. x_right = Suffix[x,n-mid]
D. Edit[x_left] = Forwards(x_left,y)
%% an array of length m+1
E. Edit[x_right] = Backwards(x_right,y)
%% an array of length m+1
F. cut = ArgMin{Edit[x_left,Prefix[y,j]] +
Edit[x_right,Suffix[y,m-j]]} %% j ranges from 1 to m-1
G. Hirschberg(x_left,Prefix[y,cut])
H. Hirschberg(x_right,Suffix[y,m-cut])

You have no rights to post comments



© 2018 BioinfoGuide. All Rights Reserved.