Smith-Waterman Algorithm

The Smith–Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. This algorithm is a variation of Needleman-Wunsch Algorithm developed by  Temple F. Smith and Michael S. Waterman in 1981, it is also a dynamic programming algorithm to find the optimal local alignment with respect to the scoring system being used. The major difference from Needleman-Wunsch algorithm includes:

  1. In order to highlight the best local alignments; negative scoring matrix cells are set to zero
  2. Traceback procedure starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered

This algorithm was designed to sensitively detect similarities in highly diverged sequences. For the purpose of explanation, first summarizing the algorithm in to simple steps and then I will move forwards with examples and explanations.

Algorithm is similar to global alignment with modified boundary conditions and recurrence rules

  1. The top row and left column are now filled with 0
  2. If the (sub-)alignment score becomes negative, restart the search:Smith-Waterman_Algorithm
  3. Traceback is from the maximum of F (i, j) in the whole matrix to the first 0.
  4. Example: the optimal local alignment between HEAGAWGHEE and PAWHEAE is AWGHE::AW-HE
  5. Issue: In gapped alignments, the expected score for a random match may be positive

Example of  Smith–Waterman Algorithm


1. Start with a N x N integer matrix where N is sequence length (both s and t). Compute M[i ][j ] based on Score Matrix and optimum score compute so far (DP). 


2. Understanding the matrix

  • Alignment
    t : - - - - - - - -
    s : c c c t a g g t


  • Alignment
    t : c g g g t a t ...
    s : - - - - - - - ...


3. Computing cell scores: Finding m[i][j]: There are three ways to finish the alignment of s0..i and t0..j



4. Scoring process: Element Computation M[i ][j ]:




5. Backtracking Process: For finding the BEST local alignment, find the highest score and then traceback to first 0

Scoring used in this example:


  • diag - the letters from two sequences are aligned
  • left - a gap is introduced in the left sequence
  • up - a gap is introduced in the top sequence


You have no rights to post comments


Latest Articles

09 February 2018
26 January 2018
09 January 2018


© 2018 BioinfoGuide. All Rights Reserved.