Monday, April 4, 2016

Bioinformatics: Linear Gap Function vs Affine Gap Function

The main difference between the 2 is:
  • A gap function is called linear if it is in the form of:
    $g(k) = \beta * k$, for some parameter $\beta$.
    $k$ refers to the length of the gap.
  • A gap function is called affine if it is in the form of:
    $g(k) = \alpha + \beta * k$, for some parameter $\alpha$ and $\beta$.
    $\alpha$ is called gap open cost and $\beta$ is called gap extend cost.
    $k$ refers to the length of the gap.
Affine gap function prefers one long gap instead of many small gaps of the same length (because of the gap open cost). Linear gap function does not take into account the length of the gap (doesn't have gap open cost). Affine gap function is more common and gives more realistic result than linear gap function.

Needleman-Wunsch algorithm is used to do global alignment with linear scoring function.
Gotoh algorithm is used to do global alignment with affine gap function.

For example:We have 2 sequences: ATCG and ATGCCG and match score = 2, mismatch score = -1. If we align the 2 sequence using linear gap function, we will get 2 possible results (by using Needleman-Wunsch algorithm):
  1. AT--CG
    ATGCCG
  2. AT-C-G
    ATGCCG
However, if you try to align the 2 sequence with affine gap function with same match/mismatch score (by using Gotoh algorithm), you will only get the first result because the gaps in the second result will incur cost of $2 * (\alpha + \beta * 1)$ while the gaps in the first result will incur cost of $\alpha + \beta * 2$. Therefore, $2\alpha + 2\beta < \alpha + 2\beta$. Since the number of matches is the same (4 matches), the alignment score of the first result is strictly bigger than the second result.

No comments:

Post a Comment