The Smith-Waterman algorithm identifies the similar regions in two genome sequences. In this assignment, you will implement an MPI version of the Smith-Waterman algorithm to measure the similarity between two strings.
The pseudocode of the Smith-Waterman algorithm is shown in Algorithm 1. Given string A = a1,a2,…,an and string B = b1,b2,…,bm, we first construct a scoring matrix H of size (n + 1) ∗ (m+1) and set the first row and column to zero. Then, we iteratively compute the scoring matrix using the following dynamic programming formula:
| Hi−1,j−1 + s(ai,bj) Hi−1,j − w
Hij = max Hi,j−1 − w 0 |
(1 ≤ i ≤ n,1 ≤ j ≤ m) |
where s is the substitution matrix is the match score, v is the
mismatch score, and w is the gap penalty. The value of u, v and w are constant and are give as part of the input to the algorithm. Finally, we return the highest score in the scoring matrix as the similarity score between A and B.
Input and Output
The input file will be in the following format:
- The first line contains two integers a_len and b_len (1 ≤ a_len ≤ 20000,1 ≤ b_len ≤ 20000), separated by a space. They represent the length of string a and b, respectively.
- The second line is a string a of length a_len, consisting of four letters A,T,G,C.
- The third line is a string b of length b_len, consisting of four letters A,T,G,C.
Assignment 1 COMP5112, 2020 Spring
Algorithm 1: The Smith-Waterman Algorithm
Input : string A = a1,a2,…,an and B = b1,b2,…,bm, match score u, mismatch score v, gap penalty w
Output: the similarity score between A and B
- int score[|A| + 1][|B| + 1];
- for i ← 0 to |A| do
3score[i][0] ← 0;
- end
- for i ← 0 to |B| do
6score[0][i] ← 0;
- end
- int max_score ← 0;
- for i ← 1 to |A| do
- for j ← 1 to |B| do
11score[i][j] ← max(0,
12score[i − 1][j] − w,
13score[i][j − 1] − w,
14score[i − 1][j − 1] + sub_mat(ai,bj));
15max_score ← max(max_score,score[i][j]);
- end
- end
The output of the program consists of two lines:
- The similarity score between a and b, i.e., the highest score in the scoring matrix.
- The elapsed time in seconds of the execution.
We assume that the match score is 3, the mismatch score is -3, and the gap penalty is 2.
Here is an example input/output for your reference:
Input:
9 8
GGTTGACTA TGTTACGG
Output:
13
Time: 0.00001 s



