Longest Common Subsequence
A. First Edition
This is just a simple practice of algorithms of Longest Common Subsequence.
How to find the longest common subsequence? Here the common subsequence is defined to be common string such that
you can omit one or more character within the string. i.e. "string" and "satirainsg" have the common subsequence
of "string" even the second string has so many garbage within it. You just ignore them and continue to move
your cursor.
What I am doing here is try to test my idea that I can compare and get the longest common subsequence among
THREE strings by comparing the result common subsequence of two with the third string.
This is the standard algorithm from textbook. The only thing I misunderstand is that when you re-construct the
common string from the matrix, you need to add the character into your stack unless it is surely not from
"different" case. (Do you understand? See the algorithm and you know that it will write down the max of its
neighbour if the two character from two string are different. If same, you write down the upper-left number plus
one.)
กก
E.Further improvement
กก
F.File listing
1. LCS.h
2. LCS.cpp
3. main.cpp
file name: LCS.h
class LCS { private: int** matrix; char* rowStr; char* colStr; int row, col; int curRow, curCol; void initialize(char* str1, char* str2); void diffCase(int r, int c); void sameCase(int r, int c); void printMatrix(); void uninitialize(); public: char* stack; int compStr(const char* str1, const char* str2); void print(); LCS(); ~LCS(); };
file name: LCS.cpp
#include <stdlib.h> #include <stdio.h> #include <string.h> #include "LCS.H" LCS::~LCS() { uninitialize(); } void LCS::uninitialize() { if (matrix!=NULL) { for (int i=0; i<row; i++) { delete [] matrix[i]; } delete [] matrix; matrix=NULL; } if (stack!=NULL) { delete [] stack; } stack=NULL; } LCS::LCS() { matrix=NULL; rowStr=colStr=stack=NULL; curRow=curCol=0; } void LCS::initialize(char* str1, char* str2) { uninitialize(); row=strlen(str1); col=strlen(str2); rowStr=(char*)str1; colStr=(char*)str2; matrix=new int*[row]; for (int i=0; i<row; i++) { matrix[i]=new int[col]; for (int j=0; j<col; j++) { //because every row by default has maximum of this number matrix[i][j]=i+1; } } } int LCS::compStr(const char* str1, const char* str2) { initialize((char*)str1, (char*)str2); for (int r=0; r<row; r++) { for (int c=0; c<col; c++) { if (str1[r]!=str2[c]) { diffCase(r, c); } else { sameCase(r, c); } //if exceeds the limit, no need for further if (matrix[r][c]==r+1) { break; } } } return matrix[row-1][col-1]; } void LCS::printMatrix() { for (int r=0; r<col; r++) { printf("%2d ", r+1); } printf("\n"); for ( r=0; r<row; r++) { for (int c=0; c<col; c++) { printf("%2d ", matrix[r][c]); } printf("\n"); } } void LCS::print() { int r=row-1, c=col-1, len=matrix[row-1][col-1]; stack=new char[len+1]; stack[len]='\0'; printMatrix(); printf("str1=%s\nstr2=%s\n", rowStr, colStr); while (r!=0&&c!=0) { if (matrix[r][c]==matrix[r-1][c]) { r--; } else { if (matrix[r][c]==matrix[r][c-1]) { c--; } else { //it must be same for the last one stack[--len]=rowStr[r]; r--; c--; } } } if (len>0) { stack[--len]=rowStr[r]; } printf("the common subsequence is:%s\n", stack); } void LCS::sameCase(int r, int c) { if (r!=0&&c!=0) { matrix[r][c]=matrix[r-1][c-1]+1; } else { matrix[r][c]=1; } } void LCS::diffCase(int r, int c) { if (r!=0&&c!=0) { if (matrix[r][c-1]>matrix[r-1][c]) { matrix[r][c]=matrix[r][c-1]; } else { matrix[r][c]=matrix[r-1][c]; } } else { matrix[r][c]=0; } }
file name: main.cpp
#include "LCS.H" #include <stdio.h> #include <stdlib.h> #include <string.h> char* str1="010010100100111100"; char* str2="10001111010010101001001"; char* str3="01010010100101101"; int main() { char buffer[30]; LCS L; printf("The length is %d\n", L.compStr(str1, str2)); L.print(); strcpy(buffer, L.stack); printf("The length is %d\n", L.compStr(buffer, str3)); L.print(); printf("The length is %d\n", L.compStr(str1, str3)); L.print(); strcpy(buffer, L.stack); printf("The length is %d\n", L.compStr(buffer, str2)); L.print(); printf("The length is %d\n", L.compStr(str2, str3)); L.print(); strcpy(buffer, L.stack); printf("The length is %d\n", L.compStr(buffer, str1)); L.print(); return 0; }
How to run?
I try to get the longest common subsequence from THREE strings. First, I compare two strings and get the result
of LCS. Then I use the result to compare with the third because the common subsequence must be subsequence of
the longest common subsequence from previous two strings. Can you test it for me?
1. The following I am just comparing str1 and str2. Then I compare the result with str3.
2. Secondly I change the comparing sequence: I compare str1 with str3. Then compare the result with str2.
3. Thirdly I change the sequence again. str2 and str3 first and then result with str1.
4. The results of 1) and 2) are different. And result of 1) and 3) are same.
The length is 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 1 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 0 2 2 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 1 2 2 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 7 7 7 7 0 2 3 3 5 5 5 5 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 0 1 3 4 5 5 5 5 6 6 7 8 8 8 8 8 8 9 9 9 9 9 9 1 1 3 4 5 6 6 6 6 7 7 8 9 9 9 9 9 9 9 10 10 10 10 0 2 2 4 5 6 6 6 7 7 8 8 9 10 10 10 10 10 10 10 11 11 11 0 1 3 3 5 6 6 6 7 7 8 9 9 10 10 11 11 11 11 11 11 12 12 1 1 3 3 4 6 7 7 7 8 8 9 10 10 11 11 12 12 12 12 12 12 13 1 1 3 3 4 5 7 8 8 8 8 9 10 10 11 11 12 12 12 13 13 13 13 1 1 3 3 4 5 6 8 8 9 9 9 10 10 11 11 12 12 12 13 13 13 14 1 1 3 3 4 5 6 7 8 9 9 9 10 10 11 11 12 12 12 13 13 13 14 0 2 2 4 4 5 6 7 8 9 10 10 10 11 11 12 12 13 13 13 14 14 14 0 1 3 3 4 5 6 7 8 9 10 11 11 11 11 12 12 13 14 14 14 15 15 str1=010010100100111100 str2=10001111010010101001001 the common subsequence is:100101001001100 The length is 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 0 2 2 3 3 3 4 4 4 4 4 4 4 4 4 4 4 1 2 3 3 4 4 4 5 5 5 5 5 5 5 5 5 5 0 2 3 4 4 4 5 5 6 6 6 6 6 6 6 6 6 1 2 3 4 5 5 5 6 6 7 7 7 7 7 7 7 7 1 2 3 4 5 6 6 6 6 7 8 8 8 8 8 8 8 0 2 3 4 5 6 7 7 7 7 8 9 9 9 9 9 9 1 2 3 4 5 6 7 8 8 8 8 9 10 10 10 10 10 1 2 3 4 5 6 7 8 8 9 9 9 10 10 10 11 11 0 2 3 4 5 6 7 8 9 9 9 10 10 11 11 11 12 0 1 3 4 5 6 7 8 9 9 9 10 10 11 12 12 12 1 1 2 4 5 6 7 8 9 10 10 10 11 11 12 13 13 1 1 2 4 5 6 7 8 9 10 11 11 11 11 12 13 13 str1=100101001001100 str2=01010010100101101 the common subsequence is:1001010010110 The length is 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 2 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 0 2 3 4 4 4 5 5 5 5 5 5 5 5 5 5 5 1 2 3 4 5 5 5 6 6 6 6 6 6 6 6 6 6 0 2 3 4 5 5 6 6 7 7 7 7 7 7 7 7 7 1 2 3 4 5 6 6 7 7 8 8 8 8 8 8 8 8 1 2 3 4 5 6 6 7 7 8 9 9 9 9 9 9 9 0 2 3 4 5 6 7 7 8 8 9 10 10 10 10 10 10 1 2 3 4 5 6 7 8 8 9 9 10 11 11 11 11 11 1 2 3 4 5 6 7 8 8 9 10 10 11 11 11 12 12 0 2 3 4 5 6 7 8 9 9 10 11 11 12 12 12 13 0 1 3 4 5 6 7 8 9 9 10 11 11 12 13 13 13 0 1 3 4 5 6 7 8 9 9 10 11 11 12 13 13 14 0 1 3 4 5 6 7 8 9 9 10 11 11 12 13 13 14 1 1 2 4 5 6 7 8 9 10 10 11 12 12 13 14 14 1 1 2 4 5 6 7 8 9 10 11 11 12 12 13 14 14 str1=010010100100111100 str2=01010010100101101 the common subsequence is:01001010010111 The length is 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 1 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 0 2 2 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 1 2 2 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 7 7 7 7 0 2 3 3 5 5 5 5 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 0 1 3 4 5 5 5 5 6 6 7 8 8 8 8 8 8 9 9 9 9 9 9 1 1 3 4 5 6 6 6 6 7 7 8 9 9 9 9 9 9 9 10 10 10 10 0 2 2 4 5 6 6 6 7 7 8 8 9 10 10 10 10 10 10 10 11 11 11 1 2 2 4 5 6 7 7 7 8 8 8 9 10 11 11 11 11 11 11 11 11 12 1 2 2 4 5 6 7 8 8 8 8 8 9 10 11 11 12 12 12 12 12 12 12 1 2 2 4 5 6 7 8 8 9 9 9 9 10 11 11 12 12 12 13 13 13 13 str1=01001010010111 str2=10001111010010101001001 the common subsequence is:1001010010111 The length is 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 2 3 4 4 4 4 4 4 4 4 4 4 4 4 0 2 2 3 3 4 5 5 5 5 5 5 5 5 5 5 5 0 1 2 3 3 4 5 5 6 6 6 6 6 6 6 6 6 0 1 2 3 3 4 5 5 6 6 6 7 7 7 7 7 7 0 1 2 3 3 4 5 5 6 6 6 7 7 8 8 8 8 1 1 2 3 4 4 5 6 6 7 7 7 8 8 8 9 9 0 2 2 3 4 4 5 6 7 7 7 8 8 9 9 9 10 1 2 3 3 4 5 5 6 7 8 8 8 9 9 9 10 10 1 2 3 3 4 5 5 6 7 8 9 9 9 9 9 10 10 0 2 3 4 4 5 6 6 7 8 9 10 10 10 10 10 11 1 2 3 4 5 5 6 7 7 8 9 10 11 11 11 11 11 0 2 3 4 5 5 6 7 8 8 9 10 11 12 12 12 12 1 2 3 4 5 6 6 7 8 9 9 10 11 12 12 13 13 0 2 3 4 5 6 7 7 8 9 9 10 11 12 13 13 14 1 2 3 4 5 6 7 8 8 9 10 10 11 12 13 14 14 1 2 3 4 5 6 7 8 8 9 10 10 11 12 13 14 14 0 2 3 4 5 6 7 8 9 9 10 11 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 10 11 12 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 11 12 12 13 14 15 0 2 3 4 5 6 7 8 9 10 11 12 12 13 13 14 15 str1=10001111010010101001001 str2=01010010100101101 the common subsequence is:100010100101101 The length is 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 0 2 2 3 4 4 5 5 5 5 5 5 5 5 5 5 5 5 1 2 3 3 4 5 5 6 6 6 6 6 6 6 6 6 6 6 0 2 3 3 4 5 6 6 6 7 7 7 7 7 7 7 7 7 1 2 3 4 4 5 6 7 7 7 8 8 8 8 8 8 8 8 1 2 3 4 4 5 6 7 8 8 8 9 9 9 9 9 9 9 0 2 3 4 5 5 6 7 8 9 9 9 10 10 10 10 10 10 1 2 3 4 5 6 6 7 8 9 10 10 10 10 10 10 11 11 0 2 3 4 5 6 7 7 8 9 10 10 11 11 11 11 11 11 0 1 3 4 5 6 7 7 8 9 10 10 11 12 12 12 12 12 1 1 2 4 5 6 7 8 8 9 10 11 11 12 12 12 13 13 0 2 2 4 5 6 7 8 8 9 10 11 12 12 13 13 13 13 str1=100010100101101 str2=010010100100111100 the common subsequence is:1001010010110 Press any key to continue