BACK TO ALL POSTS

RCP #2: Longest Common Subsequence

September 5th, 2019 • 6 min read

1. Longest Common Subsequence

Given two strings s1 and s2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

Example:

Input: s1 = "abac", s2 = "cab"
Output: 2

Before we start working on the problem, we first must understand the definition of a subsequence. A subsequence of a string is a new string generated from the original string with some characters(can be none) deleted without changing the relative order of the remaining characters. (eg, "ace" is a subsequence of "abcde" while "aec" is not). A common subsequence of two strings is a subsequence that is common to both strings.

From the definition, we must recognize that a common subsequence of the substrings of two given strings will also be a common subsequence of the two strings themselves.

Hence, given string s1 with length m and s2 with length n, by comparing their last characters, if they are equal, LCS (Longest Common Subsequence) of these two strings will be 1 + lcs(s1.substr(0, m - 1), s2.substr(0, n - 1)). Else, if the last 2 characters are different, LCS of s1 and s2 will be maximum of lcs(s1.substr(0, m - 1), s2) and lcs(s1, s2.substr(0, n - 1)).

By defining the optimal substructure of the problem, we can easily construct a Dynamic Programming solution.

// Time: O(m * n) where m is length of s1, n is length of s2
// Space: O(m * n)
function lcs(s1, s2) {
  const m = s1.length;
  const n = s2.length;

  if (m === 0 || n === 0) {
    return 0;
  }

  // cache[i][j] represents the length of longest common subsequence
  // for s1[0...i - 1] and s2[0...j -1]
  const cache = [...Array(m + 1)].map(() => [...Array(n + 1)]);

  for (let i = 0; i <= m; i++) {
    cache[i][0] = 0;
  }

  for (let i = 0; i <= n; i++) {
    cache[0][i] = 0;
  }

  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        cache[i][j] = 1 + cache[i - 1][j - 1];
      } else {
        cache[i][j] = Math.max(
          cache[i - 1][j],
          cache[i][j - 1],
        );
      }
    }
  }

  return cache[m][n];
}

2. Longest Common Subsequence among 3 strings

Given three strings s1, s2 and s3, return the length of their longest common subsequence. If there is no common subsequence, return 0.

Example:

Input: s1 = "aabcde", s2 = "bcdeaa", s3 = "aa"
Output: 2

For this extension from the original problem, we can just apply the same logic from the original problem. The only difference now is that the cache array will have 3 dimensions instead.

// Time: O(m * n * o) where m is length of s1, n is length of s2, o is length of s3
// Space: O(m * n * o)
function lcs(s1, s2, s3) {
  const m = s1.length;
  const n = s2.length;
  const o = s3.length;

  if (m === 0 || n === 0 || o === 0) {
    return 0;
  }

  // cache[i][j][k] represents the length of longest common subsequence
  // for s1[0...i - 1], s2[0...j - 1] and s3[0...k - 1]
  const cache = [...Array(m + 1)]
  .map(
    () => [...Array(n + 1)]
      .map(() => [...Array(o + 1)]),
  );

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= n; j++) {
      cache[i][j][0] = 0;
    }
  }

  for (let i = 0; i <= m; i++) {
    for (let j = 0; j <= o; j++) {
      cache[i][0][j] = 0;
    }
  }

  for (let i = 0; i <= n; i++) {
    for (let j = 0; j <= o; j++) {
      cache[0][i][j] = 0;
    }
  }

  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      for (let k = 1; k <= o; k++) {
        if (
          s1[i - 1] === s2[j - 1] &&
          s2[j - 1] === s3[k - 1]
        ) {
          cache[i][j][k] = 1 + cache[i - 1][j - 1][k - 1];
        } else {
          cache[i][j][j] = Math.max(
            cache[i - 1][j][k],
            cache[i][j - 1][k],
            cache[i][j][k - 1],
          );
        }
      }
    }
  }

  return cache[m][n][o];
}

3. Shortest Common Supersequence

Given two strings s1 and s2, return the shortest supersequence that has both s1 and s2 as subsequences. If multiple answers exist, you may return any of them.

Example:

Input: s1 = "abac", s2 = "cab"
Output: "cabac"

Again, we first must understand what a supersequence is. A supersequence of a string is a string generated from the original string with some characters(can be none) added without changing the relative order of the original characters. (eg, "abcde" is a supersequence of "ace" while "abecd" is not). A common supersequence of two strings is a supersequence that is common to both strings.

So how can we construct the Shortest Common Supersequence for two given strings? To make the common supersequence as short as possible, we need to reuse the duplicate characters in both strings as much as we can. However, we also need to make sure that the reused characters are in the original order they are put in the two strings. So what's the longest chain of duplicate characters in the two strings for us to use? That is, in fact, the LCS (Longest Common Subsequence) of the two strings! 🤩

Hence, the approach here is to get the LCS of the two given strings, then insert characters that are missing from the two strings in the LCS. The final result will guarantee to be the shortest common supersequence of the two strings. Since we only return the length of the LCS for the original problem, we will also need to extend the original solution to reconstruct the LCS from the cache 2D array.

// Time: O(m * n)
// Space: O(m * n)
function shortestCommonSupersequence(s1, s2) {
  const m = s1.length;
  const n = s2.length;

  if (m === 0) {
    return s2;
  }
  if (n === 0) {
    return s1;
  }

  // cache[i][j] represents the length of longest common subsequence
  // for s1[0...i - 1] and s2[0...j -1]
  const cache = [...Array(m + 1)].map(() => [...Array(n + 1)]);

  for (let i = 0; i <= m; i++) {
    cache[i][0] = 0;
  }

  for (let i = 0; i <= n; i++) {
    cache[0][i] = 0;
  }

  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (s1[i - 1] === s2[j - 1]) {
        cache[i][j] = 1 + cache[i - 1][j - 1];
      } else {
        cache[i][j] = Math.max(
          cache[i - 1][j],
          cache[i][j - 1],
        );
      }
    }
  }

  // Reconstruct the longest common subsequence
  const arr = [];
  let right1 = m;
  let right2 = n;
  while (cache[right1][right2] !== 0) {
    if (s1[right1 - 1] === s2[right2 - 1]) {
      arr.push(s1[right1 - 1]);
      right1--;
      right2--;
    } else if (
      cache[right1][right2] === cache[right1 - 1][right2]
    ) {
      right1--;
    } else {
      right2--;
    }
  }
  arr.reverse();

  // Fill in missing characters from both strings
  // into the longest common subsequence
  const res = [];
  let left1 = 0;
  let left2 = 0;
  for (let i = 0; i < arr.length; i++) {
    const char = arr[i];
    while (s1[left1] !== char) {
      res.push(s1[left1]);
      left1++;
    }
    while (s2[left2] !== char) {
      res.push(s2[left2]);
      left2++;
    }
    res.push(char);
    left1++;
    left2++;
  }
  while (left1 < m) {
    res.push(s1[left1]);
    left1++;
  }
  while (left2 < n) {
    res.push(s2[left2]);
    left2++;
  }

  return res.join("");
}

Final Thoughts

So from the original problem of just finding the length of the longest common subsequence, we have gone through a few interesting extensions of it. We also had a try at reconstructing the actual data from a Dynamic Programming cache by backtracking it. In the next RCP (Randing Coding Problem) post, I will share about another type of subsequence problem, which is also very "common". 😏

But until the next sharing comes, see you when I see you! ✌️