Skillvalue Solution: Find the Link

18 Dec 2022 Robert Eisele

Given a set of strings, find the longest common subtring. It is assumed that the input set of strings will never have more than one substring in common.

Input: file containing a vector of strings on each line.
Output: a string representing the common string. If the strings have no common substring, the output should be 0.

Example:

For:
orchestra
check
chelsea

The output is che (common sub-sequences are not considered valid outputs; that is, ch and he in the given example, are not considered valid outputs).

Solution

Given only two strings, a very fast dynamic programming approach is known, which solves the problem in \(O(\ell^2)\), where \(\ell\) is the length of the longest string. Generalizing the problem to a finite set of strings with cardinality \(m\) is a little harder since remembering every valid path would require a tensor and filling it would lead to a complexity of \(O(\ell^m)\). A faster and more intuitive way of solving this problem is using a suffix-tree as well as a bit vector.

When we insert every suffix of a given string into a tree, like "orchestra", "rchestra", "chestra", ... "a" and mark every node as touched by the kth word, using a bit vector, we can start at the root node and check for the longest path for which all bits are set. Creating the tree can be done in \(O(m\cdot \ell^2)\). We avoid the higher computational complexity by storing every combination in the tree and therefore going into space complexity. Since we expect \(\ell<32\) we can use an integer as the bit vector and don't need to use an infinity bit set implementation like BitSet.js.

function insert(T, str, start, iter) {

  for (var i = start; i < str.length; i++) {
    var c = str[i];

    if (T[c] === undefined) {
      T[c] = {next: {}, mask: 1 << iter};
    } else {
      T[c].mask|= 1 << iter;
    }
    T = T[c].next;
  }
}

var arr = ["orchestra", "check", "chelsea"];
var T = {};
for (var k = 0; k < arr.length; k++) {
  var str = arr[k];
  for (var i = 0; i < str.length; i++) {
    insert(T, str, i, k);
  }
}

Having the filled tree allows us to search for all valid substrings and extract the longest:

function findLCS(T, goal) {

  var ret = [];
  for (var k in T) {
    if (T[k].mask == goal) {
      ret.push(k + findLCS(T[k].next, goal));
    }
  }
  return ret;
}
var ret = findLCS(T, Math.pow(2, arr.length) - 1);
if (ret.length === 0) {
  console.log(0);
} else {
  console.log(ret.reduce((a, b) => a.length > b.length ? a : b))
}

Skillvalue Solution: Find the Link

Solution

Related Stuff

Project Euler Solution #32: Pandigital products

Codingame Solution: Genome Sequencing

Codingame Solution: The Resistance