Given a set of strings, find the longest common subtring. It is assumed that the input set of strings will never have more than one substring in common.
Input: file containing a vector of strings on each line.
Output: a string representing the common string. If the strings have no common substring, the output should be 0.
Example:
For:
orchestra
check
chelsea
The output is che (common sub-sequences are not considered valid outputs; that is, ch and he in the given example, are not considered valid outputs).
Solution
Given only two strings, a very fast dynamic programming approach is known, which solves the problem in \(O(\ell^2)\), where \(\ell\) is the length of the longest string. Generalizing the problem to a finite set of strings with cardinality \(m\) is a little harder since remembering every valid path would require a tensor and filling it would lead to a complexity of \(O(\ell^m)\). A faster and more intuitive way of solving this problem is using a suffix-tree as well as a bit vector.
When we insert every suffix of a given string into a tree, like "orchestra", "rchestra", "chestra", ... "a" and mark every node as touched by the kth word, using a bit vector, we can start at the root node and check for the longest path for which all bits are set. Creating the tree can be done in \(O(m\cdot \ell^2)\). We avoid the higher computational complexity by storing every combination in the tree and therefore going into space complexity. Since we expect \(\ell<32\) we can use an integer as the bit vector and don't need to use an infinity bit set implementation like BitSet.js.
function insert(T, str, start, iter) {
for (var i = start; i < str.length; i++) {
var c = str[i];
if (T[c] === undefined) {
T[c] = {next: {}, mask: 1 << iter};
} else {
T[c].mask|= 1 << iter;
}
T = T[c].next;
}
}
var arr = ["orchestra", "check", "chelsea"];
var T = {};
for (var k = 0; k < arr.length; k++) {
var str = arr[k];
for (var i = 0; i < str.length; i++) {
insert(T, str, i, k);
}
}
Having the filled tree allows us to search for all valid substrings and extract the longest:
function findLCS(T, goal) {
var ret = [];
for (var k in T) {
if (T[k].mask == goal) {
ret.push(k + findLCS(T[k].next, goal));
}
}
return ret;
}
var ret = findLCS(T, Math.pow(2, arr.length) - 1);
if (ret.length === 0) {
console.log(0);
} else {
console.log(ret.reduce((a, b) => a.length > b.length ? a : b))
}