Codesignal Solution: TwoFive

18 Dec 2022 Robert Eisele

Base 32 is often chosen to represent binary data in a compressed form that can be easily read and handled by humans (for example, product keys or activation codes). It's more compact than Base 10 and easier to comprehend than Base 16 (hexadecimal) or Base 64.

To generate Base 32, the input data is processed as a bit stream. Output values for each group of five bits are produced until all input bits are processed. The last 5-bit group is extended with '0' bits if it is incomplete.

Finally, each 5-bit value is mapped into a dictionary of 32 alphanumeric characters to generate a human readable string.

The dictionary omits ambiguous characters such as '0', '1', 'O', and 'I', so only the following characters remain:

d = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ".

Your task is to generate Base 32 from the given input data a.

Example

For a = [1, 2], the output should be

TwoFive(a) = "3J22".
Represented as bits right-to-left, the input data is transformed as follows:
```
Index: 1 0
Value: 2 1
Bits: 00000010 00000001
```
For Base 32 we're interpreting the same bits as groups of five, padding the last group with 0 bits:
```
Index: 3 2 1 0
Value: 0 0 16 1
Bits: 00000 00000 10000 00001
```
We get the result sequence [1, 16, 0, 0], which we can now map into the dictionary:
```
1 => '3'
16 => 'J'
0 => '2'
0 => '2'
```
The resulting Base 32 string is "3J22".
For a = [170, 170], the output should be

TwoFive(a) = "CPC3".
Represented as bits right-to-left, the input data is transformed as follows
```
Index: 1 0
Value: 170 170
Bits: 10101010 10101010
```
For Base 32 we're interpreting the same bits as groups of five, again padding the last group with 0 bits:
```
Index: 3 2 1 0
Value: 1 10 21 10
Bits: 00001 01010 10101 01010
```
The result sequence is [10, 21, 10, 1]. Mapped into the dictionary:
```
10 => 'C'
21 => 'P'
10 => 'C'
 1 => '3'
```
The resulting Base 32 string is "CPC3".

Input/Output

[time limit] 4000ms (js)

[input] array.integer a
Array of 8-bit unsigned integers.
Constraints:

0 ≤ a.length < 16.
[output] string
Character-mapped Base 32 representation of the input data.

Solution

The first idea that came to my mind was creating a linear mapping from the indexes of the 8 bit digits to the 5 bit digits. That means that I was looking for mapping that was looking like this for a 2 byte value:

7654321076543210
43210432104321043210

I created slicing windows from \(a\) to \(b\). The first window is from 0 to 4, the next from 5 to 7 and 0 to 1 and so on. The following algorithm produces exactly this:

a = 0
b = a + 5 - 1
n = 0

for (var i=0; i < 8; i++) {

  if (a < b)
    console.log(n, a, b)
  else
    console.log(n++, 7, a, "##",n,b, 0)

  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

With this index creating algorithm, it's easy to create the masks to compute the actual base 32 values and map the array finally to the given dictionary:

arr = [170, 170]
r = []

a = 0
b = a + 5 - 1
n = 0

for (; n < arr.length; ) {

  if (a < b) {
    r.push((arr[n] >> a) & 31)
    console.log(n, a, b, r)
    if (b == 7) n++
  } else {
    v = (arr[n] >> a) & 31
    if (n+1<arr.length)
      v|=(arr[n+1] << (7-a+1)) & 31
    r.push(v)
    console.log(n, 7, a, "##",n+1,b, 0, r)
    n++;
  }
  a = (b + 1) % 8
  b = (a + 5 - 1) % 8
}

r.map(x => "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[x]).join``

When simplified, this algorithm looks as follows:

TwoFive = X => {
  r = ""
  for (a = n = 0; X[n]; ) {
    b = (a + 4) % 8
    v = X[n] >> a & 31
    b == 7 && n++
    b < a && X[++n] && (v|= X[n] << 8 - a & 31)
    a = (b + 1) % 8
    r+= "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"[v]
  }
  return r
}

As this algorithm is not readable anymore and not as small as I want it to be, I started all over. The only thing we need is a little buffer to overcome the downgrade from 8 to 5 bit. So why not writing 8 bit to a buffer, flush 5 bit, read in another 8 bit and flush all 5-digit chunks again. I wanted to give this idea a trial, which worked quite well:

TwoFive = a => {

  s = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"
  n = m = 0
  r = ""

  for (x of a) {
    m|= x << n
    n+= 8
    for (; n > 4; )
      r+= s[m & 31],
      m>>= 5,
      n-= 5
  }
  return n ? r + s[m & 31] : r
}

Codesignal Solution: TwoFive

Solution

Related Stuff

Codesignal Solution: Grouped Bits

Codingame Solution: Binary Permutations

Codingame Solution: Vortex