This is an integer hash function written by Thomas Wang:

It hashes down a 64-bit input to a 32-bit output. It has good mixing: basic statistical analysis can show that it has reasonably good avalanche effect if lacking in bit independence (certain pairs of bits of the output like to flip at the same time when the input changes in a specific way, in some cases more than 99% of the time). However this hash function has a more glaring flaw which is the lack of fan-out.

Fan-out is necessary for a hash algorithm, otherwise you could simply trace backwards and generate an input that produces a specific hash (this is called a preimage). To visualize fan-out, imagine a hash function $H$ that uses three smaller independent functions $A$, $B$, and $C$ to produce a hash like so:

\[H(i) = A(i) + B(i) + C(i)\]If an attacker wanted to manipulate the output value $H(i)$, they may start by tweaking $i$ to output a specific value in $A(i)$. However in doing so $B(i)$ and $C(i)$ now also produce different values. It is very difficult to control the outputs of all three of these functions at once (unless perhaps they are not so independent). The input value has “fanned out” into several functions, which come together at the end.

A common pattern is to use some invertible convolution $C$ and compute $C(i) + i$. This way, reversing the algorithm requires first taking a guess at what $i$ may have been, then calculating $C^{-1}(h - i)$ for the given output $h$. This value then must match the guessed $i$, but it all likelihood it will not. Algorithms like MD5, SHA1 and SHA2, and Salsa20/ChaCha20 use this pattern.

In this algorithm, the input value is only used once; it therefore can be traced backwards from the output without any issues. The only thing needed is to take a guess at what the 32 bits truncated at the last step may have been. Conveniently, each of the steps taken to mix the input are fully invertible, and so *every single output hash* coupled with *all possible truncated bits* can each be traced back to a *single* valid preimage.

In this article we will step through the process of writing an inversion of this hash function capable of generating preimages extremely quickly.

So let’s look at the next line before the truncation:

Looking at it, we can see that the uppermost 22 bits of `key`

are unaffected by the operation. Consider a visualization of a simpler example, say `x ^ (x >> 7)`

for some 16-bit wide string `x`

:

```
1011101011010010 - Some input value
^ 101110101 - Shifted right by 7
-----------------
1011101110100111 - Output value
******* - Unaffected bits
```

Therefore, those upper 22 bits can be xor-ed against bits 22-44 to obtain another group of 22 bits (remember xor is its own inverse). And then those 44 bits can be xor-ed against the next bits, and so on to the end of any length bitstring with any right shift amount. In the simpler example:

```
1011101110100111
^ 1011101 - Upper bits shifted right
-----------------
10111010110100 - Now 14 bits recovered
1011101110100111
^ 101110101 - Known 14 bits shifted right
-----------------
1011101011010010 - Fully recovered input!
```

This also works when you try to analyze it algebraically. Let’s say instead of `key = key ^ (key >> a)`

, we create a new variable key2 and `key2 = key ^ (key >> a)`

, to make it easier to see what’s going on, using $\oplus$ to denote xor:

In order to invert this, we must remove the $k_1 \gg a$ term by xor-ing it again. So, since $k_2$ contains a $k_1$ term:

\[k_3 = k_2 \oplus (k_2 \gg a)\]This works because shifts (in either direction) distribute with xor:

\[(x \oplus y) \gg a = (x \gg a) \oplus (y \gg a)\]Or visually with some 8-bit strings:

```
- Xor then shift - Shift then xor
10011010 100110(10)
^ 01111001 ^ 011110(01)
-------- ------
11100011 111000
>> 2
--------
111000(11)
```

Now we have xor-ed another copy of $k_1 \gg a$ against the xor-sum. This can be seen by substituting $k_2$ for its value as defined in terms of $k_1$:

\[\begin{align*} k_3 &= k_2 \oplus (k_2 \gg a)\\ &= k_1 \oplus (k_1 \gg a) \oplus ((k_1 \oplus (k_1 \gg a)) \gg a)\\ &= k_1 \oplus (k_1 \gg a) \oplus (k_1 \gg a) \oplus (k_1 \gg 2a)\\ &= k_1 \oplus (k_1 \gg 2a) \end{align*}\]However now we have added another term to the sum, but with a doubled shift amount. This is good, though, as this process can be repeated until the shift amount exceeds the width of the bitstring, at which point the extra xor-ed term becomes zero. That is, for a 16-bit string, a shift of 16 or greater results in a value of zero regardless of the input.

We can try this in code and see that it works:

This method can be applied to all of the lines of the function that xor shifts:

Although the function only uses xor with right shifts, this same method works just the same for left shifts. Consider reversing the order of the bits of a string, then xor-ing a right shift, and then restoring the original order. Since xor has no borrow or carry out, it is agnostic to shift direction.

The xor method does not extend to addition (or subtraction) of a right shift. Something like `k += (k >> 7)`

is not invertible, because it’s not even bijective.

This makes sense, because in this context a right shift is equivalent to taking the floor of a division by a power of 2, which of course does not distribute over addition:

\[\begin{align*} \left\lfloor \frac{3}{4} \right\rfloor + \left\lfloor \frac{2}{4} \right\rfloor &= 0 + 0 &= 0\\ \left\lfloor \frac{3 + 2}{4} \right\rfloor &= \left\lfloor \frac{5}{4} \right\rfloor &= 1 \end{align*}\]More practically, addition can be viewed as just xor with carry out. So looking at a particular example of 8-bit `k + (k >> 4)`

:

```
11111111
+ 1111
--------
00001110
```

If we try to do the same trick of extracting the upper 4 bits, assuming they have not been touched by the operation, we find that it is possible for the carry out of the addition to flip bits there. Given only `00001110`

as the output value, we could assume that the upper 4 bits are either `0000`

or
`1111`

, since 1 could’ve been added from the carry out of the addition to the lower bits. We can then subtract this upper half from the lower half to recover it, and:

```
00001110 00001110
- 1111 - 0000
-------- --------
11111111 00001110
```

We now get two different answers. Which is the correct one? When you compute `k + (k >> 4)`

for each of those answers:

```
11111111 00001110
+ 1111 + 0000
-------- --------
00001110 00001110
```

They’re both correct. This is not always the case, but this function is no longer one-to-one and so this operation is not (or at least shouldn’t be) commonly included in algorithms as it will pigeonhole the value passing through it. In this specific operation, 15 outputs can be produced by two different inputs, and correspondingly 15 other outputs cannot be produced no matter what the input.

However, *left* shifts do distribute, being equivalent to multiplication of $2^a$ for shift amount $a$, modulo $2^N$. As seen in this next line from the initial hashing function:

This can be modeled algebraically in the same way:

\[k_2 = k_1 + (k_1 \ll 6)\]We now must subtract the $k_1 \ll 6$ term to remove it from the sum, so:

\[k_3 = k_2 - (k_2 \ll 6)\]And substituting the value for $k_2$:

\[\begin{align*} k_3 &= k_2 - (k_2 \ll 6)\\ &= k_1 + (k_1 \ll 6) - ((k_1 + (k_1 \ll 6)) \ll 6)\\ &= k_1 + (k_1 \ll 6) - (k_1 \ll 6) - (k_1 \ll 12)\\ &= k_1 - (k_1 \ll 12) \end{align*}\]As a side effect of the subtraction, the negative distributes and we now are left with a subtraction of a doubled shift amount at the end.

When this operation is repeated to remove it, we switch to adding again:

\[\begin{align*} k_4 &= k_3 + (k_3 \ll 12)\\ &= k_1 - (k_1 \ll 12) + ((k_1 - (k_1 \ll 12)) \ll 12)\\ &= k_1 - (k_1 \ll 12) + (k_1 \ll 12) - (k_1 \ll 24)\\ &= k_1 - (k_1 \ll 24) \end{align*}\]The remaining term remains negative from here, having been propagated down from
the initial subtraction. As earlier, we continue doing this until the magnitude of the shift exceeds the bitwidth of `k`

, and thus the value being shifted becomes zero. Doing this in code:

This also works the same for subtraction of a left shift, where the subtraction itself propagates through the iterations:

However, since a left shift is multiplication modulo $2^N$, we can instead invert this by multiplying by the multiplicative inverse, modulo $2^N$:

And of course this also works the same for subtraction of a left shift, where `k - (k << 3)`

is equal to `k * -7`

:

At this point I have to say thanks to this extended Euclidean algorithm calculator which happily deals with numbers exceeding $2^{64}$ and I used for every multiplicative inverse here.

Note that to have a multiplicative inverse modulo $2^N$, the multiplier number must be coprime with $2^N$— that is, odd. For `k += k << a`

and `k -= k << a`

this is always the case, as the initial `k`

is adding or subtracting 1 to the multiplier of $2^a$. Multiplying by an even number would be written as such with shifts:

This is obviously uninvertible because the shifts remove bits with no trace of the original `k`

value. For instance an 8-bit `k`

value of `10001100`

when replaced with a left shift of itself deletes the high bit, and has the same result as shifting the value `00001100`

.

Looking back at the original hash function, it’s clear how to proceed with inverting most of it:

Our code so far might look like this:

The last thing to do is invert the addition of a left shift to the complement. This is a bit tricky, now we have two transformations of the data instead of one. Intuitively complementing does not distribute with addition like it does with xor, because it changes where the carry-outs are occurring.

However, we can still do the trick of looking at the least significant 18 bits, which are unaffected by the entire operation. If we complement these bits, then subtract them from the next 18 bits, and then complement the result, we can extract another 18 bits, and then repeat this process until the entire input is recovered.

Looking at a simpler example of 8-bit `k = ~k + << 3`

:

```
- Forward:
11010100 - Input value
00101011 - Complement of input
+ 10100 - Input shifted left
--------
11001011 - Output value
*** - Unaffected bits
- Reverse:
- First iteration:
11001011 - Output value
- 100 - Subtract complement of lowest 3 bits, shifted left
--------
~ 10101011 - Undo complement of input
01010100
****** - Correct input bits
- Second iteration:
11001011 - Output value
- 10100 - Subtract known 6 bits, shifted left
--------
~ 00101011 - Undo complement of input
11010100 - Recovered input value!
```

This must be done in pieces while slowly assembling the original input, because otherwise the partial subtraction could cause borrowing in upper bits that could taint them for later iterations.

The code to do this might look like this:

This also works just the same for subtraction by changing to adding in the iterations:

Printing the value of `ktmp`

after each iteration shows how it is extracting bits in groups of 9:

00000000 00000000000000000000000000000000 832fe0ef 10000011001011111110000011101111 e2f1beef 11100010111100011011111011101111 66adbeef 01100110101011011011111011101111 deadbeef 11011110101011011011111011101111

The bits higher than the ones extracted so far are essentially garbage from the previous iteration being incorrect, and can be ignored. These bits could be masked out at each iteration, but it’s equally correct to just ignore them.

Another interesting thing about this solution is the initialization to 0 is not necessary. What is happening here is 0 correct bits are turned into 9, which are then turned into 18, and then into 27, and finally into all 32. Because of the left shift, we know that the rightmost 9 bits are 0 and so each iteration “corrects” 9 bits of ktmp which is a “guess” as to the input. Picking any other value for the initialization changes only the garbage data:

abcdef97 10101011110011011110111110010111 1f0f0eef 00011111000011110000111011101111 a14dbeef 10100001010011011011111011101111 1eadbeef 00011110101011011011111011101111 deadbeef 11011110101011011011111011101111

Should some of the bits of the “guess” in ktmp already be correct, even fewer iterations are needed:

0000beef 00000000000000001011111011101111 84adbeef 10000100101011011011111011101111 deadbeef 11011110101011011011111011101111 deadbeef 11011110101011011011111011101111 deadbeef 11011110101011011011111011101111

Despite how interesting this solution is, like before with multiplicative inverses, it’s simpler to just solve this mathematically. Note that:

\[\overline{x} = -x - 1\]Where here $\overline{x}$ denotes C `~x`

and for unsigned values $-x$ means multiplying $x$ by $2^N - 1$ for bit width $N$. This is true for unsigned integers as well as two’s complement signed ones, where most people would be familiar with this equation.

Substituted into the full line:

\[\begin{align*} key &= \overline{key} + (key \ll 18)\\ &= -key - 1 + (key * 2^{18})\\ &= key * \left(2^{18} - 1\right) - 1 \end{align*}\]So while complementing as a black box does not distribute with addition, all that needs to be done to resolve this is to add one. Then what is necessary for the code is:

Likewise with subtraction, where the multiplier becomes $-(2^a + 1)$:

The clever bit-extracting solution is not necessary for xor-ing shifts against the complement, either. Complements do distribute with xor, so all that is needed is to re-flip the complemented bits, and then proceed as normal with shift inversion, like so:

And so the final inversion of the hash function is:

This code can churn through every single possible preimage for a given hash in about ~19 minutes. If you want to play with the C code, here is a download to the file.

The original hash function could be improved drastically by applying the $C(i) + i$ structure mentioned at the start, like so:

Just doing this modification I can no longer see an easy way (faster than $2^{32}$ work) to generate preimages and so this is “left as an exercise to the reader”.