The Fast Inverse Square Root algorithm is one of the most famous numerical methods in computer science, gaining legendary status for its crucial role in real-time 3D graphics. It was popularized by its use in the 1999 video game *Quake III Arena*, where it was employed to perform rapid vector normalization.

## What is the Fast Inverse Square Root Algorithm?

The Fast Inverse Square Root algorithm provides an efficient method to compute \( \frac{1}{\sqrt{x}} \), a calculation that is typically resource-intensive. By combining bitwise manipulations with a refinement step using Newton's method, the algorithm achieves a fast and remarkably accurate approximation.

### The Importance of Inverse Square Root in 3D Graphics

In 3D graphics, vector normalization is essential. It involves scaling a vector so that its magnitude is 1, which is critical for lighting calculations, reflections, and geometric operations. Normalization is achieved by dividing each component of a vector by its length:

\[ \hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|} = \mathbf{v} \times \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}} \]

where the Fast Inverse Square Root algorithm is used to quickly compute \( \frac{1}{\sqrt{x}} \) with \(x=v_x^2 + v_y^2 + v_z^2\).

## Newton's Method and Bitwise Operations

The algorithm's efficiency stems from two key components: an initial approximation using bitwise operations and a refinement using Newton's method.

### Newton's Method

To find \( y=\frac{1}{\sqrt{x}} \Leftrightarrow \frac{1}{y^2} - x = 0 \), we reformulate the problem as finding the root of the function:

\[ f(y) = \frac{1}{y^2} - x \]

Using Newton's method and the derivative \(f'(y) = -\frac{2}{y^3}\), the iterative formula for improving our guess \( y_n \) is:

\[ y_{n+1} = y_n - \frac{f(y_n)}{f'(y_n)} = y_n \left(\frac{3}{2} - \frac{x y_n^2}{2}\right) \]

This formula quickly converges to the true value of \( \frac{1}{\sqrt{x}} \), especially when the initial guess \( y_0 \) is close to the actual solution.

### Initial Approximation and Floating-Point Representation

The algorithm starts with an initial guess \( y_0 \), derived from the floating-point representation of \( x \) based on the IEEE 754 standard.

#### Floating-Point Representation

Encoding a non-zero real number \(x\) as a single precision float, the first step is to write \(x\) as a normalized binary number:

\[x = \pm 1.b_1 b_2 b_3... \cdot 2^{e}\]

with \(b_1 b_2 b_3...\) being the binary digits of the significant and \(e\) being the integer exponent. Since the leading 1 bit is always 1, it is not stored and therefore the a single-precision floating-point number \( x \) can be expressed as:

\[ x = \pm 2^e \cdot (1 + m) \]

Here, \( m \) is the mantissa (with \( m \in [0, 1) \)).

Now the floating-point number \(x\) is stored as a 32-bit integer, comprising:

- A sign bit \( S \) (0 for positive, 1 for negative).
- An 8 bit exponent \( E = e + B \) (with \( B = 127 \) as the bias, acting as an offset for negative exponents).
- A mantissa \( M =2^{23}\cdot m\), scaled to fit within the remaining bits.

31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |

\(S\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) |

### The Magic Number and Algorithm Derivation

The goal is still to calculate \(y=\frac{1}{x}\). When we take the logarithm on both sides we get

\[\log_b(y)=\log_b\left(\frac{1}{x}\right)=\log(x^{-\frac{1}{2}})=-\frac{1}{2}\log_b(x)\]

And applying it with \(b=2\) to the floating point representation of \(x = \pm 2^e \cdot (1 + m)\) by ignoring the negative numbers (since we can't calculate the negative square root) we have:

\[\log_2(x)=e+\log_2(1+m)\]

This demonstrates that aliasing a float32 to an integer provides a rough approximation of its logarithm in base two. Since \( m \in [0,1) \), we can approximate \( \log_2(x) \) as \( m + \mu \), where \( \mu \) is a free parameter. For instance, setting \( \mu = 0 \) gives exact results at the endpoints of the interval, while \( \mu = \frac{1}{2} - \frac{1 + \ln(\ln(2))}{2 \ln(2)} \approx 0.0430357 \) offers the optimal approximation, minimizing the error in the uniform norm sense. Therefore

\[\log_2(x) = e_x + m_x + \mu\]

Now saying that \(I\) is an integer, representing the bit pattern of the floating point number, we get

\[\begin{array}{rl} I &= 2^{23}\cdot E + M\\ &= 2^{23}\cdot(e+B+m)\\ &= 2^{23}\cdot(e+m+\mu+B-\mu)\\ &\approx 2^{23}\cdot\log_2(x) + 2^{23}\cdot(B-\mu) \end{array}\]

From which follows that we can approximate \(\log_2(x)\) with the bit representation of a number, shifted by some constants

\[ \log_2(x) \approx \frac{I}{2^{23}} - (B - \mu) \]

Using the approximation of \(\log_2(x)\) for the inverse square root yields

\[\begin{array}{rrl} &y &= \frac{1}{\sqrt x}\\ \Leftrightarrow& \log_2(y) &= -\frac{1}{2}\log_2(x)\\ \Rightarrow& \frac{I_y}{2^{23}} - (B-\mu) &\approx -\frac{1}{2}\left(\frac{I_x}{2^{23}} - (B-\mu)\right)\\ \Leftrightarrow& I_y &\approx \underbrace{\frac{3}{2}\cdot 2^{23}\cdot(B-\mu)}_{=\text{0x5f3759df}} - \underbrace{\frac{1}{2}}_{i\text{>>}1}\cdot I_x \end{array}\]

So the **magic number** 0x5f3759df that is found in the original source code used to compute the initial guess is basically the first part of the equation with \(B=127\) and \(\mu\approx 0.0450466\) and the division by two is implemented as a right shift by one, leading to

`i = 0x5f3759df - (i >> 1)`

### Implementation in Code

The algorithm that was found in the source code and posted to **comp.graphics.algorithms** on January 9, 2002 is about three times as fast as the native square root function with an error of about 1% and is implemented in C with the following snippet. First the address of variable &y is converted to a 32 bit long pointer, then the initial value \(y_0\) is defined using i = 0x5f3759df - (i >> 1) and after that one step of Newton's Method is done to refine the result

```
float Q_rsqrt(float number) {
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * (long *) &y; // evil floating point bit level hacking
i = 0x5f3759df - (i >> 1); // what the fuck?
y = *(float *) &i;
y = y * (threehalfs - (x2 * y * y)); // 1st iteration
// y = y * (threehalfs - (x2 * y * y)); // 2nd iteration, this can be removed
return y;
}
```

## Using this idea for Square Roots

If we try to use the same idea for ordinary square roots, we find that

\[\begin{array}{rrl} &y &= \sqrt x\\ \Leftrightarrow& \log_2(y) &= \frac{1}{2}\log_2(x)\\ \Rightarrow& \frac{I_y}{2^{23}} - (B-\mu) &\approx \frac{1}{2}\left(\frac{I_x}{2^{23}} - (B-\mu)\right)\\ \Leftrightarrow& I_y &\approx 532676608 - 4194304 \mu + \frac{x}{2} \end{array}\]

With \(\mu:= 0.0759\) a quite good sqrt approximation function is

```
union FloatLong {
float y;
long i;
};
float Q_sqrt(float x) {
union FloatLong u;
u.y = x;
u.i = (532358260 + (u.i >> 1)) & 0x7fffffff; // Get initial y_0
return (x / u.y + u.y) * 0.5; // Return after one Newton step
}
```