The Fast Inverse Square Root algorithm is one of the most famous numerical methods in computer science, gaining legendary status for its crucial role in real-time 3D graphics. It was popularized by its use in the 1999 video game Quake III Arena, where it was employed to perform rapid vector normalization.
What is the Fast Inverse Square Root Algorithm?
The Fast Inverse Square Root algorithm provides an efficient method to compute \( \frac{1}{\sqrt{x}} \), a calculation that is typically resource-intensive. By combining bitwise manipulations with a refinement step using Newton's method, the algorithm achieves a fast and remarkably accurate approximation.
The Importance of Inverse Square Root in 3D Graphics
In 3D graphics, vector normalization is essential. It involves scaling a vector so that its magnitude is 1, which is critical for lighting calculations, reflections, and geometric operations. Normalization is achieved by dividing each component of a vector by its length:
\[ \hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|} = \mathbf{v} \times \frac{1}{\sqrt{v_x^2 + v_y^2 + v_z^2}} \]
where the Fast Inverse Square Root algorithm is used to quickly compute \( \frac{1}{\sqrt{x}} \) with \(x=v_x^2 + v_y^2 + v_z^2\).
Newton's Method and Bitwise Operations
The algorithm's efficiency stems from two key components: an initial approximation using bitwise operations and a refinement using Newton's method.
Newton's Method
To find \( y=\frac{1}{\sqrt{x}} \Leftrightarrow \frac{1}{y^2} - x = 0 \), we reformulate the problem as finding the root of the function:
\[ f(y) = \frac{1}{y^2} - x \]
Using Newton's method and the derivative \(f'(y) = -\frac{2}{y^3}\), the iterative formula for improving our guess \( y_n \) is:
\[ y_{n+1} = y_n - \frac{f(y_n)}{f'(y_n)} = y_n \left(\frac{3}{2} - \frac{x y_n^2}{2}\right) \]
This formula quickly converges to the true value of \( \frac{1}{\sqrt{x}} \), especially when the initial guess \( y_0 \) is close to the actual solution.
Initial Approximation and Floating-Point Representation
The algorithm starts with an initial guess \( y_0 \), derived from the floating-point representation of \( x \) based on the IEEE 754 standard.
Floating-Point Representation
Encoding a non-zero real number \(x\) as a single precision float, the first step is to write \(x\) as a normalized binary number:
\[x = \pm 1.b_1 b_2 b_3... \cdot 2^{e}\]
with \(b_1 b_2 b_3...\) being the binary digits of the significant and \(e\) being the integer exponent. Since the leading 1 bit is always 1, it is not stored and therefore the a single-precision floating-point number \( x \) can be expressed as:
\[ x = \pm 2^e \cdot (1 + m) \]
Here, \( m \) is the mantissa (with \( m \in [0, 1) \)).
Now the floating-point number \(x\) is stored as a 32-bit integer, comprising:
- A sign bit \( S \) (0 for positive, 1 for negative).
- An 8 bit exponent \( E = e + B \) (with \( B = 127 \) as the bias, acting as an offset for negative exponents).
- A mantissa \( M =2^{23}\cdot m\), scaled to fit within the remaining bits.
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
\(S\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(E\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) | \(M\) |
The Magic Number and Algorithm Derivation
The goal is still to calculate \(y=\frac{1}{x}\). When we take the logarithm on both sides we get
\[\log_b(y)=\log_b\left(\frac{1}{x}\right)=\log(x^{-\frac{1}{2}})=-\frac{1}{2}\log_b(x)\]
And applying it with \(b=2\) to the floating point representation of \(x = \pm 2^e \cdot (1 + m)\) by ignoring the negative numbers (since we can't calculate the negative square root) we have:
\[\log_2(x)=e+\log_2(1+m)\]
This demonstrates that aliasing a float32 to an integer provides a rough approximation of its logarithm in base two. Since \( m \in [0,1) \), we can approximate \( \log_2(x) \) as \( m + \mu \), where \( \mu \) is a free parameter. For instance, setting \( \mu = 0 \) gives exact results at the endpoints of the interval, while \( \mu = \frac{1}{2} - \frac{1 + \ln(\ln(2))}{2 \ln(2)} \approx 0.0430357 \) offers the optimal approximation, minimizing the error in the uniform norm sense. Therefore
\[\log_2(x) = e_x + m_x + \mu\]
Now saying that \(I\) is an integer, representing the bit pattern of the floating point number, we get
\[\begin{array}{rl} I &= 2^{23}\cdot E + M\\ &= 2^{23}\cdot(e+B+m)\\ &= 2^{23}\cdot(e+m+\mu+B-\mu)\\ &\approx 2^{23}\cdot\log_2(x) + 2^{23}\cdot(B-\mu) \end{array}\]
From which follows that we can approximate \(\log_2(x)\) with the bit representation of a number, shifted by some constants
\[ \log_2(x) \approx \frac{I}{2^{23}} - (B - \mu) \]
Using the approximation of \(\log_2(x)\) for the inverse square root yields
\[\begin{array}{rrl} &y &= \frac{1}{\sqrt x}\\ \Leftrightarrow& \log_2(y) &= -\frac{1}{2}\log_2(x)\\ \Rightarrow& \frac{I_y}{2^{23}} - (B-\mu) &\approx -\frac{1}{2}\left(\frac{I_x}{2^{23}} - (B-\mu)\right)\\ \Leftrightarrow& I_y &\approx \underbrace{\frac{3}{2}\cdot 2^{23}\cdot(B-\mu)}_{=\text{0x5f3759df}} - \underbrace{\frac{1}{2}}_{i\text{>>}1}\cdot I_x \end{array}\]
So the magic number 0x5f3759df that is found in the original source code used to compute the initial guess is basically the first part of the equation with \(B=127\) and \(\mu\approx 0.0450466\) and the division by two is implemented as a right shift by one, leading to
i = 0x5f3759df - (i >> 1)
Implementation in Code
The algorithm that was found in the source code and posted to comp.graphics.algorithms on January 9, 2002 is about three times as fast as the native square root function with an error of about 1% and is implemented in C with the following snippet. First the address of variable &y is converted to a 32 bit long pointer, then the initial value \(y_0\) is defined using i = 0x5f3759df - (i >> 1) and after that one step of Newton's Method is done to refine the result
float Q_rsqrt(float number) {
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * (long *) &y; // evil floating point bit level hacking
i = 0x5f3759df - (i >> 1); // what the fuck?
y = *(float *) &i;
y = y * (threehalfs - (x2 * y * y)); // 1st iteration
// y = y * (threehalfs - (x2 * y * y)); // 2nd iteration, this can be removed
return y;
}
Using this idea for Square Roots
If we try to use the same idea for ordinary square roots, we find that
\[\begin{array}{rrl} &y &= \sqrt x\\ \Leftrightarrow& \log_2(y) &= \frac{1}{2}\log_2(x)\\ \Rightarrow& \frac{I_y}{2^{23}} - (B-\mu) &\approx \frac{1}{2}\left(\frac{I_x}{2^{23}} - (B-\mu)\right)\\ \Leftrightarrow& I_y &\approx 532676608 - 4194304 \mu + \frac{x}{2} \end{array}\]
With \(\mu:= 0.0759\) a quite good sqrt approximation function is
union FloatLong {
float y;
long i;
};
float Q_sqrt(float x) {
union FloatLong u;
u.y = x;
u.i = (532358260 + (u.i >> 1)) & 0x7fffffff; // Get initial y_0
return (x / u.y + u.y) * 0.5; // Return after one Newton step
}