Introduction to Newton's Method

Newton’s method, also known as Newton-Raphson method, is a numerical technique utilized to find the roots of a specified function \(f(x)\). In more elementary terms, it aids in pinpointing the locations where a function intersects the x-axis. The primary objective of this method is therefore to determine the values of \(x\) for which \(f(x)=0\).

We now take an arbitrary non-zero starting value \(x_0\), which serves as an estimate for the location of the root:

If we draw the tangent line to the function at point \(x_0\), which is simply a straight line of the form \(y=mx+b\), we see that this line crosses the x-axis somewhere closer to the desired root at \(x_1\)

The y-offset is \(\Delta y=f(x)\) and the slope of the tangent line is the derivative \(\frac{\Delta y}{\Delta x} = f'(x)\) defined at point \(x_0\) as \(f'(x_0) = \frac{f(x) - f(x_0)}{x-x_0}\), from which we can fully describe the tangent line equation

\[f(x) = f(x_0)+f'(x_0)(x-x_0)\]

The intersection \(x_1\) on the x-coordinate is consequently found by identifying where the tangent line crosses the x-axis, i.e., where \(y=0\). Since we are looking for the \(x\) that is our new \(x_1\), we also substitute \(x_1\) for \(x\) and get

\[0 = f'(x_0) (x_1 - x_0) + f(x_0)\;\;\;\;\; \left(= \frac{\Delta y}{\Delta x}\Delta x + \Delta y\right)\]

Now solving for \(x_1\) leads to

\[x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}\]

Since we can use the new approximation \(x_1\) to calculate an even better approximation \(x_2\) in the same way, we can come up with the recursive formulation, where \(x_n\) is the n^th root/solution of \(f(x)=0\):

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

This iterative refinement continues until a certain accuracy level is achieved, typically when the change in the estimated root falls below a predefined tolerance threshold \(\epsilon\).

Derivation using Taylor Approximation

Another way of seeing Newtons method is by taking Taylor approximations. We still need to find the root \(f(x)=0\) and have given an approximate value \(x_0\) that we use as initial guess. In general a Taylor series at center \(x_0\) is given by

\[f(x) = f(x_0)+f'(x_0)(x-x_0)+\mathcal{O}(x^2)\]

from which is also clear that the error is always \(\leq \mathcal{O}(x^2)\) when we stop after the second term. Our aim is a better approximation \(x_1\) such that \(f(x_1)=0\). Therefore, our linear approximation is

\[0\approx f(x_0)+f'(x_0)(x_1-x_0)\]

When we now solve for the better approximation \(x_1\) we get

\[x_1=x_0-\frac{f(x_0)}{f'(x_0)}\]

from which we can derive the general case

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

Examples

Calculating the Square root using Newton’s method

A typical example is calculating the square root \(y=\sqrt{x}\) of a positive real number. By re-arranging the equation, we get

\[y=\sqrt{x} \Leftrightarrow y^2 - x = 0\]

Therefore \(0 = f(y) = y^2 - x\) and its derivative is \(f'(y) = 2y\). Now \(\frac{f(y)}{f'(y)} = \frac{y^2 - x}{2y}\) from which follows the recursive update rule

\[y_{n+1}= y_n - \frac{y_n^2 - x}{2y_n} = \frac{x + y_n^2}{2 y_n} = \frac{1}{2}\left(\frac{x}{y_n} + y_n\right)\]

function sqrt(x) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = x * 0.5;
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y = (x / y + y) * 0.5;
  }
  return y;
}

Calculating the Inverse Square Root using Newton’s method

Similarly, the inverse square root \(y=\frac{1}{\sqrt{x}}\) can be determined by

\[y=\frac{1}{\sqrt{x}} \Leftrightarrow \frac{1}{y^2} - x = 0\]

With \(0=f(y) = \frac{1}{y^2} - x\) and its derivative \(f'(y) = -\frac{2}{y^3}\). Now \(\frac{f(y)}{f'(y)} = \frac{1}{2} y (x y^2 - 1)\) from which follows the recursive update rule

\[y_{n+1} = y_n - \frac{1}{2} y_n (x y_n^2 - 1) = y_n \left(\frac{3}{2} - \frac{1}{2} x y_n^2\right)\]

If the initial guess is already pretty good, one needs only few iterations, which is used in the Fast inverse square root trick, found in the Quake III engine.

function isqrt(x) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = 1 / (x + 1);
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y = 0.5 * y * (3 - x * y * y);
  }
  return y;
}

Calculating the Nth Root using Newton’s method

To generalize the root finding to the Nth root, we state the problem \(y=x^\frac{1}{n}\), which can be written as

\[x - y^n = 0\]

With \(0=f(y) = x - y^{n}\) and its derivative \(f'(y) = -n y^{n - 1}\), we get

\[\begin{array}{rl} y_{k+1} &= y_k-\frac{y_k^n-x}{n y_k^{n - 1}}\\ &= y_k + \frac{x y_k^{1 - n} - y_k}{n}\\ &= y_k + \frac{\frac{x}{y_k^{n-1}} - y_k}{n} \end{array}\]

function nrt(x, n) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = 0.5 * x;
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y+= (x / Math.pow(y, n - 1) - y) / n;
  }
  return y;
}

Optimization using Newton’s method

Given a twice differentiable function \(f:\mathbb{R}\to\mathbb{R}\), we seek to minimize the function \(f(x)\):

\[\min\limits_{x\in\mathbb{R}} f(x)\]

With Newton’s method, we have a powerful technique for root-finding and optimization. To optimize a function, we aim to locate points where \(f'(x)=0\). Therefore

\[x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}\]

This formulation is notable, as it can also be interpreted from another perspective. Specifically, it minimizes the quadratic approximation given by the Taylor approximation of \(f(x)\):

\[f(x) \approx f(x_k) + f'(x_k)(x-x_k) + f''(x_k)\frac{(x-x_k)^2}{2}\]

For a function \(f:\mathbb{R}^n\to\mathbb{R}\) with multiple input variables \(\mathbf{x}\in\mathbb{R}^n\), one uses, analogous to the derivative, the Jacobian matrix \(J\in\mathbb{R}^{n\times 1}\) and the inverse Hessian matrix \(H\in\mathbb{R}^{n\times n}\) for the second derivative:

\[\mathbf{x}_{k+1} = \mathbf{x}_k - H(\mathbf{x}_k)^{-1}J(\mathbf{x}_k)\]

Solving equations with Newton’s method

Newton’s method can also be used to solve equations. For example to determine the intersection of two differentiable functions \(f\) and \(g\), we set them equal \(f(x)=g(x)\) and to be able to solve it, we transform it to a root finding problem with \(f(x)-g(x) = 0\). With i.e., \(f(x)=\ln x\) and \(g(x)=-x\) we get

\[x_{n+1} = x_n - \frac{\log x_n + x_n}{\frac{1}{x_n}+1} = \frac{x_n }{x_n + 1}\left(1 - \log(x_n)\right)\]

After a few iterations we see the sequence converges to \(x_n \approx 0.567143\), which is the same as we get with the analytical solution using the Lambert W function with \(x + \log(x) = 0 \Leftrightarrow e^{x + \log(x)} = W(e^0) \Leftrightarrow xe^{x} = W(1) \Rightarrow W(1)\approx 0.567143\).

Note about \(x_0\)

Choosing the initial guess \(x_0\) is crucial for more complex functions, especially when a function has multiple roots. Functions like the square root, the log or the exponential function are easier in this regard in converges even with huge deviations from the true root. It can also happen that Newton’s method stops in a local optima, since the gradient of the tangent line would be zero. Choosing random start points for multiple trials can mitigate this issue.

Derivation using Taylor Approximation

Examples

Calculating the Square root using Newton’s method

Calculating the Inverse Square Root using Newton’s method

Calculating the Nth Root using Newton’s method

Optimization using Newton’s method

Solving equations with Newton’s method

Note about \(x_0\)

Related Stuff

The Fast Inverse Square Root Algorithm

Introduction to Lambert W function

Introduction to Gradient Descent