Contents
raw book

Newton’s method, also known as Newton-Raphson method, is a numerical technique utilized to find the roots of a specified function \(f(x)\). In more elementary terms, it aids in pinpointing the locations where a function intersects the x-axis. The primary objective of this method is therefore to determine the values of \(x\) for which \(f(x)=0\).

We now take an arbitrary non-zero starting value \(x_0\), which serves as an estimate for the location of the root:

If we draw the tangent line to the function at point \(x_0\), which is simply a straight line of the form \(y=mx+b\), we see that this line crosses the x-axis somewhere closer to the desired root at \(x_1\)

The y-offset is \(\Delta y=f(x)\) and the slope of the tangent line is the derivative \(\frac{\Delta y}{\Delta x} = f'(x)\) defined at point \(x_0\) as \(f'(x_0) = \frac{f(x) - f(x_0)}{x-x_0}\), from which we can fully describe the tangent line equation

\[f(x) = f(x_0)+f'(x_0)(x-x_0)\]

The intersection \(x_1\) on the x-coordinate is consequently found by identifying where the tangent line crosses the x-axis, i.e., where \(y=0\). Since we are looking for the \(x\) that is our new \(x_1\), we also substitute \(x_1\) for \(x\) and get

\[0 = f'(x_0) (x_1 - x_0) + f(x_0)\;\;\;\;\; \left(= \frac{\Delta y}{\Delta x}\Delta x + \Delta y\right)\]

Now solving for \(x_1\) leads to

\[x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}\]

Since we can use the new approximation \(x_1\) to calculate an even better approximation \(x_2\) in the same way, we can come up with the recursive formulation, where \(x_n\) is the nth root/solution of \(f(x)=0\):

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

This iterative refinement continues until a certain accuracy level is achieved, typically when the change in the estimated root falls below a predefined tolerance threshold \(\epsilon\).

How to Use Newton's Method: Step-by-Step

Here is the practical algorithm:

  1. Define the Function: Determine the function \(f(x)\) for which you want to find the root (the solution to \(f(x)=0\)).
  2. Calculate the Derivative: Find the derivative function \(f'(x)\).
  3. Choose a Starting Value: Select an initial guess \(x_0\) that is (ideally) relatively close to the actual root.
  4. Apply the Iteration Formula: Calculate the next, better approximation \(x_1\) using the formula: \[x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}\]
  5. Repeat: Use the new value to find \(x_2\), \(x_3\), ...: \[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]
  6. Stopping Condition: Stop the process when the difference between successive estimates is very small (e.g., \(|x_{n+1} - x_n| < \epsilon\)) or when \(f(x_n)\) is very close to zero.

Derivation using Taylor Approximation

Another way of seeing Newton’s method is by taking Taylor approximations. We still need to find the root \(f(x)=0\) and have given an approximate value \(x_0\) that we use as initial guess. In general a Taylor series at center \(x_0\) is given by

\[f(x) = f(x_0)+f'(x_0)(x-x_0)+\mathcal{O}(x^2)\]

from which is also clear that the error is always \(\leq \mathcal{O}(x^2)\) when we stop after the second term. Our aim is a better approximation \(x_1\) such that \(f(x_1)=0\). Therefore, our linear approximation is

\[0\approx f(x_0)+f'(x_0)(x_1-x_0)\]

When we now solve for the better approximation \(x_1\) we get

\[x_1=x_0-\frac{f(x_0)}{f'(x_0)}\]

from which we can derive the general case

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

Examples

Calculating the Square root using Newton’s method

A typical example is calculating the square root \(y=\sqrt{x}\) of a positive real number. By re-arranging the equation, we get

\[y=\sqrt{x} \Leftrightarrow y^2 - x = 0\]

Therefore \(0 = f(y) = y^2 - x\) and its derivative is \(f'(y) = 2y\). Now \(\frac{f(y)}{f'(y)} = \frac{y^2 - x}{2y}\) from which follows the recursive update rule

\[y_{n+1}= y_n - \frac{y_n^2 - x}{2y_n} = \frac{x + y_n^2}{2 y_n} = \frac{1}{2}\left(\frac{x}{y_n} + y_n\right)\]

function sqrt(x) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = (x + 1) * 0.5;
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y = (x / y + y) * 0.5;
  }
  return y;
}

Calculating the Inverse Square Root using Newton’s method

Similarly, the inverse square root \(y=\frac{1}{\sqrt{x}}\) can be determined by

\[y=\frac{1}{\sqrt{x}} \Leftrightarrow \frac{1}{y^2} - x = 0\]

With \(0=f(y) = \frac{1}{y^2} - x\) and its derivative \(f'(y) = -\frac{2}{y^3}\). Now \(\frac{f(y)}{f'(y)} = \frac{1}{2} y (x y^2 - 1)\) from which follows the recursive update rule

\[y_{n+1} = y_n - \frac{1}{2} y_n (x y_n^2 - 1) = y_n \left(\frac{3}{2} - \frac{1}{2} x y_n^2\right)\]

If the initial guess is already pretty good, one needs only few iterations, which is used in the Fast inverse square root trick, found in the Quake III engine.

Otherwise, instead of using the bit-hack from Quake III, we can choose a purely algebraic initial guess. A convenient family is

\[ y_0(x) = \frac{c}{x + d} \]

to approximate \(\frac{1}{\sqrt{x}}\). We require \(y_0(x) \le \frac{1}{\sqrt{x}}\) for all \(x>0\). Since \(x+d>0\) and \(\sqrt{x}>0\), this is equivalent to

\[ \frac{c}{x+d} \le \frac{1}{\sqrt{x}} \;\Longleftrightarrow\; c \le \frac{x+d}{\sqrt{x}} \;=\; \sqrt{x} + \frac{d}{\sqrt{x}}. \]

Denote \(g_d(x) = \sqrt{x} + \frac{d}{\sqrt{x}}\). By the AM-GM inequality,

\[ g_d(x) = \sqrt{x} + \frac{d}{\sqrt{x}} \;\ge\; 2\sqrt{d}, \]

with equality at \(x=d\). Hence we must have \(c \le 2\sqrt{d}\). If we additionally want \(y_0(1) = 1\), then

\[ \frac{c}{1+d} = 1 \quad\Rightarrow\quad c = 1 + d. \]

Combining \(1 + d \le 2\sqrt{d}\) yields \((\sqrt{d} - 1)^2 \le 0\), so \(d = 1\) and \(c = 2\). This gives the simple optimal start

\[ y_0(x) = \frac{2}{1 + x}. \]

Moreover, for all \(x>0\) this lies below the exact value:

\[ \frac{2}{1+x} \le \frac{1}{\sqrt{x}} \;\Longleftrightarrow\; 2\sqrt{x} \le x+1 \;\Longleftrightarrow\; 0 \le x+1-2\sqrt{x} \;\Longleftrightarrow\; 0 \le (\sqrt{x}-1)^2, \]

which is always true. Thus \(\frac{2}{1+x}\) is a globally valid lower bound for \(\frac{1}{\sqrt{x}}\) and already quite close, so only a few Newton iterations are needed:

function isqrt(x) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = 2 / (x + 1);
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y = 0.5 * y * (3 - x * y * y);
  }
  return y;
}

Calculating the Nth Root using Newton’s method

To generalize the root finding to the Nth root, we state the problem \(y=x^\frac{1}{n}\), which can be written as

\[x - y^n = 0\]

With \(0=f(y) = x - y^{n}\) and its derivative \(f'(y) = -n y^{n - 1}\), we get

\[\begin{array}{rl} y_{k+1} &= y_k-\frac{y_k^n-x}{n y_k^{n - 1}}\\ &= y_k + \frac{x y_k^{1 - n} - y_k}{n}\\ &= y_k + \frac{\frac{x}{y_k^{n-1}} - y_k}{n} \end{array}\]

function nrt(x, n) {
  if (x < 0) return NaN;

  let eps = 1e-15;
  let y = 0.5 * x;
  let p = 0;
  while (Math.abs(y - p) > eps) {
    p = y;
    y+= (x / Math.pow(y, n - 1) - y) / n;
  }
  return y;
}

Optimization using Newton’s method

Given a twice differentiable function \(f:\mathbb{R}\to\mathbb{R}\), we seek to minimize the function \(f(x)\):

\[\min\limits_{x\in\mathbb{R}} f(x)\]

With Newton’s method, we have a powerful technique for root-finding and optimization. To optimize a function, we aim to locate points where \(f'(x)=0\). Therefore

\[x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}\]

This formulation is notable because it admits a second, purely optimization-based interpretation: the same update is obtained by minimizing the local quadratic model given by the second-order Taylor approximation of \(f\) around the current iterate \(x_k\).

To see this explicitly, define the quadratic model

\[ q_k(x) := f(x_k) + f'(x_k)(x-x_k) + \frac{1}{2}f''(x_k)(x-x_k)^2. \]

Minimizing \(q_k\) means solving \(q_k'(x)=0\):

\[ q_k'(x)= f'(x_k) + f''(x_k)(x-x_k) \stackrel{!}{=} 0 \quad\Longrightarrow\quad x = x_k - \frac{f'(x_k)}{f''(x_k)}. \]

The same derivation can be written in terms of a step. Define the step \(\Delta_k := x_{k+1}-x_k\), i.e. \(x_{k+1}=x_k+\Delta_k\). Writing the quadratic model in terms of \(\Delta\) gives

\[ q_k(\Delta) := f(x_k) + f'(x_k)\Delta + \frac{1}{2}f''(x_k)\Delta^2, \qquad (x = x_k+\Delta). \]

Minimizing \(q_k(\Delta)\) means solving \(\frac{d}{d\Delta}q_k(\Delta)=0\):

\[ \frac{d}{d\Delta}q_k(\Delta)= f'(x_k) + f''(x_k)\Delta \stackrel{!}{=} 0 \quad\Longrightarrow\quad \Delta_k = -\frac{f'(x_k)}{f''(x_k)}. \]

Finally, we update \(x_{k+1} := x_k + \Delta_k\), which yields

\[ x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}. \]

The same idea generalizes directly to multivariate optimization. Let \(f:\mathbb{R}^n\to\mathbb{R}\) be a scalar objective with parameter vector \(\mathbf{x}\in\mathbb{R}^n\). In this case, the analogue of the first derivative is the gradient \(\nabla f(\mathbf{x})\in\mathbb{R}^{n\times 1}\), and the analogue of the second derivative is the Hessian \(H_f(\mathbf{x}) := \nabla^2 f(\mathbf{x})\in\mathbb{R}^{n\times n}\).

At iteration \(k\), define the step \(\Delta_k\in\mathbb{R}^n\) by \(\mathbf{x}_{k+1} := \mathbf{x}_k + \Delta_k\). The second-order Taylor approximation of \(f\) around \(\mathbf{x}_k\) is

\[ f(\mathbf{x}_k+\Delta) \approx f(\mathbf{x}_k) + \nabla f(\mathbf{x}_k)^\top \Delta + \frac{1}{2}\Delta^\top H_f(\mathbf{x}_k)\Delta. \]

As in the one-dimensional case, we obtain the Newton step by minimizing the local quadratic model, now viewed as a function of the vector step \(\Delta\):

\[ q_k(\Delta) := f(\mathbf{x}_k) + \nabla f(\mathbf{x}_k)^\top \Delta + \frac{1}{2}\Delta^\top H_f(\mathbf{x}_k)\Delta. \]

Since \(q_k\) is a quadratic function in \(\Delta\), its minimizer is characterized by the first-order condition \(\nabla_{\Delta} q_k(\Delta)=\mathbf{0}\). Differentiating term by term gives \(\nabla_{\Delta} q_k(\Delta)=\nabla f(\mathbf{x}_k)+H_f(\mathbf{x}_k)\Delta\). Setting this to zero at \(\Delta=\Delta_k\) yields the Newton system

\[ H_f(\mathbf{x}_k)\,\Delta_k = -\nabla f(\mathbf{x}_k). \]

Solving for \(\Delta_k\) and applying the update \(\mathbf{x}_{k+1}=\mathbf{x}_k+\Delta_k\) gives the closed form

\[ \mathbf{x}_{k+1} = \mathbf{x}_k - H_f(\mathbf{x}_k)^{-1}\nabla f(\mathbf{x}_k). \]

Solving equations with Newton’s method

Newton’s method can also be used to solve equations. For example to determine the intersection of two differentiable functions \(f\) and \(g\), we set them equal \(f(x)=g(x)\) and to be able to solve it, we transform it to a root finding problem with \(f(x)-g(x) = 0\). With i.e., \(f(x)=\ln x\) and \(g(x)=-x\) we get

\[x_{n+1} = x_n - \frac{\log x_n + x_n}{\frac{1}{x_n}+1} = \frac{x_n }{x_n + 1}\left(1 - \log(x_n)\right)\]

After a few iterations we see the sequence converges to \(x_n \approx 0.567143\), which is the same as we get with the analytical solution using the Lambert W function with \(x + \log(x) = 0 \Leftrightarrow e^{x + \log(x)} = W(e^0) \Leftrightarrow xe^{x} = W(1) \Rightarrow W(1)\approx 0.567143\).

When Newton's Method Fails or Converges Poorly

Choosing the initial guess \(x_0\) is crucial. While the method is very fast when it works (it converges quadratically), it is not guaranteed to converge for every function or every starting point. Common problems include: