A matrix \(\mathbf{A}\) transforms a vector \(\mathbf{x}\) into another vector \(\mathbf{Ax}\). In general, \(\mathbf{Ax}\) points in a different direction than \(\mathbf{x}\), since some sort of rotation may be part of the transformation.
However, special vectors called eigenvectors keep their direction under a transformation with \(\mathbf{A}\) (i.e. they remain invariant in direction). In this case, \(\mathbf{Ax}\) and \(\mathbf{x}\) are parallel, differing only in a constant scale factor (only stretched, compressed, or flipped), which we call the associated eigenvalue.
Definition
For a square matrix \(\mathbf{A}\), a to be found nonzero vector \(\mathbf{x}\) and a complex or real scalar \(\lambda\) are eigenvectors and associated eigenvalues, iff they satisfy
\[\mathbf{Ax}=\lambda\mathbf{x}\]
There are infinetely many solutions, since \(\frac{\lambda}{c}\) with \(c\mathbf{x}\) for any \(c\neq 0\) also satisfies this equation. Consequently, eigenvectors are assumed to be normalized, i.e., satisfy the constraint \(\mathbf{x}^T\mathbf{x}=1\).
The original equation can be re-arranged a bit:
\[\begin{array}{rrl} &\mathbf{Ax}&=\lambda\mathbf{x}\\ \Leftrightarrow&\mathbf{Ax} - \lambda\mathbf{x}&=\mathbf{0}\\ \Leftrightarrow&(\mathbf{A} - \lambda\mathbf{I})\mathbf{x}&=\mathbf{0} \end{array}\]
This is a homogeneous linear system in \(\mathbf{x}\). It always has the trivial solution \(\mathbf{x}=\mathbf{0}\), but we are explicitly interested in nonzero solutions. A nonzero solution exists if and only if the matrix \(\mathbf{A}-\lambda\mathbf{I}\) is singular, i.e. not invertible. For square matrices, singularity is equivalent to a zero determinant. Therefore, eigenvalues \(\lambda\) are precisely those scalars for which
\[\det(\mathbf{A}-\lambda\mathbf{I})=0.\]
Once an eigenvalue \(\lambda\) is known, eigenvectors are obtained by solving the corresponding linear system
\[(\mathbf{A}-\lambda\mathbf{I})\mathbf{x}=\mathbf{0}\]
and then choosing any nonzero solution and (optionally) normalizing it.
The Characteristic Polynomial
For \(\mathbf{A}\in\mathbb{C}^{n\times n}\), the expression \(\det(\mathbf{A}-\lambda\mathbf{I})\) is a polynomial in \(\lambda\) of degree \(n\). This polynomial is called the characteristic polynomial of \(\mathbf{A}\):
\[p_{\mathbf{A}}(\lambda):=\det(\mathbf{A}-\lambda\mathbf{I}).\]
The equation \(p_{\mathbf{A}}(\lambda)=0\) is called the characteristic equation. Its roots \(\lambda_1,\ldots,\lambda_n\) (counted with algebraic multiplicity) are the eigenvalues of \(\mathbf{A}\).
To see directly why a polynomial appears, it helps to look at an explicit example and observe what changes when we subtract \(\lambda\mathbf{I}\): only the diagonal entries are shifted by \(-\lambda\). When you compute the determinant, you multiply and add these entries in a structured way; as soon as \(\lambda\) appears on the diagonal, the determinant becomes an expression containing powers of \(\lambda\), i.e. a polynomial.
A concrete \(2\times 2\) example
Let
\[\mathbf{A}=\begin{pmatrix}a&b\\c&d\end{pmatrix}.\]
Then
\[\mathbf{A}-\lambda\mathbf{I}= \begin{pmatrix}a-\lambda&b\\c&d-\lambda\end{pmatrix}\]
and the determinant expands to
\[\det(\mathbf{A}-\lambda\mathbf{I}) =\det\begin{pmatrix}a-\lambda&b\\c&d-\lambda\end{pmatrix} =(a-\lambda)(d-\lambda)-bc.\]
Multiplying out gives
\[(a-\lambda)(d-\lambda)-bc =\lambda^2-(a+d)\lambda+(ad-bc).\]
So the characteristic polynomial is
\[p_{\mathbf{A}}(\lambda)=\lambda^2-\mathrm{Tr}(\mathbf{A})\,\lambda+\det(\mathbf{A}).\]
Its roots are the eigenvalues. For higher-dimensional matrices the determinant expansion is more involved, but the same mechanism applies: subtracting \(\lambda\) on the diagonal forces the determinant to become a polynomial in \(\lambda\) whose roots are the eigenvalues.
How to compute eigenvectors once \(\lambda\) is known
For each eigenvalue \(\lambda\), form \(\mathbf{A}-\lambda\mathbf{I}\) and solve
\[(\mathbf{A}-\lambda\mathbf{I})\mathbf{x}=\mathbf{0}.\]
The solution set is the null space (kernel) \(\mathcal{N}(\mathbf{A}-\lambda\mathbf{I})\), called the eigenspace associated with \(\lambda\). Any nonzero vector in this eigenspace is an eigenvector. If the eigenspace has dimension greater than \(1\), then \(\lambda\) has multiple linearly independent eigenvectors; choosing an orthonormal basis of the eigenspace is often convenient.
Properties
- Eigenvalues are roots of the characteristic polynomial: \(\lambda\) is an eigenvalue of \(\mathbf{A}\) iff \(\det(\mathbf{A}-\lambda\mathbf{I})=0\), i.e. iff \(p_{\mathbf{A}}(\lambda)=0\).
- Symmetric matrices: If \(\mathbf{A}\) is real symmetric, all eigenvalues are real.
- Trace and determinant (with multiplicities): If \(\lambda_1,\ldots,\lambda_n\) are the eigenvalues counted with algebraic multiplicity, then \[\mathrm{Tr}(\mathbf{A})=\sum_{i=1}^n\lambda_i,\qquad \det(\mathbf{A})=\prod_{i=1}^n\lambda_i.\]
- If \(\mathbf{A}\) has \(n\) distinct (and therefore linearly independent) eigenvectors, these eigenvectors form a basis of \(\mathbb{R}^n\) and \(\mathbf{A}\) is diagonalisable
- Diagonal and triangular matrices: If \(\mathbf{A}\) is diagonal (or, more generally, triangular), its eigenvalues are exactly its diagonal entries.
- Scaling: The scaled matrix \(c\mathbf{A}\) (scalar \(c\)) has eigenvalues \(c\lambda_i\) with the same eigenvectors \(\mathbf{x}_i\).
Proof: \(\mathbf{Ax}_i=\lambda_i\mathbf{x}_i \Rightarrow (c\mathbf{A})\mathbf{x}_i=(c\lambda_i)\mathbf{x}_i.\) - Shifting by the identity: \(\mathbf{A}+c\mathbf{I}\) has eigenvalues \(\lambda_i+c\) with the same eigenvectors.
Proof: \((\mathbf{A}+c\mathbf{I})\mathbf{x}_i=\mathbf{A}\mathbf{x}_i+c\mathbf{x}_i=(\lambda_i+c)\mathbf{x}_i.\) - Powers: \(\mathbf{A}^t\) has eigenvalues \(\lambda_i^t\) and eigenvectors \(\mathbf{x}_i\) (integer \(t\ge 0\)). More generally, whenever \(f(\mathbf{A})\) is defined via a polynomial \(f\), the eigenvalues transform as \(f(\lambda_i)\) on the same eigenvectors.
- Inverse: If \(\mathbf{A}^{-1}\) exists, it has eigenvalues \(\frac{1}{\lambda_i}\) with the same eigenvectors.
Proof: \[\begin{array}{rrl} &\mathbf{A}\mathbf{x}_i&=\lambda_i\mathbf{x}_i\\ \Leftrightarrow&\mathbf{A}^{-1}\mathbf{A}\mathbf{x}_i&=\mathbf{A}^{-1}\lambda_i\mathbf{x}_i\\ \Leftrightarrow&\mathbf{x}_i&=\lambda_i\mathbf{A}^{-1}\mathbf{x}_i\\ \Leftrightarrow&\mathbf{A}^{-1}\mathbf{x}_i&=\frac{1}{\lambda_i}\mathbf{x}_i\quad\square \end{array}\] - Transpose: \(\mathbf{A}\) and \(\mathbf{A}^T\) have the same eigenvalues (over \(\mathbb{C}\)).
- Similarity invariance: If \(\mathbf{B}=\mathbf{S}^{-1}\mathbf{A}\mathbf{S}\) for an invertible \(\mathbf{S}\), then \(\mathbf{A}\) and \(\mathbf{B}\) have the same characteristic polynomial and the same eigenvalues.
- Products \(\mathbf{AB}\) and \(\mathbf{BA}\): The nonzero eigenvalues of \(\mathbf{AB}\) equal the nonzero eigenvalues of \(\mathbf{BA}\) (with multiplicities). In particular, \(\mathrm{Tr}(\mathbf{AB})=\mathrm{Tr}(\mathbf{BA})\).
- Real matrices and complex conjugate pairs: If \(\mathbf{A}\) is real and has a non-real eigenvalue \(\lambda\in\mathbb{C}\setminus\mathbb{R}\), then \(\overline{\lambda}\) is also an eigenvalue (with the same multiplicity).
- Orthogonality of eigenvectors (symmetric case): If \(\mathbf{A}\) is real symmetric, then eigenvectors corresponding to distinct eigenvalues are orthogonal and can be chosen to form an orthonormal basis. In particular, they can be used as axes, e.g. for a plotting a covariance matrix.
- Positive (semi-)definite matrices:
- If \(\mathbf{A}\) is positive definite (\(\mathbf{x}^T\mathbf{A}\mathbf{x}>0\) for all \(\mathbf{x}\neq\mathbf{0}\)), then all eigenvalues satisfy \(\lambda>0\).
- If \(\mathbf{A}\) is positive semidefinite (\(\mathbf{x}^T\mathbf{A}\mathbf{x}\ge 0\) for all \(\mathbf{x}\neq\mathbf{0}\)), then all eigenvalues satisfy \(\lambda\ge 0\).
- Multiplicity (conceptual clarity): An eigenvalue may occur multiple times as a root of \(p_{\mathbf{A}}(\lambda)\); this is its algebraic multiplicity. The dimension of its eigenspace \(\mathcal{N}(\mathbf{A}-\lambda\mathbf{I})\) is its geometric multiplicity. Always \(1\le \text{geometric}\le \text{algebraic}\).
Orthogonal Matrices
An orthogonal matrix is a real square matrix \(\mathbf{A}\) such that
\[\mathbf{A}^T=\mathbf{A}^{-1}.\]
Equivalently,
\[\mathbf{A}\mathbf{A}^T=\mathbf{I}\quad\text{and}\quad \mathbf{A}^T\mathbf{A}=\mathbf{I}.\]
This means the rows (and columns) of \(\mathbf{A}\) form an orthonormal set:
- the inner product of any two different rows is \(0\),
- the inner product of any row with itself is \(1\).
Geometrically, orthogonal matrices represent length- and angle-preserving transformations (rotations and reflections). In particular, \(\|\mathbf{Ax}\|_2=\|\mathbf{x}\|_2\) for all \(\mathbf{x}\).
Eigenvalue Decomposition (EVD)
If \(\mathbf{A}\in\mathbb{R}^{n\times n}\) has \(n\) linearly independent eigenvectors \(\mathbf{q}_i\) (for \(i=1,\ldots,n\)) and is therefore diagonalisable, then \(\mathbf{A}\) can be factorized as
\[\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1}\]
where \(\mathbf{Q}\) is a \(n\times n\) matrix whose ith column is the eigenvector \(\mathbf{q}_i\) of \(\mathbf{A}\) and \(\mathbf{\Lambda}\) is the diagonal matrix whose diagonal elements are the corresponding eigenvalues \({\Lambda}_{ii}=\lambda_i\). The decomposition can directly be derived from the initial statement about eigenvalues and eigenvectors:
\[\begin{array}{rrl} &\mathbf{A} \mathbf{q}_i &= \lambda_i \mathbf{q}_i \quad (i=1,\ldots,n)\\ \Leftrightarrow&\mathbf{A} \mathbf{Q} &= \mathbf{Q} \mathbf{\Lambda} \\ \Leftrightarrow&\mathbf{A} &= \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1} . \end{array}\]
Spectral theorem (important special case): If \(\mathbf{A}\) is real symmetric, it is always diagonalisable with an orthogonal eigenvector matrix. Hence
\[\mathbf{A}=\mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T,\qquad \mathbf{Q}^T\mathbf{Q}=\mathbf{I}.\]
Applications of Eigenvalues and Eigenvectors
- Principal component analysis (PCA)
- Powers of a diagonalizable matrix (fast computation of \(\mathbf{A}^t\))
- Quadratic forms and ellipses/ellipsoids (axes and scaling)
- Stability analysis of linear systems and differential equations
- Graph algorithms (e.g. Laplacians, spectral clustering)