Linear algebra is the branch of mathematics concerned with vector spaces, linear mappings, and systems of linear equations.
It is foundational to virtually every area of applied mathematics, data science, engineering, and physics.
1. Vectors ¶ A vector v ∈ R n \mathbf{v} \in \mathbb{R}^n v ∈ R n is an ordered n n n -tuple of real numbers.
1.1 Operations ¶ For u , v ∈ R n \mathbf{u}, \mathbf{v} \in \mathbb{R}^n u , v ∈ R n and scalar c c c :
u + v = ( u 1 + v 1 , … , u n + v n ) c v = ( c v 1 , … , c v n ) \begin{align}
\mathbf{u} + \mathbf{v} &= (u_1+v_1,\; \ldots,\; u_n+v_n) \\
c\,\mathbf{v} &= (cv_1,\; \ldots,\; cv_n)
\end{align} u + v c v = ( u 1 + v 1 , … , u n + v n ) = ( c v 1 , … , c v n ) Dot product :
u ⋅ v = ∑ i = 1 n u i v i = ∥ u ∥ ∥ v ∥ cos θ \mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^n u_i v_i = \|\mathbf{u}\|\,\|\mathbf{v}\|\cos\theta u ⋅ v = i = 1 ∑ n u i v i = ∥ u ∥ ∥ v ∥ cos θ where θ \theta θ is the angle between the vectors.
Euclidean norm :
∥ v ∥ = v ⋅ v = v 1 2 + ⋯ + v n 2 \|\mathbf{v}\| = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{v_1^2 + \cdots + v_n^2} ∥ v ∥ = v ⋅ v = v 1 2 + ⋯ + v n 2 Cross product (only in R 3 \mathbb{R}^3 R 3 ):
u × v = ∣ i j k u 1 u 2 u 3 v 1 v 2 v 3 ∣ \mathbf{u} \times \mathbf{v} =
\begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{vmatrix} u × v = ∣ ∣ i u 1 v 1 j u 2 v 2 k u 3 v 3 ∣ ∣ The magnitude ∥ u × v ∥ = ∥ u ∥ ∥ v ∥ sin θ \|\mathbf{u} \times \mathbf{v}\| = \|\mathbf{u}\|\,\|\mathbf{v}\|\sin\theta ∥ u × v ∥ = ∥ u ∥ ∥ v ∥ sin θ , equal to the area of the parallelogram spanned by u \mathbf{u} u and v \mathbf{v} v .
1.2 Linear Independence ¶ Vectors { v 1 , … , v k } \{\mathbf{v}_1, \ldots, \mathbf{v}_k\} { v 1 , … , v k } are linearly independent if the only solution to
c 1 v 1 + c 2 v 2 + ⋯ + c k v k = 0 c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k = \mathbf{0} c 1 v 1 + c 2 v 2 + ⋯ + c k v k = 0 is c 1 = c 2 = ⋯ = c k = 0 c_1 = c_2 = \cdots = c_k = 0 c 1 = c 2 = ⋯ = c k = 0 . Otherwise they are linearly dependent .
The span of a set of vectors is the set of all linear combinations. A basis of a vector space is a linearly independent spanning set; its cardinality is the dimension of the space.
2. Matrices ¶ An m × n m \times n m × n matrix A A A has m m m rows and n n n columns:
A = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} A = ⎝ ⎛ a 11 a 21 ⋮ a m 1 a 12 a 22 a m 2 ⋯ ⋯ ⋱ ⋯ a 1 n a 2 n ⋮ a mn ⎠ ⎞ 2.1 Basic Operations ¶ Addition : ( A + B ) i j = a i j + b i j (A + B)_{ij} = a_{ij} + b_{ij} ( A + B ) ij = a ij + b ij (same dimensions required)
Scalar multiplication : ( c A ) i j = c a i j (cA)_{ij} = c\,a_{ij} ( c A ) ij = c a ij
Transpose : ( A T ) i j = a j i (A^T)_{ij} = a_{ji} ( A T ) ij = a ji
Matrix multiplication : ( A B ) i j = ∑ k = 1 n a i k b k j (AB)_{ij} = \displaystyle\sum_{k=1}^n a_{ik}b_{kj} ( A B ) ij = k = 1 ∑ n a ik b kj (requires A A A is m × n m\times n m × n , B B B is n × p n\times p n × p )
2.2 Special Matrices ¶ Name Property Identity I I I A I = I A = A AI = IA = A A I = I A = A Symmetric A T = A A^T = A A T = A Skew-symmetric A T = − A A^T = -A A T = − A Orthogonal A T A = I A^T A = I A T A = I (preserves lengths and angles)Diagonal a i j = 0 a_{ij} = 0 a ij = 0 for i ≠ j i \neq j i = j Triangular All entries above (lower) or below (upper) diagonal are zero
3. Determinants ¶ The determinant det ( A ) \det(A) det ( A ) (or ∣ A ∣ |A| ∣ A ∣ ) is a scalar associated with a square matrix.
2 × 2 2 \times 2 2 × 2 case :
det ( a b c d ) = a d − b c \det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc det ( a c b d ) = a d − b c 3 × 3 3 \times 3 3 × 3 case (cofactor expansion along the first row):
det ( A ) = a 11 ( a 22 a 33 − a 23 a 32 ) − a 12 ( a 21 a 33 − a 23 a 31 ) + a 13 ( a 21 a 32 − a 22 a 31 ) \det(A) = a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{22}a_{31}) det ( A ) = a 11 ( a 22 a 33 − a 23 a 32 ) − a 12 ( a 21 a 33 − a 23 a 31 ) + a 13 ( a 21 a 32 − a 22 a 31 ) Key properties ¶ det ( A B ) = det ( A ) det ( B ) \det(AB) = \det(A)\det(B) det ( A B ) = det ( A ) det ( B )
det ( A T ) = det ( A ) \det(A^T) = \det(A) det ( A T ) = det ( A )
det ( A − 1 ) = 1 / det ( A ) \det(A^{-1}) = 1/\det(A) det ( A − 1 ) = 1/ det ( A ) (when A A A is invertible)
Swapping two rows negates the determinant
If any row is a linear combination of the others, det ( A ) = 0 \det(A) = 0 det ( A ) = 0
A matrix A A A is invertible (nonsingular) if and only if det ( A ) ≠ 0 \det(A) \neq 0 det ( A ) = 0 .
The inverse is then:
A − 1 = 1 det ( A ) adj ( A ) A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A) A − 1 = det ( A ) 1 adj ( A ) where adj ( A ) \operatorname{adj}(A) adj ( A ) is the transpose of the cofactor matrix.
4. Systems of Linear Equations ¶ A system of m m m equations in n n n unknowns can be written as A x = b A\mathbf{x} = \mathbf{b} A x = b .
4.1 Gaussian Elimination ¶ Form the augmented matrix [ A ∣ b ] [A \mid \mathbf{b}] [ A ∣ b ] and reduce to row-echelon form using:
Swap two rows
Multiply a row by a nonzero scalar
Add a multiple of one row to another
Continue to reduced row-echelon form (RREF) for a unique representation.
4.2 Solution Types ¶ Unique solution : rank ( A ) = rank ( [ A ∣ b ] ) = n \operatorname{rank}(A) = \operatorname{rank}([A|\mathbf{b}]) = n rank ( A ) = rank ([ A ∣ b ]) = n
Infinitely many solutions : rank ( A ) = rank ( [ A ∣ b ] ) < n \operatorname{rank}(A) = \operatorname{rank}([A|\mathbf{b}]) < n rank ( A ) = rank ([ A ∣ b ]) < n
No solution : rank ( A ) < rank ( [ A ∣ b ] ) \operatorname{rank}(A) < \operatorname{rank}([A|\mathbf{b}]) rank ( A ) < rank ([ A ∣ b ])
4.3 Cramer’s Rule (n × n n \times n n × n , det ( A ) ≠ 0 \det(A) \neq 0 det ( A ) = 0 ) ¶ x i = det ( A i ) det ( A ) x_i = \frac{\det(A_i)}{\det(A)} x i = det ( A ) det ( A i ) where A i A_i A i is A A A with its i i i -th column replaced by b \mathbf{b} b .
4.4 LU Decomposition ¶ For an n × n n \times n n × n matrix: A = L U A = LU A = LU where L L L is lower triangular with unit diagonal and U U U is upper triangular.
Solve by forward substitution (L y = b L\mathbf{y} = \mathbf{b} L y = b ) then back substitution (U x = y U\mathbf{x} = \mathbf{y} U x = y ).
This is the basis for most numerical linear solvers.
5. Eigenvalues and Eigenvectors ¶ A scalar λ \lambda λ and nonzero vector v \mathbf{v} v satisfy the eigenvector equation :
A v = λ v A\mathbf{v} = \lambda\mathbf{v} A v = λ v λ \lambda λ is an eigenvalue of A A A and v \mathbf{v} v is the corresponding eigenvector .
5.1 Characteristic Equation ¶ Eigenvalues are found by solving:
det ( A − λ I ) = 0 \det(A - \lambda I) = 0 det ( A − λ I ) = 0 The left side is the characteristic polynomial of degree n n n , so an n × n n \times n n × n matrix has exactly n n n eigenvalues (counted with multiplicity, possibly complex).
5.2 Properties ¶ tr ( A ) = ∑ i = 1 n λ i \operatorname{tr}(A) = \displaystyle\sum_{i=1}^n \lambda_i tr ( A ) = i = 1 ∑ n λ i
det ( A ) = ∏ i = 1 n λ i \det(A) = \displaystyle\prod_{i=1}^n \lambda_i det ( A ) = i = 1 ∏ n λ i
If A A A is real symmetric, all eigenvalues are real and eigenvectors of distinct eigenvalues are orthogonal
5.3 Diagonalization ¶ If A A A has n n n linearly independent eigenvectors forming the columns of P P P :
A = P Λ P − 1 A = P \Lambda P^{-1} A = P Λ P − 1 where Λ = diag ( λ 1 , … , λ n ) \Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) Λ = diag ( λ 1 , … , λ n ) .
This gives A k = P Λ k P − 1 A^k = P \Lambda^k P^{-1} A k = P Λ k P − 1 , which is efficient for large powers.
6. Vector Spaces and Subspaces ¶ A vector space over R \mathbb{R} R is a set V V V closed under addition and scalar multiplication and satisfying the eight axioms (associativity, commutativity, identity, inverses, distributivity).
Important subspaces associated with an m × n m \times n m × n matrix A A A :
Subspace Definition Dimension Column space col ( A ) \operatorname{col}(A) col ( A ) Span of columns of A A A rank ( A ) \operatorname{rank}(A) rank ( A ) Row space row ( A ) \operatorname{row}(A) row ( A ) Span of rows of A A A rank ( A ) \operatorname{rank}(A) rank ( A ) Null space null ( A ) \operatorname{null}(A) null ( A ) { x : A x = 0 } \{\mathbf{x} : A\mathbf{x} = \mathbf{0}\} { x : A x = 0 } n − rank ( A ) n - \operatorname{rank}(A) n − rank ( A )
Rank-Nullity Theorem : rank ( A ) + nullity ( A ) = n \operatorname{rank}(A) + \operatorname{nullity}(A) = n rank ( A ) + nullity ( A ) = n .
7. Inner Product Spaces and Orthogonality ¶ An inner product on V V V generalises the dot product. The standard inner product on R n \mathbb{R}^n R n is:
⟨ u , v ⟩ = u T v \langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T \mathbf{v} ⟨ u , v ⟩ = u T v Gram–Schmidt Orthogonalisation ¶ Given a basis { v 1 , … , v k } \{\mathbf{v}_1, \ldots, \mathbf{v}_k\} { v 1 , … , v k } , produce an orthonormal basis { e 1 , … , e k } \{\mathbf{e}_1, \ldots, \mathbf{e}_k\} { e 1 , … , e k } :
u 1 = v 1 u j = v j − ∑ i = 1 j − 1 ⟨ v j , u i ⟩ ⟨ u i , u i ⟩ u i \begin{align}
\mathbf{u}_1 &= \mathbf{v}_1 \\
\mathbf{u}_j &= \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\langle \mathbf{v}_j, \mathbf{u}_i \rangle}{\langle \mathbf{u}_i, \mathbf{u}_i \rangle}\,\mathbf{u}_i
\end{align} u 1 u j = v 1 = v j − i = 1 ∑ j − 1 ⟨ u i , u i ⟩ ⟨ v j , u i ⟩ u i then e j = u j / ∥ u j ∥ \mathbf{e}_j = \mathbf{u}_j / \|\mathbf{u}_j\| e j = u j /∥ u j ∥ .
QR Decomposition ¶ Any m × n m \times n m × n matrix A A A (with m ≥ n m \geq n m ≥ n and full column rank) can be written as A = Q R A = QR A = QR where Q Q Q has orthonormal columns and R R R is upper triangular. This is useful for solving least-squares problems.
8. Singular Value Decomposition (SVD) ¶ Every m × n m \times n m × n real matrix A A A can be decomposed as:
A = U Σ V T A = U \Sigma V^T A = U Σ V T where:
U U U is m × m m \times m m × m orthogonal (left singular vectors)
Σ \Sigma Σ is m × n m \times n m × n diagonal with non-negative entries σ 1 ≥ σ 2 ≥ ⋯ ≥ 0 \sigma_1 \geq \sigma_2 \geq \cdots \geq 0 σ 1 ≥ σ 2 ≥ ⋯ ≥ 0 (singular values)
V V V is n × n n \times n n × n orthogonal (right singular vectors)
The SVD generalises the eigendecomposition to non-square matrices and is widely used for dimensionality reduction, pseudoinverse computation, and data compression.
The rank of A A A equals the number of nonzero singular values.
The pseudoinverse (Moore–Penrose) is A + = V Σ + U T A^+ = V \Sigma^+ U^T A + = V Σ + U T .
A function T : V → W T: V \to W T : V → W between vector spaces is a linear transformation if for all u , v ∈ V \mathbf{u}, \mathbf{v} \in V u , v ∈ V and scalar c c c :
T ( u + v ) = T ( u ) + T ( v ) T ( c u ) = c T ( u ) \begin{align}
T(\mathbf{u} + \mathbf{v}) &= T(\mathbf{u}) + T(\mathbf{v}) \\
T(c\,\mathbf{u}) &= c\,T(\mathbf{u})
\end{align} T ( u + v ) T ( c u ) = T ( u ) + T ( v ) = c T ( u ) Every linear transformation from R n \mathbb{R}^n R n to R m \mathbb{R}^m R m can be represented by an m × n m \times n m × n matrix A A A such that T ( x ) = A x T(\mathbf{x}) = A\mathbf{x} T ( x ) = A x .
9.1 Kernel and Image ¶ Kernel (null space): ker ( T ) = { v ∈ V : T ( v ) = 0 } \ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\} ker ( T ) = { v ∈ V : T ( v ) = 0 }
Image (range): im ( T ) = { T ( v ) : v ∈ V } \operatorname{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\} im ( T ) = { T ( v ) : v ∈ V }
Rank-Nullity Theorem for Transformations : dim ( ker T ) + dim ( im T ) = dim V \dim(\ker T) + \dim(\operatorname{im} T) = \dim V dim ( ker T ) + dim ( im T ) = dim V .
Transformation Matrix Rotation by θ \theta θ ( cos θ − sin θ sin θ cos θ ) \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} ( cos θ sin θ − sin θ cos θ ) Reflection across x x x -axis ( 1 0 0 − 1 ) \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} ( 1 0 0 − 1 ) Scaling by ( s x , s y ) (s_x, s_y) ( s x , s y ) ( s x 0 0 s y ) \begin{pmatrix} s_x & 0 \\ 0 & s_y \end{pmatrix} ( s x 0 0 s y ) Shear (horizontal) ( 1 k 0 1 ) \begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix} ( 1 0 k 1 ) Projection onto x x x -axis ( 1 0 0 0 ) \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} ( 1 0 0 0 )
9.3 Change of Basis ¶ If P P P is the change-of-basis matrix from basis B ′ \mathcal{B}' B ′ to B \mathcal{B} B , then the matrix of T T T in the new basis is:
[ T ] B ′ = P − 1 [ T ] B P [T]_{\mathcal{B}'} = P^{-1} [T]_{\mathcal{B}}\, P [ T ] B ′ = P − 1 [ T ] B P Two matrices are similar (A ∼ B A \sim B A ∼ B ) if B = P − 1 A P B = P^{-1}AP B = P − 1 A P for some invertible P P P . Similar matrices represent the same linear transformation in different bases and share the same eigenvalues, determinant, and trace.
10. Positive Definite Matrices ¶ A real symmetric n × n n \times n n × n matrix A A A is positive definite if:
x T A x > 0 for all x ≠ 0 \mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0} x T A x > 0 for all x = 0 Equivalent Conditions ¶ Any one of these implies the rest:
All eigenvalues of A A A are strictly positive
All leading principal minors are positive (Sylvester’s criterion)
There exists a matrix R R R with A = R T R A = R^T R A = R T R (Cholesky decomposition)
All pivots (in Gaussian elimination without row swaps) are positive
Positive semi-definite allows x T A x ≥ 0 \mathbf{x}^T A \mathbf{x} \geq 0 x T A x ≥ 0 and eigenvalues ≥ 0 \geq 0 ≥ 0 .
Cholesky Decomposition ¶ For positive definite A A A , there exists a unique lower-triangular matrix L L L with positive diagonal entries such that:
This is roughly twice as efficient as LU decomposition and is used extensively in numerical optimization, statistics, and simulation.
11. Matrix Norms and Condition Numbers ¶ 11.1 Vector Norms ¶ The general p p p -norm:
∥ x ∥ p = ( ∑ i = 1 n ∣ x i ∣ p ) 1 / p \|\mathbf{x}\|_p = \left(\sum_{i=1}^n |x_i|^p\right)^{1/p} ∥ x ∥ p = ( i = 1 ∑ n ∣ x i ∣ p ) 1/ p Special cases: ∥ x ∥ 1 \|\mathbf{x}\|_1 ∥ x ∥ 1 (Manhattan), ∥ x ∥ 2 \|\mathbf{x}\|_2 ∥ x ∥ 2 (Euclidean), ∥ x ∥ ∞ = max i ∣ x i ∣ \|\mathbf{x}\|_\infty = \max_i |x_i| ∥ x ∥ ∞ = max i ∣ x i ∣ (max norm).
11.2 Induced Matrix Norms ¶ The operator norm induced by a vector norm:
∥ A ∥ = max x ≠ 0 ∥ A x ∥ ∥ x ∥ \|A\| = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|} ∥ A ∥ = x = 0 max ∥ x ∥ ∥ A x ∥ ∥ A ∥ 2 = σ 1 \|A\|_2 = \sigma_1 ∥ A ∥ 2 = σ 1 (largest singular value)
∥ A ∥ F = ∑ i , j a i j 2 = tr ( A T A ) \|A\|_F = \sqrt{\sum_{i,j} a_{ij}^2} = \sqrt{\operatorname{tr}(A^T A)} ∥ A ∥ F = ∑ i , j a ij 2 = tr ( A T A ) (Frobenius norm)
11.3 Condition Number ¶ κ ( A ) = ∥ A ∥ ⋅ ∥ A − 1 ∥ = σ max σ min \kappa(A) = \|A\| \cdot \|A^{-1}\| = \frac{\sigma_{\max}}{\sigma_{\min}} κ ( A ) = ∥ A ∥ ⋅ ∥ A − 1 ∥ = σ m i n σ m a x A large condition number means the system A x = b A\mathbf{x} = \mathbf{b} A x = b is ill-conditioned : small perturbations in b \mathbf{b} b cause large changes in x \mathbf{x} x .
12. Applications ¶ 12.1 Least Squares ¶ Given an overdetermined system A x ≈ b A\mathbf{x} \approx \mathbf{b} A x ≈ b (m > n m > n m > n ), the least-squares solution minimises ∥ A x − b ∥ 2 2 \|A\mathbf{x} - \mathbf{b}\|_2^2 ∥ A x − b ∥ 2 2 :
x ^ = ( A T A ) − 1 A T b \hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b} x ^ = ( A T A ) − 1 A T b This is the foundation of linear regression . Numerically, it is better solved via QR or SVD.
12.2 Principal Component Analysis (PCA) ¶ Given data matrix X X X (n n n samples, p p p features), centred so each column has mean zero:
Compute the covariance matrix C = 1 n − 1 X T X C = \frac{1}{n-1} X^T X C = n − 1 1 X T X
Find eigenvalues λ 1 ≥ ⋯ ≥ λ p \lambda_1 \geq \cdots \geq \lambda_p λ 1 ≥ ⋯ ≥ λ p and eigenvectors of C C C
Project data onto the top k k k eigenvectors for dimensionality reduction
The fraction of variance captured by the first k k k components is ∑ i = 1 k λ i / ∑ i = 1 p λ i \sum_{i=1}^k \lambda_i \big/ \sum_{i=1}^p \lambda_i ∑ i = 1 k λ i / ∑ i = 1 p λ i .
12.3 Markov Chains ¶ A Markov chain with n n n states is described by a transition matrix P P P where P i j P_{ij} P ij is the probability of moving from state i i i to state j j j . The columns (or rows, by convention) sum to 1.
The steady-state distribution π \boldsymbol{\pi} π satisfies P T π = π P^T \boldsymbol{\pi} = \boldsymbol{\pi} P T π = π , i.e., it is an eigenvector of P T P^T P T for eigenvalue 1.
12.4 Graph Theory ¶ The adjacency matrix A A A of a graph with n n n vertices has A i j = 1 A_{ij} = 1 A ij = 1 if vertices i i i and j j j are connected. The Laplacian L = D − A L = D - A L = D − A (where D D D is the diagonal degree matrix) encodes spectral properties: