Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Linear algebra is the branch of mathematics concerned with vector spaces, linear mappings, and systems of linear equations. It is foundational to virtually every area of applied mathematics, data science, engineering, and physics.


1. Vectors

A vector vRn\mathbf{v} \in \mathbb{R}^n is an ordered nn-tuple of real numbers.

1.1 Operations

For u,vRn\mathbf{u}, \mathbf{v} \in \mathbb{R}^n and scalar cc:

u+v=(u1+v1,  ,  un+vn)cv=(cv1,  ,  cvn)\begin{align} \mathbf{u} + \mathbf{v} &= (u_1+v_1,\; \ldots,\; u_n+v_n) \\ c\,\mathbf{v} &= (cv_1,\; \ldots,\; cv_n) \end{align}

Dot product:

uv=i=1nuivi=uvcosθ\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^n u_i v_i = \|\mathbf{u}\|\,\|\mathbf{v}\|\cos\theta

where θ\theta is the angle between the vectors.

Euclidean norm:

v=vv=v12++vn2\|\mathbf{v}\| = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{v_1^2 + \cdots + v_n^2}

Cross product (only in R3\mathbb{R}^3):

u×v=ijku1u2u3v1v2v3\mathbf{u} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{vmatrix}

The magnitude u×v=uvsinθ\|\mathbf{u} \times \mathbf{v}\| = \|\mathbf{u}\|\,\|\mathbf{v}\|\sin\theta, equal to the area of the parallelogram spanned by u\mathbf{u} and v\mathbf{v}.

1.2 Linear Independence

Vectors {v1,,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} are linearly independent if the only solution to

c1v1+c2v2++ckvk=0c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k = \mathbf{0}

is c1=c2==ck=0c_1 = c_2 = \cdots = c_k = 0. Otherwise they are linearly dependent.

The span of a set of vectors is the set of all linear combinations. A basis of a vector space is a linearly independent spanning set; its cardinality is the dimension of the space.


2. Matrices

An m×nm \times n matrix AA has mm rows and nn columns:

A=(a11a12a1na21a22a2nam1am2amn)A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}

2.1 Basic Operations

  • Addition: (A+B)ij=aij+bij(A + B)_{ij} = a_{ij} + b_{ij} (same dimensions required)

  • Scalar multiplication: (cA)ij=caij(cA)_{ij} = c\,a_{ij}

  • Transpose: (AT)ij=aji(A^T)_{ij} = a_{ji}

  • Matrix multiplication: (AB)ij=k=1naikbkj(AB)_{ij} = \displaystyle\sum_{k=1}^n a_{ik}b_{kj} (requires AA is m×nm\times n, BB is n×pn\times p)

2.2 Special Matrices

NameProperty
Identity IIAI=IA=AAI = IA = A
SymmetricAT=AA^T = A
Skew-symmetricAT=AA^T = -A
OrthogonalATA=IA^T A = I (preserves lengths and angles)
Diagonalaij=0a_{ij} = 0 for iji \neq j
TriangularAll entries above (lower) or below (upper) diagonal are zero

3. Determinants

The determinant det(A)\det(A) (or A|A|) is a scalar associated with a square matrix.

2×22 \times 2 case:

det(abcd)=adbc\det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc

3×33 \times 3 case (cofactor expansion along the first row):

det(A)=a11(a22a33a23a32)a12(a21a33a23a31)+a13(a21a32a22a31)\det(A) = a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{22}a_{31})

Key properties

  • det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)

  • det(AT)=det(A)\det(A^T) = \det(A)

  • det(A1)=1/det(A)\det(A^{-1}) = 1/\det(A) (when AA is invertible)

  • Swapping two rows negates the determinant

  • If any row is a linear combination of the others, det(A)=0\det(A) = 0

A matrix AA is invertible (nonsingular) if and only if det(A)0\det(A) \neq 0. The inverse is then:

A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A)

where adj(A)\operatorname{adj}(A) is the transpose of the cofactor matrix.


4. Systems of Linear Equations

A system of mm equations in nn unknowns can be written as Ax=bA\mathbf{x} = \mathbf{b}.

4.1 Gaussian Elimination

Form the augmented matrix [Ab][A \mid \mathbf{b}] and reduce to row-echelon form using:

  1. Swap two rows

  2. Multiply a row by a nonzero scalar

  3. Add a multiple of one row to another

Continue to reduced row-echelon form (RREF) for a unique representation.

4.2 Solution Types

  • Unique solution: rank(A)=rank([Ab])=n\operatorname{rank}(A) = \operatorname{rank}([A|\mathbf{b}]) = n

  • Infinitely many solutions: rank(A)=rank([Ab])<n\operatorname{rank}(A) = \operatorname{rank}([A|\mathbf{b}]) < n

  • No solution: rank(A)<rank([Ab])\operatorname{rank}(A) < \operatorname{rank}([A|\mathbf{b}])

4.3 Cramer’s Rule (n×nn \times n, det(A)0\det(A) \neq 0)

xi=det(Ai)det(A)x_i = \frac{\det(A_i)}{\det(A)}

where AiA_i is AA with its ii-th column replaced by b\mathbf{b}.

4.4 LU Decomposition

For an n×nn \times n matrix: A=LUA = LU where LL is lower triangular with unit diagonal and UU is upper triangular. Solve by forward substitution (Ly=bL\mathbf{y} = \mathbf{b}) then back substitution (Ux=yU\mathbf{x} = \mathbf{y}). This is the basis for most numerical linear solvers.


5. Eigenvalues and Eigenvectors

A scalar λ\lambda and nonzero vector v\mathbf{v} satisfy the eigenvector equation:

Av=λvA\mathbf{v} = \lambda\mathbf{v}

λ\lambda is an eigenvalue of AA and v\mathbf{v} is the corresponding eigenvector.

5.1 Characteristic Equation

Eigenvalues are found by solving:

det(AλI)=0\det(A - \lambda I) = 0

The left side is the characteristic polynomial of degree nn, so an n×nn \times n matrix has exactly nn eigenvalues (counted with multiplicity, possibly complex).

5.2 Properties

  • tr(A)=i=1nλi\operatorname{tr}(A) = \displaystyle\sum_{i=1}^n \lambda_i

  • det(A)=i=1nλi\det(A) = \displaystyle\prod_{i=1}^n \lambda_i

  • If AA is real symmetric, all eigenvalues are real and eigenvectors of distinct eigenvalues are orthogonal

5.3 Diagonalization

If AA has nn linearly independent eigenvectors forming the columns of PP:

A=PΛP1A = P \Lambda P^{-1}

where Λ=diag(λ1,,λn)\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n). This gives Ak=PΛkP1A^k = P \Lambda^k P^{-1}, which is efficient for large powers.


6. Vector Spaces and Subspaces

A vector space over R\mathbb{R} is a set VV closed under addition and scalar multiplication and satisfying the eight axioms (associativity, commutativity, identity, inverses, distributivity).

Important subspaces associated with an m×nm \times n matrix AA:

SubspaceDefinitionDimension
Column space col(A)\operatorname{col}(A)Span of columns of AArank(A)\operatorname{rank}(A)
Row space row(A)\operatorname{row}(A)Span of rows of AArank(A)\operatorname{rank}(A)
Null space null(A)\operatorname{null}(A){x:Ax=0}\{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}nrank(A)n - \operatorname{rank}(A)

Rank-Nullity Theorem: rank(A)+nullity(A)=n\operatorname{rank}(A) + \operatorname{nullity}(A) = n.


7. Inner Product Spaces and Orthogonality

An inner product on VV generalises the dot product. The standard inner product on Rn\mathbb{R}^n is:

u,v=uTv\langle \mathbf{u}, \mathbf{v} \rangle = \mathbf{u}^T \mathbf{v}

Gram–Schmidt Orthogonalisation

Given a basis {v1,,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\}, produce an orthonormal basis {e1,,ek}\{\mathbf{e}_1, \ldots, \mathbf{e}_k\}:

u1=v1uj=vji=1j1vj,uiui,uiui\begin{align} \mathbf{u}_1 &= \mathbf{v}_1 \\ \mathbf{u}_j &= \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\langle \mathbf{v}_j, \mathbf{u}_i \rangle}{\langle \mathbf{u}_i, \mathbf{u}_i \rangle}\,\mathbf{u}_i \end{align}

then ej=uj/uj\mathbf{e}_j = \mathbf{u}_j / \|\mathbf{u}_j\|.

QR Decomposition

Any m×nm \times n matrix AA (with mnm \geq n and full column rank) can be written as A=QRA = QR where QQ has orthonormal columns and RR is upper triangular. This is useful for solving least-squares problems.


8. Singular Value Decomposition (SVD)

Every m×nm \times n real matrix AA can be decomposed as:

A=UΣVTA = U \Sigma V^T

where:

  • UU is m×mm \times m orthogonal (left singular vectors)

  • Σ\Sigma is m×nm \times n diagonal with non-negative entries σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0 (singular values)

  • VV is n×nn \times n orthogonal (right singular vectors)

The SVD generalises the eigendecomposition to non-square matrices and is widely used for dimensionality reduction, pseudoinverse computation, and data compression.

The rank of AA equals the number of nonzero singular values. The pseudoinverse (Moore–Penrose) is A+=VΣ+UTA^+ = V \Sigma^+ U^T.


9. Linear Transformations

A function T:VWT: V \to W between vector spaces is a linear transformation if for all u,vV\mathbf{u}, \mathbf{v} \in V and scalar cc:

T(u+v)=T(u)+T(v)T(cu)=cT(u)\begin{align} T(\mathbf{u} + \mathbf{v}) &= T(\mathbf{u}) + T(\mathbf{v}) \\ T(c\,\mathbf{u}) &= c\,T(\mathbf{u}) \end{align}

Every linear transformation from Rn\mathbb{R}^n to Rm\mathbb{R}^m can be represented by an m×nm \times n matrix AA such that T(x)=AxT(\mathbf{x}) = A\mathbf{x}.

9.1 Kernel and Image

  • Kernel (null space): ker(T)={vV:T(v)=0}\ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\}

  • Image (range): im(T)={T(v):vV}\operatorname{im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\}

Rank-Nullity Theorem for Transformations: dim(kerT)+dim(imT)=dimV\dim(\ker T) + \dim(\operatorname{im} T) = \dim V.

9.2 Common Transformations in R2\mathbb{R}^2

TransformationMatrix
Rotation by θ\theta(cosθsinθsinθcosθ)\begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}
Reflection across xx-axis(1001)\begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}
Scaling by (sx,sy)(s_x, s_y)(sx00sy)\begin{pmatrix} s_x & 0 \\ 0 & s_y \end{pmatrix}
Shear (horizontal)(1k01)\begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix}
Projection onto xx-axis(1000)\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}

9.3 Change of Basis

If PP is the change-of-basis matrix from basis B\mathcal{B}' to B\mathcal{B}, then the matrix of TT in the new basis is:

[T]B=P1[T]BP[T]_{\mathcal{B}'} = P^{-1} [T]_{\mathcal{B}}\, P

Two matrices are similar (ABA \sim B) if B=P1APB = P^{-1}AP for some invertible PP. Similar matrices represent the same linear transformation in different bases and share the same eigenvalues, determinant, and trace.


10. Positive Definite Matrices

A real symmetric n×nn \times n matrix AA is positive definite if:

xTAx>0for all x0\mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0}

Equivalent Conditions

Any one of these implies the rest:

  • All eigenvalues of AA are strictly positive

  • All leading principal minors are positive (Sylvester’s criterion)

  • There exists a matrix RR with A=RTRA = R^T R (Cholesky decomposition)

  • All pivots (in Gaussian elimination without row swaps) are positive

Positive semi-definite allows xTAx0\mathbf{x}^T A \mathbf{x} \geq 0 and eigenvalues 0\geq 0.

Cholesky Decomposition

For positive definite AA, there exists a unique lower-triangular matrix LL with positive diagonal entries such that:

A=LLTA = LL^T

This is roughly twice as efficient as LU decomposition and is used extensively in numerical optimization, statistics, and simulation.


11. Matrix Norms and Condition Numbers

11.1 Vector Norms

The general pp-norm:

xp=(i=1nxip)1/p\|\mathbf{x}\|_p = \left(\sum_{i=1}^n |x_i|^p\right)^{1/p}

Special cases: x1\|\mathbf{x}\|_1 (Manhattan), x2\|\mathbf{x}\|_2 (Euclidean), x=maxixi\|\mathbf{x}\|_\infty = \max_i |x_i| (max norm).

11.2 Induced Matrix Norms

The operator norm induced by a vector norm:

A=maxx0Axx\|A\| = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|}
  • A2=σ1\|A\|_2 = \sigma_1 (largest singular value)

  • AF=i,jaij2=tr(ATA)\|A\|_F = \sqrt{\sum_{i,j} a_{ij}^2} = \sqrt{\operatorname{tr}(A^T A)} (Frobenius norm)

11.3 Condition Number

κ(A)=AA1=σmaxσmin\kappa(A) = \|A\| \cdot \|A^{-1}\| = \frac{\sigma_{\max}}{\sigma_{\min}}

A large condition number means the system Ax=bA\mathbf{x} = \mathbf{b} is ill-conditioned: small perturbations in b\mathbf{b} cause large changes in x\mathbf{x}.


12. Applications

12.1 Least Squares

Given an overdetermined system AxbA\mathbf{x} \approx \mathbf{b} (m>nm > n), the least-squares solution minimises Axb22\|A\mathbf{x} - \mathbf{b}\|_2^2:

x^=(ATA)1ATb\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}

This is the foundation of linear regression. Numerically, it is better solved via QR or SVD.

12.2 Principal Component Analysis (PCA)

Given data matrix XX (nn samples, pp features), centred so each column has mean zero:

  1. Compute the covariance matrix C=1n1XTXC = \frac{1}{n-1} X^T X

  2. Find eigenvalues λ1λp\lambda_1 \geq \cdots \geq \lambda_p and eigenvectors of CC

  3. Project data onto the top kk eigenvectors for dimensionality reduction

The fraction of variance captured by the first kk components is i=1kλi/i=1pλi\sum_{i=1}^k \lambda_i \big/ \sum_{i=1}^p \lambda_i.

12.3 Markov Chains

A Markov chain with nn states is described by a transition matrix PP where PijP_{ij} is the probability of moving from state ii to state jj. The columns (or rows, by convention) sum to 1.

The steady-state distribution π\boldsymbol{\pi} satisfies PTπ=πP^T \boldsymbol{\pi} = \boldsymbol{\pi}, i.e., it is an eigenvector of PTP^T for eigenvalue 1.

12.4 Graph Theory

The adjacency matrix AA of a graph with nn vertices has Aij=1A_{ij} = 1 if vertices ii and jj are connected. The Laplacian L=DAL = D - A (where DD is the diagonal degree matrix) encodes spectral properties:

  • The number of zero eigenvalues of LL equals the number of connected components

  • The second-smallest eigenvalue (Fiedler value) measures connectivity and is used in spectral clustering