## Jordan Santell

open web engineer

# Matrix Transformations

Notes published

Vertex positions are three-dimensional vectors with $xyz$ values. Operations can be performed on these positions, like translate, rotate, or scale, and result in a new, transformed position.

A single 4x4 matrix can encode this sequence of operations, and can be multiplied by a vector to apply its transformation.

Applying transformation $M$ to vector $v$, results in a new vector, $v^{\prime}$:

$v^{\prime} = M \cdot v$

While the vectors represent 3D positions conceptually, when undergoing transformation, they're represented as having four dimensions; $xyz$ values representing a point in 3D space, and an extra $w$ value that is set to 1 (see Homogeneous Coordinates).

$\begin{bmatrix} x^{\prime} \\ y^{\prime} \\ z^{\prime} \\ w^{\prime} \end{bmatrix} = M \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix}$

Matrices only encode relative transformations, so the same matrix can be applied to multiple vertices. For example, given a matrix $T$ that translates a vector by $(5, 0, 0)$ will add 5 to each vector's $x$ value.

$\begin{bmatrix} 6 \\ 1 \\ 1 \\ 1 \end{bmatrix} = T \cdot \begin{bmatrix} 1 \\ 1\\ 1 \\ 1 \end{bmatrix}$ $\begin{bmatrix} 7 \\ 2 \\ 2 \\ 1 \end{bmatrix} = T \cdot \begin{bmatrix} 2 \\ 2 \\ 2 \\ 1 \end{bmatrix}$

Multiplying two transformation matrices together results in a new matrix that encodes both transformations in order. This allows a series of operations to be chained together, defining the sequence of transformations to be performed on a vector. For example, a vector $v$ can be transformed by a scale matrix $S$, followed by a rotation matrix $R$, and finally a translation matrix $T$. Note that matrix multiplication is performed from right to left.

$v^{\prime} = T \cdot R \cdot S \cdot v$

Encoding multiple operations as a single matrix is a powerful consequence of representing transformations as matrices. The above $S$, $R$, and $T$ transformations can be multiplied into a single matrix $M$. In this example, if applying this transform to many vertices, $M$ could be cached, needing only one multiplication per vertex, rather than three.

$M = T \cdot R \cdot S$ $v^{\prime} = M \cdot v$

Given a series of vertices representing a cube, and a matrix representing a scale, rotation and translation, each vertex can be multiplied by the same matrix.

Applying a scale, rotation and translation matrix to a collection of vertices is the foundation of the model matrix.

## Identity Matrix

There exists an identity matrix $I$.

$I = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Any transformation matrix $M$, when multiplied by the identity matrix $I$, results in $M$.

$M = I \cdot M$

## Inverse Matrix

Multiplying a transform matrix by its own inverse matrix results in the identity matrix. Inverse, or reciprocal, matrices are often denoted as $M^{-1}$.

$M \cdot M^{-1} = I$

Inverse matrices are used to change points to be relative to a new frame of reference. Given two transformation matrices defined in the same coordinate space, a transform can then be defined relative to another transform.

For example, given two transformations defined in some shared "world" space, $T$ can be redefined relative to $M$ via multiplying by the inverse:

$T_{M} = M_{world}^{-1} \cdot T_{world}$

## Column-major, row-major

Matrices are represented as arrays in OpenGL. There are two reasonable ways of interpreting these matrices in an array/1D data structure: column-major and row-major.

$\begin{bmatrix} m11 & m12 & m13 & m14 \\ m21 & m22 & m23 & m24 \\ m31 & m32 & m33 & m34 \\ m41 & m42 & m43 & m44 \\ \end{bmatrix}$

Interpreting the above matrix as column-major when storing as an array:

// column-major
[ m11, m21, m31, m41,
m12, m22, m32, m42,
m13, m23, m33, m43,
m14, m24, m34, m44 ]


And as row-major:

// row-major
[ m11, m12, m13, m14,
m21, m22, m23, m24,
m31, m32, m33, m434,
m41, m42, m43, m44 ]


Note that OpenGL conventions use column-major order and post-multiplication. Opting to instead premultiply, row-major order could be used.

## Homogeneous Coordinates

The initial 3D vector is represented as a 4D vector with $w = 1$ because affine transformations use homogeneous coordinates, requiring one extra dimension. Transform matrices allow representing a series of transforms with a single matrix, which is not possible if using a vertex's 3D cartesian coordinates for some transformations, namely perspective.

## Transformation Reference

Language constructs and math libraries should be preferred. Matrices listed here as reference.

Note that the translation, rotation and scale matrices below become the identity matrix when translating by $(0, 0, 0)$, rotating by $0$, or scaling by $(1, 1, 1)$ respectively, essentially becoming noops.

### Translation

The following matrix $T$ translates a vector by $(x, y, z)$.

$T = \begin{bmatrix} 1 & 0 & 0 & x \\ 0 & 1 & 0 & y \\ 0 & 0 & 1 & z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

### Rotation

The matrices $R_{x}$, $R_{y}$, and $R_{z}$ rotate around the $x$, $y$, and $z$ axes respectively by $\theta$ radians. There exists other ways of generating rotation matrices, like from a quaternion, or by any axis, which may be more appropriate.

$R_{x} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -(\sin \theta) & 0 \\ 0 & \sin \theta & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$ $R_{y} = \begin{bmatrix} \cos \theta & 0 & \sin \theta & 0 \\ 0 & 1 & 0 & 0 \\ -(\sin \theta) & 0 & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$ $R_{z} = \begin{bmatrix} \cos \theta & -(\sin \theta) & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

### Scale

The following matrix $S$ scales a vector by $(x, y, z)$.

$S = \begin{bmatrix} x & 0 & 0 & 0 \\ 0 & y & 0 & 0 \\ 0 & 0 & z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

### Perspective Projection

Projection matrices are defined by their left ($l$), right ($r$), top ($t$), bottom ($b$) extents and their near ($n$) and far ($f$) plane values. 3D Projection covers these matrices in more detail.

$P_{p}$ represents the general form of perspective projection.

$P_{p} = \begin{bmatrix} \dfrac{2n}{r - l} & 0 & \dfrac{r + l}{r - l} & 0 \\ 0 & \dfrac{2n}{t - b} & \dfrac{t + b}{t - b} & 0 \\ 0 & 0 & \dfrac{f + n}{n - f} & \dfrac{2fn}{n - f} \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}$

A simplified form, $P_{ps}$ can be used for symmetric projections, where $r = -l$ and $t = -b$.

$P_{ps} = \begin{bmatrix} \dfrac{n}{r} & 0 & 0 & 0 \\ 0 & \dfrac{n}{t} & 0 & 0 \\ 0 & 0 & \dfrac{f + n}{n - f} & \dfrac{2fn}{n - f} \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}$

### Orthographic projection

$P_{o}$ represents the general form of orthographic projection.

$P_{o} = \begin{bmatrix} \dfrac{2}{r - l} & 0 & 0 & -\dfrac{r + l}{r - l} \\ 0 & \dfrac{2}{t - b} & 0 & -\dfrac{t + b}{t - b} \\ 0 & 0 & \dfrac{-2}{f - n} & -\dfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

A simplified form, $P_{os}$ can be used for symmetric projections, where $r = -l$ and $t = -b$.

$P_{os} = \begin{bmatrix} \dfrac{1}{r} & 0 & 0 & 0 \\ 0 & \dfrac{1}{t} & 0 & 0 \\ 0 & 0 & \dfrac{-2}{f - n} & -\dfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$