Jordan Santell immersive engineer

Matrix Transformations

Vertex positions are three-dimensional vectors with xyzxyz values. Operations can be performed on these positions, like translate, rotate, or scale, and result in a new, transformed position.

A single 4x4 matrix can encode this sequence of operations, and can be multiplied by a vector to apply its transformation.

Applying transformation MM to vector vv, results in a new vector, vv^{\prime}:

v=Mvv^{\prime} = M \cdot v

While the vectors represent 3D positions conceptually, when undergoing transformation, they're represented as having four dimensions; xyzxyz values representing a point in 3D space, and an extra ww value that is set to 1 (see Homogeneous Coordinates).

[xyzw]=M[xyz1]\begin{bmatrix} x^{\prime} \\ y^{\prime} \\ z^{\prime} \\ w^{\prime} \end{bmatrix} = M \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix}

Matrices only encode relative transformations, so the same matrix can be applied to multiple vertices. For example, given a matrix TT that translates a vector by (5,0,0)(5, 0, 0) will add 5 to each vector's xx value.

[6111]=T[1111]\begin{bmatrix} 6 \\ 1 \\ 1 \\ 1 \end{bmatrix} = T \cdot \begin{bmatrix} 1 \\ 1\\ 1 \\ 1 \end{bmatrix} [7221]=T[2221]\begin{bmatrix} 7 \\ 2 \\ 2 \\ 1 \end{bmatrix} = T \cdot \begin{bmatrix} 2 \\ 2 \\ 2 \\ 1 \end{bmatrix}

Multiplying two transformation matrices together results in a new matrix that encodes both transformations in order. This allows a series of operations to be chained together, defining the sequence of transformations to be performed on a vector. For example, a vector vv can be transformed by a scale matrix SS, followed by a rotation matrix RR, and finally a translation matrix TT. Note that matrix multiplication is performed from right to left.

v=TRSvv^{\prime} = T \cdot R \cdot S \cdot v

Encoding multiple operations as a single matrix is a powerful consequence of representing transformations as matrices. The above SS, RR, and TT transformations can be multiplied into a single matrix MM. In this example, if applying this transform to many vertices, MM could be cached, needing only one multiplication per vertex, rather than three.

M=TRSv=Mv\begin{aligned} M &= T \cdot R \cdot S \\ v^{\prime} &= M \cdot v \\ \end{aligned}

Given a series of vertices representing a cube, and a matrix representing a scale, rotation and translation, each vertex can be multiplied by the same matrix.

Applying a scale, rotation and translation matrix to a collection of vertices is the foundation of the model matrix.

Identity Matrix

There exists an identity matrix II.

I=[1000010000100001]I = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Any transformation matrix MM, when multiplied by the identity matrix II, results in MM.

M=IMM = I \cdot M

Inverse Matrix

Multiplying a transform matrix by its own inverse matrix results in the identity matrix. Inverse, or reciprocal, matrices are often denoted as M1M^{-1}.

MM1=IM \cdot M^{-1} = I

Inverse matrices are used to change points to be relative to a new frame of reference. Given two transformation matrices defined in the same coordinate space, a transform can then be defined relative to another transform.

For example, given two transformations defined in some shared "world" space, TT can be redefined relative to MM via multiplying by the inverse:

TM=Mworld1TworldT_{M} = M_{world}^{-1} \cdot T_{world}

Column-major, row-major

Matrices are represented as arrays in OpenGL. There are two reasonable ways of interpreting these matrices in an array/1D data structure: column-major and row-major.

[m11m12m13m14m21m22m23m24m31m32m33m34m41m42m43m44]\begin{bmatrix} m11 & m12 & m13 & m14 \\ m21 & m22 & m23 & m24 \\ m31 & m32 & m33 & m34 \\ m41 & m42 & m43 & m44 \\ \end{bmatrix}

Interpreting the above matrix as column-major when storing as an array:

// column-major
[ m11, m21, m31, m41,
  m12, m22, m32, m42,
  m13, m23, m33, m43,
  m14, m24, m34, m44 ]

And as row-major:

// row-major
[ m11, m12, m13, m14,
  m21, m22, m23, m24,
  m31, m32, m33, m434,
  m41, m42, m43, m44 ]

Note that OpenGL conventions use column-major order and post-multiplication. Opting to instead premultiply, row-major order could be used.

Homogeneous Coordinates

The initial 3D vector is represented as a 4D vector with w=1w = 1 because affine transformations use homogeneous coordinates, requiring one extra dimension. Transform matrices allow representing a series of transforms with a single matrix, which is not possible if using a vertex's 3D Cartesian coordinates for some transformations, namely perspective.

Transformation Reference

Language constructs and math libraries should be preferred. Matrices listed here as reference.

Note that the translation, rotation and scale matrices below become the identity matrix when translating by (0,0,0)(0, 0, 0), rotating by 00, or scaling by (1,1,1)(1, 1, 1) respectively, essentially becoming noops.

Translation

The following matrix TT translates a vector by (x,y,z)(x, y, z).

T=[100x010y001z0001]T = \begin{bmatrix} 1 & 0 & 0 & x \\ 0 & 1 & 0 & y \\ 0 & 0 & 1 & z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Rotation

The matrices RxR_{x}, RyR_{y}, and RzR_{z} rotate around the xx, yy, and zz axes respectively by θ\theta radians. There exists other ways of generating rotation matrices, like from a quaternion, or by any axis, which may be more appropriate.

Rx=[10000cosθ(sinθ)00sinθcosθ00001]R_{x} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -(\sin \theta) & 0 \\ 0 & \sin \theta & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} Ry=[cosθ0sinθ00100(sinθ)0cosθ00001]R_{y} = \begin{bmatrix} \cos \theta & 0 & \sin \theta & 0 \\ 0 & 1 & 0 & 0 \\ -(\sin \theta) & 0 & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} Rz=[cosθ(sinθ)00sinθcosθ0000100001]R_{z} = \begin{bmatrix} \cos \theta & -(\sin \theta) & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Scale

The following matrix SS scales a vector by (x,y,z)(x, y, z).

S=[x0000y0000z00001]S = \begin{bmatrix} x & 0 & 0 & 0 \\ 0 & y & 0 & 0 \\ 0 & 0 & z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Perspective Projection

Projection matrices are defined by their left (ll), right (rr), top (tt), bottom (bb) extents and their near (nn) and far (ff) plane values. 3D Projection covers these matrices in more detail.

PpP_{p} represents the general form of perspective projection.

Pp=[2nrl0r+lrl002ntbt+btb000f+nnf2fnnf0010]P_{p} = \begin{bmatrix} \dfrac{2n}{r - l} & 0 & \dfrac{r + l}{r - l} & 0 \\ 0 & \dfrac{2n}{t - b} & \dfrac{t + b}{t - b} & 0 \\ 0 & 0 & \dfrac{f + n}{n - f} & \dfrac{2fn}{n - f} \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}

A simplified form, PpsP_{ps} can be used for symmetric projections, where r=lr = -l and t=bt = -b.

Pps=[nr0000nt0000f+nnf2fnnf0010]P_{ps} = \begin{bmatrix} \dfrac{n}{r} & 0 & 0 & 0 \\ 0 & \dfrac{n}{t} & 0 & 0 \\ 0 & 0 & \dfrac{f + n}{n - f} & \dfrac{2fn}{n - f} \\ 0 & 0 & -1 & 0 \\ \end{bmatrix}

Orthographic projection

PoP_{o} represents the general form of orthographic projection.

Po=[2rl00r+lrl02tb0t+btb002fnf+nfn0001]P_{o} = \begin{bmatrix} \dfrac{2}{r - l} & 0 & 0 & -\dfrac{r + l}{r - l} \\ 0 & \dfrac{2}{t - b} & 0 & -\dfrac{t + b}{t - b} \\ 0 & 0 & \dfrac{-2}{f - n} & -\dfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

A simplified form, PosP_{os} can be used for symmetric projections, where r=lr = -l and t=bt = -b.

Pos=[1r00001t00002fnf+nfn0001]P_{os} = \begin{bmatrix} \dfrac{1}{r} & 0 & 0 & 0 \\ 0 & \dfrac{1}{t} & 0 & 0 \\ 0 & 0 & \dfrac{-2}{f - n} & -\dfrac{f + n}{f - n} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Resources & References