3D Projection
In 3D graphics, objects are rendered from some viewer's position and displayed on a flat screen, like a phone or laptop. Projection describes the transformation of a three-dimensional point into a two-dimensional point. This transformation can be represented by a projection matrix, which may encode both perspective, like a camera's focal length, as well as the transformation to normalized device coordinates (NDC).
Projection matrices are one of the more confusing parts of the GL pipeline, are notoriously difficult to debug, and can be parameterized in several different ways. The following fundamentals and equations attempt to clarify the process and provide reference for common projection tasks and conversions.
Projection Transformation
The two most common types of projection are orthographic and perspective projection. Axonometric (isometric) projections are common in games as well.
Orthographic projections do not visualize depth, and are often used for schematics, architectural drawings, and 3D software when lining up vertices. As there is no applied perspective, lines can be absolutely measured and compared.
Perspective projection, however, accounts for depth in a way that simulates how humans perceive the world. Objects that are further away appear smaller, resulting in roughly a single vanishing point in the center of our vision.
Whatever type of projection is used, the end result is a 4D homogeneous coordinate in clip space; in the OpenGL pipeline, this value is then divided by , becoming a 3D vector in normalized device coordinates, and any vertex outside of the to range gets clipped.
Viewing Frustum
A camera abstraction in a 3D engine has an area of space that is visible, described as a viewing volume in a cuboid shape for orthographic projections, or a frustum for perspective projections. The human visual system, although a series of lies and magic, has a viewing volume that includes 180° horizontally and 90° vertically, and extends essentially an infinite amount. After all, we can see V762 Cas in Cassiopeia, 16,308 light-years away! Cameras in 3D engines are much more constrained.
A camera's frustum can be thought of as 6 planes, and any objects between those planes are visible and within the camera's field of view. Frustums are generally defined in terms of the near and far planes' distance from the camera on the Z
axis, and how far the frustum extends on the near plane to the left, right, top and bottom from the Z
axis. The near plane is the 2D plane that the rendered image will be projected upon.
Perspective projection
With the six extent values (near, far, left, right, top, bottom), a perspective projection matrix can be created:
Most 3D engines or libraries will have a function that creates a perspective matrix from these values, like glFrustum or three.js's Matrix4#makePerspective,
These values are in world units; the near and far values are absolute distances from the camera's forward axis, and the extents are the relative position between the camera's focal point on the camera's forward axis on the near plane, and the extent.
The following figures illustrate the context of the extent values, and how they can be used with trigonometry to measure any length or angle.
Projection Symmetry
Note that the simulation and images so far have been symmetric projections. The symmetric frustums' extents are symmetrical both vertically and horizontally around the Z axis at the near plane, such that and . Symmetric projections are common in 3D renderings, although asymmetric projections can be used in stereoscopic VR rendering, augmented reality platforms, or immersive installations.
A simplified form of the perspective projection matrix can be used for symmetric projections, where and :
Parameterization
Defining a perspective projection in terms of its frustum extents is just one option. Projections can be defined via aspect ratio, field of view, focal length, or other parameters, depending on background or purpose.
Field of view
Perhaps more commonly, perspective cameras are defined by a vertical field of view and the projection screen's aspect ratio, as well as the near and far plane values. This parameterization is (subjectively) more human-understandable: aspect ratio usually must be configurable to work across different screen resolutions, and the field of view is more intuitive than frustum extents.
Referencing Figure 1 above and using some trigonometry, the vertical field of view and aspect ratio can be converted to frustum extents, or used directly in the creation of the matrix. This assumes a symmetric projection.
let top = near * Math.tan(fov / 2);
let bottom = -top;
let right = aspect * top;
let left = -right;
above can be thought of as the focal length. While rendering doesn't quite have the same idea as a focal length, Eric Lengyel shared some matrix tricks at GDC 2007 to simulate the parameterization. Paul Bourke's brief note, "Field of view and focal length" sketches out the relationship between the two as well.
Camera intrinsics
If working with OpenCV or augmented reality platforms (ARCore, ARKit), controlling projections via camera intrinsics may be necessary.
Where and are the horizontal and vertical focal lengths in pixels, an often unused for skew, and and representing the principal point, or the horizontal and vertical offset from the bottom-left in pixels, which for symmetric projections results in and .
Koshy George shared a specialized form of representing camera intrinsics in OpenGL, for symmetric projections that have adjustable near/far planes:
George's solution derives from Kyle Simek's excellent and detailed series on camera calibration and OpenGL, where more background and a generalized form is described.
Framing
Sometimes it's desirable to change the position of the camera such that some object is framed relatively to the viewport. Unlike the very specific dolly zoom example above, the field of view is most likely a fixed size.
For example, a lot of thought went into creating the framing rules used in model-viewer. We wanted an arbitrarily-sized model to look great inside of an arbitrarily sized viewport. To ensure good "framing", the model is placed inside of a "room" representing the camera frustum that maximizes the model's size given the current aspect ratio. The camera's near plane "frames" the room's forward plane.
Given a static vertical field of view, and the height of the frame in world units, the corresponding camera's position can be calculated via similar triangles, using values from Figure 1 above. Using half of the height and half of the field of view (in radians), the distance can be derived the same way as the near plane ().
const d = (height / 2) / Math.tan(fov / 2)
Similarly, this can be done with horizontal field of view and extents, or revised to find the size of a frustum at a given distance from the camera.
Orthographic projection
Orthographic projections lack perspective and are a bit more straight forward than perspective projections.
The orthographic projection matrix can be constructed from its extent values like perspective projection:
A simplified form can be used for symmetric projections, where and .