Perspective Projection Matrix of OpenGL and Direct3D

1. Setting the Stage: What Does Perspective Projection Do?

The perspective projection matrix transforms 3D points from camera space (or view space) into clip space (before clipping and normalization). The key goals of a perspective projection matrix are:

  • Mapping 3D points to 2D while preserving the depth information.
  • Foreshortening: Objects further away from the camera appear smaller.
  • Clipping: Points outside the view frustum (outside the near and far planes, or outside the left, right, top, and bottom planes) are clipped.

Both OpenGL and Direct3D use a 4x4 matrix to achieve this, but they handle things like depth range and homogeneous coordinates differently. These differences give us two distinct forms of the projection matrix.

2. General Perspective Projection Matrix Setup

A perspective projection matrix typically involves the following parameters:

  • Field of view (FOV) --- defines the viewing angle.
  • Aspect ratio (AR) --- the ratio of the width to the height of the image.
  • Near plane (n) and far plane (f) --- these define the depth range of the scene being rendered.
  • Frustum bounds --- the left, right, top, and bottom boundaries of the view frustum (a truncated pyramid that represents the visible part of the scene).

The view frustum is defined by:

  • l l l: Left boundary of the near plane.
  • r r r: Right boundary of the near plane.
  • b b b: Bottom boundary of the near plane.
  • t t t: Top boundary of the near plane.
  • n n n: Distance to the near clipping plane.
  • f f f: Distance to the far clipping plane.

3. Deriving the Perspective Projection Matrix

(a) Frustum Definition

The frustum is a truncated pyramid that defines the visible region of the 3D scene:

  • Near clipping plane : The rectangle at z = n z = n z=n, defined by [ l , r ] [l, r] [l,r] in the horizontal direction and [ b , t ] [b, t] [b,t] in the vertical direction.
  • Far clipping plane : The rectangle at z = f z = f z=f, also bounded horizontally and vertically.

The idea is to map 3D points inside this frustum to a cube (known as clip space) where:

  • x x x is in the range [ − 1 , 1 ] [-1, 1] [−1,1],
  • y y y is in the range [ − 1 , 1 ] [-1, 1] [−1,1],
  • For OpenGL , z z z is in the range [ − 1 , 1 ] [-1, 1] [−1,1],
  • For Direct3D , z z z is in the range [ 0 , 1 ] [0, 1] [0,1].
(b) Perspective Projection Matrix Form

To map the 3D frustum to this normalized clip space, we use a perspective projection matrix. For both OpenGL and Direct3D, the general form of the matrix is the same:

P 4 x 4 = [ 2 n r − l 0 r + l r − l 0 0 2 n t − b t + b t − b 0 0 0 f + n n − f 2 f n n − f 0 0 − 1 0 ] P_{4x4} = \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & \frac{f+n}{n-f} & \frac{2fn}{n-f} \\ 0 & 0 & -1 & 0 \end{bmatrix} P4x4= r−l2n0000t−b2n00r−lr+lt−bt+bn−ff+n−100n−f2fn0

Where:

  • r r r, l l l: Right and left boundaries of the frustum at the near plane.
  • t t t, b b b: Top and bottom boundaries of the frustum at the near plane.
  • n , f n, f n,f: Near and far clipping distances.
© Breaking Down the Matrix:

Let's break down each part of this matrix:

  • First row (horizontal scaling) :
    2 n r − l \frac{2n}{r-l} r−l2n

    This scales the x-coordinate into the range [ − 1 , 1 ] [-1, 1] [−1,1] in clip space. The r − l r-l r−l term ensures the width of the frustum is correctly mapped, and the factor of 2 n 2n 2n accounts for the depth and field of view.

    The third element of the first row,
    r + l r − l \frac{r+l}{r-l} r−lr+l

    adjusts for the horizontal offset of the frustum (if it's centered or not).

  • Second row (vertical scaling) :
    2 n t − b \frac{2n}{t-b} t−b2n

    This scales the y-coordinate into the range [ − 1 , 1 ] [-1, 1] [−1,1] in clip space. Similar to the horizontal scaling, this term ensures the height of the frustum is correctly mapped.

    The third element of the second row,
    t + b t − b \frac{t+b}{t-b} t−bt+b

    adjusts for the vertical offset of the frustum.

  • Third row (depth scaling) :
    f + n n − f \frac{f+n}{n-f} n−ff+n

    This term maps the z-coordinate (depth) from the range [ n , f ] [n, f] [n,f] in camera space into the clip space range. Here's where the difference between OpenGL and Direct3D comes in:

    • In OpenGL , we want z z z to be mapped into the range [ − 1 , 1 ] [-1, 1] [−1,1].
    • In Direct3D , we want z z z to be mapped into the range [ 0 , 1 ] [0, 1] [0,1].

    The fourth element of the third row,
    2 f n n − f \frac{2fn}{n-f} n−f2fn

    ensures that the depth values are correctly scaled.

  • Fourth row :
    − 1 -1 −1

    This term ensures the correct perspective divide happens, where after applying the transformation, the coordinates are divided by w w w to obtain normalized device coordinates (NDC).

4. Differences Between OpenGL and Direct3D Matrices

Now, let's focus on the key differences between the OpenGL and Direct3D perspective projection matrices.

(a) Depth Range Differences (Third Row)
  • OpenGL : The depth range is mapped to [ − 1 , 1 ] [-1, 1] [−1,1]. This is why the term f + n n − f \frac{f+n}{n-f} n−ff+n is used in the M 33 M_{33} M33 element of the OpenGL matrix. It ensures that z z z values at the near plane ( z = n z = n z=n) are mapped to − 1 -1 −1 and at the far plane ( z = f z = f z=f) are mapped to 1 1 1.

    This explains why M 33 = 1 M_{33} = 1 M33=1 in OpenGL, as OpenGL uses a symmetric depth range around 0.

  • Direct3D : The depth range is mapped to [ 0 , 1 ] [0, 1] [0,1]. In Direct3D, P 33 = − 1 P_{33} = -1 P33=−1 because Direct3D maps the near plane to z = 0 z = 0 z=0 and the far plane to z = 1 z = 1 z=1. This convention is more compatible with common buffer formats (where a depth of 0 corresponds to the near plane and 1 corresponds to the far plane).

(b) c x c_x cx and c y c_y cy (Principal Point Offsets)

The terms c x c_x cx and c y c_y cy (the third elements in the first and second rows) represent the offsets for the principal point (the center of projection) on the image plane. These terms are different because they depend on how the frustum is defined:

  • In OpenGL , the frustum is typically symmetric, so c x c_x cx and c y c_y cy are often 0 when the frustum is centered. This means that the projection is symmetric around the origin in clip space.

  • In Direct3D , the frustum may be defined differently, or the principal point may be offset from the center (especially if the projection is not symmetric). This can lead to non-zero values for c x c_x cx and c y c_y cy.

5. Alternative Form: Field of View (FOV) Perspective Matrix

In some cases, instead of defining the frustum with explicit bounds ( r , l , t , b ) (r, l, t, b) (r,l,t,b), it's more intuitive to use the field of view (FOV) and aspect ratio. The projection matrix in this form is often used for cameras:

P 4 x 4 = [ 1 A R ⋅ tan ⁡ ( F O V 2 ) 0 0 0 0 1 tan ⁡ ( F O V 2 ) 0 0 0 0 f + n n − f 2 f n n − f 0 0 − 1 0 ] P_{4x4} = \begin{bmatrix} \frac{1}{AR \cdot \tan(\frac{FOV}{2})} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\frac{FOV}{2})} & 0 & 0 \\ 0 & 0 & \frac{f+n}{n-f} & \frac{2fn}{n-f} \\ 0 & 0 & -1 & 0 \end{bmatrix} P4x4= AR⋅tan(2FOV)10000tan(2FOV)10000n−ff+n−100n−f2fn0

Where:

  • F O V FOV FOV is the field of view angle.
  • A R AR AR is the aspect ratio (width/height).
  • n n n and f f f are the near and far clipping planes.
相关推荐
前端Hardy18 小时前
HTML&CSS:数据卡片可以这样设计
前端·javascript·css·3d·html
小彭努力中1 天前
138. CSS3DRenderer渲染HTML标签
前端·深度学习·3d·webgl·three.js
AI生成未来1 天前
斯坦福&UC伯克利开源突破性视觉场景生成与编辑技术,精准描绘3D/4D世界!
3d·3d场景·4d
汪洪墩2 天前
【Mars3d】实现这个地图能靠左,不居中的样式效果
前端·javascript·vue.js·3d·webgl·cesium
Bearnaise2 天前
GaussianDreamer: Fast Generation from Text to 3D Gaussians——点云论文阅读(11)
论文阅读·人工智能·python·深度学习·opencv·计算机视觉·3d
智驾机器人技术前线2 天前
近期两篇NeRF/3DGS-based SLAM方案赏析:TS-SLAM and MBA-SLAM
3d·slam·nerf·3dgs
Tianwen_Burning2 天前
halcon3d disparity_image_to_xyz非常重要的算子及使用条件
算法·3d
光学测量小菜鸡3 天前
OpenCV双目立体视觉重建
opencv·3d·双目相机·结构光·sgbm
豆包MarsCode3 天前
基于豆包MarsCode 和 Threejs 实现3D地图可视化
大数据·开发语言·人工智能·python·3d·程序员
工业3D_大熊3 天前
3D数据格式转换工具HOOPS Exchange如何在读取CAD文件时处理镶嵌数据?
java·linux·c++·windows·macos·3d·c#