Android WebRTC VideoFrame

VideoFrame

VideoFrame 是 WebRTC 中相当重要的一个概念，我们叫他视频帧，Androider 视角来看位于 org.webrtc 包下，在前端或者 iOS 应该也有对应的 VideoFrame 甚至他们大部分字段都一样。因为他们都是不同平台对 native 层的抽象实体。

核心既然是在 c++ 为什么我们要重视它去了解它，除了我们可能需要在做帧补偿的时候对某一帧画面进行截留，其次还有可能我们需要将它转换成 Bitmap 或者是转换成 byte [] 进行存储，重要的是我在进行一些业务处理的时候操作它出现了：

log 复制代码

java.lang.IllegalStateException: retain() called on an obiect with refcount < 1at 
org.webrtc.RefCountDelegate.retain(RefCountDelegate.java:34)at 
org.webrtc.TextureBufferImpl.retain(TextureBufferImpl.java:119)at 
org.webrtc.VideoFrame.retain(VideoFrame.iava:196)at 
org.webrtc.EglRenderer.onFrame(EglRenderer.java:521)at 
org.webrtc.SurfaceEglRenderer.onFrame(SurfaceEqlRenderer.java:106)at 
org.webrtc.SurfaceViewRenderer.onFrame(SurfaceViewRenderer.java:196)at 
org.webrtc.SurfaceViewRenderer.triggerLastFrameRefresh(SurfaceViewRenderer.java:.28)

为什么会报这个错误？ retain 和 release 代表了什么含义？

java 复制代码

public class VideoFrame implements RefCounted

回答上面的问题，VideoFrame 实现了 RefCounted ，字母意思好像是引用计数，实际也确实跟这个相关

java 复制代码

public interface RefCounted {
  /** Increases ref count by one. */
  @CalledByNative void retain();

  /**
   * Decreases ref count by one. When the ref count reaches zero, resources related to the object
   * will be freed.
   */
  @CalledByNative void release();
}

retain 和 release 方法是用于管理 VideoFrame 对象的引用计数的。引用计数是一种内存管理技术，用于跟踪一个对象被引用的次数。当对象的引用计数为 0 时，即没有任何对象在使用它时，该对象应该被销毁以释放内存。

具体来说，这两个方法的含义如下：

retain(): 调用这个方法会增加 VideoFrame 对象的引用计数。这意味着另一个对象现在也持有对该帧的引用，防止它在不再需要时被销毁。在多线程环境下，对于需要在不同线程之间传递 VideoFrame 对象的情况下，通常会在传递前调用 retain() 方法，以确保对象在传递期间不会被销毁。
release(): 调用这个方法会减少 VideoFrame 对象的引用计数。当引用计数变为 0 时，表示没有任何对象再持有对该帧的引用，该帧对象可以被销毁以释放内存。在使用完 VideoFrame 对象后，通常会调用这个方法来告诉系统不再需要该对象，以便及时释放资源。

了解了 retain 和 release 的含义，我们应该能避免再出现文中开头的错误了：

确保在调用'retain()方法之前，对象的引用计数大于等于1
确保在调用'release()方法后，不再对对象进行任何操作，以避免重复释放对象

再来看下 VideoFrame 还有哪些我们需要关注的点呢？

java 复制代码

public class VideoFrame implements RefCounted {
  /**
   * Implements image storage medium. Might be for example an OpenGL texture or a memory region
   * containing I420-data.
   *
   * <p>Reference counting is needed since a video buffer can be shared between multiple VideoSinks,
   * and the buffer needs to be returned to the VideoSource as soon as all references are gone.
   */
  public interface Buffer extends RefCounted {
    /**
     * Representation of the underlying buffer. Currently, only NATIVE and I420 are supported.
     */
    @CalledByNative("Buffer")
    @VideoFrameBufferType
    default int getBufferType() {
      return VideoFrameBufferType.NATIVE;
    }

    /**
     * Resolution of the buffer in pixels.
     */
    @CalledByNative("Buffer") int getWidth();
    @CalledByNative("Buffer") int getHeight();

    /**
     * Returns a memory-backed frame in I420 format. If the pixel data is in another format, a
     * conversion will take place. All implementations must provide a fallback to I420 for
     * compatibility with e.g. the internal WebRTC software encoders.
     */
    @CalledByNative("Buffer") I420Buffer toI420();

    @Override @CalledByNative("Buffer") void retain();
    @Override @CalledByNative("Buffer") void release();

    /**
     * Crops a region defined by `cropx`, `cropY`, `cropWidth` and `cropHeight`. Scales it to size
     * `scaleWidth` x `scaleHeight`.
     */
    @CalledByNative("Buffer")
    Buffer cropAndScale(
        int cropX, int cropY, int cropWidth, int cropHeight, int scaleWidth, int scaleHeight);
  }

  /**
   * Interface for I420 buffers.
   */
  public interface I420Buffer extends Buffer {
    @Override
    default int getBufferType() {
      return VideoFrameBufferType.I420;
    }

    /**
     * Returns a direct ByteBuffer containing Y-plane data. The buffer capacity is at least
     * getStrideY() * getHeight() bytes. The position of the returned buffer is ignored and must
     * be 0. Callers may mutate the ByteBuffer (eg. through relative-read operations), so
     * implementations must return a new ByteBuffer or slice for each call.
     */
    @CalledByNative("I420Buffer") ByteBuffer getDataY();
    /**
     * Returns a direct ByteBuffer containing U-plane data. The buffer capacity is at least
     * getStrideU() * ((getHeight() + 1) / 2) bytes. The position of the returned buffer is ignored
     * and must be 0. Callers may mutate the ByteBuffer (eg. through relative-read operations), so
     * implementations must return a new ByteBuffer or slice for each call.
     */
    @CalledByNative("I420Buffer") ByteBuffer getDataU();
    /**
     * Returns a direct ByteBuffer containing V-plane data. The buffer capacity is at least
     * getStrideV() * ((getHeight() + 1) / 2) bytes. The position of the returned buffer is ignored
     * and must be 0. Callers may mutate the ByteBuffer (eg. through relative-read operations), so
     * implementations must return a new ByteBuffer or slice for each call.
     */
    @CalledByNative("I420Buffer") ByteBuffer getDataV();

    @CalledByNative("I420Buffer") int getStrideY();
    @CalledByNative("I420Buffer") int getStrideU();
    @CalledByNative("I420Buffer") int getStrideV();
  }

  /**
   * Interface for buffers that are stored as a single texture, either in OES or RGB format.
   */
  public interface TextureBuffer extends Buffer {
    enum Type {
      OES(GLES11Ext.GL_TEXTURE_EXTERNAL_OES),
      RGB(GLES20.GL_TEXTURE_2D);

      private final int glTarget;

      private Type(final int glTarget) {
        this.glTarget = glTarget;
      }

      public int getGlTarget() {
        return glTarget;
      }
    }

    Type getType();
    int getTextureId();

    /**
     * Retrieve the transform matrix associated with the frame. This transform matrix maps 2D
     * homogeneous coordinates of the form (s, t, 1) with s and t in the inclusive range [0, 1] to
     * the coordinate that should be used to sample that location from the buffer.
     */
    Matrix getTransformMatrix();
  }

  private final Buffer buffer;
  private final int rotation;
  private final long timestampNs;

  /**
   * Constructs a new VideoFrame backed by the given {@code buffer}.
   *
   * @note Ownership of the buffer object is tranferred to the new VideoFrame.
   */
  @CalledByNative
  public VideoFrame(Buffer buffer, int rotation, long timestampNs) {
    if (buffer == null) {
      throw new IllegalArgumentException("buffer not allowed to be null");
    }
    if (rotation % 90 != 0) {
      throw new IllegalArgumentException("rotation must be a multiple of 90");
    }
    this.buffer = buffer;
    this.rotation = rotation;
    this.timestampNs = timestampNs;
  }

  @CalledByNative
  public Buffer getBuffer() {
    return buffer;
  }

  /**
   * Rotation of the frame in degrees.
   */
  @CalledByNative
  public int getRotation() {
    return rotation;
  }

  /**
   * Timestamp of the frame in nano seconds.
   */
  @CalledByNative
  public long getTimestampNs() {
    return timestampNs;
  }

  public int getRotatedWidth() {
    if (rotation % 180 == 0) {
      return buffer.getWidth();
    }
    return buffer.getHeight();
  }

  public int getRotatedHeight() {
    if (rotation % 180 == 0) {
      return buffer.getHeight();
    }
    return buffer.getWidth();
  }

  @Override
  public void retain() {
    buffer.retain();
  }

  @Override
  @CalledByNative
  public void release() {
    buffer.release();
  }
}

从代码中提取几个关键词：

Buffer
Y/U/V
I420

这个 buffer 可不是我们 java.io 里面的 buffer 是一种接口用于规范 WebRTC 的视频帧的实现，实现一般有两种类型，一种是基于 YUV 的，另外一种是基于 RBG 和 openGL 的。

YUV 又是什么呢， RGB 我们很熟悉，就像 Android 中的 bitmap 位图，我们在指定一些参数时会见到 RGB_8888 或者 RGB_565。

第一次接触 YUV 还是 2015 年那时候基于 ZXing 库去试图通过 YUV 去提高扫码和识别二维码的成功率，YUV 和 RGB 的两者的差异是：

表示方式：

RGB：使用红色（R）、绿色（G）和蓝色（B）三种颜色通道的组合来表示颜色。每个像素由这三个通道的强度值来描述，每个通道的强度值通常范围在0到255之间，表示不同程度的颜色亮度。
YUV：YUV编码是一种亮度和色度分离的颜色编码方式。其中Y表示亮度（Luminance），U和V表示色度（Chrominance）。Y通道描述了图像的亮度信息，而U和V通道描述了色彩信息。

存储方式：

RGB：每个像素使用3个字节（或4个字节，如果包含Alpha通道 ARBG 别对应红色、绿色和蓝色通道。这种方式在计算机图形处理中比较直观，但相对而言需要更多的存储空间。
YUV：使用更为复杂的编码方式来表示颜色信息，通常需要更少的存储空间。在视频压缩和传输中，YUV通常比RGB更为常见，因为它更适合于对人眼感知不敏感的色度信息进行压缩。

用途：

RGB：通常用于计算机显示器和图形处理领域，因为它直接对应于显示器的颜色输出。
YUV：在视频处理和传输领域中更为常见，因为它更适合于视频信号的压缩和传输。在许多视频编解码标准中（如MPEG、H.264等），都使用了YUV格式。

RGB适合于计算机图形处理和显示，而YUV则更适合于视频编解码和传输，尤其是在带宽和存储资源有限的情况下。

关于 YUV 里面最重要的是 Y 的亮度信息，如果你使用过 YUV 的三个参数的调试工具，发现不管怎么去调 U和V 你最终还是能识别出这个物体是什么，或者像是看到了一张黑白照片。

I420 又是什么呢？参考注释

I420 是图像或者视频帧的存储格式，类似的存储格式还有 NA12 / YUV2 等，只是 WebRTC 要求各端以这个通用的格式进行数据返回，以保证软件编码器的兼容。

下面提供几个来自互联网的对 VideoFrame 应用的场景示例：

VideoFrame 转 byte[]

java 复制代码

public void onFrame(VideoFrame var1) {
    VideoFrame.I420Buffer buffer = var1.getBuffer().toI420();
    int height = buffer.getHeight();
    int width = buffer.getWidth();
 
    ByteBuffer yBuffer = buffer.getDataY();
    ByteBuffer uBuffer = buffer.getDataU();
    ByteBuffer vBuffer = buffer.getDataV();
 
    int yStride = buffer.getStrideY();
    int uStride = buffer.getStrideU();
    int vStride = buffer.getStrideV();
 
    byte[] data = new byte[height * width * 3 / 2];
    yBuffer.get(data, 0, height * width);
 
    int uOffset = width * height;
    int vOffset = width * height * 5 / 4;
    for (int i = 0; i < height / 2; i++) {
         uBuffer.position(i * uStride);
         uBuffer.get(data, uOffset, width / 2);
         uOffset += width / 2;
         vBuffer.position(i * vStride);
         vBuffer.get(data, vOffset, width / 2);
         vOffset += width / 2;
     }
     buffer.release();
}

VideoFrame 转 Bitmap

java 复制代码

    public Bitmap saveImgBitmap(VideoFrame frame){
        final Matrix drawMatrix = new Matrix();
        // Used for bitmap capturing.
        final GlTextureFrameBuffer bitmapTextureFramebuffer =
                new GlTextureFrameBuffer(GLES20.GL_RGBA);
        drawMatrix.reset();
        drawMatrix.preTranslate(0.5f, 0.5f);
        //控制图片的方向
        drawMatrix.preScale( -1f ,  -1f);
        drawMatrix.preScale(-1f, 1f); // We want the output to be upside down for Bitmap.
        drawMatrix.preTranslate(-0.5f, -0.5f);
        
        final int scaledWidth = (int) (1 * frame.getRotatedWidth());
        final int scaledHeight = (int) (1 * frame.getRotatedHeight());
        bitmapTextureFramebuffer.setSize(scaledWidth, scaledHeight);
 
        GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, bitmapTextureFramebuffer.getFrameBufferId());
        GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER, GLES20.GL_COLOR_ATTACHMENT0,
                GLES20.GL_TEXTURE_2D, bitmapTextureFramebuffer.getTextureId(), 0);
 
        GLES20.glClearColor(0 /* red */, 0 /* green */, 0 /* blue */, 0 /* alpha */);
        GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT);
        VideoFrameDrawer frameDrawer = new VideoFrameDrawer();
        RendererCommon.GlDrawer drawer = new GlRectDrawer();
        frameDrawer.drawFrame(frame, drawer, drawMatrix, 0 /* viewportX */,
                0 /* viewportY */, scaledWidth, scaledHeight);
 
        final ByteBuffer bitmapBuffer = ByteBuffer.allocateDirect(scaledWidth * scaledHeight * 4);
        GLES20.glViewport(0, 0, scaledWidth, scaledHeight);
        GLES20.glReadPixels(
                0, 0, scaledWidth, scaledHeight, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, bitmapBuffer);
 
        GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0);
        GlUtil.checkNoGLES2Error("EglRenderer.notifyCallbacks");
 
        final Bitmap bitmap = Bitmap.createBitmap(scaledWidth, scaledHeight, Bitmap.Config.ARGB_8888);
        bitmap.copyPixelsFromBuffer(bitmapBuffer);
 
        try {
            File file = new File("/data/data/com.xxx.diagnose/files"+ "/test.jpg");
            if (!file.exists()){
                file.createNewFile();
            }
            OutputStream outputStream=new FileOutputStream(file);
            bitmap.compress(Bitmap.CompressFormat.JPEG,100,outputStream);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return bitmap;
    }