Decord - 深度学习视频加载器

文章目录


一、关于 Decord

一款高效的深度学习视频加载器,具有超级容易消化的智能洗牌功能



DecordRecord的一个反向过程,它提供了方便的视频切片方法,该方法基于硬件加速视频解码器上的一个薄封装器。

  • FFMPEG/LibAV(完成)
  • Nvidia编解码器(完成)
  • 英特尔编解码器

Decord旨在处理尴尬的视频洗牌体验,以提供类似于深度学习的随机图像加载器的流畅体验。

Decord还可以从视频和音频文件中解码音频。人们可以将视频和音频切片在一起以获得同步结果;因此为视频和音频解码提供一站式解决方案。


初步基准

Decord擅长处理随机访问模式,这在神经网络训练中很常见。


二、安装


1、通过pip安装

简单的使用

shell 复制代码
pip install decord

支持的平台:

  • Linux
  • Mac OS>=10.12, python>=3.5
  • Windows

请注意,现在PYPI仅提供CPU版本。请从源代码构建以启用GPU加速器。


2、从源代码安装


2.1 Linux

安装用于构建共享库的系统包,对于Debian/Ubuntu用户,运行:

复制代码
# official PPA comes with ffmpeg 2.8, which lacks tons of features, we use ffmpeg 4.0 here
sudo add-apt-repository ppa:jonathonf/ffmpeg-4 # for ubuntu20.04 official PPA is already version 4.2, you may skip this step
sudo apt-get update
sudo apt-get install -y build-essential python3-dev python3-setuptools make cmake
sudo apt-get install -y ffmpeg libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev
# note: make sure you have cmake 3.8 or later, you can install from cmake official website if it's too old

递归克隆repo(重要)

shell 复制代码
git clone --recursive https://github.com/dmlc/decord

在源根目录中构建共享库:

shell 复制代码
cd decord
mkdir build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make

您可以指定-DUSE_CUDA=ON-DUSE_CUDA=/path/to/cuda-DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER=/path/to/cuda/nvcc以启用NVDEC硬件加速解码:

复制代码
cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release

请注意,如果您遇到libnvcuvid.so的问题(例如,参见#102),可能是由于libnvcuvid.so的链接丢失,您可以手动找到它(ldconfig -p | grep libnvcuvid)并将库链接到CUDA_TOOLKIT_ROOT_DIR\lib64以允许decord顺利检测并链接正确的库。

要指定自定义FFMPEG库路径,请使用'-DFFMPEG_DIR=/path/to/ffmpeg"。

安装python绑定:

shell 复制代码
cd ../python
# option 1: add python path to $PYTHONPATH, you will need to install numpy separately
pwd=$PWD
echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bashrc
source ~/.bashrc
# option 2: install with setuptools
python3 setup.py install --user

2.2 macOS

macOS上的安装类似于Linux。但是macOS用户需要先安装clang、GNU Make、cmake等构建工具。

clang和GNU Make等工具打包在macOS的命令行工具中。要安装:

复制代码
xcode-select --install

要安装其他需要的包,如cmake,我们建议首先安装Homebrew,它是macOS的流行包管理器。详细说明可以在其主页上找到。

安装Homebrew后,通过以下方式安装cmake和ffmpeg:

shell 复制代码
brew install cmake ffmpeg
# note: make sure you have cmake 3.8 or later, you can install from cmake official website if it's too old

递归克隆repo(重要)

shell 复制代码
git clone --recursive https://github.com/dmlc/decord

然后转到根目录构建共享库:

shell 复制代码
cd decord
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

安装python绑定:

shell 复制代码
cd ../python
# option 1: add python path to $PYTHONPATH, you will need to install numpy separately
pwd=$PWD
echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bash_profile
source ~/.bash_profile 

# option 2: install with setuptools
python3 setup.py install --user

2.3 Windows

对于windows,您将需要CMake和Visual Studio进行C++编译。


依赖项准备就绪后,打开命令行提示符:

shell 复制代码
cd your-workspace
git clone --recursive https://github.com/dmlc/decord
cd decord
mkdir build
cd build
cmake -DCMAKE_CXX_FLAGS="/DDECORD_EXPORTS" -DCMAKE_CONFIGURATION_TYPES="Release" -G "Visual Studio 15 2017 Win64" ..
# open `decord.sln` and build project

三、用法

Decord为引导提供了最小的API集。您还可以查看jupyter笔记本示例


1、VideoReader

VideoReader用于直接从视频文件中访问帧。

python 复制代码
from decord import VideoReader
from decord import cpu, gpu

vr = VideoReader('examples/flipping_a_pancake.mkv', ctx=cpu(0))

# a file like object works as well, for in-memory decoding
with open('examples/flipping_a_pancake.mkv', 'rb') as f:
  vr = VideoReader(f, ctx=cpu(0))
print('video frames:', len(vr))

# 1. the simplest way is to directly access frames
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    frame = vr[i]
    print(frame.shape)

# To get multiple frames at once, use get_batch
# this is the efficient way to obtain a long list of frames
frames = vr.get_batch([1, 3, 5, 7, 9])
print(frames.shape)
# (5, 240, 320, 3)

# duplicate frame indices will be accepted and handled internally to avoid duplicate decoding
frames2 = vr.get_batch([1, 2, 3, 2, 3, 4, 3, 4, 5]).asnumpy()
print(frames2.shape)
# (9, 240, 320, 3)

# 2. you can do cv2 style reading as well
# skip 100 frames
vr.skip_frames(100)

# seek to start
vr.seek(0)
batch = vr.next()
print('frame shape:', batch.shape)
print('numpy frames:', batch.asnumpy())

2、VideoLoader

VideoLoader专为训练具有大量视频文件的深度学习模型而设计。它提供智能视频洗牌技术,以提供高随机访问性能(我们知道在视频中寻找是超级慢和冗余的)。优化隐藏在用户不可见的C++代码中。

python 复制代码
from decord import VideoLoader
from decord import cpu, gpu

vl = VideoLoader(['1.mp4', '2.avi', '3.mpeg'], ctx=[cpu(0)], shape=(2, 320, 240, 3), interval=1, skip=5, shuffle=1)
print('Total batches:', len(vl))

for batch in vl:
    print(batch[0].shape)

Shuffling 视频可能很棘手,因此我们提供各种模式:

shell 复制代码
shuffle = -1  # smart shuffle mode, based on video properties, (not implemented yet)
shuffle = 0  # all sequential, no seeking, following initial filename order
shuffle = 1  # random filename order, no random access for each video, very efficient
shuffle = 2  # random order
shuffle = 3  # random frame access in each video only

3、AudioReader

AudioReader用于直接从视频(如果有音轨)和音频文件中访问样本。

python 复制代码
from decord import AudioReader
from decord import cpu, gpu

# You can specify the desired sample rate and channel layout
# For channels there are two options: default to the original layout or mono
ar = AudioReader('example.mp3', ctx=cpu(0), sample_rate=44100, mono=False)
print('Shape of audio samples: ', ar.shape())
# To access the audio samples
print('The first sample: ', ar[0])
print('The first five samples: ', ar[0:5])
print('Get a batch of samples: ', ar.get_batch([1,3,5]))

4、AVReader

AVReader是AudioReader和VideoReader的包装器。它使您能够同时切片视频和音频。

python 复制代码
from decord import AVReader
from decord import cpu, gpu

av = AVReader('example.mov', ctx=cpu(0))
# To access both the video frames and corresponding audio samples
audio, video = av[0:20]
# Each element in audio will be a batch of samples corresponding to a frame of video
print('Frame #: ', len(audio))
print('Shape of the audio samples of the first frame: ', audio[0].shape)
print('Shape of the first frame: ', video.asnumpy()[0].shape)
# Similarly, to get a batch
audio2, video2 = av.get_batch([1,3,5])

四、深度学习框架的桥梁:

有一个从decord到流行的深度学习框架的桥梁对于训练/推理很重要

  • Apache MXNet(完成)
  • Pytorch(完成)
  • TensorFlow(完成)

将桥接器用于深度学习框架很简单,例如,可以将默认张量输出设置为mxnet.ndarray

python 复制代码
import decord
vr = decord.VideoReader('examples/flipping_a_pancake.mkv')
print('native output:', type(vr[0]), vr[0].shape)
# native output: <class 'decord.ndarray.NDArray'>, (240, 426, 3)
# you only need to set the output type once
decord.bridge.set_bridge('mxnet')
print(type(vr[0], vr[0].shape))
# <class 'mxnet.ndarray.ndarray.NDArray'> (240, 426, 3)
# or pytorch and tensorflow(>=2.2.0)
decord.bridge.set_bridge('torch')
decord.bridge.set_bridge('tensorflow')
# or back to decord native format
decord.bridge.set_bridge('native')

2025-01-07(二)

相关推荐
Mintopia7 小时前
OpenClaw 对软件行业产生的影响
人工智能
陈广亮7 小时前
构建具有长期记忆的 AI Agent:从设计模式到生产实践
人工智能
会写代码的柯基犬7 小时前
DeepSeek vs Kimi vs Qwen —— AI 生成俄罗斯方块代码效果横评
人工智能·llm
Mintopia8 小时前
OpenClaw 是什么?为什么节后热度如此之高?
人工智能
爱可生开源社区8 小时前
DBA 的未来?八位行业先锋的年度圆桌讨论
人工智能·dba
叁两11 小时前
用opencode打造全自动公众号写作流水线,AI 代笔太香了!
前端·人工智能·agent
前端付豪11 小时前
LangChain记忆:通过Memory记住上次的对话细节
人工智能·python·langchain
strayCat2325511 小时前
Clawdbot 源码解读 7: 扩展机制
人工智能·开源