多模态视频大模型Aria在Docker部署
契机
⚙ 闲逛HuggingFace的时候发现一个25.3B的多模态大模型,支持图片和视频。刚好我有H20的GPU所以部署来看看效果,因为我的宿主机是cuda-12.1所以为了防止环境污染采用docker部署,通过一系列的披荆斩棘比如Segmentation fault (core dumped)异常,最终成功运行在单卡h20服务器上,python3.10,cuda12.4,ubuntu20.04,程序在推理图片的时候占用50g显存,推理5s视频20fps的时候占用60g左右显存。
项目简介
https://github.com/rhymes-ai/Aria
线上demo尝试
线上demo响应很快,并且描述得很详细,并且可以描述什么时间发生了啥,介绍里面说的是:Cutting a long video by scene transitions with timestamps.(通过带有时间戳的场景过渡来剪切长视频。),这不是自动剪分镜吗,我有一个好想法先写完这篇再说
环境
docker环境
宿主机cuda是12.4以上的可以忽略,宿主机可以随便升降级cuda的也可以忽略要不然会出现以下异常:ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/.../.../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
bash
#安装docker前置
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -fsSL https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
#安装docker和nvidia-docker
sudo apt-get update
sudo apt-get docker.io
sudo apt-get install -y nvidia-docker2
sudo systemctl start docker
docker --version
#配置加速
#data-root为容器目录,我这里指定只是根目录磁盘满了,你磁盘多的可以不指定
vim /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-file": "3",
"max-size": "10m"
},
"registry-mirrors" :[
"https://hub.rat.dev",
"https://docker.1panel.live",
"https://docker.rainbond.cc",
"https://mirror.ccs.tencentyun.com",
"http://registry.docker-cn.com",
"http://docker.mirrors.ustc.edu.cn",
"http://hub-mirror.c.163.com"
],
"data-root": "/home/docker"
}
#重启
sudo systemctl daemon-reload
sudo systemctl restart docker
#运行cuda:12.4.1容器,指定使用哪块gpu,指定挂载路径
#cuda:12.4.1-devel-ubuntu20.04。这个镜像包含了 nvcc 和其他开发工具。
docker run -d \
--name aria \
--gpus '"device=3"' \
-v /home:/home \
nvidia/cuda:12.4.1-devel-ubuntu20.04 \
tail -f /dev/null
#进入docker
docker exec -it aria bash
#安装常见工具
apt install vim
apt install wget
apt install git
bash
#迁移docker容器目录
#这只是我的磁盘满了,需要搞到其他盘,我自己记录一下,你不用运行
sudo rsync -aP /var/lib/docker/ /home/docker
docker info | grep "Docker Root Dir"
Conda环境
bash
#下载conda,有些云厂商不支持tsinghua,所以任意选一个就行
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
#安装conda,配置环境变量,如果选择了自动配置环境可以不修改bashrc
sh Miniconda3-latest-Linux-x86_64.sh
#添加conda
vim ~/.bashrc
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/xxx/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/xxx/miniconda3/etc/profile.d/conda.sh" ]; then
. "/xxx/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/xxx/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
#激活
source ~/.bashrc
代码环境
bash
#建立conda环境,必须使用3.10
#ERROR: Package 'aria' requires a different Python: 3.9.20 not in '>=3.10'
conda create --name aria python=3.10
#克隆代码
git clone https://github.com/rhymes-ai/Aria.git
#进入Aria工程目录
conda activate aria
pip install -e . -i https://mirrors.aliyun.com/pypi/simple
pip install grouped_gemm -i https://mirrors.aliyun.com/pypi/simple
pip install flash-attn --no-build-isolation -i https://mirrors.aliyun.com/pypi/simple
下载模型
本来测试代码可以自动下载,我喜欢放在指定目录,所以搞了个脚本下载
bash
import argparse
import time
import logging
from huggingface_hub import snapshot_download
# Configure logging
logging.basicConfig(level=logging.INFO)
def download_model(model_name, local_name, max_retries=15, retry_interval=2):
for attempt in range(1, max_retries + 1):
try:
snapshot_download(
repo_id=model_name,
ignore_patterns=["*.bin"],
local_dir=local_name,
force_download=False
)
logging.info("Download successful")
return
except Exception as e:
logging.error(f"Attempt {attempt} failed: {e}")
if attempt < max_retries:
time.sleep(retry_interval)
else:
logging.critical("Download failed, exceeded maximum retry attempts")
def main():
parser = argparse.ArgumentParser(description="Download a model from Hugging Face Hub")
parser.add_argument("--model_name", required=True, help="Name of the model to download")
parser.add_argument("--local_name", required=True, help="Local directory to save the model")
args = parser.parse_args()
download_model(args.model_name, args.local_name)
if __name__ == "__main__":
main()
bash
#设置国内下载加速
export HF_ENDPOINT=https://hf-mirror.com
#命令行直接运行,如果缺少依赖手动装下就行
python download_model.py \
--model_name rhymes-ai/Aria \
--local_name /home/models/huggingface/rhymes-ai/Aria
#建议使用nohup
export HF_ENDPOINT=https://hf-mirror.com && nohup xxxxx >> dowload.log 2>&1 &
图片测试
代码
python
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
#这里为下载好模型本地地址
model_id_or_path = "/home/models/huggingface/rhymes-ai/Aria"
model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
#你自己搞一个你图片
image_path = "https://m207605830-1.jpg"
image = Image.open(requests.get(image_path, stream=True).raw)
messages = [
{
"role": "user",
"content": [
{"text": None, "type": "image"},
{"text": "what is the image?", "type": "text"},
],
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
output = model.generate(
**inputs,
max_new_tokens=500,
stop_strings=["<|im_end|>"],
tokenizer=processor.tokenizer,
do_sample=True,
temperature=0.9,
)
output_ids = output[0][inputs["input_ids"].shape[1]:]
result = processor.decode(output_ids, skip_special_tokens=True)
print(result)
结果
视频测试
代码
python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import time
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
model_id_or_path = "/home/models/huggingface/rhymes-ai/Aria"
model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16,
trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
#这个一定放在模型加载下面,要不然要报错Segmentation fault (core dumped)
from decord import VideoReader
from tqdm import tqdm
from typing import List
def load_video(video_file, num_frames=128, cache_dir="/home/lzy/cached_video_frames", verbosity="DEBUG"):
# Create cache directory if it doesn't exist
os.makedirs(cache_dir, exist_ok=True)
video_basename = os.path.basename(video_file)
cache_subdir = os.path.join(cache_dir, f"{video_basename}_{num_frames}")
os.makedirs(cache_subdir, exist_ok=True)
cached_frames = []
missing_frames = []
frame_indices = []
for i in range(num_frames):
frame_path = os.path.join(cache_subdir, f"frame_{i}.jpg")
if os.path.exists(frame_path):
cached_frames.append(frame_path)
else:
missing_frames.append(i)
frame_indices.append(i)
vr = VideoReader(video_file)
duration = len(vr)
fps = vr.get_avg_fps()
frame_timestamps = [int(duration / num_frames * (i + 0.5)) / fps for i in range(num_frames)]
if verbosity == "DEBUG":
print(
"Already cached {}/{} frames for video {}, enjoy speed!".format(len(cached_frames), num_frames, video_file))
# If all frames are cached, load them directly
if not missing_frames:
return [Image.open(frame_path).convert("RGB") for frame_path in cached_frames], frame_timestamps
actual_frame_indices = [int(duration / num_frames * (i + 0.5)) for i in missing_frames]
missing_frames_data = vr.get_batch(actual_frame_indices).asnumpy()
for idx, frame_index in enumerate(tqdm(missing_frames, desc="Caching rest frames")):
img = Image.fromarray(missing_frames_data[idx]).convert("RGB")
frame_path = os.path.join(cache_subdir, f"frame_{frame_index}.jpg")
img.save(frame_path)
cached_frames.append(frame_path)
cached_frames.sort(key=lambda x: int(os.path.basename(x).split('_')[1].split('.')[0]))
return [Image.open(frame_path).convert("RGB") for frame_path in cached_frames], frame_timestamps
def get_placeholders_for_videos(frames: List, timestamps=[]):
contents = []
if not timestamps:
for i, _ in enumerate(frames):
contents.append({"text": None, "type": "image"})
contents.append({"text": "\n", "type": "text"})
else:
for i, (_, ts) in enumerate(zip(frames, timestamps)):
contents.extend(
[
{"text": f"[{int(ts) // 60:02d}:{int(ts) % 60:02d}]", "type": "text"},
{"text": None, "type": "image"},
{"text": "\n", "type": "text"}
]
)
return contents
video_extensions = ('.mp4', '.avi', '.mov')
for root, _, files in os.walk("/home/"):
for file in files:
if file.endswith(video_extensions):
video_path = os.path.join(root, file)
frames, frame_timestamps = load_video(video_path, num_frames=20)
### If you want to insert timestamps for Aria Inputs
contents = get_placeholders_for_videos(frames, frame_timestamps)
### If you DO NOT want to insert frame timestamps for Aria Inputs
# contents = get_placeholders_for_videos(frames)
start = time.time()
messages = [
{
"role": "user",
"content": [
*contents,
{
"text": "描述视频",
"type": "text"},
],
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=frames, return_tensors="pt", max_image_size=980)
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
output = model.generate(
**inputs,
max_new_tokens=2048,
stop_strings=["<|im_end|>"],
tokenizer=processor.tokenizer,
do_sample=False,
)
output_ids = output[0][inputs["input_ids"].shape[1]:]
result = processor.decode(output_ids, skip_special_tokens=True)
print(result)
print(time.time() - start)
- 我是分析/home/下面的所有视频,你要分析单个改改就行
- max_image_size可改成490
- num_frames你根据自己视频来选,我的5秒视频,分析20fps,相当于一秒4fps
结果
总结
- aria显存占用还可以,60g左右,好像默认使用的是attn_implementation="flash_attention_2"
- 对比qwen和cpm来说,可以做到:通过带有时间戳的场景过渡来剪切长视频
- core dumped调整下import就行