【读论文】DINOv3论文阅读与代码实践

文章目录

实践
- DINOv3权重申请与下载
- 测试模型加载和推理

论文地址： https://arxiv.org/abs/2508.10104
github地址： https://github.com/facebookresearch/dinov3

实践

DINOv3权重申请与下载

在hugging face上申请，申请的时候最好瞎编一个美国人的信息，并使用美国的IP（因为我第一次申请的时候老老实实填写的真实信息，IP地址定位在新加坡，然后一会儿就被reject了）

文件/目录	作用
`config.json`	模型结构配置
`model-00001-of-00006.safetensors` ... `model-00006-of-00006.safetensors`	权重分片（共 6 个）
`model.safetensors.index.json`	分片映射文件
`preprocessor_config.json`	图像预处理参数
`README.md` / `LICENSE.md` / `.gitattributes`	文档与版权信息

将以上文件下载之后存放在目录/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m

测试模型加载和推理

官方readme中的一段测试代码（由于网络问题，已提前将测试图片和DINOv3模型权重下载到服务器，离线加载）：

python 复制代码

from transformers import pipeline
from transformers.image_utils import load_image

# url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
# image = load_image(url)
image = load_image("/data/xujianxia/dinov3-main/pipeline-cat-chonk.jpeg")


# feature_extractor = pipeline(
#     model="facebook/dinov3-convnext-tiny-pretrain-lvd1689m",
#     task="image-feature-extraction",
# )
feature_extractor = pipeline(
    model="/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m",
    task="image-feature-extraction",
)
features = feature_extractor(image)

运行以上代码，报错如下：

bash 复制代码

/home/xujianxia/anaconda3/bin/conda run -n monai --no-capture-output python /home/xujianxia/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client localhost --port 34499 --file /data/xujianxia/dinov3-main/load_model.py 
Connected to pydev debugger (build 251.26927.74)
Traceback (most recent call last):
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1271, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 966, in __getitem__
    raise KeyError(key)
KeyError: 'dinov3_vit'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xujianxia/.pycharm_helpers/pydev/pydevd.py", line 1570, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/xujianxia/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/data/xujianxia/dinov3-main/load_model.py", line 13, in <module>
    feature_extractor = pipeline(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 909, in pipeline
    config = AutoConfig.from_pretrained(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
python-BaseException
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `dinov3_vit` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
^C
CondaError: KeyboardInterrupt


Process finished with exit code 1

检查transformers版本是否最新，发现已经是最新版：

(monai) xujianxia@aa-SYS-4029GP-TRT:/data/xujianxia$ pip show transformers

Name: transformers

Version: 4.55.2

Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow

Home-page: https://github.com/huggingface/transformers

Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)

Author-email: transformers@huggingface.co

License: Apache 2.0 License

Location: /home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages

Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm

Required-by:

所以采用报错信息提供的第二种方案：pip install git+https://github.com/huggingface/transformers.git

hugging face也有老哥遇到该问题并通过直接从 GitHub 的主分支安装 transformers 库解决：
https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m/discussions/1

报错原因：模型代码还没有正式发布到 PyPI（Python 的官方包仓库）

执行pip install git+https://github.com/huggingface/transformers.git后一直卡在这里，应该是网络原因：

所以采用离线安装的方案：

先去github上下载最新的transformers仓库得到transformers-main.zip传到服务器上，并解压。

bash 复制代码

# 激活 conda 环境
conda activate monai

# 进入你解压后的文件夹（请根据你的实际路径修改）
cd /path/to/your/folder/transformers-main/

# 执行本地安装（推荐同时使用国内 PyPI 镜像加速依赖包的下载）
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

本地安装成功：

再次运行，又报错：

bash 复制代码

/home/xujianxia/anaconda3/bin/conda run -n monai --no-capture-output python /data/xujianxia/dinov3-main/load_model.py 
Loading checkpoint shards: 100%|██████████████████| 6/6 [00:00<00:00, 61.22it/s]
Device set to use cuda:0
Traceback (most recent call last):
  File "/data/xujianxia/dinov3-main/load_model.py", line 13, in <module>
    feature_extractor = pipeline(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 1210, in pipeline
    return pipeline_class(model=model, framework=framework, task=task, **kwargs)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1043, in __init__
    self.model.to(self.device)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4374, in to
    return super().to(*args, **kwargs)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 
ERROR conda.cli.main_run:execute(127): `conda run python /data/xujianxia/dinov3-main/load_model.py` failed. (See above for error)

Process finished with exit code 1

因为我这里用的是dinov3-vit7b16-pretrain-lvd1689m（6,716M），源码中用的是dinov3-convnext-tiny-pretrain-lvd1689m（29M ）。以下是作者提供的所有预训练模型list:

ViT models pretrained on web dataset (LVD-1689M):

Model	Parameters	Pretraining Dataset	Download
ViT-S/16 distilled	21M	LVD-1689M	$link$
ViT-S+/16 distilled	29M	LVD-1689M	$link$
ViT-B/16 distilled	86M	LVD-1689M	$link$
ViT-L/16 distilled	300M	LVD-1689M	$link$
ViT-H+/16 distilled	840M	LVD-1689M	$link$
ViT-7B/16	6,716M	LVD-1689M	$link$

ConvNeXt models pretrained on web dataset (LVD-1689M):

Model	Parameters	Pretraining Dataset	Download
ConvNeXt Tiny	29M	LVD-1689M	$link$
ConvNeXt Small	50M	LVD-1689M	$link$
ConvNeXt Base	89M	LVD-1689M	$link$
ConvNeXt Large	198M	LVD-1689M	$link$

ViT models pretrained on satellite dataset (SAT-493M):

Model	Parameters	Pretraining Dataset	Download
ViT-L/16 distilled	300M	SAT-493M	$link$
ViT-7B/16	6,716M	SAT-493M	$link$

为了快速验证，我选择降低模型精度：

python 复制代码

feature_extractor = pipeline(
    torch_dtype=torch.float16,  # 添加这一行
    model="/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m",
    task="image-feature-extraction",
)

成功将图片编码成features。