【读论文】DINOv3论文阅读与代码实践

文章目录

论文地址: https://arxiv.org/abs/2508.10104
github地址: https://github.com/facebookresearch/dinov3

实践

DINOv3权重申请与下载

在hugging face上申请,申请的时候最好瞎编一个美国人的信息,并使用美国的IP(因为我第一次申请的时候老老实实填写的真实信息,IP地址定位在新加坡,然后一会儿就被reject了)

文件/目录 作用
config.json 模型结构配置
model-00001-of-00006.safetensors ... model-00006-of-00006.safetensors 权重分片(共 6 个)
model.safetensors.index.json 分片映射文件
preprocessor_config.json 图像预处理参数
README.md / LICENSE.md / .gitattributes 文档与版权信息

将以上文件下载之后存放在目录/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m

测试模型加载和推理

官方readme中的一段测试代码(由于网络问题,已提前将测试图片和DINOv3模型权重下载到服务器,离线加载):

python 复制代码
from transformers import pipeline
from transformers.image_utils import load_image

# url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
# image = load_image(url)
image = load_image("/data/xujianxia/dinov3-main/pipeline-cat-chonk.jpeg")


# feature_extractor = pipeline(
#     model="facebook/dinov3-convnext-tiny-pretrain-lvd1689m",
#     task="image-feature-extraction",
# )
feature_extractor = pipeline(
    model="/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m",
    task="image-feature-extraction",
)
features = feature_extractor(image)

运行以上代码,报错如下:

bash 复制代码
/home/xujianxia/anaconda3/bin/conda run -n monai --no-capture-output python /home/xujianxia/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client localhost --port 34499 --file /data/xujianxia/dinov3-main/load_model.py 
Connected to pydev debugger (build 251.26927.74)
Traceback (most recent call last):
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1271, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 966, in __getitem__
    raise KeyError(key)
KeyError: 'dinov3_vit'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xujianxia/.pycharm_helpers/pydev/pydevd.py", line 1570, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/xujianxia/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/data/xujianxia/dinov3-main/load_model.py", line 13, in <module>
    feature_extractor = pipeline(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 909, in pipeline
    config = AutoConfig.from_pretrained(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
python-BaseException
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `dinov3_vit` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
^C
CondaError: KeyboardInterrupt


Process finished with exit code 1

检查transformers版本是否最新,发现已经是最新版:

(monai) xujianxia@aa-SYS-4029GP-TRT:/data/xujianxia$ pip show transformers

Name: transformers

Version: 4.55.2

Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow

Home-page: https://github.com/huggingface/transformers

Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)

Author-email: transformers@huggingface.co

License: Apache 2.0 License

Location: /home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages

Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm

Required-by:

所以采用报错信息提供的第二种方案:pip install git+https://github.com/huggingface/transformers.git

hugging face也有老哥遇到该问题并通过直接从 GitHub 的主分支安装 transformers 库解决:
https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m/discussions/1

报错原因:模型代码还没有正式发布到 PyPI(Python 的官方包仓库)

执行pip install git+https://github.com/huggingface/transformers.git后一直卡在这里,应该是网络原因:

所以采用离线安装的方案:

先去github上下载最新的transformers仓库得到transformers-main.zip传到服务器上,并解压。

bash 复制代码
# 激活 conda 环境
conda activate monai

# 进入你解压后的文件夹(请根据你的实际路径修改)
cd /path/to/your/folder/transformers-main/

# 执行本地安装(推荐同时使用国内 PyPI 镜像加速依赖包的下载)
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

本地安装成功:

再次运行,又报错:

bash 复制代码
/home/xujianxia/anaconda3/bin/conda run -n monai --no-capture-output python /data/xujianxia/dinov3-main/load_model.py 
Loading checkpoint shards: 100%|██████████████████| 6/6 [00:00<00:00, 61.22it/s]
Device set to use cuda:0
Traceback (most recent call last):
  File "/data/xujianxia/dinov3-main/load_model.py", line 13, in <module>
    feature_extractor = pipeline(
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 1210, in pipeline
    return pipeline_class(model=model, framework=framework, task=task, **kwargs)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1043, in __init__
    self.model.to(self.device)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4374, in to
    return super().to(*args, **kwargs)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/home/xujianxia/anaconda3/envs/monai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 
ERROR conda.cli.main_run:execute(127): `conda run python /data/xujianxia/dinov3-main/load_model.py` failed. (See above for error)

Process finished with exit code 1

因为我这里用的是dinov3-vit7b16-pretrain-lvd1689m(6,716M),源码中用的是dinov3-convnext-tiny-pretrain-lvd1689m(29M )。以下是作者提供的所有预训练模型list:

ViT models pretrained on web dataset (LVD-1689M):

Model Parameters Pretraining Dataset Download
ViT-S/16 distilled 21M LVD-1689M [link]
ViT-S+/16 distilled 29M LVD-1689M [link]
ViT-B/16 distilled 86M LVD-1689M [link]
ViT-L/16 distilled 300M LVD-1689M [link]
ViT-H+/16 distilled 840M LVD-1689M [link]
ViT-7B/16 6,716M LVD-1689M [link]

ConvNeXt models pretrained on web dataset (LVD-1689M):

Model Parameters Pretraining Dataset Download
ConvNeXt Tiny 29M LVD-1689M [link]
ConvNeXt Small 50M LVD-1689M [link]
ConvNeXt Base 89M LVD-1689M [link]
ConvNeXt Large 198M LVD-1689M [link]

ViT models pretrained on satellite dataset (SAT-493M):

Model Parameters Pretraining Dataset Download
ViT-L/16 distilled 300M SAT-493M [link]
ViT-7B/16 6,716M SAT-493M [link]

为了快速验证,我选择降低模型精度:

python 复制代码
feature_extractor = pipeline(
    torch_dtype=torch.float16,  # 添加这一行
    model="/data/xujianxia/dinov3-main/dinov3-vit7b16-pretrain-lvd1689m",
    task="image-feature-extraction",
)

成功将图片编码成features。

相关推荐
刘若里1 天前
【论文阅读】自适应稀疏自注意力——可直接用!
论文阅读·人工智能·笔记·深度学习·计算机视觉
数说星榆1811 天前
油气勘探数据可视化流程图制作
论文阅读·信息可视化·毕业设计·流程图·论文笔记·毕设
Matrix_111 天前
论文阅读:中央凹堆叠成像技术
论文阅读·人工智能·计算摄影
智算菩萨2 天前
【论文复现】Applied Intelligence 2025:Auto-PU正例无标签学习的自动化实现与GPT-5.4辅助编程实战
论文阅读·python·gpt·学习·自动化·复现
智算菩萨2 天前
【论文精读】通过元学习与关联规则挖掘增强人工智能在网络安全领域特征选择中的可解释性
论文阅读·人工智能·学习·web安全·论文笔记
智算菩萨2 天前
【论文精读】AI-assisted rational decision-making:AI 如何辅助理性决策?
论文阅读·人工智能·论文笔记
檐下翻书1732 天前
小型网站开发简易流程步骤 在线画图工具快速生成开发流程图方法
论文阅读·信息可视化·流程图
乌黑浓密的技术员2 天前
zotero官方中文版下载地址
论文阅读
StfinnWu2 天前
论文阅读:SIMPLE ZERO-SHOT IMAGE DEHAZING
论文阅读