最近由于需要用到Grounding DINO进行标注,Grounding DINO可以通过文本提示的方式检测目标,号称检查一切,有点类似Segment Anything Model (SAM)分割一切的大模型。因此需要用到Grounding DINO,但是在部署的时候发现国内无法访问Hugging Face,因此需要把相关的资源下载到本地部署。
1. Grounding DINO链接
2. 资源准备
groundingdino权重
上面的权重直接下载即可!!!后面放进项目的weights文件夹里
下载以上的即可!!!
3. 配置环境
3.1 设置环境变量
临时设置:
export CUDA_HOME=/path/to/cuda-11.3
永久设置:
echo 'export CUDA_HOME=/path/to/cuda' >> ~/.bashrc
source ~/.bashrc
echo $CUDA_HOME
检测是否配置成功
echo $CUDA_HOME
有输出则成功,没有任何输出则配置失败
3.2 下载代码
git clone https://github.com/IDEA-Research/GroundingDINO.git
3.3 配置虚拟环境
conda create -n grounding_dino python=3.8.16
cd GroundingDINO/
pip install -e .
3.4 项目配置
在项目根目录创建两个文件夹,weights和bert-base-uncased把上面下载的内容分别放进这两个文件夹即可。
3.5 修改代码
GroundingDINO-main/groundingdino/util/get_tokenlizer.py
python
from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
import os
def get_tokenlizer(text_encoder_type):
# import ipdb;ipdb.set_trace();
if not isinstance(text_encoder_type, str):
# print("text_encoder_type is not a str")
if hasattr(text_encoder_type, "text_encoder_type"):
text_encoder_type = text_encoder_type.text_encoder_type
elif text_encoder_type.get("text_encoder_type", False):
text_encoder_type = text_encoder_type.get("text_encoder_type")
elif os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type):
pass
else:
raise ValueError(
"Unknown type of text_encoder_type: {}".format(type(text_encoder_type))
)
print("final text_encoder_type: {}".format(text_encoder_type))
# 新添加代码片段
tokenizer_path = "/home/zhangh/GroundingDINO-main/GroundingDINO-main/bert-base-uncased" # 这个需要使用绝对路径才可以。他这里使用了相对路径,有可能报错。
tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False)
return tokenizer
'''
tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)
return tokenizer
'''
def get_pretrained_language_model(text_encoder_type):
# import ipdb;ipdb.set_trace();
if text_encoder_type == "bert-base-uncased" or (os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type)):
# 新添加代码片段
model_path = "/home/zhangh/GroundingDINO-main/GroundingDINO-main/bert-base-uncased"
return BertModel.from_pretrained(model_path)
# return BertModel.from_pretrained(text_encoder_type)
if text_encoder_type == "roberta-base":
return RobertaModel.from_pretrained(text_encoder_type)
raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))
把配置的文件夹的路径改为自己的路径即可!
4. 开始测试
测试代码
python
from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2
model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
IMAGE_PATH = "OIP.jpg"
TEXT_PROMPT = "chair . person . dog ."
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25
image_source, image = load_image(IMAGE_PATH)
boxes, logits, phrases = predict(
model=model,
image=image,
caption=TEXT_PROMPT,
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
cv2.imwrite("annotated_image.jpg", annotated_frame)
里面的代码修改下图片路径,再修改下提示词,即可开始运行!
感觉效果还不错 !!!
附录
还有个Grounding DINO API 1.5也还可以,这个的部署比较简单,没这个这么麻烦,只需要注册一个账号,申请下token,免费的!再再本地简单的部署下环境即可。
链接如下:
本篇博客参考了以下的博客!非常感谢!
[1] https://blog.csdn.net/m0_46295727/article/details/133221439?spm=1001.2014.3001.5506
[2] https://blog.csdn.net/weixin_44151034/article/details/139362032?spm=1001.2014.3001.5506