siglip2推理教程

1.安装transformers

一定要按照下面的方法安装(或者从源码安装),否则会报错,试了其他几种方法都不行

复制代码
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .

安装完有下面提示即可:

Successfully installed transformers-4.50.0.dev0

2.下载权重

地址:

https://huggingface.co/google/siglip2-base-patch16-224/tree/main

3.推理代码

方法1:

python 复制代码
from transformers import pipeline
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch


dtype = torch.float32
device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = "google/siglip2-base-patch16-224"

# load pipeline
image_classifier = pipeline(task="zero-shot-image-classification",model="google/siglip2-base-patch16-224",)
path ="000000039769.jpg"
image = Image.open(path)
candidate_labels = ["2 cats", "a plane", "a remote"]
outputs = image_classifier(image, candidate_labels=candidate_labels)
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

推理结果:

方法2:

python 复制代码
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch

model = AutoModel.from_pretrained("google/siglip2-base-patch16-224")
processor = AutoProcessor.from_pretrained("google/siglip2-base-patch16-224")

#url = "http://images.cocodataset.org/val2017/000000039769.jpg"
#image = Image.open(requests.get(url, stream=True).raw)

path ="000000039769.jpg"
image = Image.open(path)

candidate_labels = ["2 cats", "2 dogs"]
# follows the pipeline prompt template to get same results
texts = [f"This is a photo of {label}." for label in candidate_labels]

# IMPORTANT: we pass `padding=max_length` and `max_length=64` since the model was trained with this
inputs = processor(text=texts, images=image, padding="max_length", max_length=64, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image) # these are the probabilities
print(f"{probs[0][0]:.1%} that image 0 is '{candidate_labels[0]}'")

推理结果:

4.动态分辨率测试代码

注意下面的代码中# fixres的 processo需要自己再定义下:

python 复制代码
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch


# first, create an image with a circle and define labels
def create_image(width, height):
    image = Image.new("RGB", (width, height), color="red")
    draw = ImageDraw.Draw(image)
    center_x = image.width // 2
    center_y = image.height // 2
    radius = min(center_x, center_y) // 8 * 7
    draw.ellipse(
        (center_x - radius, center_y - radius, center_x + radius, center_y + radius),
        fill="blue",
        outline="green",
        width=image.width // 20,
    )
    return image

labels = [
    "a circle",
    "an ellipse",
    "a square",
    "a rectangle",
    "a triangle",
]
text = [f"A photo of {label}." for label in labels]
print(text)

image_with_circle = create_image(512, 256)

# loading NaFlex model and processor
naflex_checkpoint = "google/siglip2-base-patch16-naflex"

naflex_model = AutoModel.from_pretrained(naflex_checkpoint, torch_dtype=dtype, device_map=device)
naflex_processor = AutoProcessor.from_pretrained(naflex_checkpoint)


# naflex inference
inputs = naflex_processor(text=text, images=image_with_circle, padding="max_length", max_length=64, return_tensors="pt")
inputs = inputs.to(device)

with torch.inference_mode():
    naflex_outputs = naflex_model(**inputs)

# fixres inference
inputs = processor(text=text, images=image_with_circle, padding="max_length", max_length=64, return_tensors="pt")
inputs = inputs.to(device)

with torch.inference_mode():
    outputs = model(**inputs)
    
#visualize results
logits_per_text = torch.cat([naflex_outputs.logits_per_text, outputs.logits_per_text], dim=1)
probs = (logits_per_text.float().sigmoid().detach().cpu().numpy() * 100)

pd.DataFrame(probs, index=labels, columns=["naflex", "fixres"]).style.format('{:.1f}%').background_gradient('Greens', vmin=0, vmax=100)
相关推荐
小徐Chao努力2 分钟前
Spring AI Alibaba A2A 使用指南
java·人工智能·spring boot·spring·spring cloud·agent·a2a
啊阿狸不会拉杆2 分钟前
《数字图像处理》第7章:小波变换和其他图像变换
图像处理·人工智能·python·算法·机器学习·计算机视觉·数字图像处理
yiersansiwu123d2 分钟前
生成式AI重构内容生态,人机协同定义创作新范式
大数据·人工智能·重构
老蒋新思维4 分钟前
创客匠人:从个人IP到知识变现,如何构建可持续的内容生态?
大数据·网络·人工智能·网络协议·tcp/ip·创客匠人·知识变现
HyperAI超神经10 分钟前
GPT-5全面领先,OpenAI发布FrontierScience,「推理+科研」双轨检验大模型能力
人工智能·gpt·ai·openai·benchmark·基准测试·gpt5.2
老蒋新思维14 分钟前
创客匠人洞察:从“个人品牌”到“系统物种”——知识IP的终极进化之路
网络·人工智能·网络协议·tcp/ip·重构·创客匠人·知识变现
阿杰学AI17 分钟前
AI核心知识57——大语言模型之MoE(简洁且通俗易懂版)
人工智能·ai·语言模型·aigc·ai-native·moe·混合专家模型
珠海西格电力21 分钟前
零碳园区边缘计算节点规划:数字底座的硬件部署与能耗控制方案
运维·人工智能·物联网·能源·边缘计算
臼犀24 分钟前
孩子,那不是说明书,那是祈祷文
人工智能·程序员·markdown
黑客思维者25 分钟前
《关于深入实施 “人工智能 +“ 行动的意见》深度解读
人工智能