文章目录
- 一、如何直接将多模态数据传输给模型
- [二、如何使用 mutimodal prompts](#二、如何使用 mutimodal prompts)
一、如何直接将多模态数据传输给模型
在这里,我们演示了如何将多模式输入直接传递给模型。对于其他的支持多模态输入的模型提供者,langchain 在类中提供了内在逻辑来转化为期待的格式。
传入图像最常用的方法是将其作为字节字符串传入。这应该适用于大多数模型集成。
python
import base64
import httpx
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
)
response = model.invoke([message])
print(response.content)
我们可以直接在"image_URL"类型的内容块中提供图像URL。但是注意,只有一些模型提供程序支持此功能。
python
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
我们也可以传多个图片。
python
message = HumanMessage(
content=[
{"type": "text", "text": "are these two images the same?"},
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "image_url", "image_url": {"url": image_url}},
],
)
response = model.invoke([message])
print(response.content)
二、如何使用 mutimodal prompts
在这里,我们将描述一下怎么使用 prompt templates 来为模型格式化 multimodal imputs。
python
import base64
import httpx
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
prompt = ChatPromptTemplate.from_messages(
[
("system", "Describe the image provided"),
(
"user",
[
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,{image_data}"},
}
],
),
]
)
我们也可以给模型传入多个图片。
python
prompt = ChatPromptTemplate.from_messages(
[
("system", "compare the two pictures provided"),
(
"user",
[
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,{image_data1}"},
},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,{image_data2}"},
},
],
),
]
)
chain = prompt | model
response = chain.invoke({"image_data1": image_data, "image_data2": image_data})
print(response.content)