gemini-pro-vision 看图说话

一、安装

复制代码
   pip install -U langchain-google-vertexai

二、设置访问权限

申请服务账号json格式key

三、完整代码

复制代码
import gradio as gr
import json
import base64
from pathlib import Path
import os
import time
import requests
from fastapi import FastAPI, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
from langchain_core.messages import HumanMessage
from langchain_google_vertexai import ChatVertexAI
from langchain_core.output_parsers import StrOutputParser

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "xxx.json"
app = FastAPI()
app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def generate(model, prompt, images_base64):
    llm = ChatVertexAI(model_name=model)
    # example
    message = HumanMessage(
        content=[
            {
                "type": "text",
                "text": prompt,
            },
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{images_base64}"}},
        ]
    )
    parser = StrOutputParser()
    result = llm.invoke([message])
    parserResult = parser.invoke(result)
    return parserResult

def respond(model, image_path, prompt, chat_history):
    print(model, image_path, prompt)
    images_base64 = [encode_image(image_path)]
    bot_message = generate(model, prompt, images_base64)
    chat_history.append((prompt, bot_message))
    time.sleep(1)
    return "", chat_history

with gr.Blocks() as demo:
    gr.Image(value='xxx.png',height=30,width=70, interactive=False, show_download_button=False, show_label=False)
    gr.HTML("""<h1 align="center">图片问答</h1>""")
    
    model = gr.Textbox(value="gemini-pro-vision",label="gemini多模态模型:")
    with gr.Row():
        with gr.Column(scale=1):
            image_path = gr.Image(label="上传图片:",type="filepath", value='Picture1.png')
        with gr.Column(scale=3):
            chatbot = gr.Chatbot()
    prompt = gr.Textbox(label="用户:",value="大童在保险行业的地位如何?使用中文回答。")
    
    clear = gr.ClearButton([prompt, chatbot])
            
    prompt.submit(respond, [model, image_path, prompt, chatbot], [prompt, chatbot])

app = gr.mount_gradio_app(app, demo, path="/")

if __name__ == '__main__':
    uvicorn.run(app='web_gemini:app', host='0.0.0.0', port=8500, workers=1)

四、运行效果

相关推荐
X56616 分钟前
CSS如何处理SSR中CSS引入_在服务端渲染时提取关键CSS
jvm·数据库·python
duke86926721440 分钟前
PostgreSQL 中高效插入多对多关联数据的三种方案对比与最佳实践
jvm·数据库·python
狮子座明仔1 小时前
AgentSPEX:当 Agent 框架开始把“控制流“从 Python 里抠出来
开发语言·python
m0_463672201 小时前
mysql数据库如何进行逻辑备份与物理备份对比_优缺点分析
jvm·数据库·python
2401_867623981 小时前
SQL如何进行分组后字符串拼接_使用GROUP_CONCAT或STRING_AGG
jvm·数据库·python
kexnjdcncnxjs1 小时前
MySQL触发器无法触发的原因分析_MySQL触发器排查指南
jvm·数据库·python
Java后端的Ai之路1 小时前
CodeBuddy-Rules配置
人工智能·python·ai编程
拾-光1 小时前
【Git】命令大全:从入门到高手,100 个最常用命令速查(2026 版)
java·大数据·人工智能·git·python·elasticsearch·设计模式
AI玫瑰助手2 小时前
Python流程控制:break与continue语句的区别与应用
开发语言·python·信息可视化
棉猴2 小时前
python海龟绘图之画布与窗口
javascript·python·html·setup·turtle·海龟绘图·screensize