Ollama REST API模型调用实战

Ollama REST API模型调用小试

在上一篇文章中，我们已经完成了 Ollama 的安装部署以及模型的基本使用。接下来，我们将通过 REST API 的方式调用 Ollama 模型，实现更灵活的集成和应用。

启动 Ollama 服务

在调用 REST API 之前，确保 Ollama 服务已经启动。可以通过以下命令启动服务：

bash 复制代码

ollama serve

默认情况下，Ollama 服务会监听 http://localhost:11434。

示例使用本地部署的最轻量级 qwen2.5:0.5b 模型，默认采用CPU计算。

sh 复制代码

ollama run qwen2.5:0.5b

REST API 接口概览

Ollama 提供了以下主要 REST API 接口：

生成文本 ：POST /api/generate
聊天对话 ：POST /api/chat
模型管理 ：POST /api/pull（拉取模型）、POST /api/delete（删除模型）
模型列表 ：GET /api/tags（获取可用模型列表）

我们将重点介绍 生成文本 和 聊天对话 接口的使用。

API 示例

示例1：Request (Structured outputs)

ollama 提供的 api 支持结构化格式输出文本，模型可以生成指定格式要求的响应内容（如json）。

python 复制代码

url = "http://192.168.1.199:11434/api/generate"

prefix = "检索文章内容的品牌及车型信息。如：xx品牌，xx车型。"
data = {
  "model": "qwen2.5-custom:latest",
  "prompt": prefix + "比亚迪的车最近卖的很火爆，尤其是宋Pro Dmi，销量很高啊！",
  "stream": False,
  "format": {  # 指定响应内容的格式，这里设定输出< brand、vehicle_type >两个字段
    "type": "object",
    "properties": {
      "brand": {
        "type": "string"
      },
      "vehicle_type": {
        "type": "string"
      }
    },
    "required": [
      "brand"
      "vehicle_type"
    ]
  }
}

response = requests.post(url, json=data)

if response.status_code == 200:
    result = response.json()
    print("响应结果：\n", result.get('response'))
else:
    print(f"请求失败，状态码：{response.status_code}")
    print("错误信息：", response.text)

模型返回内容如下：

示例2：Chat request (With History)

发送带有历史记录的聊天消息。即多轮对话的支持，模型可以结合上下文，生成连贯的回复。

python 复制代码

url = "http://192.168.1.199:11434/api/chat"

data = {
  "model": "qwen2.5:0.5b",
  "messages": [
    {"role": "system", "content": "你现在是一名合格的售票员，你还可以随意生成一些航班路线提供给用户，请扮演好您的角色。"},
    {"role": "user", "content": "你好，我想订一张机票。"},
    {"role": "assistant", "content": "好的，请问您要去哪里？"},
    {"role": "user", "content": "我要去北京。"},
    {"role": "user", "content": "有哪些航班可选？"}
  ],
  "stream": False
}

response = requests.post(url, json=data)

if response.status_code == 200:
    result = response.json()
    print("响应结果：\n", result.get('message').get('content'))
else:
    print(f"请求失败，状态码：{response.status_code}")
    print("错误信息：", response.text)

模型返回内容如下：

示例3：Chat request (with tools)

请求带工具，一种增强型对话请求功能。允许模型在生成回复时调用外部工具或 API，以完成更复杂的任务或提供更丰富的信息。这种功能通常用于需要实时数据、互联网搜索或其他外部资源支持的场景。

python 复制代码

import requests
from requests import Response

citykey_map = {
    '北京': '101010100',
    '上海': '101020100',
    '广州': '101280101',
    '深圳': '101280601',
    '天津': '101030100'
}
weather_type_map = {
    'wendu': '温度',
    'shidu': '湿度',
    'pm25': 'pm25',
    'quality': '空气质量'
}

def get_current_city_weather(city: str, weather_type: str):
    print("调用获取天气的自定义方法 -> 获取城市：" + city + "的" + weather_type_map.get(weather_type) + "信息")
    url = "http://t.weather.sojson.com/api/weather/city/" + citykey_map.get(city)
    response_json = requests.get(url).json()
    return f"今天{ city }的{ weather_type_map.get(weather_type) }：{ response_json['data'][weather_type] }"

prompt = """
    帮我回答下今天上海的天气是什么温度？
    """
url = "http://192.168.1.199:11434/api/chat"
data = {
  "model": "qwen2.5:0.5b",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "stream": False,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_city_weather",
        "description": "Get the weather temperature of the city area on the day",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "The city to get the weather for, e.g. 天津, 中国"
            },
            "weather_type": {
              "type": "string",
              "description": "The weather_type to return the type in the weather information, e.g. 'wendu' or 'shidu'",
              "enum": ["wendu", "shidu", "pm25", "quality"]
            }
          },
          "required": ["city", "weather_type"]
        }
      }
    }
  ]
}

response = requests.post(url, json=data)
function_map = {
    'get_current_city_weather': get_current_city_weather,
}

def chat_tool_calls(response: Response, function_map: dict):
    result = response.json()
    function_name = None
    try:
        tool_call = result.get('message').get('tool_calls')[0]
        function_name = tool_call.get('function').get('name')
        arguments = tool_call.get('function').get('arguments')
    except TypeError:
        ...
    function_tool_call = function_map.get(function_name)
    if function_tool_call:
        return function_tool_call(**arguments)
    else:
        return result.get('message').get('content')

chat_response = chat_tool_calls(response, function_map)
print("响应结果：\n", chat_response)

模型返回内容如下：

如果设置的问题描述没有涉及天气相关的内容，模型则不会返回tool_calls参数。

示例4：Request (with suffix)

请求带suffix参数，在生成文本时，模型需要在生成的文本之后指定附加的内容。换句话说，suffix 是模型生成文本的"后缀"，模型会尝试生成与 prompt 和 suffix 都匹配的文本。

sh 复制代码

curl http://localhost:11434/api/generate -d '{
  "model": "codellama:code",
  "prompt": "def compute_gcd(a, b):",
  "suffix": "    return result",
  "options": { 
    "temperature": 0
  },
  "stream": false
}'

需要注意的是，某些模型可能不支持在生成文本时使用 suffix 参数。

示例5：Request (with images)

要将图像提交给多模态模型如 llava 或 bakllava ，请提供一个包含 base64 编码的 images 列表：

sh 复制代码

curl http://localhost:11434/api/generate -d '{
  "model": "llava",
  "prompt":"What is in this picture?",
  "stream": false,
  "images": [" 请填入 base64 编码的二进制图片数据 "]
}'

示例6：Chat request (Reproducible outputs)

针对指定的问题，可重复同一回答的文本输出。（设置 temperature 为0，seed 为某一固定值）

sh 复制代码

curl http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:0.5b",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  }
}'

示例7：Generate request (With options)

如果您想在运行时而不是在 Modelfile 中为模型设置自定义选项，可以使用 options 参数。此示例设置了所有可用选项，但您也可以单独设置其中任何一个，并省略您不想覆盖的选项。

sh 复制代码

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "options": {
    "num_keep": 5,
    "seed": 42,
    "num_predict": 100,
    "top_k": 20,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 0.7,
    "repeat_last_n": 33,
    "temperature": 0.8,
    "repeat_penalty": 1.2,
    "presence_penalty": 1.5,
    "frequency_penalty": 1.0,
    "mirostat": 1,
    "mirostat_tau": 0.8,
    "mirostat_eta": 0.6,
    "penalize_newline": true,
    "stop": ["\n", "user:"],
    "numa": false,
    "num_ctx": 1024,
    "num_batch": 2,
    "num_gpu": 1,
    "main_gpu": 0,
    "low_vram": false,
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
    "num_thread": 8
  }
}'

注意事项

模型选择：
- 确保本地已安装所需模型，否则需要先通过 POST /api/pull 拉取模型。
性能优化：
- 如果生成文本较长，建议启用 stream 模式，以减少延迟。
安全性：
- 在生产环境中，建议对 Ollama 服务进行身份验证和访问控制。

总结

通过 Ollama REST API，我们可以轻松调用本地部署的大语言模型，实现文本生成、聊天对话等功能。结合 Python 或其他编程语言，可以快速集成到各种应用中，满足不同的需求。