Docker Model Runner Chat

Docker Model Runner是一个AI推理引擎，提供来自不同提供商的各种模型。

Spring AI通过重用现有的OpenAI支持的ChatClient与Docker Model Runner集成。为此，将基本URL设置为localhost:12434/engines，并选择提供的LLM模型之一。

查看DockerModelRunnerWithOpenAiChatModelIT.java测试，了解如何将Docker Model Runner与Spring AI结合使用。

Prerequisite

下载适用于Mac 4.40.0的Docker桌面。

选择以下选项之一以启用模型运行器：

选项1：

启用模型运行器docker桌面启用模型运行程序--tcp 12434。

将基本url设置为localhost:12434/engines

选项2：

启用模型运行器docker桌面启用模型运行程序。

使用测试容器并按如下方式设置基本url：

java 复制代码

@Container
private static final SocatContainer socat = new SocatContainer().withTarget(80, "model-runner.docker.internal");

@Bean
public OpenAiApi chatCompletionApi() {
	var baseUrl = "http://%s:%d/engines".formatted(socat.getHost(), socat.getMappedPort(80));
	return OpenAiApi.builder().baseUrl(baseUrl).apiKey("test").build();
}

您可以通过阅读使用Docker在本地运行LLMs的博客文章来了解更多关于Docker模型运行器的信息。

Auto-configuration

Spring AI启动器模块的工件ID自1.0.0.M7版本以来已被重命名。依赖项名称现在应该遵循模型、向量存储和MCP启动器的更新命名模式。有关更多信息，请参阅升级说明。

Spring AI为OpenAI聊天客户端提供Spring Boot自动配置。要启用它，请将以下依赖项添加到项目的Maven pom.xml文件中：

java 复制代码

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

或者将以下内容添加到Gradle build.Gradle构建文件中。

java 复制代码

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-openai'
}

请参阅依赖关系管理部分，将Spring AI BOM添加到构建文件中。

Chat Properties

Retry Properties

前缀spring.ai.retry用作属性前缀，允许您为OpenAI聊天模型配置重试机制。

|------------------------------------------|----------------------------------------------------------------------------------------------------|--------|
| spring.ai.retry.max-attempts | Maximum number of retry attempts. | 10 |
| spring.ai.retry.backoff.initial-interval | Initial sleep duration for the exponential backoff policy. | 2 sec. |
| spring.ai.retry.backoff.multiplier | Backoff interval multiplier. | 5 |
| spring.ai.retry.backoff.max-interval | Maximum backoff duration. | 3 min. |
| spring.ai.retry.on-client-errors | If false, throw a NonTransientAiException, and do not attempt retry for 4xx client error codes | false |
| spring.ai.retry.exclude-on-http-codes | List of HTTP status codes that should not trigger a retry (e.g. to throw NonTransientAiException). | empty |
| spring.ai.retry.on-http-codes | List of HTTP status codes that should trigger a retry (e.g. to throw TransientAiException). | empty |

Connection Properties

前缀spring.ai.openai用作属性前缀，允许您连接到openai。

|---------------------------|----------------------------------------------------------------------------------------------------------------|---|
| spring.ai.openai.base-url | The URL to connect to. Must be set to hub.docker.com/u/ai | - |
| spring.ai.openai.api-key | Any string | - |

Configuration Properties

启用和禁用聊天自动配置现在是通过前缀为spring.ai.model.chat的顶级属性完成的。

要启用，spring.ai.model.chat=openai（默认情况下已启用）

要禁用，spring.ai.model.chat=none（或任何与openai不匹配的值）

此更改允许在应用程序中配置多个模型。

前缀spring.ai.openai.chat是属性前缀，允许您为openai配置聊天模型实现。

所有前缀为spring.ai.openai.chat.options的属性都可以在运行时通过向Prompt调用添加特定于请求的runtime options来覆盖。

Runtime Options

OpenAiChatOptions.java提供了模型配置，例如要使用的模型、温度、频率惩罚等。

启动时，可以使用OpenAiChatModel（api，options）构造函数或spring.ai.openai.chat.options.*属性配置默认选项。

在运行时，您可以通过向Prompt调用添加新的、特定于请求的选项来覆盖默认选项。例如，要覆盖特定请求的默认型号和温度：

java 复制代码

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OpenAiChatOptions.builder()
            .model("ai/gemma3:4B-F16")
        .build()
    ));

除了特定于模型的OpenAiChatOptions之外，您还可以使用使用ChatOptions#builder（）创建的可移植ChatOptions实例。

Function Calling

Docker Model Runner支持在选择支持它的模型时调用工具/函数。

您可以在ChatModel中注册自定义Java函数，并让提供的模型智能地选择输出一个包含参数的JSON对象来调用一个或多个注册的函数。这是一种将LLM功能与外部工具和API连接起来的强大技术。

Tool Example

下面是一个简单的例子，说明如何在Spring AI中使用Docker Model Runner函数调用：

java 复制代码

spring.ai.openai.api-key=test
spring.ai.openai.base-url=http://localhost:12434/engines
spring.ai.openai.chat.options.model=ai/gemma3:4B-F16

java 复制代码

@SpringBootApplication
public class DockerModelRunnerLlmApplication {

    public static void main(String[] args) {
        SpringApplication.run(DockerModelRunnerLlmApplication.class, args);
    }

    @Bean
    CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
        return args -> {
            var chatClient = chatClientBuilder.build();

            var response = chatClient.prompt()
                .user("What is the weather in Amsterdam and Paris?")
                .functions("weatherFunction") // reference by bean name.
                .call()
                .content();

            System.out.println(response);
        };
    }

    @Bean
    @Description("Get the weather in location")
    public Function<WeatherRequest, WeatherResponse> weatherFunction() {
        return new MockWeatherService();
    }

    public static class MockWeatherService implements Function<WeatherRequest, WeatherResponse> {

        public record WeatherRequest(String location, String unit) {}
        public record WeatherResponse(double temp, String unit) {}

        @Override
        public WeatherResponse apply(WeatherRequest request) {
            double temperature = request.location().contains("Amsterdam") ? 20 : 25;
            return new WeatherResponse(temperature, request.unit);
        }
    }
}

在这个例子中，当模型需要天气信息时，它会自动调用weatherFunction bean，然后可以获取实时天气数据。预期的反应是："阿姆斯特丹的天气目前是20摄氏度，巴黎的天气目前为25摄氏度。"

阅读更多关于OpenAI函数调用的信息。

Sample Controller

创建一个新的Spring Boot项目，并将Spring ai starter模型openai添加到pom（或gradle）依赖项中。

在src/main/resources目录下添加一个application.properties文件，以启用和配置OpenAi聊天模型：

java 复制代码

spring.ai.openai.api-key=test
spring.ai.openai.base-url=http://localhost:12434/engines
spring.ai.openai.chat.options.model=ai/gemma3:4B-F16

# Docker Model Runner doesn't support embeddings, so we need to disable them.
spring.ai.openai.embedding.enabled=false

这是一个使用聊天模型生成文本的简单@Controller类的示例。

java 复制代码

@RestController
public class ChatController {

    private final OpenAiChatModel chatModel;

    @Autowired
    public ChatController(OpenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}