Jlama 是专为 Java 生态设计的现代化大型语言模型(LLM)推理引擎,旨在让开发者无需依赖外部服务即可直接在 Java 应用中运行本地化模型推理。
本文将结合Jlama + langchain4j 实现一个本地可运行的简单问答Demo
前置条件
- JDK 21
- IDEA 2023.3.6 社区版
- Maven
- 基本的Spring Boot知识
- 简单的前端开发经验
模型离线下载
这一步如果网络通畅,也可以通过代码运行时自动下载
模型从 Huging Face 下载到本地
在本地创建目录 tjake_Llama-3.2-1B-Instruct-JQ4
,将所有下载文件放入目录中,并手动创建 .finished
文件,如下图
示例代码
POM.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>java-ai</groupId>
<artifactId>io.ai</artifactId>
<version>1.0-SNAPSHOT</version>
<name>io.ai</name>
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>21</maven.compiler.release>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>com.github.tjake</groupId>
<artifactId>jlama-core</artifactId>
<version>0.8.4</version>
</dependency>
<dependency>
<groupId>com.github.tjake</groupId>
<artifactId>jlama-native</artifactId>
<!-- supports linux-x86_64, macos-x86_64/aarch_64, windows-x86_64
Use https://github.com/trustin/os-maven-plugin to detect os and arch -->
<classifier>windows-x86_64</classifier>
<version>0.8.4</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>1.0.0-beta2</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-jlama</artifactId>
<version>1.0.0-beta2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<source>21</source>
<target>21</target>
<encoding>UTF-8</encoding>
<compilerArgs>--enable-preview</compilerArgs>
</configuration>
</plugin>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
服务接口
接口使用了SSE,也可以直接通过 ChatLanguageModel 模型同步输出结果
typescript
package io.ai;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.chat.response.StreamingChatResponseHandler;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
import java.util.UUID;
@RestController
public class ChatController {
@Autowired
private StreamingChatLanguageModel model;
@GetMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter chat(@RequestParam("prompt") String prompt) {
// 创建一个 SseEmitter 实例
SseEmitter emitter = new SseEmitter();
model.chat(prompt, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
//System.out.println("partialResponse\n" + partialResponse);
try {
emitter.send(SseEmitter
.event()
.id(UUID.randomUUID().toString())
.name("message").data(partialResponse));
} catch (Exception e) {
emitter.completeWithError(e);
}
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
//System.out.println("completeResponse\n" + completeResponse.aiMessage().text());
//emitter.complete();
}
@Override
public void onError(Throwable error) {
emitter.completeWithError(error);
}
});
return emitter;
}
}
启动类
注意,启动时需要配置IDEA Edit Configurations 增加参数VM参数 --add-modules jdk.incubator.vector
java
package io.ai;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.jlama.JlamaStreamingChatModel;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import java.nio.file.Path;
@SpringBootApplication
public class App {
public static void main(String[] args) throws Exception {
SpringApplication.run(App.class);
}
@Bean
public StreamingChatLanguageModel chatLanguageMode(){
return JlamaStreamingChatModel.builder()
.modelCachePath(Path.of("D:\\dev_llm\\jlama\\models"))
.modelName("tjake/Llama-3.2-1B-Instruct-JQ4")
.temperature(0.7f)
.build();
}
}
前端界面
xml
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>在线问答Demo</title>
<style>
#answer {
height: 300px;
overflow-y: auto;
border: 1px solid #ccc;
margin-top: 10px;
padding: 10px;
white-space: pre-wrap;
}
</style>
</head>
<body>
<h1>在线问答</h1>
<input type="text" id="question" style="width:700px;" placeholder="输入您的问题">
<button onclick="askQuestion()">提问</button>
<div id="answer"></div>
<script>
function askQuestion() {
const question = document.getElementById('question').value;
const answerDiv = document.getElementById('answer');
// 清空之前的回答
answerDiv.innerHTML = '';
// 创建 EventSource 连接到 SSE 端点
const eventSource = new EventSource(`/chat?prompt=${question}`);
// 监听消息事件
eventSource.onmessage = (event) => {
// 将接收到的数据添加到答案区域
answerDiv.textContent += event.data + ' ';
};
// 错误
eventSource.onerror = (err) => {
console.error("EventSource failed:", err);
answerDiv.innerHTML += "Error: 与服务器通信时发生错误。";
eventSource.close();
};
// 关闭
eventSource.onclose = () =>{
answerDiv.innerHTML += "连接关闭";
}
}
</script>
</body>
</html>
效果
界面简单实现了下,目前还比较粗糙,核心是整体流程的闭环。