Ollama【部署 02】Linux本地化部署及SpringBoot2.X集成Ollama(ollama-linux-amd64.tgz最新版本 0.6.2)

安装资源分享:

百度网盘链接: pan.baidu.com/s/17qK0Nx73... 提取码: tc61

包含文件:

  • Windows OllamaStep.exe(版本 0.5.7)
  • Linux ollama-linux-amd64-0.3.9.tgz
  • Linux ollama-linux-amd64-0.5.11.tgz
  • Linux ollama-linux-amd64-0.6.2.tgz
  • Chatbox-1.9.8-Step.exe(实现客户端操作)
  • AnythingLLMDesktop-v1.7.4.exe(实现客户端操作)

1.本地部署

1.1 软件安装

1.1.1 脚本安装

shell 复制代码
curl -fsSL https://ollama.com/install.sh | sh

脚本安装默认路径是 /usr/local,如果此目录挂载空间不大,建议手动安装。

1.1.2 手动安装

点击 Manual install instructions可查看《手动安装文档》包含:

  • 一般 Linux 安装
  • AMD GPU install
  • ARM64 install
  • Adding Ollama as a startup service (recommended)
  • Install CUDA drivers (optional)
  • Install AMD ROCm drivers (optional)
  • Customizing
  • Updating
  • Installing specific versions
  • Viewing logs
  • Uninstall

本次进行一般安装,流程如下:

bash 复制代码
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz

下载非常慢,文件1604M显示需要54 小时有点难绷,网盘上也有分享,版本是 0.3.9

下载:

bash 复制代码
# 使用后台下载
nohup curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz > download.log 2>&1 &
# 使用 wget 进行断点续传
nohup wget -c https://ollama.com/download/ollama-linux-amd64.tgz > download.log 2>&1 &
# wget跟随的最新url
https://github.com/ollama/ollama/releases/latest/download/ollama-linux-amd64.tgz

下载失败后可以续传,虽然还是很慢,但是之前下载的数据没有白费:

解压、启动:

bash 复制代码
# 解压
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
# 启动
ollama serve
# 验证
ollama -v
bash 复制代码
# 使用后台启动
nohup ./ollama serve >> serve.log 2>&1 &

# CPU
msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="156.1 GiB" available="105.4 GiB"
# GPU
msg="inference compute" id=GPU-5892a465-7090-90e9-d072-f04f3e56380a library=cuda variant=v11 compute=7.5 driver=0.0 name="" total="14.8 GiB" available="11.9 GiB"'
# 启动后 调用时的信息【1.5b】
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:2F:00.0 Off |                    0 |
| N/A   77C    P0    34W /  70W |   2831MiB / 15109MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    281537      C   ...a_v11/ollama_llama_server     1809MiB |
+-----------------------------------------------------------------------------+

1.2 模型安装

安装命令跟 Windows 版本的一致:

bash 复制代码
ollama run deepseek-r1:1.5b

我使用的是 root 用户,模型安装在 /root/.ollama/models目录下。

1.3 端口映射

由于默认绑定的是 127.0.0.1使用 Nginx 将 Ollama 服务的端口映射出来:

nginx 复制代码
server {
  listen 11435;
  server_name localhost;

  location / {
    proxy_pass http://127.0.0.1:11434;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}
  • 要注意防火墙。

也可以设置 Ollama 的环境变量 OLLAMA_HOST=0.0.0.0 监听所有可用的网络接口,从而允许外部网络访问:

bash 复制代码
export OLLAMA_HOST=0.0.0.0:11434
nohup ./ollama serve >> serve.log 2>&1 &

Ollama 的环境变量:

bash 复制代码
Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_TMPDIR              Location for temporary files
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection

2.SpringBoot 集成

Spring AI 官方文档 Ollama 的 API 说明文档

Spring AI supports Spring Boot 3.2.x and 3.3.x

最低 JDK 要求为 17

由于项目使用的依然是 JDK8 这里自行封装 API。

2.1 配置信息及配置类

yaml 复制代码
spring:
  ai:
    ollama:
      baseUrl: http://192.168.0.1:11434
      temperature: 0.8
      maxTokens: 4096
      stream: false
java 复制代码
@Data
@Component
@ConfigurationProperties(prefix = "spring.ai.ollama")
public class OllamaConfig {
    /**
     * 调用路径
     */
    public String baseUrl;
    /**
     * 严谨与想象【0-2】
     */
    public double temperature = 0.8;
    /**
     * 最大token量
     */
    public int maxTokens = 4096;
    /**
     * 是否流输出
     */
    public boolean stream = false;
}

2.2 请求及响应对象

公共的信息类:

java 复制代码
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Message {

    @ApiModelProperty("角色")
    private String role;

    @ApiModelProperty("内容")
    private String content;

}

请求对象封装,两个请求对象属性很相似:

java 复制代码
@Data
@NoArgsConstructor
@AllArgsConstructor
public class GenerateReq {

    @ApiModelProperty("模型")
    private String model;

    @ApiModelProperty("提示")
    private String prompt;

    @ApiModelProperty("是否流输出")
    private boolean stream = false;

}

@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatReq {

    @ApiModelProperty("模型")
    private String model;

    @ApiModelProperty("提示")
    private List<Message> messages;

    @ApiModelProperty("是否流输出")
    private boolean stream = false;

}

响应对象封装,也是比较相似的:

java 复制代码
@Data
@NoArgsConstructor
@AllArgsConstructor
public class GenerateRes {

    private String model;
    private String created_at;
    private String response;
    private String done_reason;
    private boolean done;
    private int total_duration;
    private int load_duration;
    private int prompt_eval_count;
    private int prompt_eval_duration;
    private int eval_count;
    private int eval_duration;

}

@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatRes {
    
    private String model;
    private String created_at;
    private Message message;
    private String done_reason;
    private boolean done;
    private long total_duration;
    private long load_duration;
    private long prompt_eval_count;
    private long prompt_eval_duration;
    private long eval_count;
    private long eval_duration;

}

2.3 OllamaController

简单弄两个接口进行测试:

java 复制代码
@Slf4j
@RestController
@RequestMapping("/api")
public class OllamaController {

    @Resource
    private OllamaComponent ollamaComponent;

    @PostMapping("/generate")
    @ApiOperation(value = "生成补全", tags = {"Chat"})
    public R<Object> generate(@RequestBody GenerateReq req) {
        return ollamaComponent.generate(req);
    }

    @PostMapping("/chat")
    @ApiOperation(value = "生成聊天补全", tags = {"Chat"})
    public R<Object> chat(@RequestBody ChatReq req) {
        return ollamaComponent.chat(req);
    }

}

2.4 OllamaComponent

调用服务的类:

java 复制代码
@Slf4j
@Component
public class OllamaComponent {

    @Resource
    private OllamaUtil ollamaUtil;

    public R<Object> generate(GenerateReq req) {
        GenerateRes generateRes = null;
        try {
            generateRes = ollamaUtil.generate(req);
        } catch (Exception e) {
            e.printStackTrace();
            log.error("generate Failed!");
        }
        return R.ok(generateRes);
    }

    public R<Object> chat(ChatReq req) {
        ChatRes chatRes = null;
        try {
            chatRes = ollamaUtil.chat(req);
        } catch (Exception e) {
            e.printStackTrace();
            log.error("chat Failed!");
        }
        return R.ok(chatRes);
    }
}

2.5 OllamaUtil

发送接口请求的工具类:

java 复制代码
@Slf4j
@Component
public class OllamaUtil {

    @Resource
    private OllamaConfig ollamaConfig;

    private static final int CODE200 = 200;

    public GenerateRes generate(GenerateReq req) {
        try {
            String bodyStr = ollamaRemoteApi("/api/generate", req);
            return JSON.parseObject(bodyStr, GenerateRes.class);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public ChatRes chat(ChatReq req) {
        try {
            String bodyStr = ollamaRemoteApi("/api/chat", req);
            return JSON.parseObject(bodyStr, ChatRes.class);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    private String ollamaRemoteApi(String url, Object req) throws IOException {
        String bodyStr = "";
        OkHttpClient client = OkHttpClientConfig.getUnsafeOkHttpClient();
        MediaType mediaType = MediaType.parse("application/json");
        RequestBody body = RequestBody.create(mediaType, JSON.toJSONString(req));
        Request request = new Request.Builder()
                .url(ollamaConfig.baseUrl + url)
                .method("POST", body)
                .addHeader("User-Agent", "Apifox/1.0.0 (https://apifox.com)")
                .addHeader("Content-Type", "application/json")
                .addHeader("Accept", "*/*")
                .addHeader("Connection", "keep-alive")
                .build();
        Response response = client.newCall(request).execute();
        if (response.code() == CODE200) {
            bodyStr = response.body().string();
        }
        return bodyStr;
    }
}

2.6 OkHttpClientConfig

这个类主要是为了绕过 https,一般的请求也适用,由开发工程师 dongliang7贡献,特此感谢:

java 复制代码
/**
 * 创建 OkHttpClient 不进行SSL(证书)验证
 *
 * @author dongliang7
 * @date  2021年11月19日 09:50:00
 */
@Slf4j
public class OkHttpClientConfig {

    public static OkHttpClient getUnsafeOkHttpClient() {
        try {
            // 创建不验证证书链的信任管理器
            final TrustManager[] trustAllCerts = new TrustManager[]{
                    new X509TrustManager() {
                        @Override
                        public void checkClientTrusted(java.security.cert.X509Certificate[] chain, String authType) {
                        }

                        @Override
                        public void checkServerTrusted(java.security.cert.X509Certificate[] chain, String authType) {
                        }

                        @Override
                        public java.security.cert.X509Certificate[] getAcceptedIssuers() {
                            return new java.security.cert.X509Certificate[]{};
                        }
                    }
            };
            // 安装全信任信任管理器
            final SSLContext sslContext = SSLContext.getInstance("SSL");
            sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
            // 使用我们完全信任的管理器创建 ssl 套接字工厂
            final SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();
            OkHttpClient.Builder builder = new OkHttpClient.Builder()
                    .connectTimeout(60, TimeUnit.SECONDS)
                    .readTimeout(60, TimeUnit.SECONDS)
                    .writeTimeout(120, TimeUnit.SECONDS);
            builder.sslSocketFactory(sslSocketFactory);
            builder.hostnameVerifier((hostname, session) -> true);
            return builder.build();
        } catch (Exception e) {
            log.error("创建OkHttpClient不进行SSL(证书)验证失败:{}", e.getMessage());
            throw new RuntimeException(e);
        }
    }
}

3.更新日志

  • 20250326 网盘添加 ollama-linux-amd64-0.5.11.tgz 0.5.11版本资源

4.小小的总结

本文介绍了Ollama在Linux系统上的本地化部署方法,包括脚本安装和手动安装两种方式。手动安装部分详细说明了下载、解压和启动流程,并提供了环境变量配置建议。文章还介绍了模型安装方法,以及通过Nginx进行端口映射的配置方案。最后列出了Ollama服务相关的环境变量参数说明,为SpringBoot集成提供基础支持。网盘分享了多个版本的安装文件和客户端工具。

相关推荐
菜鸟谢14 小时前
Rust 智能指针完整详解
后端
大模型真好玩14 小时前
什么是Loop Engineering?最通俗易懂的Loop Engineering核心概念
人工智能·agent·deepseek
菜鸟谢14 小时前
Rust 函数完整知识点详解
后端
叁两14 小时前
前端转型AI Agent该如何学习?(前置篇)
前端·人工智能·node.js
爱勇宝14 小时前
淡泊名利之前,先承认我们都很焦虑
前端·后端·程序员
菜鸟谢14 小时前
Rust 闭包(Closure)完整详解
后端
ServBay14 小时前
如何利用本地技术栈构建 0 成本 AI SaaS 雏形
后端·aigc·ai编程
菜鸟谢14 小时前
Rust 集合 + 迭代器完整详解
后端
杨利杰YJlio14 小时前
Codex桌面客户端上手:项目、插件与自动化实战
前端·后端
常铭14 小时前
【Java基础】01-HashMap的底层原理
后端·面试