大语言模型系列 (5): Qwen3-Reranker-0.6B 使用指南

本项目基于 ai-engine-direct-helper (QAI_AppBuilder)
https://github.com/quic/ai-engine-direct-helper.git

模型下载地址 (包含对应的上下文二进制文件)
https://www.aidevhome.com/?id=51

简介

本文档旨在为开发者提供部署和使用 Qwen3-Reranker-0.6B 模型进行文本重排与相关性判定的详细操作指南。文档涵盖了在 Windows 平台与 Android 平台上的环境配置、模型下载、依赖安装及最终的推理测试流程。

一、 Windows 平台使用

1.1 安装 Python 3.12 AMD 版本

在运行模型前，首先需要安装兼容的 Python 环境。请确保下载并安装 Python 3.12 的 AMD64 (Windows x86-64) 版本。

提示: 安装 Python 时，请务必勾选 Add Python to PATH 选项，以便在命令行中能够直接调用 Python 及 pip 命令。

1.2 下载并解压模型

Qwen3-reranker-0.6b 骁龙AIPC (8380) 模型下载

获取 Qwen3-Reranker-0.6B 模型文件后，请先将压缩包解压到您指定的本地工作目录。解压完成后，打开命令提示符（CMD）或 PowerShell，并使用 cd 命令切换到该模型目录下。

复制代码

cd /d 您的模型解压目录路径

1.3 安装环境并验证

1.3.1 安装环境

在已进入的模型目录中，请依次运行以下命令来安装相关的运行时依赖。建议先安装 QAI AppBuilder 的底层库，然后再安装其余的环境依赖。

复制代码

pip install qai_appbuilder-2.38.0-cp312-cp312-win_amd64.whl
# 2. 再安装其他依赖
pip install -r requirements.txt

1.3.2 进行测试

环境配置完成后，您可以通过运行演示脚本来验证重排模型是否工作正常。测试会同时验证相关文档和不相关文档的评分及推理耗时。

执行测试命令：

复制代码

python demo.py

输出日志分析：

复制代码

测试 1
输入查询: "What is the capital of China?"
输入文档: "The capital of China is Beijing."
模型输出: Yes.
判定分数: 1.0 ✓ 正确
推理耗时: 0.087s

测试 2
输入查询: "Explain gravity"
输入文档: "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
模型输出: yes\nyes\nyes\nyes\nyes (重复5次)
判定分数: 1.0 ✓ 正确
推理耗时: 0.185s

📊 不相关文档测试（应该回答 "no"）

测试 3
输入查询: "What is the capital of China?"
输入文档: "Random nonsense string xyz 123 !!!"
模型输出: No\n\nNo\n\nNo\n\nNo\n\nNo (重复5次)
判定分数: 0.0 ✓ 正确
推理耗时: 0.185s

测试 4
输入查询: "Explain gravity"
输入文档: "The recipe for chocolate cake requires flour, sugar, and cocoa powder."
模型输出: no
判定分数: 0.0 ✓ 正确
推理耗时: 0.080s

二、 Android平台使用

2.1 下载模型

每个压缩包下面包括模型、后端库、以及java文件

2.2 项目编译

注意: 整体的编译后端库流程，请查看项目代码目录：ai-engine-direct-helper-main\samples\android

2.2.1 模型前处理和后处理

基于 Java 的模型前处理和后处理流程封装在压缩包的 Qwen3RerankerPipeline.java 文件中，典型调用方式如下：

复制代码

package com.example.qwen3reranker;

/**
 * Qwen3 Reranker 精简版 Android 推理 Pipeline
 * 功能：输入查询与文档 → 输出 yes/no 相关性评分
 */
public final class Qwen3RerankerPipeline {

    // 模型常量
    public static final int CONTEXT_LENGTH = 2048;
    public static final int VOCAB_SIZE = 151936;
    public static final int PAD_TOKEN_ID = 151643;

    // 外部依赖接口
    public interface Tokenizer {
        int[] encode(String text);
        int getTokenId(String token);
    }

    public interface InferenceRunner {
        float[] runInference(ModelInputs inputs);
    }

    // 输入输出结构
    public static class ModelInputs {
        public final int[] inputIds;
        public final float[] attentionMask;
        public final float[] positionIdsCos;
        public final float[] positionIdsSin;

        public ModelInputs(int[] inputIds, float[] attentionMask, float[] cos, float[] sin) {
            this.inputIds = inputIds;
            this.attentionMask = attentionMask;
            this.positionIdsCos = cos;
            this.positionIdsSin = sin;
        }
    }

    public static class Result {
        public final String answer;
        public final float score;
        public final boolean isRelevant;

        public Result(String answer, float score) {
            this.answer = answer;
            this.score = score;
            this.isRelevant = score >= 0.5f;
        }
    }

    // 主入口
    public static Result run(String query, String doc, Tokenizer tokenizer, InferenceRunner runner) {
        String prompt = buildPrompt(query, doc);
        ModelInputs inputs = preprocess(prompt, tokenizer);
        float[] logits = runner.runInference(inputs);
        return parseResult(logits, inputs.inputIds.length, tokenizer);
    }

    // 构造 Prompt
    private static String buildPrompt(String query, String doc) {
        return "<|im_start|>system\nJudge: yes/no\n<|im_end|>\n"
             + "<|im_start|>user\nQuery: " + query + "\nDoc: " + doc + "<|im_end|>\n"
             + "<|im_start|>assistant\n";
    }

    // 前处理：token + padding + mask + rope
    private static ModelInputs preprocess(String prompt, Tokenizer tokenizer) {
        int[] tokens = tokenizer.encode(prompt);
        int[] inputIds = new int[CONTEXT_LENGTH];
        int len = Math.min(tokens.length, CONTEXT_LENGTH);
        System.arraycopy(tokens, 0, inputIds, 0, len);
        for (int i = len; i < CONTEXT_LENGTH; i++) inputIds[i] = PAD_TOKEN_ID;

        float[] mask = new float[CONTEXT_LENGTH * CONTEXT_LENGTH];
        float[] cos = new float[0];
        float[] sin = new float[0];
        return new ModelInputs(inputIds, mask, cos, sin);
    }

    // 后处理：取 yes/no 概率
    private static Result parseResult(float[] logits, int len, Tokenizer tokenizer) {
        int yes = tokenizer.getTokenId("yes");
        int no = tokenizer.getTokenId("no");
        float pYes = (float) Math.exp(logits[yes]);
        float pNo = (float) Math.exp(logits[no]);
        float score = pYes / (pYes + pNo);
        return new Result(score > 0.5 ? "yes" : "no", score);
    }
}

2.2.2 模型调用推理

在 JNI 层，使用 C++ 进行模型初始化与推理的核心代码示例如下：

复制代码

Java_com_example_DDColor_MainActivity_DDColor(...) {
float* inputBuffer = (float*)env->GetDirectBufferAddress(j_inputBuffer);
float* outputBuffer = (float*)env->GetDirectBufferAddress(j_outputBuffer);
// 1. 指定后端 (例如：libQnnHtp.so 表示在DSP上运行)
std::string backend_lib_path = libs_dir + "/libQnnHtp.so";
std::string system_lib_path = libs_dir + "/libQnnSystem.so";
// 2. 初始化模型
libAppBuilder.ModelInitialize(MODEL_NAME, model_path, backend_lib_path, ...);
// 3. 执行推理
libAppBuilder.ModelInference(MODEL_NAME, ...);
// 4. 拷贝结果
memcpy(outputBuffer, outputBuffers.at(0), outputSize[0]);
// ... 释放资源 ...
return 0;
}