WSL2中配置Khoj+DeepSeekV4的本地Agents

自托管 Khoj + DeepSeek-compatible API 的部署与兼容性排障记录

本文记录一次在 Windows + WSL2 + Docker Compose 环境中部署 Khoj，并接入 DeepSeek-compatible API 作为 OpenAI-compatible provider 的完整排障过程。重点不是"如何从零安装 Khoj"，而是记录几个真实工程问题：WSL/Docker 存储迁移、containerd snapshot 损坏、Khoj 与 DeepSeek response_format 兼容性问题，以及最终通过 LiteLLM workaround 跑通本地 AI workflow 的路径。

背景与目标

我的目标是部署一个本地 self-hosted agent workflow，用来承担类似"个人 AI 助理 / workflow hub"的角色：

本地运行 Khoj，提供 Web UI、Agent、Automation、Search、Computer/Sandbox 等能力；
使用较低成本的 DeepSeek-compatible API，例如 deepseek-v4-flash；
通过 OpenAI-compatible API 接入，而不是直接使用昂贵的官方 OpenAI API；
后续希望用于每日岗位搜索、论文雷达、本地工程排障、个人知识库整理等任务。

最终目标架构大致是：

text 复制代码

Windows Browser
    ↓
Khoj Web UI :42110
    ↓
Khoj server container
    ↓
LiteLLM proxy container :4000
    ↓
DeepSeek-compatible API

环境概况

本次环境大致如下：

text 复制代码

Host OS: Windows
Runtime: WSL2 Ubuntu
Container runtime: native Docker inside WSL2
Khoj deployment: Docker Compose
Khoj version: v1.42.10
Khoj image: ghcr.io/khoj-ai/khoj:latest
LLM provider: DeepSeek-compatible OpenAI API
Model: deepseek-v4-flash / deepseek-v4-flash-khoj alias

需要注意：这里并不是 Docker Desktop，而是在 WSL2 Ubuntu 内部使用原生 Docker。这个选择本身后来影响了 Docker 数据位置、WSL 虚拟磁盘迁移和 /var/lib/docker 的排查方式。

第一阶段：WSL / Docker 存储问题

现象

最初在拉取 Khoj Docker 镜像时，Windows 上的 Ubuntu WSL 启动失败，报错类似：

text 复制代码

灾难性故障
错误代码: Wsl/Service/E_UNEXPECTED

同时发现 C 盘空间几乎被吃满。

判断

因为 Docker 是在 WSL2 Ubuntu 内部运行的原生 Docker，所以镜像、容器、volume、build cache 实际都写入 Ubuntu 发行版的虚拟磁盘 ext4.vhdx 中。也就是说，虽然 Docker 操作发生在 Linux 里，但底层空间仍然由 Windows 上的 WSL 虚拟磁盘承担。

典型路径类似：

text 复制代码

C:\Users\<user>\AppData\Local\Packages\CanonicalGroupLimited...\LocalState\ext4.vhdx

当 Khoj 拉取多个镜像，例如：

text 复制代码

ghcr.io/khoj-ai/khoj:latest
ghcr.io/khoj-ai/terrarium:latest
pgvector/pgvector:pg15
searxng/searxng:latest

时，C 盘空间迅速被撑大，导致 WSL 服务出现异常。

解决：使用 VHD 导出迁移 WSL

普通 tar 导出失败：

powershell 复制代码

wsl --export Ubuntu G:\WSLBackup\ubuntu_backup.tar

报错：

text 复制代码

bsdtar: (null)
错误代码: Wsl/Service/WSL_E_EXPORT_FAILED

后来改用 VHD 格式导出成功：

powershell 复制代码

wsl --export Ubuntu G:\WSLBackup\ubuntu_backup.vhdx --format vhd

然后注销旧发行版并导入到 G 盘：

powershell 复制代码

wsl --unregister Ubuntu
wsl --import Ubuntu G:\WSL\Ubuntu G:\WSLBackup\ubuntu_backup.vhdx --vhd

迁移后，Ubuntu 根目录显示：

bash 复制代码

df -h

类似：

text 复制代码

/dev/sdd       1007G   11G  946G   2% /

这说明 WSL 根文件系统已经迁移到 G 盘所在位置，不再继续占用 C 盘的默认 Store 路径。

第二阶段：Docker/containerd snapshot 损坏

现象

迁移后 Docker 能运行，但 docker compose up 创建容器时报错：

text 复制代码

Error response from daemon: failed to stat parent:
stat /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/41/fs:
no such file or directory

判断

这是 containerd overlay snapshot 元数据和实际文件不一致。很可能是之前拉镜像过程中 WSL 崩溃、磁盘满或 Docker 中断导致的。

一开始尝试删除 Docker 数据后，发现 docker system df 仍显示大量 images，例如：

text 复制代码

Images          6         0         12.55GB

这说明数据并没有真正清干净。原因是 docker.socket 仍然 active，即使停掉 docker.service，socket 也可能在下一次执行 docker 命令时自动拉起 daemon。

修复方式

正确清理需要先停掉 socket、service 和 containerd：

bash 复制代码

sudo systemctl stop docker.socket
sudo systemctl stop docker.service
sudo systemctl stop containerd.service

确认状态：

bash 复制代码

systemctl is-active docker.socket
systemctl is-active docker.service
systemctl is-active containerd.service

然后删除 Docker/containerd 数据：

bash 复制代码

sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd

再重启：

bash 复制代码

sudo systemctl start containerd.service
sudo systemctl start docker.socket
sudo systemctl start docker.service

确认清空：

bash 复制代码

docker system df
docker images
docker ps -a

之后重新：

bash 复制代码

docker compose pull
docker compose up

Khoj 服务能够进入正常初始化阶段。

第三阶段：Khoj 初始化成功，但聊天无响应

初始成功迹象

Khoj server 日志显示初始化成功：

text 复制代码

Initializing Khoj v1.42.10
Initializing DB
Created admin user
OpenAI chat model configuration complete
Starting Khoj
Loaded embedding model thenlper/gte-small

Web UI 可访问：

text 复制代码

http://localhost:42110

Admin panel 可访问：

text 复制代码

http://localhost:42110/server/admin

但是，在前端输入 hi 后没有任何回复，最终会保存一条空回复：

text 复制代码

Saved Conversation Turn
You (default): "hi"

Khoj: ""

核心报错

server 日志里出现：

text 复制代码

BadRequestError: Error code: 400 -
{'error': {'message': 'This response_format type is unavailable now',
'type': 'invalid_request_error',
'param': None,
'code': 'invalid_request_error'}}

调用栈指向：

text 复制代码

/app/src/khoj/routers/api_chat.py
/app/src/khoj/routers/helpers.py
/app/src/khoj/processor/conversation/openai/gpt.py
/app/src/khoj/processor/conversation/openai/utils.py
client.beta.chat.completions...

具体发生在 Khoj 的内部规划阶段：

text 复制代码

Chat actor: Infer information sources to refer

也就是说，即使只是问一句 hi，Khoj 也会先调用模型判断是否需要使用 search、files、tools、output format 等信息源。

第四阶段：定位到 `response_format=json_schema` 兼容性问题

直接原因

Khoj 对 OpenAI-compatible provider 默认使用了类似：

json 复制代码

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "PickTools",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "source": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "output": {
            "type": "string"
          }
        },
        "required": ["source", "output"],
        "additionalProperties": false
      }
    }
  }
}

但是 DeepSeek-compatible API 返回：

text 复制代码

This response_format type is unavailable now

这说明该 API 不支持 OpenAI newer structured outputs 的 json_schema 模式。它可能支持更简单的 JSON mode：

json 复制代码

{
  "response_format": {
    "type": "json_object"
  }
}

或者不支持当前请求中的 structured schema 参数。

为什么 `deepseek-reasoner` 可能不报错

后来 issue 里有贡献者指出，Khoj 源码里已经对 deepseek-reasoner 做了特殊处理：

python 复制代码

def get_openai_api_json_support(model_name: str, api_base_url: str = None) -> JsonSupport:
    if model_name.startswith("deepseek-reasoner"):
        return JsonSupport.NONE
    if api_base_url:
        host = urlparse(api_base_url).hostname
        if host and host.endswith(".ai.azure.com"):
            return JsonSupport.OBJECT
        if host == "api.deepinfra.com":
            return JsonSupport.OBJECT
    return JsonSupport.SCHEMA

这意味着：

deepseek-reasoner 被映射到 JsonSupport.NONE，Khoj 不会发送 response_format；
api.deepinfra.com 被映射到 JsonSupport.OBJECT，Khoj 会发送 response_format={"type":"json_object"}；
但官方 DeepSeek host api.deepseek.com 没有被包含，最终 fallback 到 JsonSupport.SCHEMA，导致发送 json_schema；
对于 deepseek-chat 或其他非 deepseek-reasoner 的 DeepSeek 模型，就容易触发 400 invalid_request_error。

贡献者提出的最小 patch 是：

diff 复制代码

 if host and host.endswith(".ai.azure.com"):
     return JsonSupport.OBJECT
 if host == "api.deepinfra.com":
     return JsonSupport.OBJECT
+if host == "api.deepseek.com":
+    return JsonSupport.OBJECT
 return JsonSupport.SCHEMA

这个 patch 的思想是：DeepSeek 官方 API 支持 JSON mode，但不支持 structured outputs 的 json_schema，因此应该将 api.deepseek.com 映射为 JsonSupport.OBJECT，而不是默认的 JsonSupport.SCHEMA。

第五阶段：尝试 LiteLLM workaround

目标

为了不直接 patch Khoj 容器源码，我引入 LiteLLM 作为本地 OpenAI-compatible proxy：

text 复制代码

Khoj → LiteLLM → DeepSeek-compatible API

其中 LiteLLM 负责：

接收 Khoj 的 OpenAI-compatible request；
将模型 alias 映射到真实上游模型；
删除或改写不兼容参数；
转发到 DeepSeek-compatible API。

LiteLLM 配置

litellm_config.yaml 示例：

yaml 复制代码

model_list:
  - model_name: deepseek-v4-flash-khoj
    litellm_params:
      model: openai/deepseek-v4-flash
      api_base: https://<your-provider>/v1
      api_key: os.environ/DEEPSEEK_API_KEY
      drop_params: true
      additional_drop_params:
        - response_format

litellm_settings:
  drop_params: true
  additional_drop_params:
    - response_format
  set_verbose: true

这里的重点是：

yaml 复制代码

additional_drop_params:
  - response_format

仅设置：

yaml 复制代码

drop_params: true

并不一定会删除 response_format，因为 response_format 本身是 OpenAI 协议中的已知参数，不会被 LiteLLM 自动视为 unknown param。

Docker Compose 中加入 LiteLLM

可以通过 docker-compose.override.yml 添加 LiteLLM：

yaml 复制代码

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-stable
    container_name: khoj-litellm
    command: ["--config", "/app/config.yaml", "--port", "4000"]
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    environment:
      - DEEPSEEK_API_KEY=<your-api-key>
    ports:
      - "4000:4000"

启动：

bash 复制代码

docker compose up -d litellm

验证 LiteLLM 自身可用：

bash 复制代码

curl -sS http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer anything" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash-khoj",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence."}
    ]
  }'

json 复制代码

{
  "choices": [
    {
      "message": {
        "content": "Hello!"
      }
    }
  ]
}

说明：

text 复制代码

WSL → LiteLLM → DeepSeek-compatible API

链路可用。

第六阶段：Khoj 正确绑定 LiteLLM

容器内访问地址问题

一开始尝试在 Khoj Admin 里配置：

text 复制代码

http://litellm:4000/v1

但 Khoj Admin 将其判定为 invalid URL。

原因是 litellm 作为 Docker Compose service name 在容器网络里可解析，但 Django/Admin 的 URL 校验可能认为它不是标准 hostname。

最终使用：

text 复制代码

http://host.docker.internal:4000/v1

并依赖 compose 中的：

yaml 复制代码

extra_hosts:
  - "host.docker.internal:host-gateway"

这样链路变为：

text 复制代码

Khoj server container
    ↓
http://host.docker.internal:4000/v1
    ↓
Docker host port 4000
    ↓
LiteLLM container
    ↓
DeepSeek-compatible API

Admin 中的关键配置

在 Khoj Admin 中新增或修改 AiModelApi：

text 复制代码

Name: liteLLM-DSv4
API Base URL: http://host.docker.internal:4000/v1
API Key: anything

然后在 ChatModel 中设置：

text 复制代码

Name: deepseek-v4-flash-khoj
Model Type: Openai
Ai Model Api: liteLLM-DSv4

这里有一个关键坑：不能仍然选择默认的 OpenAI API 配置，否则 Khoj 会把 deepseek-v4-flash-khoj 直接发给原始上游，导致上游报：

text 复制代码

The supported API model names are deepseek-v4-pro or deepseek-v4-flash,
but you passed deepseek-v4-flash-khoj.

正确情况应该是：

text 复制代码

Khoj sends model = deepseek-v4-flash-khoj
LiteLLM maps it to upstream model = deepseek-v4-flash

第七阶段：最终跑通

当 ChatModel 正确绑定到 liteLLM-DSv4 后，LiteLLM 日志确认接收到 Khoj 请求。

一开始，如果 response_format 没有被删除，LiteLLM 日志会显示：

text 复制代码

Final returned optional params: {
  'temperature': 0.8,
  'stream': True,
  'stream_options': {'include_usage': True},
  'response_format': {
    'type': 'json_schema',
    'json_schema': {
      ...
      'name': 'PickTools',
      'strict': True
    }
  }
}

并返回：

text 复制代码

400 Bad Request
This response_format type is unavailable now

加入：

yaml 复制代码

additional_drop_params:
  - response_format

之后，Khoj 前端能够正常返回：

text 复制代码

Hi there! 👋 It's great to hear from you. How can I assist you today?

这说明最终链路已经跑通：

text 复制代码

Khoj Web UI
→ Khoj server
→ LiteLLM proxy
→ DeepSeek-compatible API
→ response streamed back to Khoj

Workaround 总结

可用 workaround

当前可用方案是：

text 复制代码

Khoj → LiteLLM → DeepSeek-compatible API

LiteLLM 配置：

yaml 复制代码

litellm_settings:
  drop_params: true
  additional_drop_params:
    - response_format

并确保 Khoj 的 ChatModel 绑定到 LiteLLM 的 AiModelApi。

优点

不需要修改 Khoj 镜像源码；
可回滚、可维护；
可以在 LiteLLM 中统一管理模型 alias；
对第三方 DeepSeek-compatible endpoint 也适用；
后续切换模型只需要改 LiteLLM 配置。

缺点

删除整个 response_format 后，Khoj 无法使用 provider 原生 JSON mode；
某些依赖 structured output 的功能可能退化；
比直接修 Khoj 多了一层服务；
如果后续 Khoj 依赖其他 OpenAI-specific 参数，可能还需要继续在 LiteLLM 中 drop 或适配。

上游 patch 思路讨论

贡献者提出的 patch：

python 复制代码

if host == "api.deepseek.com":
    return JsonSupport.OBJECT

这比我的 LiteLLM workaround 更接近根因修复。

为什么是 `JsonSupport.OBJECT`，不是 `JsonSupport.NONE`

因为 DeepSeek 官方文档支持：

json 复制代码

{"type": "json_object"}

也就是 JSON mode。问题不是 DeepSeek 完全不支持 JSON 输出，而是不支持 OpenAI 新 structured outputs：

json 复制代码

{"type": "json_schema", ...}

所以更合理的是：

text 复制代码

JsonSupport.SCHEMA → JsonSupport.OBJECT

而不是：

text 复制代码

JsonSupport.SCHEMA → JsonSupport.NONE

这可以保留 Khoj 对结构化输出的基本需求，避免完全依赖自然语言 prompt。

这个 patch 的局限

这个 patch 只覆盖官方 host：

text 复制代码

api.deepseek.com

但很多用户使用的是第三方 DeepSeek-compatible endpoint、聚合平台或自定义 proxy，host 可能不是 api.deepseek.com。

因此更通用的长期设计可能是，在 Khoj 的 AiModelApi 或 ChatModel 中增加一个配置项：

text 复制代码

json_support = schema | object | none | auto

或者：

text 复制代码

structured_output_mode = json_schema | json_object | disabled

这样用户可以明确告诉 Khoj：这个 OpenAI-compatible provider 支持哪种程度的 JSON/structured output。

经验教训

这次排障的关键经验：

日志比 UI 更可靠。

UI 只是"不回复"，但 server 日志明确指出 response_format 错误。
OpenAI-compatible 不等于完全 OpenAI-compatible。

很多 provider 只支持 chat completions 的基础子集，不支持 OpenAI newer structured outputs。
Proxy（中转）是很实用的兼容层。

LiteLLM 不只是模型路由工具，也可以作为 OpenAI-compatible compatibility shim。
Workaround 和 upstream fix 是两回事。

LiteLLM drop response_format 能跑通，但更理想的上游修复是让 Khoj 正确区分 json_schema、json_object 和 none。
开源 issue 的价值不只是"报 bug"。

高质量 issue 应该包括环境、复现步骤、日志、根因假设、workaround，以及对潜在修复方向的讨论。

WSL2中配置Khoj+DeepSeekV4的本地Agents

自托管 Khoj + DeepSeek-compatible API 的部署与兼容性排障记录

背景与目标

环境概况

第一阶段：WSL / Docker 存储问题

现象

判断

解决：使用 VHD 导出迁移 WSL

第二阶段：Docker/containerd snapshot 损坏

现象

判断

修复方式

第三阶段：Khoj 初始化成功，但聊天无响应

初始成功迹象

核心报错

第四阶段：定位到 response_format=json_schema 兼容性问题

直接原因

为什么 deepseek-reasoner 可能不报错

第五阶段：尝试 LiteLLM workaround

目标

LiteLLM 配置

Docker Compose 中加入 LiteLLM

第六阶段：Khoj 正确绑定 LiteLLM

容器内访问地址问题

Admin 中的关键配置

第七阶段：最终跑通

Workaround 总结

可用 workaround

优点

缺点

上游 patch 思路讨论

为什么是 JsonSupport.OBJECT，不是 JsonSupport.NONE

这个 patch 的局限

经验教训

第四阶段：定位到 `response_format=json_schema` 兼容性问题

为什么 `deepseek-reasoner` 可能不报错

为什么是 `JsonSupport.OBJECT`，不是 `JsonSupport.NONE`