目录
[企业知识库 MCP Server 设计方案](#企业知识库 MCP Server 设计方案)
[1. 文档操作工具](#1. 文档操作工具)
[2. 智能处理工具](#2. 智能处理工具)
[3. 管理与集成工具](#3. 管理与集成工具)
[1. 启动服务器](#1. 启动服务器)
[2. Claude Desktop 配置](#2. Claude Desktop 配置)
[3. 使用示例](#3. 使用示例)
[1. 实时协作](#1. 实时协作)
[2. 高级搜索](#2. 高级搜索)
[3. 工作流集成](#3. 工作流集成)
[4. 分析与洞察](#4. 分析与洞察)
[5. 移动端支持](#5. 移动端支持)
企业知识库 MCP Server 设计方案
一、需求分析与架构设计
核心需求
-
文档管理:上传、搜索、检索、更新企业文档
-
权限控制:基于角色的访问控制(RBAC)
-
多格式支持:Markdown、PDF、Word、HTML、纯文本
-
智能功能:文档摘要、翻译、相似文档推荐
-
审计与监控:操作日志、使用统计
-
集成能力:与现有系统(Confluence、Git、SharePoint)集成
系统架构
┌─────────────────────────────────────┐
│ MCP Client (Claude) │
└───────────────┬─────────────────────┘
│ SSE/HTTP
┌───────────────▼─────────────────────┐
│ 企业知识库 MCP Server │
├───────────────┬─────────────────────┤
│ API Gateway │ Auth Middleware │
│ Tool Router │ Rate Limiter │
│ Cache Layer │ Audit Logger │
└───────────────┴─────────────────────┘
│
┌───────────────┬─────────────────────┐
│ Search Engine │ Vector Database │
│ (Elasticsearch)│ (Pinecone/Qdrant) │
└───────────────┴─────────────────────┘
│
┌───────────────┬─────────────────────┐
│ Document Store│ External Systems │
│ (S3/MinIO) │ (Confluence/Git) │
└───────────────┴─────────────────────┘
二、工具(Tools)设计
1. 文档操作工具
// 完整工具集定义
{
"tools": [
{
"name": "knowledge_search",
"description": "搜索企业知识库文档",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"filters": {
"type": "object",
"description": "过滤条件",
"properties": {
"department": {
"type": "string",
"enum": ["engineering", "sales", "hr", "finance"]
},
"doc_type": {
"type": "string",
"enum": ["policy", "guide", "api", "meeting"]
},
"author": { "type": "string" },
"date_range": {
"type": "object",
"properties": {
"from": { "type": "string", "format": "date" },
"to": { "type": "string", "format": "date" }
}
},
"security_level": {
"type": "string",
"enum": ["public", "internal", "confidential"]
}
}
},
"limit": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"default": 10
},
"page": {
"type": "integer",
"minimum": 1,
"default": 1
},
"sort_by": {
"type": "string",
"enum": ["relevance", "date_desc", "date_asc", "views"],
"default": "relevance"
}
},
"required": ["query"]
}
},
{
"name": "knowledge_upload",
"description": "上传文档到知识库",
"inputSchema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"content": { "type": "string" },
"file_content": {
"type": "string",
"description": "Base64编码的文件内容"
},
"file_type": {
"type": "string",
"enum": ["text", "markdown", "pdf", "docx", "html"]
},
"metadata": {
"type": "object",
"properties": {
"department": { "type": "string" },
"tags": { "type": "array", "items": { "type": "string" } },
"security_level": { "type": "string" },
"expires_at": { "type": "string", "format": "date" }
}
}
},
"required": ["title"]
}
},
{
"name": "get_document",
"description": "获取特定文档内容",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"include_metadata": { "type": "boolean", "default": true },
"include_embeddings": { "type": "boolean", "default": false }
},
"required": ["doc_id"]
}
},
{
"name": "update_document",
"description": "更新文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"content": { "type": "string" },
"metadata": { "type": "object" },
"update_reason": { "type": "string" }
},
"required": ["doc_id"]
}
},
{
"name": "delete_document",
"description": "删除文档(需要确认)",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"reason": { "type": "string" }
},
"required": ["doc_id", "reason"]
}
}
]
}
2. 智能处理工具
{
"tools": [
{
"name": "summarize_document",
"description": "生成文档摘要",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"length": {
"type": "string",
"enum": ["short", "medium", "detailed"],
"default": "medium"
},
"language": { "type": "string", "default": "zh" }
},
"required": ["doc_id"]
}
},
{
"name": "translate_document",
"description": "翻译文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"target_language": { "type": "string" },
"include_original": { "type": "boolean", "default": false }
},
"required": ["doc_id", "target_language"]
}
},
{
"name": "find_similar",
"description": "查找相似文档",
"inputSchema": {
"type": "object",
"properties": {
"doc_id": { "type": "string" },
"query": { "type": "string" },
"top_k": { "type": "integer", "default": 5 },
"similarity_threshold": {
"type": "number",
"minimum": 0,
"maximum": 1,
"default": 0.7
}
}
}
},
{
"name": "ask_question",
"description": "基于文档内容问答",
"inputSchema": {
"type": "object",
"properties": {
"question": { "type": "string" },
"doc_ids": {
"type": "array",
"items": { "type": "string" },
"description": "指定在哪些文档中搜索"
},
"scope": {
"type": "string",
"enum": ["all", "department", "personal"],
"default": "all"
},
"include_sources": { "type": "boolean", "default": true }
},
"required": ["question"]
}
}
]
}
3. 管理与集成工具
{
"tools": [
{
"name": "sync_external",
"description": "同步外部系统文档",
"inputSchema": {
"type": "object",
"properties": {
"source": {
"type": "string",
"enum": ["confluence", "github", "sharepoint", "notion"]
},
"config": { "type": "object" },
"full_sync": { "type": "boolean", "default": false }
},
"required": ["source"]
}
},
{
"name": "generate_report",
"description": "生成知识库使用报告",
"inputSchema": {
"type": "object",
"properties": {
"report_type": {
"type": "string",
"enum": ["usage", "coverage", "freshness", "popularity"]
},
"time_range": {
"type": "object",
"properties": {
"start": { "type": "string", "format": "date" },
"end": { "type": "string", "format": "date" }
}
},
"format": {
"type": "string",
"enum": ["markdown", "json", "html"],
"default": "markdown"
}
},
"required": ["report_type"]
}
},
{
"name": "manage_permissions",
"description": "管理文档权限",
"inputSchema": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["grant", "revoke", "list"]
},
"doc_id": { "type": "string" },
"user_or_group": { "type": "string" },
"permission": {
"type": "string",
"enum": ["read", "write", "admin"]
}
},
"required": ["action"]
}
}
]
}
三、资源(Resources)设计
{
"resources": [
{
"uri": "knowledge://recent/{limit?}",
"name": "最近更新文档",
"description": "最近更新的知识库文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://popular/{limit?}",
"name": "热门文档",
"description": "查看量最高的文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://stats/overview",
"name": "知识库统计概览",
"description": "知识库使用统计信息",
"mimeType": "application/json"
},
{
"uri": "knowledge://category/{category}",
"name": "分类文档",
"description": "按分类浏览文档",
"mimeType": "application/json"
},
{
"uri": "knowledge://user/{user_id}/recent",
"name": "用户最近访问",
"description": "用户最近访问的文档",
"mimeType": "application/json"
}
]
}
四、完整实现示例(Python)
python
import asyncio
import json
import base64
from typing import Dict, Any, List, Optional
from datetime import datetime
from enum import Enum
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from pydantic import BaseModel, Field
import aiohttp
from elasticsearch import AsyncElasticsearch
from qdrant_client import QdrantClient
import hashlib
import uuid
# ========== 数据模型 ==========
class DocumentMetadata(BaseModel):
department: str
doc_type: str = Field(default="document")
security_level: str = Field(default="internal")
tags: List[str] = Field(default_factory=list)
author: str
created_at: datetime = Field(default_factory=datetime.now)
updated_at: datetime = Field(default_factory=datetime.now)
expires_at: Optional[datetime] = None
views: int = 0
permissions: Dict[str, List[str]] = Field(default_factory=dict)
class Document(BaseModel):
id: str = Field(default_factory=lambda: str(uuid.uuid4()))
title: str
content: str
summary: Optional[str] = None
metadata: DocumentMetadata
embeddings: Optional[List[float]] = None
file_hash: str # 用于去重
class SearchFilters(BaseModel):
department: Optional[str] = None
doc_type: Optional[str] = None
security_level: Optional[str] = None
author: Optional[str] = None
tags: Optional[List[str]] = None
date_range: Optional[Dict[str, datetime]] = None
# ========== 权限控制 ==========
class PermissionManager:
ROLES = {
"admin": ["read", "write", "delete", "manage"],
"editor": ["read", "write"],
"viewer": ["read"],
"guest": ["read_public"]
}
def __init__(self):
self.user_roles: Dict[str, str] = {}
def check_permission(self, user: str, doc: Document, action: str) -> bool:
"""检查用户对文档的权限"""
role = self.user_roles.get(user, "guest")
# 检查角色权限
if action not in self.ROLES.get(role, []):
return False
# 检查文档级别的权限
if doc.metadata.security_level == "confidential":
return role in ["admin", "editor"]
# 检查自定义权限
if user in doc.metadata.permissions.get(action, []):
return True
return role in ["admin", "editor"]
# ========== 知识库 MCP Server ==========
class KnowledgeBaseMCPServer:
def __init__(self):
# 存储
self.documents: Dict[str, Document] = {}
self.permission_mgr = PermissionManager()
# 搜索和向量数据库
self.es = AsyncElasticsearch(["localhost:9200"])
self.qdrant = QdrantClient("localhost", port=6333)
# 缓存
self.cache = {}
# 审计日志
self.audit_log = []
# ========== 工具实现 ==========
async def handle_knowledge_search(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""搜索文档实现"""
try:
query = params.get("query", "")
filters = params.get("filters", {})
limit = params.get("limit", 10)
page = params.get("page", 1)
# 构建 ES 查询
es_query = {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": query,
"fields": ["title^3", "content", "summary^2"],
"type": "best_fields"
}
}],
"filter": []
}
},
"from": (page - 1) * limit,
"size": limit
}
# 添加过滤器
if filters:
filter_clauses = []
if department := filters.get("department"):
filter_clauses.append({"term": {"department": department}})
if doc_type := filters.get("doc_type"):
filter_clauses.append({"term": {"doc_type": doc_type}})
if security_level := filters.get("security_level"):
filter_clauses.append({"term": {"security_level": security_level}})
if date_range := filters.get("date_range"):
filter_clauses.append({
"range": {
"created_at": {
"gte": date_range.get("from"),
"lte": date_range.get("to")
}
}
})
if filter_clauses:
es_query["query"]["bool"]["filter"] = filter_clauses
# 执行搜索
response = await self.es.search(
index="knowledge_docs",
body=es_query
)
# 处理结果
results = []
for hit in response["hits"]["hits"]:
doc = hit["_source"]
results.append({
"id": doc["id"],
"title": doc["title"],
"summary": doc.get("summary", ""),
"score": hit["_score"],
"metadata": {
"department": doc["department"],
"doc_type": doc["doc_type"],
"created_at": doc["created_at"],
"author": doc["author"]
}
})
return {
"content": [{
"type": "text",
"text": f"找到 {response['hits']['total']['value']} 个结果:"
}, {
"type": "text",
"text": self._format_search_results(results)
}],
"metadata": {
"total": response['hits']['total']['value'],
"page": page,
"page_size": limit
}
}
except Exception as e:
return {
"content": [{
"type": "text",
"text": f"搜索失败: {str(e)}"
}],
"isError": True
}
async def handle_knowledge_upload(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""上传文档实现"""
try:
title = params["title"]
content = params.get("content", "")
metadata = params.get("metadata", {})
# 检查权限
user = metadata.get("author", "unknown")
if not self.permission_mgr.check_permission(user, None, "write"):
return self._error_response("权限不足")
# 创建文档
doc = Document(
title=title,
content=content,
metadata=DocumentMetadata(
department=metadata.get("department", "general"),
security_level=metadata.get("security_level", "internal"),
tags=metadata.get("tags", []),
author=user
),
file_hash=self._calculate_hash(content)
)
# 生成摘要
doc.summary = await self._generate_summary(content)
# 生成嵌入向量
doc.embeddings = await self._generate_embeddings(content)
# 存储文档
self.documents[doc.id] = doc
# 索引到搜索引擎
await self._index_document(doc)
# 存储到向量数据库
await self._store_embeddings(doc)
# 审计日志
self._log_audit("upload", user, doc.id)
return {
"content": [{
"type": "text",
"text": f"文档上传成功!\nID: {doc.id}\n标题: {title}\n安全等级: {doc.metadata.security_level}"
}],
"suggestedToolCalls": [{
"toolName": "summarize_document",
"arguments": {"doc_id": doc.id}
}]
}
except Exception as e:
return self._error_response(f"上传失败: {str(e)}")
async def handle_ask_question(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""文档问答实现"""
try:
question = params["question"]
scope = params.get("scope", "all")
include_sources = params.get("include_sources", True)
# 向量搜索相关文档
query_embedding = await self._generate_embeddings(question)
similar_docs = await self._vector_search(query_embedding, top_k=5)
if not similar_docs:
return {
"content": [{
"type": "text",
"text": "没有找到相关文档来回答这个问题。"
}]
}
# 构建上下文
context = "\n\n".join([
f"文档: {doc.title}\n内容: {doc.content[:1000]}"
for doc in similar_docs
])
# 调用 LLM 生成答案
answer = await self._call_llm_for_qa(question, context)
# 构建响应
response_content = [{
"type": "text",
"text": f"**问题**: {question}\n\n**答案**: {answer}"
}]
if include_sources:
sources_text = "\n".join([
f"- {doc.title} (相关度: {score:.2f})"
for doc, score in similar_docs
])
response_content.append({
"type": "text",
"text": f"\n**参考文档**:\n{sources_text}"
})
return {"content": response_content}
except Exception as e:
return self._error_response(f"问答失败: {str(e)}")
# ========== 资源处理 ==========
async def handle_resource_read(self, uri: str) -> Dict[str, Any]:
"""处理资源读取请求"""
if uri.startswith("knowledge://recent/"):
limit = int(uri.split("/")[-1]) if uri.split("/")[-1].isdigit() else 10
recent_docs = sorted(
self.documents.values(),
key=lambda x: x.metadata.updated_at,
reverse=True
)[:limit]
return {
"contents": [{
"uri": uri,
"mimeType": "application/json",
"text": json.dumps([
{
"id": doc.id,
"title": doc.title,
"updated_at": doc.metadata.updated_at.isoformat(),
"author": doc.metadata.author
}
for doc in recent_docs
], ensure_ascii=False, indent=2)
}]
}
elif uri.startswith("knowledge://stats/overview"):
stats = self._generate_statistics()
return {
"contents": [{
"uri": uri,
"mimeType": "application/json",
"text": json.dumps(stats, ensure_ascii=False, indent=2)
}]
}
return {"error": f"未知资源: {uri}"}
# ========== 辅助方法 ==========
def _format_search_results(self, results: List[Dict]) -> str:
"""格式化搜索结果"""
formatted = []
for i, result in enumerate(results, 1):
formatted.append(
f"{i}. **{result['title']}**\n"
f" 概要: {result['summary'][:100]}...\n"
f" 部门: {result['metadata']['department']} | "
f"类型: {result['metadata']['doc_type']} | "
f"作者: {result['metadata']['author']}\n"
)
return "\n".join(formatted)
def _generate_statistics(self) -> Dict:
"""生成统计信息"""
total_docs = len(self.documents)
departments = {}
doc_types = {}
for doc in self.documents.values():
dept = doc.metadata.department
doc_type = doc.metadata.doc_type
departments[dept] = departments.get(dept, 0) + 1
doc_types[doc_type] = doc_types.get(doc_type, 0) + 1
return {
"total_documents": total_docs,
"by_department": departments,
"by_type": doc_types,
"last_updated": max(
[doc.metadata.updated_at for doc in self.documents.values()],
default=datetime.now()
).isoformat()
}
async def _generate_summary(self, content: str, length: str = "medium") -> str:
"""生成文档摘要(简化版)"""
# 实际实现应该调用 LLM API
sentences = content.split('.')
if len(sentences) <= 3:
return content
if length == "short":
return '.'.join(sentences[:2]) + '.'
elif length == "detailed":
return '.'.join(sentences[:10]) + '.'
else: # medium
return '.'.join(sentences[:5]) + '.'
async def _generate_embeddings(self, text: str) -> List[float]:
"""生成文本向量(简化版)"""
# 实际实现应该调用嵌入模型 API
import numpy as np
# 使用简单的哈希作为模拟嵌入
hash_val = int(hashlib.md5(text.encode()).hexdigest()[:8], 16)
np.random.seed(hash_val)
return np.random.randn(384).tolist() # 384维向量
async def _vector_search(self, query_embedding: List[float], top_k: int = 5):
"""向量搜索"""
# 简化实现
results = []
for doc in self.documents.values():
if doc.embeddings:
# 计算余弦相似度
similarity = self._cosine_similarity(query_embedding, doc.embeddings)
if similarity > 0.7: # 阈值
results.append((doc, similarity))
# 按相似度排序
results.sort(key=lambda x: x[1], reverse=True)
return results[:top_k]
def _cosine_similarity(self, a: List[float], b: List[float]) -> float:
"""计算余弦相似度"""
import numpy as np
a_np = np.array(a)
b_np = np.array(b)
return np.dot(a_np, b_np) / (np.linalg.norm(a_np) * np.linalg.norm(b_np))
def _calculate_hash(self, content: str) -> str:
"""计算内容哈希值"""
return hashlib.md5(content.encode()).hexdigest()
def _log_audit(self, action: str, user: str, doc_id: str):
"""记录审计日志"""
self.audit_log.append({
"timestamp": datetime.now().isoformat(),
"action": action,
"user": user,
"doc_id": doc_id,
"ip": "127.0.0.1" # 实际应该从请求获取
})
def _error_response(self, message: str) -> Dict[str, Any]:
"""错误响应"""
return {
"content": [{"type": "text", "text": f"错误: {message}"}],
"isError": True
}
async def _call_llm_for_qa(self, question: str, context: str) -> str:
"""调用 LLM 进行问答(简化版)"""
# 实际实现应该调用 LLM API
prompt = f"""
基于以下文档内容回答问题:
文档内容:
{context}
问题:{question}
答案:
"""
# 这里应该调用实际的 LLM API
return "这是基于文档内容生成的答案。"
# ========== MCP Server 主程序 ==========
async def main():
server = KnowledgeBaseMCPServer()
# 这里应该实现 MCP 协议的具体通信
# 包括 initialize, tools/list, tools/call, resources/list, resources/read 等
print("知识库 MCP Server 已启动")
print("可用工具:")
print("- knowledge_search: 搜索文档")
print("- knowledge_upload: 上传文档")
print("- get_document: 获取文档")
print("- ask_question: 文档问答")
print("- summarize_document: 文档摘要")
print("- find_similar: 查找相似文档")
if __name__ == "__main__":
asyncio.run(main())
五、配置文件示例
# config.yaml
server:
name: "enterprise-knowledge-base"
version: "1.0.0"
host: "0.0.0.0"
port: 8000
auth_required: true
database:
elasticsearch:
hosts: ["localhost:9200"]
index: "knowledge_docs"
qdrant:
host: "localhost"
port: 6333
collection: "doc_embeddings"
redis:
host: "localhost"
port: 6379
db: 0
embeddings:
model: "text-embedding-ada-002"
api_key: "${OPENAI_API_KEY}"
dimensions: 1536
llm:
model: "gpt-4-turbo"
api_key: "${OPENAI_API_KEY}"
temperature: 0.1
security:
jwt_secret: "${JWT_SECRET}"
token_expiry: 86400
rate_limit:
requests_per_minute: 60
allowed_origins:
- "https://claude.ai"
- "http://localhost:*"
storage:
document_store: "s3"
s3:
endpoint: "s3.amazonaws.com"
bucket: "knowledge-docs"
local_backup: "/var/backups/knowledge"
integrations:
confluence:
enabled: true
base_url: "https://your-company.atlassian.net/wiki"
github:
enabled: false
sharepoint:
enabled: false
logging:
level: "INFO"
file: "/var/log/knowledge-mcp.log"
audit_log: "/var/log/knowledge-audit.log"
六、部署与使用
1. 启动服务器
# 安装依赖
pip install mcp elasticsearch qdrant-client aiohttp pydantic
# 启动
python knowledge_mcp_server.py
2. Claude Desktop 配置
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"knowledge-base": {
"command": "python",
"args": [
"/path/to/knowledge_mcp_server.py"
],
"env": {
"OPENAI_API_KEY": "sk-...",
"ELASTICSEARCH_HOSTS": "localhost:9200"
}
}
}
}
3. 使用示例
# 客户端调用示例
async with ClientSession(stdio_client(StdioServerParameters(
command="python",
args=["knowledge_mcp_server.py"]
))) as session:
# 初始化
await session.initialize()
# 列出工具
tools = await session.list_tools()
print("可用工具:", [t.name for t in tools.tools])
# 搜索文档
response = await session.call_tool(
"knowledge_search",
arguments={
"query": "Q4 销售报告",
"filters": {
"department": "sales",
"security_level": "internal"
}
}
)
# 文档问答
response = await session.call_tool(
"ask_question",
arguments={
"question": "公司今年的销售目标是多少?",
"scope": "sales"
}
)
七、扩展功能设计
1. 实时协作
-
文档协同编辑
-
评论和批注系统
-
变更历史追踪
2. 高级搜索
-
语义搜索增强
-
混合搜索(关键词+向量)
-
自然语言查询转换
3. 工作流集成
-
文档审批流程
-
自动化文档分类
-
过期文档清理
4. 分析与洞察
-
知识图谱构建
-
文档关联分析
-
知识缺口识别
5. 移动端支持
-
响应式设计
-
离线访问
-
移动端优化界面
八、安全与合规
-
数据加密
-
传输层:TLS 1.3
-
存储加密:AES-256
-
密钥管理:HSM/KMS
-
-
访问控制
-
基于角色的访问控制(RBAC)
-
属性基础的访问控制(ABAC)
-
多因素认证
-
-
合规性
-
GDPR 数据主体权利
-
SOX 文档保留策略
-
HIPAA 医疗文档保护
-
审计日志保留 7 年
-
-
监控与告警
-
异常访问检测
-
数据泄露防护
-
实时告警系统
-
这个设计提供了一个完整的企业级知识库 MCP Server 解决方案,可以根据实际需求进行调整和扩展。