安装LangChain及相关依赖
确保Python环境(建议3.8+),安装LangChain核心库及常用扩展:
bash
pip install langchain openai faiss-cpu tiktoken
如需使用特定模型(如OpenAI),需配置API密钥:
python
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
构建文档加载与处理流程
使用LangChain的文档加载器读取多种格式文件(PDF、HTML等):
python
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("example.pdf")
pages = loader.load_and_split()
通过文本分割器处理长文档:
python
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(pages)
创建向量存储与检索系统
将文档转换为嵌入向量并存储:
python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(docs, embeddings)
实现相似度检索功能:
python
retriever = db.as_retriever(search_kwargs={"k": 3})
集成问答链与语言模型
构建基于检索的问答流水线:
python
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
实现交互式问答界面
通过简单循环实现命令行交互:
python
while True:
query = input("Ask a question (type 'exit' to quit): ")
if query.lower() == 'exit':
break
result = qa_chain({"query": query})
print(f"Answer: {result['result']}\nSources: {result['source_documents']}")
扩展功能与优化建议
添加对话历史管理提升连续性:
python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
集成其他检索增强生成技术:
python
from langchain.chains import ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(
OpenAI(temperature=0.7),
retriever,
memory=memory
)
性能优化方向:
- 调整chunk_size和chunk_overlap参数平衡精度与速度
- 尝试不同嵌入模型(如HuggingFaceEmbeddings)
- 添加缓存机制减少API调用