【AI 初体验】 llama2与Gradio UI

我们首先需要安装所需的软件包：

sentence-transformers：用于文本嵌入
chromadb：提供数据库功能
langchain：为本演示提供必要的 RAG 工具

接下来，我们调用 Replicate 上的 Llama2 模型。在本示例中，我们将使用 Llama 2 13b 聊天模型。您还可以在 Replicate 的模型探索页面上搜索其他 Llama2 模型。

我们将使用 Replicate 来运行这些示例。需要首先使用 GitHub 帐户登录到 Replicate，并在此处创建一个可供一段时间使用的免费 API 令牌。免费试用结束后，需要给钱。

安装依赖环境

python 复制代码

pip install langchain replicate sentence-transformers chromadb pypdf

接下来我们从Replicate中调用Llama 2模型。在这个例子中，我们将使用Llama 2 13b聊天模型。您可以通过在Replicate模型探索页面上搜索来找到更多Llama 2模型。您可以按照以下格式将它们添加在这里：model_name/version

python 复制代码

from langchain.llms import Replicate

llama2_13b = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"
llm = Replicate(
    model=llama2_13b,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)

模型设置完成后，您现在可以开始提问了。以下是一个简单的示例，展示了向模型提问一些常见问题的方式：

python 复制代码

question = "Who is the current president of the United States?"
answer = llm(question)
print(answer)

您可以根据需要修改问题，然后通过调用llm(question)来获取回答。最后，使用print(answer)将回答打印出来。这是最简单的与模型进行问答的方法，您可以根据具体需求进行调整和扩展。

接下来，我们将尝试通过追问的方式对回答进行进一步的提问，以获取关于这本书的更多信息。

由于没有传递聊天历史给Llama，它没有上下文，不知道这个问题与书有关，因此将其视为一个新的查询。

python 复制代码

# chat history not passed so Llama doesn't have the context and doesn't know this is more about the book
followup = "tell me more"
followup_answer = llm(followup)
print(followup_answer)

为了解决这个问题，我们需要将聊天历史提供给模型。

为此，我们可以使用ConversationBufferMemory来传递聊天历史给模型，并使其能够处理跟进的问题。

python 复制代码

# using ConversationBufferMemory to pass memory (chat history) for follow up questions
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

一旦设置完成，让我们重复之前的步骤，并向模型提出一个简单的问题。

然后，我们将问题和答案再次传递给模型，以便在继续问题的同时提供上下文。

python 复制代码

# restart from the original question
answer = conversation.predict(input=question)
print(answer)

python 复制代码

# pass context (previous question and answer) along with the follow up "tell me more" to Llama who now knows more of what
memory.save_context({"input": question},
                    {"output": answer})
followup_answer = conversation.predict(input=followup)
print(followup_answer)

我们将使用 PyPDFLoader 来加载一个 PDF 文档

python 复制代码

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("https://arxiv.org/pdf/2307.09288.pdf")
docs = loader.load()

需要存储文档。LangChain 支持 30 多个向量存储（DB）。对于本例，将使用 Chroma，它是轻量级的内存存储，易于开始使用。对于其他向量存储，特别是需要存储大量数据的情况，请参见 python.langchain.com/docs/integr...

我们还将导入 HuggingFaceEmbeddings 和 RecursiveCharacterTextSplitter 来帮助存储文档。

为了存储文档，我们需要使用 RecursiveCharacterTextSplitter 将它们分割成块，并使用 HuggingFaceEmbeddings 在这些块上创建向量表示，然后将它们存储到我们的向量数据库中。

通常情况下，对于高度结构化的文本（如代码），您应该使用较大的块大小，而对于较少结构化的文本，应使用较小的块大小。您可能需要尝试不同的块大小和重叠值，以找到最佳的参数值。

python 复制代码

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(docs)

# create the vector db to store all the split chunks as embeddings
embeddings = HuggingFaceEmbeddings()
vectordb = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
)

然后，我们使用 RetrievalQA 来从向量数据库中检索文档并为 Llama 提供有关 Llama 2 的更多上下文，以增加其知识。

对于每个问题，LangChain 在向量数据库中执行语义相似性搜索，然后将搜索结果作为上下文传递给 Llama 来回答问题。

python 复制代码

# use LangChain's RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)


question = "What is llama2?"
result = qa_chain({"query": question})
print(result['result'])

现在，让我们将所有内容整合起来，并纳入后续问题。

我们提出一个后续问题，但不提供先前对话的上下文给模型。在没有上下文的情况下，我们得到的答案与我们最初的问题没有关联。

python 复制代码

# use ConversationalRetrievalChain to pass chat history for follow up questions
from langchain.chains import ConversationalRetrievalChain
chat_chain = ConversationalRetrievalChain.from_llm(llm, vectordb.as_retriever(), return_source_documents=True)

python 复制代码

# let's ask the original question "What is llama2?" again
result = chat_chain({"question": question, "chat_history": []})
print(result['answer'])

python 复制代码

# this time we pass chat history along with the follow up so good things should happen
chat_history = [(question, result["answer"])]
followup = "what are its use cases?"
followup_answer = chat_chain({"question": followup, "chat_history": chat_history})
print(followup_answer['answer'])

Gradio UI

由于我们在这个示例中使用了 Replicate，您需要用您的 API 令牌替换。

python 复制代码

from langchain.schema import AIMessage, HumanMessage
import gradio as gr
from langchain.llms import Replicate
import os

os.environ["REPLICATE_API_TOKEN"] = "<your replicate api token>"

llama2_13b_chat = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"

llm = Replicate(
    model=llama2_13b_chat,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)


def predict(message, history):
    history_langchain_format = []
    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))
    history_langchain_format.append(HumanMessage(content=message))
    gpt_response = llm(message) #history_langchain_format)
    return gpt_response#.content

gr.ChatInterface(predict).launch()