文章目录
-
- [一、LLMs vs 聊天模型](#一、LLMs vs 聊天模型)
- 二、入门
-
- [1、设置 OpenAI](#1、设置 OpenAI)
- [2、`call`: string in -> string out](#2、
__call__
: string in -> string out) - [3、`generate`: batch calls, richer outputs](#3、
generate
: batch calls, richer outputs)
- [三、异步 API](#三、异步 API)
- [四、Custom LLM](#四、Custom LLM)
- [五、Fake LLM](#五、Fake LLM)
- [六、Human input LLM](#六、Human input LLM)
- [七、缓存 llm_caching](#七、缓存 llm_caching)
-
- [1、内存缓存(In Memory Cache)](#1、内存缓存(In Memory Cache))
- [2、SQLite 缓存(SQLite Cache)](#2、SQLite 缓存(SQLite Cache))
- [3、链中的可选缓存(Optional Caching in Chains)](#3、链中的可选缓存(Optional Caching in Chains))
- 八、Serialization
- 九、流式传输(Streaming)
- [十、Tracking token usage](#十、Tracking token usage)
本文转载整理自:
https://python.langchain.com.cn/docs/modules/model_io/models/
一、LLMs vs 聊天模型
LangChain提供了两种类型模型的接口和集成:
LLMs和聊天模型在细微但重要的方面有所不同。
LangChain中的LLMs是指纯文本完成模型 。
它们包装的API接受一个字符串提示作为输入,并输出一个字符串完成。
OpenAI的GPT-3是作为LLM实现的。
聊天模型通常由LLMs支持,但专门调整用于进行对话。
关键是,它们的提供者API使用与纯文本完成模型不同的接口。
它们不是接受单个字符串作为输入,而是接受一个聊天消息列表 作为输入。
通常,这些消息带有发言者标签 (通常是"系统","AI"和"人类"之一)。
它们返回一个AI聊天消息作为输出。
GPT-4和Anthropic的Claude都是作为聊天模型实现的。
为了使LLMs和聊天模型可以互换,两者都实现了基本语言模型接口。
这包括常见的方法"predict",它接受一个字符串并返回一个字符串,以及"predict messages",它接受消息并返回一个消息。
如果您正在使用特定模型,建议使用该模型类的特定方法(例如LLMs的"predict"和聊天模型的"predict messages"),但如果您创建的应用程序需要与不同类型的模型一起工作,则共享接口可能会有所帮助。
大型语言模型(LLMs)是LangChain的核心组件。
LangChain不提供自己的LLMs,而是提供与许多不同LLMs交互的标准接口。
二、入门
有很多LLM提供商(OpenAI、Cohere、Hugging Face等)- LLM
类旨在为所有这些提供商提供标准接口。
在本教程中,我们将使用OpenAI LLM包装器,尽管强调的功能对于所有LLM类型都是通用的。
1、设置 OpenAI
首先,我们需要安装 OpenAI Python 包:
bash
pip install openai
使用 API 需要一个 API 密钥,您可以通过创建帐户并转到 此处 获取。一旦我们获得密钥,我们将希望通过运行以下命令将其设置为环境变量:
bash
export OPENAI_API_KEY="..."
如果您不想设置环境变量,可以在初始化 OpenAI LLM 类时直接通过 openai_api_key
命名参数传递密钥:
python
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key="...")
否则,您可以不使用任何参数进行初始化:
python
from langchain.llms import OpenAI
llm = OpenAI()
2、__call__
: string in -> string out
使用 LLM 的最简单方法是可调用的:输入一个字符串,获得一个字符串完成结果。
python
llm("Tell me a joke")
# -> 'Why did the chicken cross the road?\n\nTo get to the other side.'
3、generate
: batch calls, richer outputs
generate
lets you can call the model with a list of strings, getting back a more complete response than just the text.
This complete response can includes things like multiple top responses and other LLM provider-specific information:
python
llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
python
len(llm_result.generations)
# -> 30
python
llm_result.generations[0]
text
[Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'),
Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.')]
python
llm_result.generations[-1]
text
[Generation(text="\n\nWhat if love neverspeech\n\nWhat if love never ended\n\nWhat if love was only a feeling\n\nI'll never know this love\n\nIt's not a feeling\n\nBut it's what we have for each other\n\nWe just know that love is something strong\n\nAnd we can't help but be happy\n\nWe just feel what love is for us\n\nAnd we love each other with all our heart\n\nWe just don't know how\n\nHow it will go\n\nBut we know that love is something strong\n\nAnd we'll always have each other\n\nIn our lives."),
Generation(text='\n\nOnce upon a time\n\nThere was a love so pure and true\n\nIt lasted for centuries\n\nAnd never became stale or dry\n\nIt was moving and alive\n\nAnd the heart of the love-ick\n\nIs still beating strong and true.')]
您还可以访问返回的特定于提供程序的信息。此信息在不同提供程序之间是标准化的。
python
llm_result.llm_output
text
{'token_usage': {'completion_tokens': 3903,
'total_tokens': 4023,
'prompt_tokens': 120}}
三、异步 API
LangChain通过利用asyncio库为LLM提供了异步支持。
异步支持对于同时调用多个LLM特别有用,因为这些调用是网络绑定的。目前,支持OpenAI
、PromptLayerOpenAI
、ChatOpenAI
和Anthropic
,但其他LLM的异步支持正在路线图上。
您可以使用agenerate
方法异步调用OpenAI LLM。
python
import time
import asyncio
from langchain.llms import OpenAI
def generate_serially():
llm = OpenAI(temperature=0.9)
for _ in range(10):
resp = llm.generate(["Hello, how are you?"])
print(resp.generations[0][0].text)
async def async_generate(llm):
resp = await llm.agenerate(["Hello, how are you?"])
print(resp.generations[0][0].text)
async def generate_concurrently():
llm = OpenAI(temperature=0.9)
tasks = [async_generate(llm) for _ in range(10)]
await asyncio.gather(*tasks)
s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await generate_concurrently()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Concurrent executed in {elapsed:0.2f} seconds." + "\033[0m")
s = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Serial executed in {elapsed:0.2f} seconds." + "\033[0m")
text
I'm doing well, thank you. How about you?
I'm doing well, thank you. How about you?
...
I'm doing well, thanks for asking. How about you?
[1mSerial executed in 5.77 seconds.[0m
四、Custom LLM
本笔记本将介绍如何创建自定义的LLM封装器,以便在LangChain中使用自己的LLM或不同于LangChain所支持的封装器。
只需要自定义LLM实现以下一个必需的方法:
_call
方法,该方法接受一个字符串、一些可选的停用词,然后返回一个字符串。
还可以选择实现以下一个可选的方法:
_identifying_params
属性,用于帮助打印此类的信息。应返回一个字典。
让我们实现一个非常简单的自定义LLM,它只返回输入的前N个字符。
python
from typing import Any, List, Mapping, Optional
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
python
class CustomLLM(LLM):
n: int
@property
def _llm_type(self) -> str:
return "custom"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
return prompt[: self.n]
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"n": self.n}
We can now use this as an any other LLM.
python
llm = CustomLLM(n=10)
llm("This is a foobar thing")
# -> 'This is a '
python
print(llm)
text
[1mCustomLLM[0m
Params: {'n': 10}
五、Fake LLM
我们提供了一个虚假的LLM类,可用于测试。这样可以模拟对LLM的调用,并模拟LLM以特定方式响应的情况。
在本笔记本中,我们将介绍如何使用这个虚假的LLM。
我们首先将使用FakeLLM在一个代理中。
python
from langchain.llms.fake import FakeListLLM
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
python
tools = load_tools(["python_repl"])
responses = ["Action: Python REPL\nAction Input: print(2 + 2)", "Final Answer: 4"]
llm = FakeListLLM(responses=responses)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("whats 2 + 2")
shell
> Entering new AgentExecutor chain...
Action: Python REPL
Action Input: print(2 + 2)
Observation: Python REPL is not a valid tool, try another one.
Thought:Final Answer: 4
> Finished chain.
[5]:
'4'
六、Human input LLM
类似于虚假的LLM,LangChain还提供了一个伪LLM类,可用于测试、调试或教育目的。这使您可以模拟对LLM的调用,并模拟人类在接收到提示后如何回应。
在本笔记本中,我们将介绍如何使用这个伪LLM。
我们首先 在一个代理中使用 HumanInputLLM。
python
from langchain.llms.human import HumanInputLLM
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
Since we will use the WikipediaQueryRun
tool in this notebook, you might need to install the wikipedia
package if you haven't done so already.
python
!pip install wikipedia
python
tools = load_tools(["wikipedia"])
llm = HumanInputLLM(
prompt_func=lambda prompt: print(
f"\n===PROMPT====\n{prompt}\n=====END OF PROMPT======"
)
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("What is 'Bocchi the Rock!'?")
python
> Entering new AgentExecutor chain...
===PROMPT====
Answer the following questions as best you can. You have access to the following tools:
Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: What is 'Bocchi the Rock!'?
Thought:
=====END OF PROMPT======
七、缓存 llm_caching
LangChain 为 LLM 提供了一个可选的缓存层。这个功能有两个好处:
如果您经常多次请求相同的补全,它可以通过减少您对 LLM 提供者的 API 调用次数来节省费用。
它可以通过减少您对 LLM 提供者的 API 调用次数来加速您的应用程序。
python
import langchain
from langchain.llms import OpenAI
# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
1、内存缓存(In Memory Cache)
python
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
text
CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s
"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"
python
The second time it is, so it goes faster
llm("Tell me a joke")
text
CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
2、SQLite 缓存(SQLite Cache)
bash
rm .langchain.db
python
We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
python
The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
text
CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
python
The second time it is, so it goes faster
llm("Tell me a joke")
text
CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
3、链中的可选缓存(Optional Caching in Chains)
您还可以关闭链中特定节点的缓存。请注意,由于某些接口的原因,先构建链,然后再编辑 LLM 通常更容易。
例如,我们将加载一个摘要映射-减少链。我们将对映射步骤的结果进行缓存,但不对合并步骤进行冻结。
python
llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
python
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
python
text_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:
state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
docs = [Document(page_content=t) for t in texts[:3]]
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)
chain.run(docs)
shell
CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
Wall time: 5.09 s
'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'
当我们再次运行它时,我们发现它运行得更快,但最终的答案是不同的。
这是由于在映射步骤中进行了缓存,但在减少步骤中没有进行缓存。
python
chain.run(docs)
shell
CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms
Wall time: 1.04 s
'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'
bash
rm .langchain.db sqlite.db
八、Serialization
本笔记本将介绍如何将LLM配置写入磁盘并从磁盘读取。如果您想保存给定LLM的配置(例如提供商、温度等),这将非常有用。
python
from langchain.llms import OpenAI
from langchain.llms.loading import load_llm
1、加载
首先,让我们了解如何从磁盘加载LLM。LLM可以以两种格式保存在磁盘上:json或yaml。无论扩展名如何,加载的方式都是相同的。
python
!cat llm.json
text
{
"model_name": "text-davinci-003",
"temperature": 0.7,
"max_tokens": 256,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"n": 1,
"best_of": 1,
"request_timeout": null,
"_type": "openai"
}
python
llm = load_llm("llm.json")
python
!cat llm.yaml
text
_type: openai
best_of: 1
frequency_penalty: 0.0
max_tokens: 256
model_name: text-davinci-003
n: 1
presence_penalty: 0.0
request_timeout: null
temperature: 0.7
top_p: 1.0
python
llm = load_llm("llm.yaml")
2、Saving
If you want to go from an LLM in memory to a serialized version of it, you can do so easily by calling the .save
method. Again, this supports both json and yaml.
python
llm.save("llm.json")
python
llm.save("llm.yaml")
九、流式传输(Streaming)
一些 LLM 提供流式响应。这意味着您可以在整个响应返回之前开始处理它,而不是等待它完全返回。
如果您希望在生成响应时向用户显示响应,或者希望在生成响应时处理响应,这将非常有用。
目前,我们支持对 OpenAI
、ChatOpenAI
和 ChatAnthropic
实现的流式传输。
要使用流式传输,请使用一个实现了 on_llm_new_token
的 CallbackHandler
。
在这个示例中,我们使用的是 StreamingStdOutCallbackHandler
。
python
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
text
Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
如果使用 generate
,我们仍然可以访问最终的 LLMResult
。
但是,目前不支持对流式传输的 token_usage
。
python
llm.generate(["Tell me a joke."])
text
Q: What did the fish say when it hit the wall?
A: Dam!
LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})
十、Tracking token usage
本例子将介绍如何跟踪特定调用的token 使用情况。目前,仅支持OpenAI API。
让我们首先看一个非常简单的示例,用于跟踪单个LLM调用的token 使用情况。
python
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
python
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
with get_openai_callback() as cb:
result = llm("Tell me a joke")
print(cb)
text
Tokens Used: 42
Prompt Tokens: 4
Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
在上下文管理器内的所有代码都将被跟踪。下面是一个示例,演示如何使用上下文管理器来跟踪连续的多个调用。
python
with get_openai_callback() as cb:
result = llm("Tell me a joke")
result2 = llm("Tell me a joke")
print(cb.total_tokens)
# -> 91
If a chain or agent with multiple steps in it is used, it will track all those steps.
python
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
python
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
2024-04-08