原文地址:【LangChain系列 16】语言模型------LLMs(二)
本文速读:
-
缓存
-
序列化
-
流
-
token使用
在上一篇【LangChain系列 15】语言模型------LLMs(一)中,介绍了异步API、自定义LLM、Fake LLM、HumanInput LLM;本文将介绍LLMs的第二部分,主要包括缓存、序列化、流、token使用等内容
01 缓存
LangChain在LLM中提供了一个可选的缓存层,缓存有两种好处,一是减少LLM的API调用次数,节约成本,二是提高了响应速度。
内存缓存
python
from langchain.globals import set_llm_cache
from langchain.llms import OpenAI
from langchain.cache import InMemoryCache
set_llm_cache(InMemoryCache())
# To make the caching really obvious, lets use a slower model.llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
执行代码,输出结果:
yaml
CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s
"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
第一次调用,没有缓存,第二次命中缓存,时间较第一次少很多。
SQLite缓存
perl
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
同样的,第一次调用,没有缓存,第二次命中缓存,时间较第一次少很多;同时本地会生成一个langchain.db缓存文件。
链中可选缓存
在链式调用中,你可以选择性的关闭、打开部分结点的缓存;比如在下面的map-reduce链中,对map进行结果缓存,而对reduce结果不缓存。
vbnet
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Documentfrom langchain.chains.summarize import load_summarize_chain
text_splitter = CharacterTextSplitter()
llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
with open('../../../state_of_the_union.txt') as f:
state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
docs = [Document(page_content=t) for t in texts[:3]]
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)
chain.run(docs)
CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
Wall time: 5.09 s
'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'
02 序列化
对于LLM,LangChain提供了序列化功能,便于LLM的存储与共享;支持JSON和YAML两种文件格式。
加载LLM
json文件
json
cat llm.json
{ "model_name": "text-davinci-003", "temperature": 0.7, "max_tokens": 256, "top_p": 1.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "n": 1, "best_of": 1, "request_timeout": null, "_type": "openai"}
llm = load_llm("llm.json")
yaml文件
ini
cat llm.yaml
_type: openaibest_of: 1frequency_penalty: 0.0max_tokens: 256model_name: text-davinci-003n: 1presence_penalty: 0.0request_timeout: nulltemperature: 0.7top_p: 1.0
llm = load_llm("llm.yaml")
存储LLM
arduino
llm.save("llm.json")
llm.save("llm.yaml")
03 流
有些LLM提供流式响应,这样就不必等整个结果返回再去处理,而是可以只要有流式数据返回就可以去处理。
在LangChain层面,它提供了对LLM的流式处理,主要是通过CallbackHandler实现on_llm_new_token方法。
ini
from langchain.llms
import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()],
temperature=0)
resp = llm("Write me a song about sparkling water.")
执行代码,输出结果:
vbnet
Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
注意:流式LLM不支持获取token_usage。
04 token使用
在LLM调用中,主要是以token计数的,所以获取token相关的信息也是有必要的,目前只有OpenAI的API支持获取token使用情况。
最简单的用法就是直接调用LLM。
python
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
llm = OpenAI(openai_api_key="...", model_name="text-davinci-002", n=2, best_of=2)
with get_openai_callback() as cb:
result = llm("Tell me a joke")
result2 = llm("Tell me a joke")
print(cb)
执行代码,输出结果:
yaml
Tokens Used: 84
Prompt Tokens: 8
Completion Tokens: 76
Successful Requests: 2
Total Cost (USD): $0.00168
如果在链式或者agent中调用LLM,它会记录所有步骤的token使用情况。
python
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
执行代码,输出结果:
vbnet
> Entering new AgentExecutor chain...
I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.
> Finished chain.
Total Tokens: 1506
Prompt Tokens: 1350
Completion Tokens: 156
Total Cost (USD): $0.03012
本文小结
以上就是LLMs的第二部分,主要介绍了缓存、序列化、流和token使用这四个部分。