自然语言处理从入门到应用——LangChain：模型（Models）-[大型语言模型（LLMs）：缓存LLM的调用结果]

dart 复制代码

from langchain.llms import OpenAI

在内存中缓存

dart 复制代码

import langchain
from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms Wall time: 4.83 s

输出：

复制代码

"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"

计算第二次执行时间：

复制代码

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 238 µs, sys: 143 µs, total: 381 µs Wall time: 1.76 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite 缓存

dart 复制代码

!rm .langchain.db

dart 复制代码

# 我们可以用 SQLite 缓存做同样的事情
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms Wall time: 825 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

复制代码

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms Wall time: 2.67 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis缓存

我们还可以使用Redis缓存提示信息和做同样的事情：

dart 复制代码

# （确保您的本地 Redis 实例在运行此示例之前先运行）
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms Wall time: 1.04 s

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

复制代码

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms Wall time: 5.58 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

Semantic语义缓存

我们还使用Redis缓存提示和响应，并根据语义相似性评估命中率：

dart 复制代码

from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings()
)

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 351 ms, sys: 156 ms, total: 507 ms Wall time: 3.37 s

输出：

复制代码

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

计算第二次执行时间：

复制代码

%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
llm("Tell me one joke")

日志输出：

复制代码

CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms Wall time: 262 ms

输出：

复制代码

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

GPTCache

我们可以使用GPTCache进行精确匹配缓存或基于语义相似性缓存结果，我们先举一个精确匹配的例子：

dart 复制代码

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms Wall time: 6.2 s

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

复制代码

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 571 µs, sys: 43 µs, total: 614 µs Wall time: 635 µs

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们举一个相似度缓存的例子。

dart 复制代码

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间：

复制代码

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s Wall time: 8.44 s

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间：

复制代码

%%time
# 这是一个完全匹配，所以它在缓存中找到它
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 866 ms, sys: 20 ms, total: 886 ms Wall time: 226 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第三次执行时间：

复制代码

%%time
# 这不是完全匹配，但在语义上是在距离之内，所以它命中了！
llm("Tell me joke")

日志输出：

复制代码

CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms Wall time: 224 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLAlchemy Cache

我们可以使用 SQLAlchemyCache来缓存SQLAlchemy支持的任何 SQL 数据库：

dart 复制代码

# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

Custom SQLAlchemy Schemas

我们可以定义自己的声明性SQLAlchemyCache子类，以自定义用于缓存的模式。例如，为了支持在Postgres中进行高速全文提示索引，我们可以使用：

dart 复制代码

from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCache

Base = declarative_base()


class FulltextLLMCache(Base):  # type: ignore
    """Postgres table for fulltext-indexed LLM Cache"""

    __tablename__ = "llm_cache_fulltext"
    id = Column(Integer, Sequence('cache_id'), primary_key=True)
    prompt = Column(String, nullable=False)
    llm = Column(String, nullable=False)
    idx = Column(Integer)
    response = Column(String)
    prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True))
    __table_args__ = (
        Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),
    )

engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可选缓存（Optional Caching）

我们也可以选择关闭特定LLM的缓存。在下面的示例中，即使启用了全局缓存，我们也将其关闭了一个特定的LLM：

dart 复制代码

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)

计算第一次执行时间：

复制代码

%%time
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms Wall time: 745 ms

输出：

复制代码

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间：

复制代码

%%time
llm("Tell me a joke")

日志输出：

复制代码

CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms Wall time: 623 ms

输出：

复制代码

'\n\nTwo guys stole a calendar. They got six months each.'

链式可选缓存（Optional Caching in Chains）

我们还可以关闭链中特定节点的缓存。需要注意的是，某些接口通常更容易先构建链，然后再编辑 LLM。作为示例，我们将加载一个汇总器map-reduce链。我们将缓存映射步骤的结果，但不会冻结合并步骤的结果：

dart 复制代码

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

text_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

计算第一次执行时间：

复制代码

%%time
chain.run(docs)

日志输出：

复制代码

CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms Wall time: 5.09 s

输出：

复制代码

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

当我们再次运行它时，我们会发现它的运行速度大大加快，但最终的答案却不同。这是由于在映射步骤进行缓存，但在归约步骤没有进行缓存所致计算第二次执行时间：

复制代码

%%time
chain.run(docs)

日志输出：

复制代码

CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms Wall time: 1.04 s

输出：

复制代码

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'

最后我们需要记得执行：

dart 复制代码

!rm .langchain.db sqlite.db

参考文献：

$1$ LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/

$2$ LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/