自然语言处理从入门到应用——LangChain:模型(Models)-[大型语言模型(LLMs):缓存LLM的调用结果]

分类目录:《自然语言处理从入门到应用》总目录


dart 复制代码
from langchain.llms import OpenAI

在内存中缓存

dart 复制代码
import langchain
from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms Wall time: 4.83 s

输出:

"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"

计算第二次执行时间:

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出:

CPU times: user 238 µs, sys: 143 µs, total: 381 µs Wall time: 1.76 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite 缓存

dart 复制代码
!rm .langchain.db
dart 复制代码
# 我们可以用 SQLite 缓存做同样的事情
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms Wall time: 825 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间:

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出:

CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms Wall time: 2.67 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis缓存

我们还可以使用Redis缓存提示信息和做同样的事情:

dart 复制代码
# (确保您的本地 Redis 实例在运行此示例之前先运行)
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms Wall time: 1.04 s

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间:

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出:

CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms Wall time: 5.58 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

Semantic语义缓存

我们还使用Redis缓存提示和响应,并根据语义相似性评估命中率:

dart 复制代码
from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings()
)

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 351 ms, sys: 156 ms, total: 507 ms Wall time: 3.37 s

输出:

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

计算第二次执行时间:

%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
llm("Tell me one joke")

日志输出:

CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms Wall time: 262 ms

输出:

"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

GPTCache

我们可以使用GPTCache进行精确匹配缓存或基于语义相似性缓存结果,我们先举一个精确匹配的例子:

dart 复制代码
from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms Wall time: 6.2 s

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间:

%%time
# The second time it is, so it goes faster
llm("Tell me a joke")

日志输出:

CPU times: user 571 µs, sys: 43 µs, total: 614 µs Wall time: 635 µs

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们举一个相似度缓存的例子。

dart 复制代码
from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

langchain.llm_cache = GPTCache(init_gptcache)

计算第一次执行时间:

%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")

日志输出:

CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s Wall time: 8.44 s

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第二次执行时间:

%%time
# 这是一个完全匹配,所以它在缓存中找到它
llm("Tell me a joke")

日志输出:

CPU times: user 866 ms, sys: 20 ms, total: 886 ms Wall time: 226 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

计算第三次执行时间:

%%time
# 这不是完全匹配,但在语义上是在距离之内,所以它命中了!
llm("Tell me joke")

日志输出:

CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms Wall time: 224 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLAlchemy Cache

我们可以使用 SQLAlchemyCache来缓存SQLAlchemy支持的任何 SQL 数据库:

dart 复制代码
# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

Custom SQLAlchemy Schemas

我们可以定义自己的声明性SQLAlchemyCache子类,以自定义用于缓存的模式。例如,为了支持在Postgres中进行高速全文提示索引,我们可以使用:

dart 复制代码
from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCache

Base = declarative_base()


class FulltextLLMCache(Base):  # type: ignore
    """Postgres table for fulltext-indexed LLM Cache"""

    __tablename__ = "llm_cache_fulltext"
    id = Column(Integer, Sequence('cache_id'), primary_key=True)
    prompt = Column(String, nullable=False)
    llm = Column(String, nullable=False)
    idx = Column(Integer)
    response = Column(String)
    prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True))
    __table_args__ = (
        Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),
    )

engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可选缓存(Optional Caching)

我们也可以选择关闭特定LLM的缓存。在下面的示例中,即使启用了全局缓存,我们也将其关闭了一个特定的LLM:

dart 复制代码
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)

计算第一次执行时间:

%%time
llm("Tell me a joke")

日志输出:

CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms Wall time: 745 ms

输出:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

计算第二次执行时间:

%%time
llm("Tell me a joke")

日志输出:

CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms Wall time: 623 ms

输出:

'\n\nTwo guys stole a calendar. They got six months each.'

链式可选缓存(Optional Caching in Chains)

我们还可以关闭链中特定节点的缓存。需要注意的是,某些接口通常更容易先构建链,然后再编辑 LLM。作为示例,我们将加载一个汇总器map-reduce链。我们将缓存映射步骤的结果,但不会冻结合并步骤的结果:

dart 复制代码
llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

text_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

计算第一次执行时间:

%%time
chain.run(docs)

日志输出:

CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms Wall time: 5.09 s

输出:

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

当我们再次运行它时,我们会发现它的运行速度大大加快,但最终的答案却不同。这是由于在映射步骤进行缓存,但在归约步骤没有进行缓存所致计算第二次执行时间:

%%time
chain.run(docs)

日志输出:

CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms Wall time: 1.04 s

输出:

'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'

最后我们需要记得执行:

dart 复制代码
!rm .langchain.db sqlite.db

参考文献:

[1] LangChain 🦜️🔗 中文网,跟着LangChain一起学LLM/GPT开发:https://www.langchain.com.cn/

[2] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架:http://www.cnlangchain.com/

相关推荐
创意锦囊3 分钟前
ChatGPT推出Canvas功能
人工智能·chatgpt
知来者逆12 分钟前
V3D——从单一图像生成 3D 物体
人工智能·计算机视觉·3d·图像生成
碳苯44 分钟前
【rCore OS 开源操作系统】Rust 枚举与模式匹配
开发语言·人工智能·后端·rust·操作系统·os
whaosoft-1431 小时前
51c视觉~CV~合集3
人工智能
网络研究院3 小时前
如何安全地大规模部署 GenAI 应用程序
网络·人工智能·安全·ai·部署·观点
凭栏落花侧3 小时前
决策树:简单易懂的预测模型
人工智能·算法·决策树·机器学习·信息可视化·数据挖掘·数据分析
xiandong206 小时前
240929-CGAN条件生成对抗网络
图像处理·人工智能·深度学习·神经网络·生成对抗网络·计算机视觉
innutritious7 小时前
车辆重识别(2020NIPS去噪扩散概率模型)论文阅读2024/9/27
人工智能·深度学习·计算机视觉
醒了就刷牙8 小时前
56 门控循环单元(GRU)_by《李沐:动手学深度学习v2》pytorch版
pytorch·深度学习·gru
橙子小哥的代码世界8 小时前
【深度学习】05-RNN循环神经网络-02- RNN循环神经网络的发展历史与演化趋势/LSTM/GRU/Transformer
人工智能·pytorch·rnn·深度学习·神经网络·lstm·transformer