LangChain - LLMs


    • [一、LLMs vs 聊天模型](#一、LLMs vs 聊天模型)
    • 二、入门
      • [1、设置 OpenAI](#1、设置 OpenAI)
      • [2、`call`: string in -> string out](#2、__call__: string in -> string out)
      • [3、`generate`: batch calls, richer outputs](#3、generate: batch calls, richer outputs)
    • [三、异步 API](#三、异步 API)
    • [四、Custom LLM](#四、Custom LLM)
    • [五、Fake LLM](#五、Fake LLM)
    • [六、Human input LLM](#六、Human input LLM)
    • [七、缓存 llm_caching](#七、缓存 llm_caching)
      • [1、内存缓存(In Memory Cache)](#1、内存缓存(In Memory Cache))
      • [2、SQLite 缓存(SQLite Cache)](#2、SQLite 缓存(SQLite Cache))
      • [3、链中的可选缓存(Optional Caching in Chains)](#3、链中的可选缓存(Optional Caching in Chains))
    • 八、Serialization
    • 九、流式传输(Streaming)
    • [十、Tracking token usage](#十、Tracking token usage)


一、LLMs vs 聊天模型


  • LLMs: 输入一个文本字符串并返回一个文本字符串的模型
  • 聊天模型: 由语言模型支持的模型,接受一个聊天消息列表作为输入并返回一个聊天消息







它们不是接受单个字符串作为输入,而是接受一个聊天消息列表 作为输入。

通常,这些消息带有发言者标签 (通常是"系统","AI"和"人类"之一)。




这包括常见的方法"predict",它接受一个字符串并返回一个字符串,以及"predict messages",它接受消息并返回一个消息。

如果您正在使用特定模型,建议使用该模型类的特定方法(例如LLMs的"predict"和聊天模型的"predict messages"),但如果您创建的应用程序需要与不同类型的模型一起工作,则共享接口可能会有所帮助。




有很多LLM提供商(OpenAI、Cohere、Hugging Face等)- LLM类旨在为所有这些提供商提供标准接口。

在本教程中,我们将使用OpenAI LLM包装器,尽管强调的功能对于所有LLM类型都是通用的。

1、设置 OpenAI

首先,我们需要安装 OpenAI Python 包:

pip install openai

使用 API 需要一个 API 密钥,您可以通过创建帐户并转到 此处 获取。一旦我们获得密钥,我们将希望通过运行以下命令将其设置为环境变量:

export OPENAI_API_KEY="..."

如果您不想设置环境变量,可以在初始化 OpenAI LLM 类时直接通过 openai_api_key 命名参数传递密钥:

from langchain.llms import OpenAI

llm = OpenAI(openai_api_key="...")


from langchain.llms import OpenAI

llm = OpenAI()

2、__call__: string in -> string out

使用 LLM 的最简单方法是可调用的:输入一个字符串,获得一个字符串完成结果。

llm("Tell me a joke")
# -> 'Why did the chicken cross the road?\n\nTo get to the other side.'

3、generate: batch calls, richer outputs

generate lets you can call the model with a list of strings, getting back a more complete response than just the text.

This complete response can includes things like multiple top responses and other LLM provider-specific information:

llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
# ->    30

    [Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'),
     Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.')]

    [Generation(text="\n\nWhat if love neverspeech\n\nWhat if love never ended\n\nWhat if love was only a feeling\n\nI'll never know this love\n\nIt's not a feeling\n\nBut it's what we have for each other\n\nWe just know that love is something strong\n\nAnd we can't help but be happy\n\nWe just feel what love is for us\n\nAnd we love each other with all our heart\n\nWe just don't know how\n\nHow it will go\n\nBut we know that love is something strong\n\nAnd we'll always have each other\n\nIn our lives."),
     Generation(text='\n\nOnce upon a time\n\nThere was a love so pure and true\n\nIt lasted for centuries\n\nAnd never became stale or dry\n\nIt was moving and alive\n\nAnd the heart of the love-ick\n\nIs still beating strong and true.')]


    {'token_usage': {'completion_tokens': 3903,
      'total_tokens': 4023,
      'prompt_tokens': 120}}

三、异步 API



您可以使用agenerate方法异步调用OpenAI LLM。

import time
import asyncio

from langchain.llms import OpenAI

def generate_serially():
    llm = OpenAI(temperature=0.9)
    for _ in range(10):
        resp = llm.generate(["Hello, how are you?"])

async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])

async def generate_concurrently():
    llm = OpenAI(temperature=0.9)
    tasks = [async_generate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)

s = time.perf_counter()
# If running this outside of Jupyter, use
await generate_concurrently()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Concurrent executed in {elapsed:0.2f} seconds." + "\033[0m")

s = time.perf_counter()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Serial executed in {elapsed:0.2f} seconds." + "\033[0m")
I'm doing well, thank you. How about you?

I'm doing well, thank you. How about you?

I'm doing well, thanks for asking. How about you?
[1mSerial executed in 5.77 seconds.[0m

四、Custom LLM



_call 方法,该方法接受一个字符串、一些可选的停用词,然后返回一个字符串。


_identifying_params 属性,用于帮助打印此类的信息。应返回一个字典。


from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
class CustomLLM(LLM):
    n: int

    def _llm_type(self) -> str:
        return "custom"

    def _call(
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[: self.n]

    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"n": self.n}

We can now use this as an any other LLM.

llm = CustomLLM(n=10)

llm("This is a foobar thing")
# -> 'This is a '

Params: {'n': 10}

五、Fake LLM




from langchain.llms.fake import FakeListLLM
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
tools = load_tools(["python_repl"])

responses = ["Action: Python REPL\nAction Input: print(2 + 2)", "Final Answer: 4"]
llm = FakeListLLM(responses=responses)

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)"whats 2 + 2")
> Entering new AgentExecutor chain...
Action: Python REPL
Action Input: print(2 + 2)
Observation: Python REPL is not a valid tool, try another one.
Thought:Final Answer: 4

> Finished chain.

六、Human input LLM



我们首先 在一个代理中使用 HumanInputLLM。

from langchain.llms.human import HumanInputLLM

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

Since we will use the WikipediaQueryRun tool in this notebook, you might need to install the wikipedia package if you haven't done so already.

!pip install wikipedia
tools = load_tools(["wikipedia"])
llm = HumanInputLLM(
    prompt_func=lambda prompt: print(
        f"\n===PROMPT====\n{prompt}\n=====END OF PROMPT======"

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)"What is 'Bocchi the Rock!'?")
> Entering new AgentExecutor chain...

Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question


Question: What is 'Bocchi the Rock!'?
=====END OF PROMPT======

七、缓存 llm_caching

LangChain 为 LLM 提供了一个可选的缓存层。这个功能有两个好处:

如果您经常多次请求相同的补全,它可以通过减少您对 LLM 提供者的 API 调用次数来节省费用。

它可以通过减少您对 LLM 提供者的 API 调用次数来加速您的应用程序。

import langchain
from langchain.llms import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

1、内存缓存(In Memory Cache)

from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()

# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
text 复制代码
    CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
    Wall time: 4.83 s

    "\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"

python 复制代码
The second time it is, so it goes faster
llm("Tell me a joke")
text 复制代码
    CPU times: user 238 µs, sys: 143 µs, total: 381 µs
    Wall time: 1.76 ms

    '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

2、SQLite 缓存(SQLite Cache)

rm .langchain.db
We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
python 复制代码
The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
text 复制代码
    CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
    Wall time: 825 ms

    '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
The second time it is, so it goes faster
llm("Tell me a joke")
text 复制代码
    CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
    Wall time: 2.67 ms

    '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

3、链中的可选缓存(Optional Caching in Chains)

您还可以关闭链中特定节点的缓存。请注意,由于某些接口的原因,先构建链,然后再编辑 LLM 通常更容易。


llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
python 复制代码
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
text_splitter = CharacterTextSplitter()

with open('../../../state_of_the_union.txt') as f:
    state_of_the_union =
texts = text_splitter.split_text(state_of_the_union)
docs = [Document(page_content=t) for t in texts[:3]]

chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

    CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
    Wall time: 5.09 s

    '\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'



    CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms
    Wall time: 1.04 s

    '\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'
rm .langchain.db sqlite.db



from langchain.llms import OpenAI
from langchain.llms.loading import load_llm



!cat llm.json
text 复制代码
    "model_name": "text-davinci-003",
    "temperature": 0.7,
    "max_tokens": 256,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "n": 1,
    "best_of": 1,
    "request_timeout": null,
    "_type": "openai" 
llm = load_llm("llm.json")
python 复制代码
!cat llm.yaml
text 复制代码
_type: openai
best_of: 1
frequency_penalty: 0.0
max_tokens: 256
model_name: text-davinci-003
n: 1
presence_penalty: 0.0
request_timeout: null
temperature: 0.7
top_p: 1.0
llm = load_llm("llm.yaml")


If you want to go from an LLM in memory to a serialized version of it, you can do so easily by calling the .save method. Again, this supports both json and yaml.

python 复制代码"llm.json")
python 复制代码"llm.yaml")


一些 LLM 提供流式响应。这意味着您可以在整个响应返回之前开始处理它,而不是等待它完全返回。


目前,我们支持对 OpenAIChatOpenAIChatAnthropic 实现的流式传输。

要使用流式传输,请使用一个实现了 on_llm_new_tokenCallbackHandler

在这个示例中,我们使用的是 StreamingStdOutCallbackHandler

from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
    Verse 1
    I'm sippin' on sparkling water,
    It's so refreshing and light,
    It's the perfect way to quench my thirst
    On a hot summer night.
    Sparkling water, sparkling water,
    It's the best way to stay hydrated,
    It's so crisp and so clean,
    It's the perfect way to stay refreshed.
    Verse 2
    I'm sippin' on sparkling water,
    It's so bubbly and bright,
    It's the perfect way to cool me down
    On a hot summer night.
    Sparkling water, sparkling water,
    It's the best way to stay hydrated,
    It's so crisp and so clean,
    It's the perfect way to stay refreshed.
    Verse 3
    I'm sippin' on sparkling water,
    It's so light and so clear,
    It's the perfect way to keep me cool
    On a hot summer night.
    Sparkling water, sparkling water,
    It's the best way to stay hydrated,
    It's so crisp and so clean,
    It's the perfect way to stay refreshed.

如果使用 generate,我们仍然可以访问最终的 LLMResult

但是,目前不支持对流式传输的 token_usage

llm.generate(["Tell me a joke."])
text 复制代码
    Q: What did the fish say when it hit the wall?
    A: Dam!

    LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})

十、Tracking token usage

本例子将介绍如何跟踪特定调用的token 使用情况。目前,仅支持OpenAI API。

让我们首先看一个非常简单的示例,用于跟踪单个LLM调用的token 使用情况。

from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
text 复制代码
Tokens Used: 42
    Prompt Tokens: 4
    Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084


with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    result2 = llm("Tell me a joke")
    # -> 91

If a chain or agent with multiple steps in it is used, it will track all those steps.

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
with get_openai_callback() as cb:
    response =
        "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")


