大模型从入门到应用——LangChain：回调函数（Callbacks）]

LangChain提供了一个回调函数系统，允许我们在LLM应用的各个阶段进行钩子处理。这对于日志记录、监控、流处理和其他任务非常有用。我们可以通过使用API中提供的callbacks参数来订阅这些事件。该参数是一个处理程序对象列表，这些处理程序对象应该实现下面更详细描述的一个或多个方法。主要有两种回调机制：

构造器回调将用于在该对象上进行的所有调用，并且将仅作用于该对象，即如果将处理程序传递给LLMChain的构造函数，则不会被附加到该链上的模型使用。
请求回调仅用于该特定请求，以及该请求包含的所有子请求（例如，对LLMChain的调用触发对模型的调用，使用的是通过传递的相同处理程序），这些回调是显式传递的。

当我们创建一个自定义链时，可以轻松设置它以使用与所有内置链相同的回调系统。链、LLM、聊天模型、代理工具上的_call、_generate、_run和等效的异步方法现在接收第二个名为run_manager的参数，该参数绑定到该运行，其中包含该对象可以使用的日志记录方法（例如on_llm_new_token）。这在构建自定义链时非常有用，我们还参阅创建自定义链并在其中使用回调的指南获取更多信息。

CallbackHandlers是实现CallbackHandler接口的对象，该接口有每个可以订阅的事件的方法。当触发事件时，CallbackManager将调用每个处理程序的相应方法。

csharp 复制代码

class BaseCallbackHandler:
    """用于处理来自LangChain的回调的基本回调处理程序。"""

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        """LLM开始运行时执行。"""
    
    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any
    ) -> Any:
        """Chat Model开始运行时执行。"""

    def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:
        """新的LLM标记时执行。仅在启用流处理时可用。"""

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:
        """LLM结束运行时执行。"""

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """LLM发生错误时执行。"""

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> Any:
       

 """链开始运行时执行。"""

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:
        """链结束运行时执行。"""

    def on_chain_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """链发生错误时执行。"""

    def on_tool_start(
        self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
    ) -> Any:
        """工具开始运行时执行。"""

    def on_tool_end(self, output: str, **kwargs: Any) -> Any:
        """工具结束运行时执行。"""

    def on_tool_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """工具发生错误时执行。"""

    def on_text(self, text: str, **kwargs: Any) -> Any:
        """执行任意文本时执行。"""

    def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
        """代理动作时执行。"""

    def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:
        """代理结束时执行。"""

使用回调函数

callbacks参数在API的大多数对象（链、模型、工具、代理等）中都可以使用，在两个不同的位置：

构造器回调：在构造函数中定义，例如LLMChain(callbacks=[handler], tags=['a-tag'])，将用于在该对象上进行的所有调用，并且将仅作用于该对象，例如，如果要记录发送到LLMChain的所有请求，可以在构造函数中传递一个处理程序。
请求回调：在发出请求的call()、run()、apply()方法中定义，例如chain.call(inputs, callbacks=[handler], tags=['a-tag'])，仅用于该特定请求，以及该请求包含的所有子请求（例如，对LLMChain的调用触发对模型的调用，使用的是通过call()方法传递的相同处理程序）。

verbose参数在API的大多数对象（链、模型、工具、代理等）中作为构造函数参数可用，例如LLMChain(verbose=True)，它等效于将ConsoleCallbackHandler传递给该对象及其所有子对象的callbacks参数。这对于调试非常有用，因为它会将所有事件记录到控制台。

构造器回调最适用于与单个请求无关的用例，例如日志记录、监控等。例如，如果要记录发送到LLMChain的所有请求，可以将处理程序传递给构造函数。
请求回调最适用于流处理等用例，其中您希望将单个请求的输出流式传输到特定的Websocket连接或其他类似的用例。例如，如果要将单个请求的输出流式传输到Websocket，可以在call()方法中传递一个处理程序。

我们可以通过将tags参数传递给call()、run()、apply()方法来为回调函数添加标签。这对于筛选日志非常有用，例如，如果要记录发送到特定LLMChain的所有请求，可以添加一个标签，然后根据该标签筛选日志。您可以在构造器和请求回调中都传递标签，详细信息请参见上面的示例。这些标签然后传递给start回调方法（例如on_llm_start、on_chat_model_start、on_chain_start、on_tool_start）的tags参数。

使用现有处理程序

LangChain提供了一些内置处理程序，我们可以使用它们来开始使用。这些处理程序在langchain.callbacks模块中可用。最基本的处理程序是StdOutCallbackHandler，它简单地将所有事件记录到stdout。在将来，我们将向库中添加更多默认处理程序。

当对象上的verbose标志设置为True时，将调用StdOutCallbackHandler，即使没有显式传递。

csharp 复制代码

from langchain.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

# First, let's explicitly set the StdOutCallbackHandler in `callbacks`
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
chain.run(number=2)

# Then, let's use the `verbose` flag to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
chain.run(number=2)

# Finally, let's use the request `callbacks` to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt)
chain.run(number=2, callbacks=[handler])

日志输出：

复制代码

Entering new LLMChain chain...
Prompt after formatting:
1 + 2 = 

Finished chain.


Entering new LLMChain chain...
Prompt after formatting:
1 + 2 = 

Finished chain.


Entering new LLMChain chain...
Prompt after formatting:
1 + 2 = 

Finished chain.

输出：

复制代码

'\n\n3'

创建自定义处理程序

我们也可以创建自定义处理程序并将其设置在对象上。在下面的示例中，我们将使用自定义处理程序实现流式处理。

csharp 复制代码

from langchain.callbacks.base import BaseCallbackHandler
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[MyCustomHandler()])

chat([HumanMessage(content="Tell me a joke")])

日志输出：

复制代码

My custom handler, token: 
My custom handler, token: Why
My custom handler, token:  did
My custom handler, token:  the
My custom handler, token:  tomato
My custom handler, token:  turn
My custom handler, token:  red
My custom handler, token: ?
My custom handler, token:  Because
My custom handler, token:  it
My custom handler, token:  saw
My custom handler, token:  the
My custom handler, token:  salad
My custom handler, token:  dressing
My custom handler, token: !
My custom handler, token:

输出：

复制代码

AIMessage(content='Why did the tomato turn red? Because it saw the salad dressing!', additional_kwargs={})

异步回调函数

如果我们计划使用异步API，建议使用AsyncCallbackHandler以避免阻塞运行循环。如果在使用异步方法运行LLM、Chain、Tool、Agent时使用同步的CallbackHandler，它仍然可以工作。但在内部，它将使用run_in_executor调用，如果您的CallbackHandler不是线程安全的，可能会导致问题。

csharp 复制代码

import asyncio
from typing import Any, Dict, List
from langchain.schema import LLMResult
from langchain.callbacks.base import AsyncCallbackHandler

class MyCustomSyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")

class MyCustomAsyncHandler(AsyncCallbackHandler):
    """Async callback handler that can be used to handle callbacks from langchain."""

    async def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Run when chain starts running."""
        print("zzzz....")
        await asyncio.sleep(0.3)
        class_name = serialized["name"]
        print("Hi! I just woke up. Your llm is starting")

    async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Run when chain ends running."""
        print("zzzz....")
        await asyncio.sleep(0.3)
        print("Hi! I just woke up. Your llm is ending")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()])

await chat.agenerate([[HumanMessage(content="Tell me a joke")]])

日志输出：

复制代码

zzzz....
Hi! I just woke up. Your llm is starting
Sync handler being called in a `thread_pool_executor`: token: 
Sync handler being called in a `thread_pool_executor`: token: Why
Sync handler being called in a `thread_pool_executor`: token:  don
Sync handler being called in a `thread_pool_executor`: token: 't
Sync handler being called in a `thread_pool_executor`: token:  scientists
Sync handler being called in a `thread_pool_executor`: token:  trust
Sync handler being called in a `thread_pool_executor`: token:  atoms
Sync handler being called in a `thread_pool_executor`: token: ?


Sync handler being called in a `thread_pool_executor`: token: Because
Sync handler being called in a `thread_pool_executor`: token:  they
Sync handler being called in a `thread_pool_executor`: token:  make
Sync handler being called in a `thread_pool_executor`: token:  up
Sync handler being called in a `thread_pool_executor`: token:  everything
Sync handler being called in a `thread_pool_executor`: token: !
Sync handler being called in a `thread_pool_executor`: token: 
zzzz....
Hi! I just woke up. Your llm is ending

输出：

复制代码

LLMResult(generations=[[ChatGeneration(text="Why don't scientists trust atoms?\n\nBecause they make up everything!", generation_info=None, message=AIMessage(content="Why don't scientists trust atoms?\n\nBecause they make up everything!", additional_kwargs={}))]], llm_output={'token_usage': {}, 'model_name': 'gpt-3.5-turbo'})

使用多个处理程序，通过处理程序传递

在前面的示例中，我们通过使用callbacks=在创建对象时传递回调处理程序。在这种情况下，回调函数将被限定在特定的对象范围内。

然而，在许多情况下，当运行对象时，通过传递处理程序会更有优势。当使用callbacks关键字参数将CallbackHandlers传递给执行一个运行时，这些回调函数将被应用于所有参与执行的嵌套对象。例如，当将处理程序传递给一个Agent时，它将用于与该Agent相关的所有回调以及Agent执行中涉及的所有对象，例如Tools、LLMChain和LLM，这样可以避免手动将处理程序附加到每个单独的嵌套对象上。

csharp 复制代码

from typing import Dict, Union, Any, List

from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import tracing_enabled
from langchain.llms import OpenAI

# First, define custom callback handler implementations
class MyCustomHandlerOne(BaseCallbackHandler):
    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        print(f"on_llm_start {serialized['name']}")

    def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:
        print(f"on_new_token {token}")

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when LLM errors."""

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> Any:
        print(f"on_chain_start {serialized['name']}")

    def on_tool_start(
        self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
    ) -> Any:
        print(f"on_tool_start {serialized['name']}")

    def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
        print(f"on_agent_action {action}")

class MyCustomHandlerTwo(BaseCallbackHandler):
    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        print(f"on_llm_start (I'm the second handler!!) {serialized['name']}")

# Instantiate the handlers
handler1 = MyCustomHandlerOne()
handler2 = MyCustomHandlerTwo()

# Setup the agent. Only the `llm` will issue callbacks for handler2
llm = OpenAI(temperature=0, streaming=True, callbacks=[handler2])
tools = load_tools(["llm-math"], llm=llm)
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
)

# Callbacks for handler1 will be issued by every object involved in the 
# Agent execution (llm, llmchain, tool, agent executor)
agent.run("What is 2 raised to the 0.235 power?", callbacks=[handler1])

日志输出：

复制代码

on_chain_start AgentExecutor
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token  I
on_new_token  need
on_new_token  to
on_new_token  use
on_new_token  a
on_new_token  calculator
on_new_token  to
on_new_token  solve
on_new_token  this
on_new_token .
on_new_token 
Action
on_new_token :
on_new_token  Calculator
on_new_token 
Action
on_new_token  Input
on_new_token :
on_new_token  2
on_new_token ^
on_new_token 0
on_new_token .
on_new_token 235
on_new_token 
on_agent_action AgentAction(tool='Calculator', tool_input='2^0.235', log=' I need to use a calculator to solve this.\nAction: Calculator\nAction Input: 2^0.235')
on_tool_start Calculator
on_chain_start LLMMathChain
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token 

on_new_token ```text
on_new_token 

on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token 

on_new_token ```

on_new_token ...
on_new_token num
on_new_token expr
on_new_token .
on_new_token evaluate
on_new_token ("
on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token ")
on_new_token ...
on_new_token 

on_new_token 
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token  I
on_new_token  now
on_new_token  know
on_new_token  the
on_new_token  final
on_new_token  answer
on_new_token .
on_new_token 
Final
on_new_token  Answer
on_new_token :
on_new_token  1
on_new_token .
on_new_token 17
on_new_token 690
on_new_token 67
on_new_token 372
on_new_token 187
on_new_token 674
on_new_token

输出：

复制代码

'1.1769067372187674'

跟踪和令牌计数

跟踪和令牌计数是我们提供的两个基于回调机制的功能。

跟踪

有两种推荐的方式来跟踪您的LangChains：

将LANGCHAIN_TRACING 环境变量设置为true。
使用上下文管理器with tracing_enabled()来跟踪特定的代码块。

注意：如果设置了环境变量，所有代码都将被跟踪，无论是否在上下文管理器内。

csharp 复制代码

import os

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import tracing_enabled
from langchain.llms import OpenAI

# To run the code, make sure to set OPENAI_API_KEY and SERPAPI_API_KEY
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

questions = [
    "Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?",
    "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?",
    "Who won the most recent formula 1 grand prix? What is their age raised to the 0.23 power?",
    "Who won the US Open women's final in 2019? What is her age raised to the 0.34 power?",
    "Who is Beyonce's husband? What is his age raised to the 0.19 power?",
]
os.environ["LANGCHAIN_TRACING"] = "true"

# Both of the agent runs will be traced because the environment variable is set
agent.run(questions[0])
with tracing_enabled() as session:
    assert session
    agent.run(questions[1])

日志输出：

复制代码

Entering new AgentExecutor chain...
 I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.
Action: Search
Action Input: "US Open men's final 2019 winner"
Observation: Rafael Nadal defeated Daniil Medvedev in the final, 7--5, 6--3, 5--7, 4--6, 6--4 to win the men's singles tennis title at the 2019 US Open. It was his fourth US ...
Thought: I need to find out the age of the winner
Action: Search
Action Input: "Rafael Nadal age"
Observation: 36 years
Thought: I need to calculate the age raised to the 0.334 power
Action: Calculator
Action Input: 36^0.334
Observation: Answer: 3.3098250249682484
Thought: I now know the final answer
Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.

Finished chain.


Entering new AgentExecutor chain...
 I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557.

Finished chain.

输入：

复制代码

# Now, we unset the environment variable and use a context manager.

if "LANGCHAIN_TRACING" in os.environ:
    del os.environ["LANGCHAIN_TRACING"]

# here, we are writing traces to "my_test_session"
with tracing_enabled("my_test_session") as session:
    assert session
    agent.run(questions[0])  # this should be traced

agent.run(questions[1])  # this should not be traced

日志输出：

复制代码

Entering new AgentExecutor chain...
 I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.
Action: Search
Action Input: "US Open men's final 2019 winner"
Observation: Rafael Nadal defeated Daniil Medvedev in the final, 7--5, 6--3, 5--7, 4--6, 6--4 to win the men's singles tennis title at the 2019 US Open. It was his fourth US ...
Thought: I need to find out the age of the winner
Action: Search
Action Input: "Rafael Nadal age"
Observation: 36 years
Thought: I need to calculate the age raised to the 0.334 power
Action: Calculator
Action Input: 36^0.334
Observation: Answer: 3.3098250249682484
Thought: I now know the final answer
Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.

Finished chain.


Entering new AgentExecutor chain...
 I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557.

Finished chain.

输出：

复制代码

"Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557."

输入：

复制代码

# The context manager is concurrency safe:
if "LANGCHAIN_TRACING" in os.environ:
    del os.environ["LANGCHAIN_TRACING"]

# start a background task
task = asyncio.create_task(agent.arun(questions[0]))  # this should not be traced
with tracing_enabled() as session:
    assert session
    tasks = [agent.arun(q) for q in questions[1:3]]  # these should be traced
    await asyncio.gather(*tasks)

await task

日志输出：

复制代码

Entering new AgentExecutor chain...

Entering new AgentExecutor chain...


Entering new AgentExecutor chain...

 I need to find out who won the grand prix and then calculate their age raised to the 0.23 power.
Action: Search
Action Input: "Formula 1 Grand Prix Winner" I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.
Action: Search
Action Input: "US Open men's final 2019 winner"Rafael Nadal defeated Daniil Medvedev in the final, 7--5, 6--3, 5--7, 4--6, 6--4 to win the men's singles tennis title at the 2019 US Open. It was his fourth US ... I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.Lewis Hamilton has won 103 Grands Prix during his career. He won 21 races with McLaren and has won 82 with Mercedes. Lewis Hamilton holds the record for the ... I need to find out the age of the winner
Action: Search
Action Input: "Rafael Nadal age"36 years I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age" I need to find out Lewis Hamilton's age
Action: Search
Action Input: "Lewis Hamilton Age"29 years I need to calculate the age raised to the 0.334 power
Action: Calculator
Action Input: 36^0.334 I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23Answer: 3.3098250249682484Answer: 2.16945946249155738 years
Finished chain.

Finished chain.
 I now need to calculate 38 raised to the 0.23 power
Action: Calculator
Action Input: 38^0.23Answer: 2.3086081644669734
Finished chain.

输出：

复制代码

"Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484."

令牌计数

LangChain提供了一个上下文管理器，允许我们计算令牌数。

csharp 复制代码

from langchain.callbacks import get_openai_callback

llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
    llm("What is the square root of 4?")

total_tokens = cb.total_tokens
assert total_tokens > 0

with get_openai_callback() as cb:
    llm("What is the square root of 4?")
    llm("What is the square root of 4?")

assert cb.total_tokens == total_tokens * 2

# You can kick off concurrent runs from within the context manager
with get_openai_callback() as cb:
    await asyncio.gather(
        *[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
    )

assert cb.total_tokens == total_tokens * 3

# The context manager is concurrency safe
task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
with get_openai_callback() as cb:
    await llm.agenerate(["What is the square root of 4?"])

await task
assert cb.total_tokens == total_tokens

参考文献：

1\] LangChain官方网站：https://www.langchain.com/ \[2\] LangChain 🦜️🔗 中文网，跟着LangChain一起学LLM/GPT开发：https://www.langchain.com.cn/ \[3\] LangChain中文网 - LangChain 是一个用于开发由语言模型驱动的应用程序的框架：http://www.cnlangchain.com/