文本概括(Summarizing)

当今世界上有太多的文本信息,几乎没有人能够拥有足够的时间去阅读所有我们想了解的东西。但令人感到欣喜的是,目前LLM在**文本概括(Summarizing)**任务上展现了强大的水准(大白话理解即帮助简短总结重要的内容,以便我们快速了解和学习)。

文本概括实例

1. 基础概括与长度限制

prod_review = """

这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。

公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,

它有点小,我感觉在别的地方用同样的价钱能买到更大的。

快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。

"""

这是最基础的应用场景,旨在将冗长的用户评论转化为简短的摘要,便于快速浏览。

  • 核心技巧 :使用明确的约束指令,如 in at most X words(最多 X 个词)。
  • 适用场景:电商平台的评论列表预览、新闻头条生成。

示例代码:

python 复制代码
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

输出效果:

"可爱软熊猫公仔,女儿喜欢,面部表情和善,但价钱有点小贵,快递提前一天到货。"

(涵盖了情感、外观、价格吐槽和物流,信息全面但紧凑)

2. 关键角度侧重

这是文本概括的高级用法。同样的文本,针对不同的业务部门,生成的摘要内容应有所不同。

  • 核心技巧 :在 Prompt 中指定目标受众关注焦点
  • 适用场景:企业内部的情报分析、不同部门的数据看板。

场景 A:侧重物流运输

python 复制代码
prompt = f"""
你的任务是从电子商务网站上生成一个产品评论的简短摘要。

请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品运输上。

评论: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

输出效果:

"快递提前到货 ,熊猫公仔软可爱,但有点小,价钱不太划算。"

(模型将物流信息提到了最前面,优先展示)

场景 B:侧重价格与质量

python 复制代码
prompt = f"""
你的任务是从电子商务网站上生成一个产品评论的简短摘要。

请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品价格和质量上。

评论: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

输出效果:

"可爱软熊猫公仔,面部表情友好,但价钱有点高,尺寸较小 。快递提前一天到货。"

(模型优先强调了性价比问题,物流信息被放到了最后)

3. 概括与提取的区别

在处理信息时,我们需要区分"概括"和"提取":

  • 概括:允许模型保留一定的上下文。例如在侧重"价格"时,模型可能仍会保留"快递提前"这一信息作为补充背景。
  • 提取 :要求模型像过滤器一样,输出特定信息,过滤掉所有无关内容。

示例代码(提取模式):

python 复制代码
prompt = f"""
你的任务是从电子商务网站上的产品评论中提取相关信息。

请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。

评论: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

输出效果:

"快递比预期提前了一天到货。"

(完全去除了关于玩具外观和价格的描述,只保留纯粹的物流事实)

4. 多条文本概括Prompt实例

在实际生产环境中,我们通常面对的是海量数据。

python 复制代码
review_1 = prod_review 

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products. 
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I've seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn't.  Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean! 
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn't look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \ 
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \ 
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \ 
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""

reviews = [review_1, review_2, review_3, review_4]
  • 核心技巧 :使用循环结构(如 Python 的 for 循环)批量调用 Prompt。
  • 注意:对于超大规模数据(百万级),简单的循环效率较低,通常需要结合分布式处理或批量 API。
python 复制代码
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product \ 
    review from an ecommerce site. 

    Summarize the review below, delimited by triple \
    backticks in at most 20 words. 

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(i, response, "\n")

批量输出示例:

  1. 熊猫公仔:Soft and cute... Arrived early.
  2. 台灯:Affordable lamp... fast shipping... excellent customer service.
  3. 牙刷:Good battery life... small toothbrush head...
  4. 搅拌机:Price increased... motor made a funny noise...

总结

文本概括不仅仅是"变短",而是通过限制长度指定视角明确提取要求,将非结构化的长文本转化为对特定业务有价值的情报。