多模态模型在做选择题时，如何设置Prompt，如何精准定位我们需要的选项

UndefindX2025-03-07 23:02

我们这里以Qwen2-VL-7B-instruct为例：

假设我们需要分析一张图片的情绪（从现有的情绪中进行选择），并且我们需要它以思维链的形式展现出来，我们可以这样设置prompt：

python 复制代码

emotion6_CoT = """
Analyze the given image and determine the emotion it represents.
Emotional options :(A) anger (B) disgust (C) fear (D) joy (E) sadness (F) surprise (G) neutral
Your output should follow this format strictly:
# analyze 
Your analyze here
# answer
Choice index, one of A-G
"""

这样设计的好处是，最终的answer中一定会有 # analyze 和 # answer 我们就可以利用正则表达式去进行准确提取：假设我们要提取其中的选项，我们可以这样写：

python 复制代码

def remove_words(s):
    # 定义需要删除的词汇列表
    words_to_remove = ['Choice', 'index', 'one', 'of', 'A-G']

    # 使用正则表达式删除这些词
    for word in words_to_remove:
        s = re.sub(r'\b' + word + r'\b', '', s)

    # 去除多余的空格
    s = re.sub(r'\s+', ' ', s).strip()

    return s

#ouput为输出列表，我们需要将里面的字符串进行提取，所以为output_text[0]。

option = re.search(r'[A-H]', remove_words(output_text[0].split("# answer")[1]))

最终，我们可以借用字典去匹配对应情绪即可。

注意：在一些推理能力不强的模型中（例如 Qwen2-base-7B），可能会遇到输出依然不遵循prompt的回答，这是正常的。