MAI-UI的prompt

MAI-UI prompt.py

1、主要看第三种Prompt ------ MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP,内容详细点

2、从Prompt看出,可用APPs主要是英文类

3、这里面的Mobile Use可以看做是 一个MCP Tool

4、和Open-AutoGLM相比,实现了ask_user(对应的是 interact动作),没有 take_over 动作

第一种 MAI_MOBILE_SYS_PROMPT

分成以下4个部分:

1、身份:

You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.

2、输出格式要求:

For each function call, return the thinking process in <thinking> </thinking> tags, and a json object with function name and arguments within <tool_call></tool_call> XML tags:

python 复制代码
<thinking>
...
</thinking>
<tool_call>
{"name": "mobile_use", "arguments": <args-json-object>}
</tool_call>

3、动作空间(10个):

这里的动作类型和其他prompt不同,尤其注意。

python 复制代码
{"action": "click", "coordinate": [x, y]}
{"action": "long_press", "coordinate": [x, y]}
{"action": "type", "text": ""}
{"action": "swipe", "direction": "up or down or left or right", "coordinate": [x, y]} # "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.
{"action": "open", "text": "app_name"}
{"action": "drag", "start_coordinate": [x1, y1], "end_coordinate": [x2, y2]}
{"action": "system_button", "button": "button_name"} # Options: back, home, menu, enter
{"action": "wait"}
{"action": "terminate", "status": "success or fail"}
{"action": "answer", "text": "xxx"} # Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.

4、备注:

  • 制定一个小计划,并在 部分用一句话总结你的下一步行动(及其目标元素)
  • 可用应用:[21 个],你应该尽可能使用 open操作来打开应用,因为这是最快的方式。 (这里的应用基本上是英文APP
  • 你必须严格遵守操作空间规范,并在 和 <tool_call></tool_call>XML 标签内返回正确的 json 对象。
python 复制代码
- Write a small plan and finally summarize your next action (with its target element) in one sentence in <thinking></thinking> part.
- Available Apps: `["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.
You should use the `open` action to open the app as possible as you can, because it is the fast way to open the app.
- You must follow the Action Space strictly, and return the correct json object within <thinking> </thinking> and <tool_call></tool_call> XML tags.

第二种 MAI_MOBILE_SYS_PROMPT_NO_THINKING

1、身份:和第一种相同

2、输出格式要求:与第一种相比,少了 <think> 内容

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:

python 复制代码
<tool_call>
{"name": "mobile_use", "arguments": <args-json-object>}
</tool_call>

3、动作空间:和第一种相同

4、备注:与第一种相比,少了plan那一句

python 复制代码
- Available Apps: `["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.
You should use the `open` action to open the app as possible as you can, because it is the fast way to open the app.
- You must follow the Action Space strictly, and return the correct json object within <thinking> </thinking> and <tool_call></tool_call> XML tags.

第三种 MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP

分成以下 5 个部分

1、身份:和第一种相同

2、输出格式要求:和第一种相同

3、动作空间(12个):

与第一种相比,多了 ask_userdouble_click 两个动作

ask_user 是 Agent面对不确定的情况向用户做出提问

python 复制代码
{"action": "click", "coordinate": [x, y]}
{"action": "long_press", "coordinate": [x, y]}
{"action": "type", "text": ""}
{"action": "swipe", "direction": "up or down or left or right", "coordinate": [x, y]} # "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.
{"action": "open", "text": "app_name"}
{"action": "drag", "start_coordinate": [x1, y1], "end_coordinate": [x2, y2]}
{"action": "system_button", "button": "button_name"} # Options: back, home, menu, enter 
{"action": "wait"}
{"action": "terminate", "status": "success or fail"} 
{"action": "answer", "text": "xxx"} # Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.
{"action": "ask_user", "text": "xxx"} # you can ask user for more information to complete the task.
{"action": "double_click", "coordinate": [x, y]}

4、MCP工具:这一部分是本prompt特有的

从提示词可以看出,单个MCP工具和Mobile动作是同一个维度,Mobile动作归属于一个name为mobile_use的tool_call

python 复制代码
{% if tools -%}
## MCP Tools

You are also provided with MCP tools, you can use them to complete the task.
{{ tools }}

If you want to use MCP tools, you must output as the following format:
python 复制代码
<thinking>
...
</thinking>
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
python 复制代码
{% endif -%}

5、备注

  • 这里的可用apps有 14 个,比第一种prompt的少 7
python 复制代码
- Available Apps: `["Contacts", "Settings", "Clock", "Maps", "Chrome", "Calendar", "files", "Gallery", "Taodian", "Mattermost", "Mastodon", "Mail", "SMS", "Camera"]`.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in <thinking></thinking> part.

第四种 MAI_MOBILE_SYS_PROMPT_GROUNDING

比较简单

复制代码
任务:
给定一张截图和用户的定位指令。你的任务是根据用户的指令准确定位一个UI元素。
首先,你需要仔细查看截图并分析用户的指令,将用户的指令转化为有效的推理过程,然后提供最终的坐标。
python 复制代码
You are a GUI grounding agent. 

## Task
Given a screenshot and the user's grounding instruction. Your task is to accurately locate a UI element based on the user's instructions.
First, you should carefully examine the screenshot and analyze the user's instructions,  translate the user's instruction into a effective reasoning process, and then provide the final coordinate.

## Output Format
Return a json object with a reasoning process in <grounding_think></grounding_think> tags, a [x,y] format coordinate within <answer></answer> XML tags:
<grounding_think>...</grounding_think>
<answer>
{"coordinate": [x,y]}
</answer>
相关推荐
Aision_1 小时前
Agent 为什么需要 Checkpoint?
人工智能·python·gpt·langchain·prompt·aigc·agi
IT空门:门主6 小时前
Pixso UI + Figma + ui-ux-pro-max +ai idae工作流教程
ui·ux·figma·ai idae
qq_452396238 小时前
第十八篇:《移动端UI自动化:Appium入门实战》
ui·appium·自动化
罗西的思考8 小时前
【GUI-Agent】阿里通义MAI-UI 代码阅读(1)— 总体
人工智能·机器学习·ui·transformer
码途漫谈9 小时前
UI-UX-Pro-Max开源项目介绍
人工智能·ui·ai·开源·ai编程·ux
比特 GOK9 小时前
Qt项目ui文件中新添加的控件在代码中不识别的问题解决
开发语言·qt·ui
程序员三明治13 小时前
【AI】Prompt 工程入门:从五要素框架到 RAG 生产级 Prompt 模板与 Java 实战
java·人工智能·后端·大模型·llm·prompt·agent
薛定猫AI14 小时前
【深度解析】Google AI Studio Vibe Coding 更新:从 Prompt 生成到可视化应用构建闭环
人工智能·prompt
HAPPY酷16 小时前
[UE5 避坑指南] 为什么打包后 UI 消失了?Launch Game 与强制加载
java·ui·ue5
周bro16 小时前
vue2+element ui 中的el-table表格 选中当前行当前行变色,单选/多选--------续集:表格样式修改整合
vue.js·ui·elementui