本教程需要你准备好了科学上网, git, python, pip和vscode(或其他IDE编辑器) 。相关教程有很多,未准备好相关应用的,请自行互联网搜索 或询问AI。
PS:我写的教程一般都是我觉得有用,且在多台设备上成功复现过的。不是让AI简单的总结,所有的字都是我自己敲的,如感觉有帮助请帮忙点个免费的赞。万分感谢!
第一步:克隆browse-use仓库
在vscode中选择一个合适的文件夹open folder,并在vscode终端运行下面的命令:(如果clone不下来你也可以从仓库直接下载zip包然后解压)
git clone https://github.com/browser-use/browser-use.git
第二步:安装browser-use库, playwright库和gradio库
依然是使用终端依次运行下面的命令行,要求(Python≥3.11):
perl
pip install browser-use
playwright install
pip install gradio
第三步:修改代码配置以实现低成本使用
3.1 找到克隆仓库里的gradio_demo.py 文件,直接完全替换我下面编辑好的代码:(修改了好几处,直接发给你们,方便你我他~)
py
import os
import asyncio
from dataclasses import dataclass
from typing import List, Optional
# Third-party imports
import gradio as gr
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from rich.console import Console
from rich.panel import Panel
from rich.text import Text
from pydantic import SecretStr
# Local module imports
from browser_use import Agent
load_dotenv()
@dataclass
class ActionResult:
is_done: bool
extracted_content: Optional[str]
error: Optional[str]
include_in_memory: bool
@dataclass
class AgentHistoryList:
all_results: List[ActionResult]
all_model_outputs: List[dict]
def parse_agent_history(history_str: str) -> None:
console = Console()
# Split the content into sections based on ActionResult entries
sections = history_str.split('ActionResult(')
for i, section in enumerate(sections[1:], 1): # Skip first empty section
# Extract relevant information
content = ''
if 'extracted_content=' in section:
content = section.split('extracted_content=')[1].split(',')[0].strip("'")
if content:
header = Text(f'Step {i}', style='bold blue')
panel = Panel(content, title=header, border_style='blue')
console.print(panel)
console.print()
async def run_browser_task(
task: str,
api_key: str,
model: str = 'gpt-4o',
headless: bool = True,
) -> str:
if not api_key.strip():
return 'Please provide an API key'
os.environ['OPENAI_API_KEY'] = api_key
try:
agent = Agent(
task=task,
# llm=ChatOpenAI(model='gpt-4o'),
llm=ChatOpenAI(base_url='https://api.cursorai.art/v1', model='gpt-4o', api_key=SecretStr(api_key)),
)
result = await agent.run()
# TODO: The result cloud be parsed better
return result # type: ignore
except Exception as e:
return f'Error: {str(e)}'
def create_ui():
with gr.Blocks(title='Browser Use GUI') as interface:
gr.Markdown('# Browser Use Task Automation')
with gr.Row():
with gr.Column():
api_key = gr.Textbox(label='OpenAI API Key', placeholder='sk-...', type='password')
task = gr.Textbox(
label='Task Description',
placeholder='E.g., Find flights from New York to London for next week',
lines=3,
)
model = gr.Dropdown(
# choices=['gpt-4', 'gpt-3.5-turbo'], label='Model', value='gpt-4'
choices=['gpt-4o'], label='Model', value='gpt-4o'
)
headless = gr.Checkbox(label='Run Headless', value=True)
submit_btn = gr.Button('Run Task')
with gr.Column():
output = gr.Textbox(label='Output', lines=10, interactive=False)
submit_btn.click(
fn=lambda *args: asyncio.run(run_browser_task(*args)),
inputs=[task, api_key, model, headless],
outputs=output,
)
return interface
if __name__ == '__main__':
demo = create_ui()
demo.launch()
3.2 cd到browser-use文件夹根目录,在终端运行下面的命令行:
bash
cd browser-use
python examples/ui/gradio_demo.py
3.3 启动服务后,在浏览器中打开http://127.0.0.1:7860/,填入API Key ,任务描述。
(API Key获取方法在文末)
3.4 执行过程及结果:AI去帮我找图了,最后返回了一些链接,但是这是小图。其实也可以要求他进入大图页面,给我大图链接。
PS:对于一些需要登录的网站,我们手动登录之后,让AI操作,体验更佳。
补充:API Key获取方法
在CURSOR API官网注册并登录后,点击API令牌,在右侧复制你自己的API令牌。
谢谢观看,祝你发财!