Playwright02-CDP

Playwright02-CDP

playwright自动化开发记录,学习BrowserUse的时候涉及到playwright和udp-use的知识点


1-核心知识点

  • 1-运行playwright第一个demo

2-参考网址


3-动手实操

1-UV环境搭建

bash 复制代码
# 1-uv环境搭建
uv python pin 3.11.4
uv init python_playwright && cd python_playwright
uv venv && source .venv/bin/activate
uv add python-dotenv pydantic playwright

# 2-安装playwright-刷新应用
uv add playwright
source .venv/bin/activate

# 3-playwright安装chromium(当前只安装了chrome浏览器)
playwright install chromium

2-CDP接口开发

cdp-use 是一个为 Chrome DevTools Protocol(CDP)生成的 类型安全 Python 客户端库


方案 A:完全交给 Playwright

不关心真实 WebSocket 地址,只拿到"默认上下文里已有的页面"

python 复制代码
import time

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # 1. 启动浏览器,并强制开启远程调试端口
    browser = p.chromium.launch(
        headless=False,
        args=["--remote-debugging-port=9222"]  # 开启 CDP 端口
    )
    # 2. 新建标签页
    page = browser.new_page()
    # 3. 打开目标网址
    web_url = "https://www.baidu.com/"

    try:
        # 设置更长的超时时间,并添加异常处理
        page.goto(web_url, timeout=60000)
        print("浏览器 成功打开浏览器:", web_url)
    except Exception as e:
        print(f"页面加载失败: {e}")
        browser.close()
        exit(1)

    # 4. 简单等待,方便肉眼观察
    time.sleep(3)

    # 5. 直接通过 playwright 自己的连接拿到同一浏览器
    try:
        browser2 = p.chromium.connect_over_cdp("http://localhost:9222")
        browser_contexts = browser2.contexts[0]
        print("=======browser_contexts响应数据结构========")
        print(browser_contexts)
        print("=======browser_contexts响应数据结构========\n")
        default_ctx_page = browser_contexts.pages[0]  # 默认上下文里已有的页面
        print("默认页面标题:", default_ctx_page.title())
        # 5. 关闭
        browser2.close()
    except Exception as e:
        print(f"连接到CDP时出错: {e}")
    finally:
        browser.close()

运行结果示例:

复制代码
已连接到 pydev 调试器(内部版本号 231.9225.15)浏览器 成功打开浏览器: https://www.baidu.com/
=======browser_contexts响应数据结构========
<BrowserContext browser=<Browser type=<BrowserType name=chromium executable_path=/Users/rong/Library/Caches/ms-playwright/chromium-1194/chrome-mac/Chromium.app/Contents/MacOS/Chromium> version=141.0.7390.37>>
=======browser_contexts响应数据结构========

方案 B:Playwright 控制+WebSocket 地址

既要 Playwright 控制,也要"真实的 WebSocket 地址"

python 复制代码
import json
import time
import requests
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # 1. 启动浏览器,并强制开启远程调试端口
    browser = p.chromium.launch(
        headless=False,
        args=["--remote-debugging-port=9222"]  # 开启 CDP 端口
    )
    # 2. 新建标签页
    page = browser.new_page()
    # 3. 打开目标网址
    web_url = "https://www.baidu.com/"
    
    try:
        # 设置更长的超时时间,并添加异常处理
        page.goto(web_url, timeout=60000)
        print("浏览器 成功打开浏览器:", web_url)
    except Exception as e:
        print(f"页面加载失败: {e}")
        browser.close()
        exit(1)
    
    # 4. 简单等待,方便肉眼观察
    time.sleep(3)

    # 5. 自己取一次 /json/version 拿到 webSocketDebuggerUrl
    try:
        resp = requests.get("http://localhost:9222/json/version", timeout=5)
        print("=======json_version响应数据结构========")
        print(json.dumps(resp.json(), indent=2, ensure_ascii=False))
        print("=======json_version响应数据结构========\n")
        ws_url = resp.json()["webSocketDebuggerUrl"]
        print("浏览器 WebSocket 地址:", ws_url)
        
        # 如果还想继续用 playwright 操纵同一浏览器
        browser2 = p.chromium.connect_over_cdp("http://localhost:9222")
        default_page = browser2.contexts[0].pages[0]
        print("默认页面标题:", default_page.title())

        browser2.close()
    except requests.exceptions.RequestException as e:
        print(f"无法连接到调试地址: {e}")
    except Exception as e:
        print(f"处理调试连接时出错: {e}")
    finally:
        browser.close()

运行结果示例:

复制代码
已连接到 pydev 调试器(内部版本号 231.9225.15)浏览器 成功打开浏览器: https://www.baidu.com/
=======json_version响应数据结构========
{
  "Browser": "Chrome/141.0.7390.37",
  "Protocol-Version": "1.3",
  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
  "V8-Version": "14.1.146.11",
  "WebKit-Version": "537.36 (@9f043f63b0e5b728c8d09f3e3ddfc1681a4bd58e)",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/browser/27e882e5-8999-4a81-8d1f-9092e6698d61"
}
=======json_version响应数据结构========

浏览器 WebSocket 地址: ws://localhost:9222/devtools/browser/27e882e5-8999-4a81-8d1f-9092e6698d61
默认页面标题: 百度一下,你就知道

至此,你既拿到了"真实的 CDP WebSocket 地址",也通过 Playwright 取得了"默认上下文里已有的页面"。


相关推荐
亿牛云爬虫专家1 天前
当数据开始“感知页面”
javascript·html·爬虫代理·代理ip·playwright·页面渲染·dom结构
Aerelin9 天前
爬虫playwright入门讲解
前端·javascript·html·playwright
Aerelin10 天前
豆瓣数据采集案例
前端·爬虫·python·js·playwright
Aerelin11 天前
爬虫playwright中的资源监听
前端·爬虫·js·playwright
守城小轩20 天前
基于Chrome140的X账号自动化(关键词浏览)——运行脚本(三)
自动化·rpa·浏览器自动化·playwright·浏览器开发
守城小轩25 天前
基于Chrome140的X账号自动化——需求分析&环境搭建(一)
自动化·rpa·浏览器自动化·playwright
☼←安于亥时→❦1 个月前
Playwright 安装与使用
python·playwright
亿牛云爬虫专家1 个月前
用 Playwright + 容器化做分布式浏览器栈:调度、会话管理与资源回收
分布式·docker·容器·浏览器·爬虫代理·新闻网站·playwright
老友@1 个月前
Docker 部署 Node.js + Playwright 项目,实现浏览器截图、打印和下载
docker·容器·node.js·playwright