Selenium 获取接口响应数据

知识的宝藏2023-07-22 14:23

目录

seleniumwire简介

创建webdriver

限制请求捕获

前言

有时候需要知道UI界面操作的同时接口响应数据是否正常，这时就需要获取接口响应数据。Selenium本身没有获取接口响应的api，但是可以通过第三方库seleniumwire获取接口响应数据。

seleniumwire简介

SeleniumWire扩展了Selenium的Python绑定，使您可以访问浏览器发出的底层请求。您以与Selenium相同的方式编写代码，但您获得了额外的API来检查请求和响应，并对其进行动态更改。

功能

纯Python，用户友好的API
捕获HTTP和HTTPS请求
拦截请求和响应
动态修改标题、参数和正文内容
捕获websocket消息
支持HAR格式
代理服务器支持

兼容性

Python 3.7+
Selenium 4.0.0+
Chrome, Firefox, Edge and Remote Webdriver supported

目录

安装

bash 复制代码

pip install selenium-wire

创建webdriver

python 复制代码

from seleniumwire import webdriver

注意不是从selenium包中导入。

然后像直接使用Selenium一样实例化web驱动程序。您可以传入任何所需的功能或特定于浏览器的选项，如可执行路径、无头模式等。seleniumwire也有自己的选项，可以在seleniumwire_options属性中传递。

python 复制代码

# Create the driver with no options (use defaults)
driver = webdriver.Chrome()

# Or create using browser specific options and/or seleniumwire_options options
driver = webdriver.Chrome(
    options = webdriver.ChromeOptions(...),
    seleniumwire_options={...}
)

请注意，对于webdriver的子包，您应该继续直接从selenium导入这些子包。例如，要导入WebDriverWait：

python 复制代码

# Sub-packages of webdriver must still be imported from `selenium` itself
from selenium.webdriver.support.ui import WebDriverWait

获取请求

SeleniumWire捕获浏览器发出的所有HTTP/HTTPS流量[1]。以下属性提供对请求和响应的访问权限。

driver.requests

按时间顺序捕获的请求的列表。

driver.last_request

用于检索最近捕获的请求的便利属性。这比使用driver.requests[-1]更有效。

请求对象

body

以字节为单位的请求正文。如果请求没有正文，则正文的值将为空，即b''。

headers

请求头的类似字典的对象。标头不区分大小写，允许重复。请求.headers['user-agent']将返回用户代理标头的值。如果你想替换一个标头，请确保先用del request.headers〔'header-name〕删除现有的标头，否则你会创建一个重复的标头。

response

与请求关联的响应对象。如果请求没有响应，则此选项将为"无"。

限制请求捕获

SeleniumWire的工作原理是通过后台启动的内部代理服务器重定向浏览器流量。当请求流经代理时，它们会被拦截和捕获。捕获请求可能会稍微减慢速度，但您可以做一些事情来限制捕获的内容。

driver.scopes

这接受一个正则表达式列表，这些正则表达式将与要捕获的URL相匹配。在提出任何请求之前，应该在驱动程序上设置它。当为空（默认值）时，将捕获所有URL。

python 复制代码

driver.scopes = [
    '.*stackoverflow.*',
    '.*github.*'
]

driver.get(...)  # Start making requests

# Only request URLs containing "stackoverflow" or "github" will now be captured

seleniumwire_options.exclude_hosts

排除捕获域名，不需要捕获的可以加入排除选项

使用此选项可以完全绕过Selenium Wire。对此处列出的地址发出的任何请求都将直接从浏览器发送到服务器，而不涉及SeleniumWire。请注意，如果您已经配置了上游代理，那么这些请求也将绕过该代理。

python 复制代码

options = {
    'exclude_hosts': ['host1.com', 'host2.com']  # Bypass Selenium Wire for these hosts
}
driver = webdriver.Chrome(seleniumwire_options=options)

测试代码：

python 复制代码

from seleniumwire import webdriver  # Import from seleniumwire

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type'],
            request.response.body
        )

输出：

bash 复制代码

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
...

其他功能可以参考

https://github.com/wkeeling/selenium-wire

上一篇：数据结构单向循环链表，创建以及增删改查的实现

下一篇：Redis实战案例19-Redis解决主从一致性问题

热门推荐

01太炸裂了！清华大学deepseek从入门到精通使用手册又出第三版了，《普通人如何抓住DeepSeek红利》（无套路，直接下载）02如何在WPS和Word/Excel中直接使用DeepSeek功能 03DeepSeek本地部署详细指南 04DeepSeek各版本说明与优缺点分析 05本地部署DeepSeek教程（Mac版本）06DeepSeek r1本地安装全指南 07本地化部署AI知识库：基于Ollama+DeepSeek+AnythingLLM保姆级教程 08DeepSeek R1本地化部署 Ollama + Chatbox 打造最强 AI 工具 09在Windows下安装Ollama并体验DeepSeek r1大模型 10本地部署DeepSeek后的调用与删除全攻略