MechanicalSoup - 与网站自动交互

文章目录

- [一、关于 MechanicalSoup](#一、关于 MechanicalSoup)
- - 概览
- 二、安装
- 三、示例

一、关于 MechanicalSoup

github : https://github.com/MechanicalSoup/MechanicalSoup
官网：https://mechanicalsoup.readthedocs.io/
官方文档：https://mechanicalsoup.readthedocs.io/en/stable/
API 文档：https://mechanicalsoup.readthedocs.io/en/stable/mechanicalsoup.html
示例：https://github.com/MechanicalSoup/MechanicalSoup/blob/main/examples

概览

一个Python的库，用于自动与网站交互。MechanicalSoup自动存储和发送cookie，遵循重定向，并可以遵循链接和提交表单。它不使用JavaScript。

MechanicalSoup是由 M Hickford 创建的，他是 Mechanize 库深度用户。不幸的是 Mechanize 是不兼容的Python3直到2019年及其发展停滞不前好几年了。MechanicalSoup提供了一个类似的API，建立在Python 巨人请求（对于 HTTP会话）和 BeautifulSoup（用于文档导航）。自2017年以来，它是一个由小型公司积极维护的项目团队包括@hemberger和@moy。

二、安装

从PyPI下载并安装最新的正式版本：

shell 复制代码

pip install MechanicalSoup

从GitHub下载并安装开发版本：

shell 复制代码

pip install git+https://github.com/MechanicalSoup/MechanicalSoup

从源代码安装（在当前工作目录中安装版本）：

python 复制代码

python setup.py install

（在所有情况下，将--user添加到install命令中安装在当前用户的主目录中。）

三、示例

从示例/expl_qwant.py中获取结果的代码 Qwan搜索：

python 复制代码

"""Example usage of MechanicalSoup to get the results from the Qwant
search engine.
"""

import re
import mechanicalsoup
import html
import urllib.parse

# Connect to Qwant
browser = mechanicalsoup.StatefulBrowser(user_agent='MechanicalSoup')
browser.open("https://lite.qwant.com/")

# Fill-in the search form
browser.select_form('#search-form')
browser["q"] = "MechanicalSoup"
browser.submit_selected()

# Display the results
for link in browser.page.select('.result a'):
    # Qwant shows redirection links, not the actual URL, so extract
    # the actual URL from the redirect link:
    href = link.attrs['href']
    m = re.match(r"^/redirect/[^/]*/(.*)$", href)
    if m:
        href = urllib.parse.unquote(m.group(1))
    print(link.text, '->', href)

更多示例可在示例/中找到。

对于具有更复杂表单的示例（复选框、单选按钮和文本区域），读取测试/test_browser.py 和测试/test_form.py。

2024-09-24（二）