scrapling AI爬虫 初体验

1.scrapling介绍

AI时代,传统的爬虫需要根据网站的改变,需要变成xpath的匹配方式,或者说风控,scrapling比较吸引人的是可以根据学习网站的结构,网页更新时重新定位元素,据说还能绕过WAF,JA3指纹等

项目地址:https://github.com/D4Vinci/Scrapling

2.初体验

2.1 安装依赖

复制代码
certifi
orjson
w3lib
typing_extensions
lxml
cssselect
curl_cffi
playwright
browserforge
patchright
msgspec
anyio

2.2 demo 测试

python 复制代码
# from scrapling.fetchers import StealthyFetcher
#
# page = StealthyFetcher.fetch('https://example.com', headless=True)
# products = page.css('.product', auto_save=True)  # 第一次爬取时保存元素特征
#
# # 后来网站改版了,没关系,开启 adaptive=True 自动找回!
# products = page.css('.product', adaptive=True)
# print(products)
import os

import certifi
import ssl
from scrapling.spiders import Spider, Response
os.environ['SSL_CERT_FILE'] = certifi.where()
os.environ['REQUESTS_CA_BUNDLE'] = certifi.where()  # 针对requests库
# 如果你使用http.client或urllib,还需要设置这个
ssl._create_default_https_context = ssl.create_default_context
class MySpider(Spider):
  name = "demo"
  start_urls = ["https://example.com/"]

  async def parse(self, response: Response):
      for item in response.css('.product'):
          yield {"title": item.css('h2::text').get()}
print(f"certifi 路径: {certifi.where()}")
print(f"证书文件是否存在: {certifi.where()}")
print(f"SSL默认路径: {ssl.get_default_verify_paths()}")
# 强制Python的SSL模块使用certifi的证书

MySpider().start()

输出结果如下

bash 复制代码
D:\ProgramData\Anaconda3\envs\scrapling\python.exe P:\code\Scrapling-main\tests\demo\demo.py 
certifi 路径: D:\ProgramData\Anaconda3\envs\scrapling\Lib\site-packages\certifi\cacert.pem
证书文件是否存在: D:\ProgramData\Anaconda3\envs\scrapling\Lib\site-packages\certifi\cacert.pem
SSL默认路径: DefaultVerifyPaths(cafile='D:\\ProgramData\\Anaconda3\\envs\\scrapling\\Lib\\site-packages\\certifi\\cacert.pem', capath=None, openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='C:\\Program Files\\Common Files\\ssl\\cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='C:\\Program Files\\Common Files\\ssl\\certs')
[2026-03-09 23:32:09]:(demo) INFO: Spider initialized
[2026-03-09 23:32:09]:(demo) DEBUG: Starting spider
[2026-03-09 23:32:10]:(demo) WARNING: Attempt 1 failed: Failed to perform, curl: (60) SSL certificate problem: unable to get local issuer certificate. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.. Retrying in 1 seconds...
[2026-03-09 23:32:11]:(demo) WARNING: Attempt 2 failed: Failed to perform, curl: (60) SSL certificate problem: unable to get local issuer certificate. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.. Retrying in 1 seconds...
[2026-03-09 23:32:13]:(demo) ERROR: Failed after 3 attempts: Failed to perform, curl: (60) SSL certificate problem: unable to get local issuer certificate. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.
[2026-03-09 23:32:13]:(demo) DEBUG: Spider idle
[2026-03-09 23:32:13]:(demo) DEBUG: Spider closed
[2026-03-09 23:32:13]:(demo) INFO: {
    "items_scraped": 0,
    "items_dropped": 0,
    "elapsed_seconds": 3.64,
    "download_delay": 0.0,
    "concurrent_requests": 4,
    "concurrent_requests_per_domain": 0,
    "requests_count": 0,
    "requests_per_second": 0.0,
    "sessions_requests_count": {},
    "failed_requests_count": 1,
    "offsite_requests_count": 0,
    "blocked_requests_count": 0,
    "response_status_count": {},
    "response_bytes": 0,
    "domains_response_bytes": {},
    "proxies": [],
    "custom_stats": {},
    "log_count": {
        "debug": 3,
        "info": 1,
        "warning": 2,
        "error": 1,
        "critical": 0
    }
}

进程已结束,退出代码为 0

报出的SSL问题还通过安装

bash 复制代码
pip install --upgrade certifi

还是报错,明天继续测试新的特性

Attempt 1 failed: Failed to perform, curl: (60) SSL certificate problem: unable to get local issuer certificate

相关推荐
gutsyang2 小时前
LLM -> Agent -> Claw -> ? | “后 GUI 时代”的终局预测
ai·ai作画·ai编程·ai写作
成长的小牛2332 小时前
MCP 学习笔记
笔记·学习·ai
进击的雷神3 小时前
邮箱编码解码、国际电话验证、主办方过滤、多页面深度爬取——柬埔寨塑料展爬虫四大技术难关攻克纪实
爬虫·python
深蓝电商API4 小时前
多线程 vs 异步 vs 多进程爬虫性能对比
爬虫·python
全都是泡饃4 小时前
OpenClaw 配置教程:在 macOS 上搭建 AI 助手并与飞书集成
ai·openclaw
进击的雷神4 小时前
相对路径拼接、TEL前缀清洗、多链接过滤、毫秒级延迟控制——日本东京塑料展爬虫四大技术难关攻克纪实
爬虫·python
云溪·4 小时前
Milvus向量数据库混合检索召回案例
python·ai·milvus
哈喽,树先生5 小时前
SpringAi-alibaba Graph 工作流编排1
ai
Sendingab5 小时前
LuoGen-罗根AI 数字人IP口播视频自动化生成工具
人工智能·ai·数字人·自媒体·ai智能体·口播·罗根