03、爬取资料---但是失败,仅作为记录

1、找网址

进入直播间,里面的用户被设置不对外查看。

如图,找url

2、伪装

user-agent 用户代理

cookie 用户登录后保留的信息

登录信息:找cookie

浏览器信息:找user-agent

user-agent 用户代理

cookie 用户登录后保留的信息

代码:

java 复制代码
# 导入请求模块
import re
import requests
# 用户列表的链接
url = ''
# 伪装程序,相当于有一个假身份证用于访问  浏览器的信息:user-agent   登录信息:cookie
headers = {'user-agent': '', 'cookie': ''}
# 请求访问网站,得到响应
res = requests.get(url, headers=headers)
# 转化为json数据
js = res.json()
print(js)
# 在 js 的 data 里面的ranks里面
userList = js['data']['ranks']
print(userList)
# 遍历功能
for user in userList:
    # 用户的 user 里面的 pay_grade 里面的 level
    userPay = user['user']['pay_grade']['level']
    userHomePage = '' + user['user']['sec_uid']
    res = requests.get(userHomePage, headers=headers)
    # 文本数据
    text = res.text
    nickName = re.findall('(.*?)的主页', text)
    douyinNum = re.findall('抖音号是(.*?)', text)
    print(nickName, douyinNum, userPay, userHomePage)

错误原因:

显示访问太多次了,这样容易被封ip,就没尝试了。

java 复制代码
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:1131)

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

Process finished with exit code 1
相关推荐
B站计算机毕业设计超人几秒前
计算机毕业设计hadoop+spark+hive交通拥堵预测 交通流量预测 智慧城市交通大数据 交通客流量分析(源码+LW文档+PPT+讲解视频)
大数据·hive·hadoop·python·spark·毕业设计·课程设计
CodeSheep程序羊6 分钟前
拼多多春节加班工资曝光,没几个敢给这个数的。
java·c语言·开发语言·c++·python·程序人生·职场和发展
独好紫罗兰6 分钟前
对python的再认识-基于数据结构进行-a002-列表-列表推导式
开发语言·数据结构·python
机器学习之心HML9 分钟前
多光伏电站功率预测新思路:当GCN遇见LSTM,解锁时空预测密码,python代码
人工智能·python·lstm
2401_8414956411 分钟前
【LeetCode刷题】二叉树的直径
数据结构·python·算法·leetcode·二叉树··递归
王大傻092814 分钟前
python 读取文件可以使用open函数的 r 模式
python
I'mChloe14 分钟前
PTO-ISA 深度解析:PyPTO 范式生成的底层指令集与 NPU 算子执行的硬件映射
c语言·开发语言
JarryStudy15 分钟前
HCCL与PyTorch集成 hccl_comm.cpp DDP后端注册全流程
人工智能·pytorch·python·cann
编程小白202626 分钟前
从 C++ 基础到效率翻倍:Qt 开发环境搭建与Windows 神级快捷键指南
开发语言·c++·windows·qt·学习
woshikejiaih27 分钟前
**播客听书与有声书区别解析2026指南,适配不同场景的音频
大数据·人工智能·python·音视频