03、爬取资料---但是失败,仅作为记录

1、找网址

进入直播间,里面的用户被设置不对外查看。

如图,找url

2、伪装

user-agent 用户代理

cookie 用户登录后保留的信息

登录信息:找cookie

浏览器信息:找user-agent

user-agent 用户代理

cookie 用户登录后保留的信息

代码:

java 复制代码
# 导入请求模块
import re
import requests
# 用户列表的链接
url = ''
# 伪装程序,相当于有一个假身份证用于访问  浏览器的信息:user-agent   登录信息:cookie
headers = {'user-agent': '', 'cookie': ''}
# 请求访问网站,得到响应
res = requests.get(url, headers=headers)
# 转化为json数据
js = res.json()
print(js)
# 在 js 的 data 里面的ranks里面
userList = js['data']['ranks']
print(userList)
# 遍历功能
for user in userList:
    # 用户的 user 里面的 pay_grade 里面的 level
    userPay = user['user']['pay_grade']['level']
    userHomePage = '' + user['user']['sec_uid']
    res = requests.get(userHomePage, headers=headers)
    # 文本数据
    text = res.text
    nickName = re.findall('(.*?)的主页', text)
    douyinNum = re.findall('抖音号是(.*?)', text)
    print(nickName, douyinNum, userPay, userHomePage)

错误原因:

显示访问太多次了,这样容易被封ip,就没尝试了。

java 复制代码
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:1131)

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

Process finished with exit code 1
相关推荐
CF14年老兵9 分钟前
Python万物皆对象:从懵懂到顿悟的奇妙之旅
后端·python·trae
这里有鱼汤12 分钟前
发现个用《道德经》+价值投资大咖的智慧,做A股的AI诊股神器,居然还开源了
python
陈天伟教授40 分钟前
(二)Python + 地球信息科学与技术 (GeoICT)=?
开发语言·python
之歆1 小时前
大模型微调分布式训练-大模型压缩训练(知识蒸馏)-大模型推理部署(分布式推理与量化部署)-大模型评估测试(OpenCompass)
人工智能·笔记·python
人工干智能1 小时前
pygame的帧处理中,涉及键盘的有`pg.event.get()`与`pg.key.get_pressed()` ,二者有什么区别与联系?
python·游戏·计算机外设·pygame
R-G-B1 小时前
【P18 3-10】OpenCV Python—— 鼠标控制,鼠标回调函数(鼠标移动、按下、。。。),鼠标绘制基本图形(直线、圆、矩形)
python·opencv·计算机外设·鼠标回调函数·鼠标控制·鼠标移动·鼠标绘制图形
IT古董4 小时前
第四章:大模型(LLM)】06.langchain原理-(3)LangChain Prompt 用法
java·人工智能·python
fantasy_arch9 小时前
pytorch例子计算两张图相似度
人工智能·pytorch·python
七七&5569 小时前
2024年08月13日 Go生态洞察:Go 1.23 发布与全面深度解读
开发语言·网络·golang
java坤坤9 小时前
GoLand 项目从 0 到 1:第八天 ——GORM 命名策略陷阱与 Go 项目启动慢问题攻坚
开发语言·后端·golang