03、爬取资料---但是失败,仅作为记录

1、找网址

进入直播间,里面的用户被设置不对外查看。

如图,找url

2、伪装

user-agent 用户代理

cookie 用户登录后保留的信息

登录信息:找cookie

浏览器信息:找user-agent

user-agent 用户代理

cookie 用户登录后保留的信息

代码:

java 复制代码
# 导入请求模块
import re
import requests
# 用户列表的链接
url = ''
# 伪装程序,相当于有一个假身份证用于访问  浏览器的信息:user-agent   登录信息:cookie
headers = {'user-agent': '', 'cookie': ''}
# 请求访问网站,得到响应
res = requests.get(url, headers=headers)
# 转化为json数据
js = res.json()
print(js)
# 在 js 的 data 里面的ranks里面
userList = js['data']['ranks']
print(userList)
# 遍历功能
for user in userList:
    # 用户的 user 里面的 pay_grade 里面的 level
    userPay = user['user']['pay_grade']['level']
    userHomePage = '' + user['user']['sec_uid']
    res = requests.get(userHomePage, headers=headers)
    # 文本数据
    text = res.text
    nickName = re.findall('(.*?)的主页', text)
    douyinNum = re.findall('抖音号是(.*?)', text)
    print(nickName, douyinNum, userPay, userHomePage)

错误原因:

显示访问太多次了,这样容易被封ip,就没尝试了。

java 复制代码
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:1131)

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

Process finished with exit code 1
相关推荐
懒大王爱吃狼40 分钟前
Python教程:python枚举类定义和使用
开发语言·前端·javascript·python·python基础·python编程·python书籍
秃头佛爷2 小时前
Python学习大纲总结及注意事项
开发语言·python·学习
待磨的钝刨2 小时前
【格式化查看JSON文件】coco的json文件内容都在一行如何按照json格式查看
开发语言·javascript·json
深度学习lover3 小时前
<项目代码>YOLOv8 苹果腐烂识别<目标检测>
人工智能·python·yolo·目标检测·计算机视觉·苹果腐烂识别
XiaoLeisj4 小时前
【JavaEE初阶 — 多线程】单例模式 & 指令重排序问题
java·开发语言·java-ee
API快乐传递者4 小时前
淘宝反爬虫机制的主要手段有哪些?
爬虫·python
励志成为嵌入式工程师5 小时前
c语言简单编程练习9
c语言·开发语言·算法·vim
捕鲸叉5 小时前
创建线程时传递参数给线程
开发语言·c++·算法
A charmer5 小时前
【C++】vector 类深度解析:探索动态数组的奥秘
开发语言·c++·算法
Peter_chq5 小时前
【操作系统】基于环形队列的生产消费模型
linux·c语言·开发语言·c++·后端