03、爬取资料---但是失败,仅作为记录

1、找网址

进入直播间,里面的用户被设置不对外查看。

如图,找url

2、伪装

user-agent 用户代理

cookie 用户登录后保留的信息

登录信息:找cookie

浏览器信息:找user-agent

user-agent 用户代理

cookie 用户登录后保留的信息

代码:

java 复制代码
# 导入请求模块
import re
import requests
# 用户列表的链接
url = ''
# 伪装程序,相当于有一个假身份证用于访问  浏览器的信息:user-agent   登录信息:cookie
headers = {'user-agent': '', 'cookie': ''}
# 请求访问网站,得到响应
res = requests.get(url, headers=headers)
# 转化为json数据
js = res.json()
print(js)
# 在 js 的 data 里面的ranks里面
userList = js['data']['ranks']
print(userList)
# 遍历功能
for user in userList:
    # 用户的 user 里面的 pay_grade 里面的 level
    userPay = user['user']['pay_grade']['level']
    userHomePage = '' + user['user']['sec_uid']
    res = requests.get(userHomePage, headers=headers)
    # 文本数据
    text = res.text
    nickName = re.findall('(.*?)的主页', text)
    douyinNum = re.findall('抖音号是(.*?)', text)
    print(nickName, douyinNum, userPay, userHomePage)

错误原因:

显示访问太多次了,这样容易被封ip,就没尝试了。

java 复制代码
urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:1131)

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "E:\install\python3.8\install\lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\install\python3.8\install\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "E:\install\python3.8\install\lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPSConnectionPool(host='live.douyin.com', port=443): Max retries exceeded with url: /webcast/ranklist/audience/?aid=6383&app_name=douyin_web&live_id=1&device_platform=web&language=zh-CN&enter_from=web_search&cookie_enabled=true&screen_width=1313&screen_height=821&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=117.0.0.0&webcast_sdk_version=2450&room_id=7291676063342643994&anchor_id=2581300722537251&sec_anchor_id=MS4wLjABAAAAkY_WtKOYqH-5zSWzQSFe9tXTCirrA8sLJBKNrspoORbkdrMBXqAnyjV2f75mX4lk&rank_type=30&msToken=MeH2AD_j6RbHpbqPWUyyoVMzuw63sALL5xx4Y13yp6nnq8D6sknHWmfpY9j_YujnW01p_EmbfcfuNIZw-Py6bwV8Oz1j3LXX1dn8WOsZ1EEwaQVPFop9rYhPenbTrOCy&X-Bogus=DFSzswVENasANeogtYIDvKXAIQ-X&_signature=_02B4Z6wo000012ZO15QAAIDCBUQX.a0biutmTtMAALyult0a6-ftHDP.4JgbIybvVd-fj4v.dHUHYgVoxsNh8DR7dFJG7wgVBodOHrVp-kXOMztMjLuxQ1QdaFLM5hvBFWOWIIff9gOunxqM66 (Caused by ProxyError('Unable to connect to proxy', SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))))

Process finished with exit code 1
相关推荐
沙尘暴炒饭4 分钟前
vuex持久化vuex-persistedstate,存储的数据刷新页面后导致数据丢失
开发语言·前端·javascript
Msshu1237 分钟前
诱骗协议芯片支持PD2.0/3.0/3.1/PPS协议,支持使用一个Type-C与电脑传输数据和快充取电功能
c语言·开发语言·电脑
萧鼎7 分钟前
RAGFlow:构建高效检索增强生成流程的技术解析
人工智能·python
cooljser17 分钟前
告别手动操作!用脚本搞定小程序签到的全过程
python
凌叁儿33 分钟前
从零开始搭建Django博客①--正式开始前的准备工作
python·django·sqlite
景天科技苑1 小时前
【Rust结构体】Rust结构体详解:从基础到高级应用
开发语言·后端·rust·结构体·关联函数·rust结构体·结构体方法
倔强的石头1061 小时前
【C++指南】位运算知识详解
java·开发语言·c++
攻城狮7号1 小时前
Python爬虫第19节-动态渲染页面抓取之Splash使用下篇
开发语言·爬虫·python·python爬虫
天天进步20152 小时前
Python项目--基于计算机视觉的手势识别控制系统
开发语言·python·计算机视觉
mozun20202 小时前
QT:Qt5 串口模块 (QSerialPort) 在 VS2015 中正确关闭串口避免被占用
开发语言·c++·qt·串口·串口调试·上位机软件