Python学习从0开始——项目一day01爬虫

Python学习从0开始------项目一day01爬虫

一、导入代码

在Inscode新建一个python类型的项目,然后打开终端,粘贴以下代码,回车clone项目。

bash 复制代码
git clone https://gitee.com/52itstyle/Python.git

这个是gitee上找的一个python项目,项目源地址

二、使用的核心库

requests库是第三方库,使用其提供的API比使用python自带的urllib更为简洁,且能够处理多种HTTP请求,功能也很强大。

#导入requests库

import requests

#导入文件操作库

import os

#bs4全名BeautifulSoup,是编写python爬虫常用库之一,主要用来解析html标签。

import bs4

from bs4 import BeautifulSoup

#基础类库

import sys

#Python 3.x 解决中文编码问题

import importlib

importlib.reload(sys)

三、功能测试

3.1初始代码

初始代码位置:Python/Day01/脚本,打开终端运行命令:

bash 复制代码
#切换目录
cd Python/Day01/脚本
#输出
/root/Python_02/Python/Day01/脚本
#运行脚本
python3 mzitu_linux.py
#输出报错
File "/root/Python_02/Python/Day01/脚本/mzitu_linux.py", line 21
    save_path = ​'/mnt/data/mzitu'
                ^
SyntaxError: invalid non-printable character U+200B
python 复制代码
#打开mzitu_linux.py文件,定位原代码21行,修改save_path
save_path ='./picture'
#打开56、68、72行的注释
bash 复制代码
#重新运行
python3 mzitu_linux.py
#很慢,把网址复制到浏览器直接拒绝访问
键盘Ctrl+C组合停止运行

3.2新建文件

在脚本的同级目录下新进learn文件夹,新建spider.py文件,将mzitu_linux.py里的内容复制过来

3.3代码调试

python 复制代码
#问题一:网站不可访问。解决:修改爬图地址
#定位代码18行
mziTu = 'https://image.baidu.com/'
bash 复制代码
#终端执行
cd ../
cd learn/
python3 spider.py
#输出报错
Traceback (most recent call last):
  File "/root/Python_02/Python/Day01/learn/spider.py", line 106, in <module>
    main()
  File "/root/Python_02/Python/Day01/learn/spider.py", line 90, in main
    img_max = soup.find('div', class_='nav-links').find_all('a')[3].text
AttributeError: 'NoneType' object has no attribute 'find_all'

以上报错是正常的,切换爬取网站后,页面元素的解析肯定会发生改变,接下来一步步修改解析。

四、页面元素解析

4.1网页

bash 复制代码
#进入百度图片的网址
https://image.baidu.com/

键盘F12调出控制台,切换到Element标签页,组合键Ctrl+Shift+C选中合辑的图片,然后审查元素。

选中'<a>'标签,右键copy>copy emelemt审查元素,关注target和href

html 复制代码
<a class="bd-home-content-album-item             
" target="_blank" href="https://image.baidu.com/search/albumsdetail?tn=albumsdetail&amp;word=%E5%9F%8E%E5%B8%82%E5%BB%BA%E7%AD%91%E6%91%84%E5%BD%B1%E4%B8%93%E9%A2%98&amp;fr=searchindex_album%20&amp;album_tab=%E5%BB%BA%E7%AD%91&amp;album_id=7&amp;rn=30" data-type="0"> 
	<div class="bd-home-content-album-item-pic" style="background-image: url(https://t7.baidu.com/it/u=1595072465,3644073269&amp;fm=193&amp;f=GIF); background-color: #EACFC5"> 
	</div> 
	<div class="bd-home-content-album-item-inner-border"></div> 
	<div class="bd-home-content-album-item-title"> 城市建筑摄影专题  </div> 
</a>

选中'<a>'标签,右键copy>copy selector复制选择器

html 复制代码
#bd-home-content-album > a:nth-child(1)

由以上可推:根据元素的唯一id:'bd-home-content-album'可以找到'<div>'标签内的所有'<a>'标签,当前复制的'<a>'标签是其父元素的第一个子'<a>'元素。

4.2修改代码

python 复制代码
#修改39行
# 获取页面的栏目地址
    all_a = soup_sub.find('div',id='bd-home-content-album').find_all('a',target='_blank')
# 修改主方法,此页面无分页
def main():
    res = requests.get(mziTu, headers=headers)
    # 使用自带的html.parser解析
    soup = BeautifulSoup(res.text, 'html.parser')
    # 创建文件夹
    createFile(save_path)
    file = save_path
    createFile(file)
    print("开始执行")
    download(mziTu, file)

切换到终端,运行脚本:

bash 复制代码
python3 spider.py 
#输出报错
开始执行
内页第几页:2
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30
'NoneType' object has no attribute 'find_all'
内页第几页:4
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30
'NoneType' object has no attribute 'find_all'
内页第几页:6
套图地址:https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album
'NoneType' object has no attribute 'find_all'

父页面解析的元素和初始代码不同,子页面也不同,继续修改。

4.3子页面

复制打印的套图地址进入子页面,同样的操作,定位子页面图片:

html 复制代码
<a class="albumsdetail-item" href="/search/detail?tn=baiduimagedetail&amp;word=%E5%9F%8E%E5%B8%82%E5%BB%BA%E7%AD%91%E6%91%84%E5%BD%B1%E4%B8%93%E9%A2%98&amp;album_tab=%E5%BB%BA%E7%AD%91&amp;album_id=7&amp;ie=utf-8&amp;fr=albumsdetail&amp;cs=1595072465,3644073269&amp;pi=3977&amp;pn=0&amp;ic=0&amp;objurl=https%3A%2F%2Ft7.baidu.com%2Fit%2Fu%3D1595072465%2C3644073269%26fm%3D193%26f%3DGIF" target="_blank" data-index="0" width="310.4" style="width: 310.4px; height: 310px;">
	<img class="albumsdetail-item-img" src="https://t7.baidu.com/it/u=1595072465,3644073269&amp;fm=193&amp;f=GIF" style="width: 310.4px; height: 310px; background-color: rgb(234, 207, 197);">
	<div class="albumsdetail-item-inner-border"></div>
</a>

元素选择器:

html 复制代码
#imgList > div:nth-child(1) > a:nth-child(1)

数量元素选择器:

html 复制代码
#bd-albumsdetail-content > div.albumsdetail-cover.clearfix > div.albumsdetail-info > div.albumsdetail-info-text > p.albumsdetail-info-num > span

4.4修改代码

python 复制代码
#修改53行,也可以根据元素去获取这个数值,在这不是重点,直接赋值了
# 获取套图的最大数量
                pic_max = "791"
#修改62行
                    img = soup_sub_2.find('div',id='imgList').find('img')
bash 复制代码
#切换终端执行代码
python3 spider.py 
#输出报错
开始执行
内页第几页:2
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30/1
'NoneType' object has no attribute 'find'
内页第几页:4
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30/1
'NoneType' object has no attribute 'find'
内页第几页:6
套图地址:https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album/1
'NoneType' object has no attribute 'find'

明明已经根据元素选择器来查找了,为什么没有找到元素呢?打印父元素看看:

python 复制代码
#63行插入打印父元素
                    print(soup_sub_2.find('div',id='bd-albumsdetail-content'))
bash 复制代码
#终端执行
python3 spider.py 
#输出
<div id="bd-albumsdetail-content">
</div>

问题找到了,根本原因是该div内的元素是在运行时动态渲染和加载的,造成我们通过浏览器访问是能看到该元素的,但是爬虫爬不到。这就需要我们另想办法解决。

是否是动态渲染,我们可以更早的发现:

打开控制台,切换到network,可以看到多次发送的请求,这些请求网址实际上来自

查看第一条请求的返回值,随便选择一条发送图片的请求复制参数,在response页Ctrl+F调出搜索框,定位返回值所在位置。

详细数据如下,稍微调整了一下格式:

bash 复制代码
linkData: '[{\x22pid\x22:3977,\x22width\x22:1100,\x22height\x22:1100,\x22oriwidth\x22:1200,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1595072465,3644073269&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811557570\x22,\x22contSign\x22:\x221595072465,3644073269\x22},
{\x22pid\x22:3978,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=4198287529,2774471735&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/147317368?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x224198287529,2774471735\x22},
{\x22pid\x22:3979,\x22width\x22:1200,\x22height\x22:813,\x22oriwidth\x22:1200,\x22oriheight\x22:813,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1956604245,3662848045&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809773493\x22,\x22contSign\x22:\x221956604245,3662848045\x22},
{\x22pid\x22:3980,\x22width\x22:1200,\x22height\x22:760,\x22oriwidth\x22:1200,\x22oriheight\x22:760,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2529476510,3041785782&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/805192561\x22,\x22contSign\x22:\x222529476510,3041785782\x22},
{\x22pid\x22:3981,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=727460147,2222092211&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811065917\x22,\x22contSign\x22:\x22727460147,2222092211\x22},
{\x22pid\x22:3982,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2511982910,2454873241&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810968731\x22,\x22contSign\x22:\x222511982910,2454873241\x22},
{\x22pid\x22:3983,\x22width\x22:1200,\x22height\x22:686,\x22oriwidth\x22:1200,\x22oriheight\x22:686,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=825057118,3516313570&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810073156\x22,\x22contSign\x22:\x22825057118,3516313570\x22},
{\x22pid\x22:3984,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3435942975,1552946865&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811932564\x22,\x22contSign\x22:\x223435942975,1552946865\x22},
{\x22pid\x22:3985,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3569419905,626536365&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809770618\x22,\x22contSign\x22:\x223569419905,626536365\x22},
{\x22pid\x22:3986,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3779234486,1094031034&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810970358\x22,\x22contSign\x22:\x223779234486,1094031034\x22},
{\x22pid\x22:3987,\x22width\x22:1200,\x22height\x22:482,\x22oriwidth\x22:1200,\x22oriheight\x22:482,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2397542458,3133539061&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811063723\x22,\x22contSign\x22:\x222397542458,3133539061\x22},
{\x22pid\x22:3988,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2763645735,2016465681&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809771013\x22,\x22contSign\x22:\x222763645735,2016465681\x22},
{\x22pid\x22:3989,\x22width\x22:1149,\x22height\x22:1100,\x22oriwidth\x22:1200,\x22oriheight\x22:1149,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3911840071,2534614245&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810877786\x22,\x22contSign\x22:\x223911840071,2534614245\x22},
{\x22pid\x22:3990,\x22width\x22:1200,\x22height\x22:687,\x22oriwidth\x22:1200,\x22oriheight\x22:687,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3908717,2002330211&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810968672\x22,\x22contSign\x22:\x223908717,2002330211\x22},
{\x22pid\x22:3991,\x22width\x22:1200,\x22height\x22:799,\x22oriwidth\x22:1200,\x22oriheight\x22:799,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=318887420,2894941323&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810056726\x22,\x22contSign\x22:\x22318887420,2894941323\x22},
{\x22pid\x22:3992,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1063451194,1129125124&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/146287060?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x221063451194,1129125124\x22},
{\x22pid\x22:3993,\x22width\x22:800,\x22height\x22:1200,\x22oriwidth\x22:800,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3785402047,1898752523&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810970018\x22,\x22contSign\x22:\x223785402047,1898752523\x22},
{\x22pid\x22:3994,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3691080281,11347921&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809782140\x22,\x22contSign\x22:\x223691080281,11347921\x22},
{\x22pid\x22:3995,\x22width\x22:1200,\x22height\x22:799,\x22oriwidth\x22:1200,\x22oriheight\x22:799,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2374506090,1216769752&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/146290795?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x222374506090,1216769752\x22},
{\x22pid\x22:3996,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1285847167,3193778276&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809771315\x22,\x22contSign\x22:\x221285847167,3193778276\x22},
{\x22pid\x22:3997,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3251197759,2520670799&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/814059806\x22,\x22contSign\x22:\x223251197759,2520670799\x22},
{\x22pid\x22:3998,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=602106375,407124525&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/813923414\x22,\x22contSign\x22:\x22602106375,407124525\x22},
{\x22pid\x22:3999,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2906406936,2666005453&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811706433\x22,\x22contSign\x22:\x222906406936,2666005453\x22},
{\x22pid\x22:4000,\x22width\x22:1200,\x22height\x22:798,\x22oriwidth\x22:1200,\x22oriheight\x22:798,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3124693600,356058981&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/805197127\x22,\x22contSign\x22:\x223124693600,356058981\x22},
{\x22pid\x22:4001,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3646282624,1156077026&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810999167\x22,\x22contSign\x22:\x223646282624,1156077026\x22},
{\x22pid\x22:4002,\x22width\x22:1200,\x22height\x22:797,\x22oriwidth\x22:1200,\x22oriheight\x22:797,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=4158958181,280757487&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810880655\x22,\x22contSign\x22:\x224158958181,280757487\x22},
{\x22pid\x22:4003,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2371362259,3988640650&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809782065\x22,\x22contSign\x22:\x222371362259,3988640650\x22},
{\x22pid\x22:4004,\x22width\x22:800,\x22height\x22:1200,\x22oriwidth\x22:800,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=355704943,1318565630&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810998065\x22,\x22contSign\x22:\x22355704943,1318565630\x22},
{\x22pid\x22:4005,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=655876807,3707807800&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809770741\x22,\x22contSign\x22:\x22655876807,3707807800\x22},
{\x22pid\x22:4006,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1423490396,3473826719&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811796379\x22,\x22contSign\x22:\x221423490396,3473826719\x22}]',
               

拿出一条数据来看:

bash 复制代码
{\x22pid\x22:4006,
\x22width\x22:1200,
\x22height\x22:800,
\x22oriwidth\x22:1200,
\x22oriheight\x22:800,
\x22thumbnailUrl\x22:
\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1423490396,3473826719&fm=193&f=GIF\x22,
\x22fromUrl\x22:
\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811796379\x22,\x22contSign\x22:\x221423490396,3473826719\x22}]',

下一篇继续。

相关推荐
小爬虫程序猿7 分钟前
如何设置爬虫的访问频率?
爬虫
岑梓铭28 分钟前
(CentOs系统虚拟机)Standalone模式下安装部署“基于Python编写”的Spark框架
linux·python·spark·centos
游客52043 分钟前
opencv中的各种滤波器简介
图像处理·人工智能·python·opencv·计算机视觉
Eric.Lee20211 小时前
moviepy将图片序列制作成视频并加载字幕 - python 实现
开发语言·python·音视频·moviepy·字幕视频合成·图像制作为视频
Dontla1 小时前
vscode怎么设置anaconda python解释器(anaconda解释器、vscode解释器)
ide·vscode·python
biter00881 小时前
opencv(15) OpenCV背景减除器(Background Subtractors)学习
人工智能·opencv·学习
qq_529025291 小时前
Torch.gather
python·深度学习·机器学习
数据小爬虫@1 小时前
如何高效利用Python爬虫按关键字搜索苏宁商品
开发语言·爬虫·python
Cachel wood2 小时前
python round四舍五入和decimal库精确四舍五入
java·linux·前端·数据库·vue.js·python·前端框架