Python学习从0开始——项目一day01爬虫

Python学习从0开始------项目一day01爬虫

一、导入代码

在Inscode新建一个python类型的项目,然后打开终端,粘贴以下代码,回车clone项目。

bash 复制代码
git clone https://gitee.com/52itstyle/Python.git

这个是gitee上找的一个python项目,项目源地址

二、使用的核心库

requests库是第三方库,使用其提供的API比使用python自带的urllib更为简洁,且能够处理多种HTTP请求,功能也很强大。

#导入requests库

import requests

#导入文件操作库

import os

#bs4全名BeautifulSoup,是编写python爬虫常用库之一,主要用来解析html标签。

import bs4

from bs4 import BeautifulSoup

#基础类库

import sys

#Python 3.x 解决中文编码问题

import importlib

importlib.reload(sys)

三、功能测试

3.1初始代码

初始代码位置:Python/Day01/脚本,打开终端运行命令:

bash 复制代码
#切换目录
cd Python/Day01/脚本
#输出
/root/Python_02/Python/Day01/脚本
#运行脚本
python3 mzitu_linux.py
#输出报错
File "/root/Python_02/Python/Day01/脚本/mzitu_linux.py", line 21
    save_path = ​'/mnt/data/mzitu'
                ^
SyntaxError: invalid non-printable character U+200B
python 复制代码
#打开mzitu_linux.py文件,定位原代码21行,修改save_path
save_path ='./picture'
#打开56、68、72行的注释
bash 复制代码
#重新运行
python3 mzitu_linux.py
#很慢,把网址复制到浏览器直接拒绝访问
键盘Ctrl+C组合停止运行

3.2新建文件

在脚本的同级目录下新进learn文件夹,新建spider.py文件,将mzitu_linux.py里的内容复制过来

3.3代码调试

python 复制代码
#问题一:网站不可访问。解决:修改爬图地址
#定位代码18行
mziTu = 'https://image.baidu.com/'
bash 复制代码
#终端执行
cd ../
cd learn/
python3 spider.py
#输出报错
Traceback (most recent call last):
  File "/root/Python_02/Python/Day01/learn/spider.py", line 106, in <module>
    main()
  File "/root/Python_02/Python/Day01/learn/spider.py", line 90, in main
    img_max = soup.find('div', class_='nav-links').find_all('a')[3].text
AttributeError: 'NoneType' object has no attribute 'find_all'

以上报错是正常的,切换爬取网站后,页面元素的解析肯定会发生改变,接下来一步步修改解析。

四、页面元素解析

4.1网页

bash 复制代码
#进入百度图片的网址
https://image.baidu.com/

键盘F12调出控制台,切换到Element标签页,组合键Ctrl+Shift+C选中合辑的图片,然后审查元素。

选中'<a>'标签,右键copy>copy emelemt审查元素,关注target和href

html 复制代码
<a class="bd-home-content-album-item             
" target="_blank" href="https://image.baidu.com/search/albumsdetail?tn=albumsdetail&amp;word=%E5%9F%8E%E5%B8%82%E5%BB%BA%E7%AD%91%E6%91%84%E5%BD%B1%E4%B8%93%E9%A2%98&amp;fr=searchindex_album%20&amp;album_tab=%E5%BB%BA%E7%AD%91&amp;album_id=7&amp;rn=30" data-type="0"> 
	<div class="bd-home-content-album-item-pic" style="background-image: url(https://t7.baidu.com/it/u=1595072465,3644073269&amp;fm=193&amp;f=GIF); background-color: #EACFC5"> 
	</div> 
	<div class="bd-home-content-album-item-inner-border"></div> 
	<div class="bd-home-content-album-item-title"> 城市建筑摄影专题  </div> 
</a>

选中'<a>'标签,右键copy>copy selector复制选择器

html 复制代码
#bd-home-content-album > a:nth-child(1)

由以上可推:根据元素的唯一id:'bd-home-content-album'可以找到'<div>'标签内的所有'<a>'标签,当前复制的'<a>'标签是其父元素的第一个子'<a>'元素。

4.2修改代码

python 复制代码
#修改39行
# 获取页面的栏目地址
    all_a = soup_sub.find('div',id='bd-home-content-album').find_all('a',target='_blank')
# 修改主方法,此页面无分页
def main():
    res = requests.get(mziTu, headers=headers)
    # 使用自带的html.parser解析
    soup = BeautifulSoup(res.text, 'html.parser')
    # 创建文件夹
    createFile(save_path)
    file = save_path
    createFile(file)
    print("开始执行")
    download(mziTu, file)

切换到终端,运行脚本:

bash 复制代码
python3 spider.py 
#输出报错
开始执行
内页第几页:2
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30
'NoneType' object has no attribute 'find_all'
内页第几页:4
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30
'NoneType' object has no attribute 'find_all'
内页第几页:6
套图地址:https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album
'NoneType' object has no attribute 'find_all'

父页面解析的元素和初始代码不同,子页面也不同,继续修改。

4.3子页面

复制打印的套图地址进入子页面,同样的操作,定位子页面图片:

html 复制代码
<a class="albumsdetail-item" href="/search/detail?tn=baiduimagedetail&amp;word=%E5%9F%8E%E5%B8%82%E5%BB%BA%E7%AD%91%E6%91%84%E5%BD%B1%E4%B8%93%E9%A2%98&amp;album_tab=%E5%BB%BA%E7%AD%91&amp;album_id=7&amp;ie=utf-8&amp;fr=albumsdetail&amp;cs=1595072465,3644073269&amp;pi=3977&amp;pn=0&amp;ic=0&amp;objurl=https%3A%2F%2Ft7.baidu.com%2Fit%2Fu%3D1595072465%2C3644073269%26fm%3D193%26f%3DGIF" target="_blank" data-index="0" width="310.4" style="width: 310.4px; height: 310px;">
	<img class="albumsdetail-item-img" src="https://t7.baidu.com/it/u=1595072465,3644073269&amp;fm=193&amp;f=GIF" style="width: 310.4px; height: 310px; background-color: rgb(234, 207, 197);">
	<div class="albumsdetail-item-inner-border"></div>
</a>

元素选择器:

html 复制代码
#imgList > div:nth-child(1) > a:nth-child(1)

数量元素选择器:

html 复制代码
#bd-albumsdetail-content > div.albumsdetail-cover.clearfix > div.albumsdetail-info > div.albumsdetail-info-text > p.albumsdetail-info-num > span

4.4修改代码

python 复制代码
#修改53行,也可以根据元素去获取这个数值,在这不是重点,直接赋值了
# 获取套图的最大数量
                pic_max = "791"
#修改62行
                    img = soup_sub_2.find('div',id='imgList').find('img')
bash 复制代码
#切换终端执行代码
python3 spider.py 
#输出报错
开始执行
内页第几页:2
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E6%B8%90%E5%8F%98%E9%A3%8E%E6%A0%BC%E6%8F%92%E7%94%BB&fr=albumslist&album_tab=%E8%AE%BE%E8%AE%A1%E7%B4%A0%E6%9D%90&album_id=409&rn=30/1
'NoneType' object has no attribute 'find'
内页第几页:4
套图地址:https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumsdetail?tn=albumsdetail&word=%E5%AE%A0%E7%89%A9%E5%9B%BE%E7%89%87&fr=albumslist&album_tab=%E5%8A%A8%E7%89%A9&album_id=688&rn=30/1
'NoneType' object has no attribute 'find'
内页第几页:6
套图地址:https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album
套图数量:791
子内页第几页:1
https://image.baidu.com/search/albumslist?tn=albumslist&word=%E4%BA%BA%E7%89%A9&album_tab=%E4%BA%BA%E7%89%A9&rn=15&fr=searchindex_album/1
'NoneType' object has no attribute 'find'

明明已经根据元素选择器来查找了,为什么没有找到元素呢?打印父元素看看:

python 复制代码
#63行插入打印父元素
                    print(soup_sub_2.find('div',id='bd-albumsdetail-content'))
bash 复制代码
#终端执行
python3 spider.py 
#输出
<div id="bd-albumsdetail-content">
</div>

问题找到了,根本原因是该div内的元素是在运行时动态渲染和加载的,造成我们通过浏览器访问是能看到该元素的,但是爬虫爬不到。这就需要我们另想办法解决。

是否是动态渲染,我们可以更早的发现:

打开控制台,切换到network,可以看到多次发送的请求,这些请求网址实际上来自

查看第一条请求的返回值,随便选择一条发送图片的请求复制参数,在response页Ctrl+F调出搜索框,定位返回值所在位置。

详细数据如下,稍微调整了一下格式:

bash 复制代码
linkData: '[{\x22pid\x22:3977,\x22width\x22:1100,\x22height\x22:1100,\x22oriwidth\x22:1200,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1595072465,3644073269&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811557570\x22,\x22contSign\x22:\x221595072465,3644073269\x22},
{\x22pid\x22:3978,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=4198287529,2774471735&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/147317368?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x224198287529,2774471735\x22},
{\x22pid\x22:3979,\x22width\x22:1200,\x22height\x22:813,\x22oriwidth\x22:1200,\x22oriheight\x22:813,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1956604245,3662848045&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809773493\x22,\x22contSign\x22:\x221956604245,3662848045\x22},
{\x22pid\x22:3980,\x22width\x22:1200,\x22height\x22:760,\x22oriwidth\x22:1200,\x22oriheight\x22:760,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2529476510,3041785782&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/805192561\x22,\x22contSign\x22:\x222529476510,3041785782\x22},
{\x22pid\x22:3981,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=727460147,2222092211&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811065917\x22,\x22contSign\x22:\x22727460147,2222092211\x22},
{\x22pid\x22:3982,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2511982910,2454873241&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810968731\x22,\x22contSign\x22:\x222511982910,2454873241\x22},
{\x22pid\x22:3983,\x22width\x22:1200,\x22height\x22:686,\x22oriwidth\x22:1200,\x22oriheight\x22:686,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=825057118,3516313570&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810073156\x22,\x22contSign\x22:\x22825057118,3516313570\x22},
{\x22pid\x22:3984,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3435942975,1552946865&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811932564\x22,\x22contSign\x22:\x223435942975,1552946865\x22},
{\x22pid\x22:3985,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3569419905,626536365&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809770618\x22,\x22contSign\x22:\x223569419905,626536365\x22},
{\x22pid\x22:3986,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3779234486,1094031034&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810970358\x22,\x22contSign\x22:\x223779234486,1094031034\x22},
{\x22pid\x22:3987,\x22width\x22:1200,\x22height\x22:482,\x22oriwidth\x22:1200,\x22oriheight\x22:482,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2397542458,3133539061&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811063723\x22,\x22contSign\x22:\x222397542458,3133539061\x22},
{\x22pid\x22:3988,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2763645735,2016465681&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809771013\x22,\x22contSign\x22:\x222763645735,2016465681\x22},
{\x22pid\x22:3989,\x22width\x22:1149,\x22height\x22:1100,\x22oriwidth\x22:1200,\x22oriheight\x22:1149,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3911840071,2534614245&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810877786\x22,\x22contSign\x22:\x223911840071,2534614245\x22},
{\x22pid\x22:3990,\x22width\x22:1200,\x22height\x22:687,\x22oriwidth\x22:1200,\x22oriheight\x22:687,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3908717,2002330211&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810968672\x22,\x22contSign\x22:\x223908717,2002330211\x22},
{\x22pid\x22:3991,\x22width\x22:1200,\x22height\x22:799,\x22oriwidth\x22:1200,\x22oriheight\x22:799,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=318887420,2894941323&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810056726\x22,\x22contSign\x22:\x22318887420,2894941323\x22},
{\x22pid\x22:3992,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1063451194,1129125124&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/146287060?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x221063451194,1129125124\x22},
{\x22pid\x22:3993,\x22width\x22:800,\x22height\x22:1200,\x22oriwidth\x22:800,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3785402047,1898752523&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810970018\x22,\x22contSign\x22:\x223785402047,1898752523\x22},
{\x22pid\x22:3994,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3691080281,11347921&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809782140\x22,\x22contSign\x22:\x223691080281,11347921\x22},
{\x22pid\x22:3995,\x22width\x22:1200,\x22height\x22:799,\x22oriwidth\x22:1200,\x22oriheight\x22:799,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2374506090,1216769752&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.veer.com\\\/photo\\\/146290795?utm_source=baidu&utm_medium=imagesearch&chid=902\x22,\x22contSign\x22:\x222374506090,1216769752\x22},
{\x22pid\x22:3996,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1285847167,3193778276&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809771315\x22,\x22contSign\x22:\x221285847167,3193778276\x22},
{\x22pid\x22:3997,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3251197759,2520670799&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/814059806\x22,\x22contSign\x22:\x223251197759,2520670799\x22},
{\x22pid\x22:3998,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=602106375,407124525&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/813923414\x22,\x22contSign\x22:\x22602106375,407124525\x22},
{\x22pid\x22:3999,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2906406936,2666005453&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811706433\x22,\x22contSign\x22:\x222906406936,2666005453\x22},
{\x22pid\x22:4000,\x22width\x22:1200,\x22height\x22:798,\x22oriwidth\x22:1200,\x22oriheight\x22:798,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3124693600,356058981&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/805197127\x22,\x22contSign\x22:\x223124693600,356058981\x22},
{\x22pid\x22:4001,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=3646282624,1156077026&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810999167\x22,\x22contSign\x22:\x223646282624,1156077026\x22},
{\x22pid\x22:4002,\x22width\x22:1200,\x22height\x22:797,\x22oriwidth\x22:1200,\x22oriheight\x22:797,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=4158958181,280757487&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810880655\x22,\x22contSign\x22:\x224158958181,280757487\x22},
{\x22pid\x22:4003,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=2371362259,3988640650&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809782065\x22,\x22contSign\x22:\x222371362259,3988640650\x22},
{\x22pid\x22:4004,\x22width\x22:800,\x22height\x22:1200,\x22oriwidth\x22:800,\x22oriheight\x22:1200,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=355704943,1318565630&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/810998065\x22,\x22contSign\x22:\x22355704943,1318565630\x22},
{\x22pid\x22:4005,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=655876807,3707807800&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/809770741\x22,\x22contSign\x22:\x22655876807,3707807800\x22},
{\x22pid\x22:4006,\x22width\x22:1200,\x22height\x22:800,\x22oriwidth\x22:1200,\x22oriheight\x22:800,\x22thumbnailUrl\x22:\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1423490396,3473826719&fm=193&f=GIF\x22,\x22fromUrl\x22:\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811796379\x22,\x22contSign\x22:\x221423490396,3473826719\x22}]',
               

拿出一条数据来看:

bash 复制代码
{\x22pid\x22:4006,
\x22width\x22:1200,
\x22height\x22:800,
\x22oriwidth\x22:1200,
\x22oriheight\x22:800,
\x22thumbnailUrl\x22:
\x22https:\\\/\\\/t7.baidu.com\\\/it\\\/u=1423490396,3473826719&fm=193&f=GIF\x22,
\x22fromUrl\x22:
\x22https:\\\/\\\/www.vcg.com\\\/creative\\\/811796379\x22,\x22contSign\x22:\x221423490396,3473826719\x22}]',

下一篇继续。

相关推荐
云泽野1 小时前
【Java|集合类】list遍历的6种方式
java·python·list
IMPYLH3 小时前
Python 的内置函数 reversed
笔记·python
科技苑4 小时前
简单 Python 爬虫程序设计
爬虫
小赖同学啊5 小时前
物联网数据安全区块链服务
开发语言·python·区块链
码荼5 小时前
学习开发之hashmap
java·python·学习·哈希算法·个人开发·小白学开发·不花钱不花时间crud
武昌库里写JAVA6 小时前
Oracle如何使用序列 Oracle序列使用教程
java·开发语言·spring boot·学习·课程设计
小陈phd6 小时前
李宏毅机器学习笔记——梯度下降法
人工智能·python·机器学习
kk爱闹6 小时前
【挑战14天学完python和pytorch】- day01
android·pytorch·python
Blossom.1187 小时前
机器学习在智能建筑中的应用:能源管理与环境优化
人工智能·python·深度学习·神经网络·机器学习·机器人·sklearn
祁思妙想7 小时前
八股学习(三)---MySQL
数据库·学习·mysql