目录
一、目标1:爬取指定json中数据
爬取data里数据
data:image/s3,"s3://crabby-images/c1afb/c1afb0c23e99bbd8bb5229a839e43359c599f15b" alt=""
核心代码:
dirt1 = json.loads(res.text)
print(dirt1['data'])
(1)json.loads()方法可用于解析有效的JSON字符串并将其转换为Python字典
(2)dirt1['data']是打印json中的data部分
运行结果:
完成了爬取
data:image/s3,"s3://crabby-images/f9369/f9369231cbda654db97e885e01583c16478dd9a9" alt=""
代码:
import requests
import json
from fake_useragent import UserAgent
def get_json():
try:
url = 'https://napi-huawei.tianyancha.com/next/web/home/vajialist?_=1688703382196'
ua = UserAgent()
headers = {
'User-Agent': ua.chrome,
}
res = requests.get(url, headers=headers,timeout=10)
dirt1 = json.loads(res.text)
print(dirt1['data'])
except:
return ""
if __name__ == '__main__':
get_json()
二、目标2:循环取json中数据
我这里有2部分才到列表里面是把
所以我要取列表中数据就要变为
dirt1['data']['detailList']
data:image/s3,"s3://crabby-images/5a7b9/5a7b91399bd98ff24f85386b76a277065aaf7cf1" alt=""
循环:
使用一个循环就可以区分开来了
for item in dirt1['data']['detailList']:
print(item)
data:image/s3,"s3://crabby-images/ab4fe/ab4fec1f73cfd7342e648ef86c562777887481a0" alt=""
代码:
import requests
import json
from fake_useragent import UserAgent
def get_json():
try:
url = 'https://napi-huawei.tianyancha.com/next/web/home/vajialist?_=1688703382196'
ua = UserAgent()
headers = {
'User-Agent': ua.chrome,
}
res = requests.get(url, headers=headers,timeout=10)
dirt1 = json.loads(res.text)
#print(dirt1['data'])
for item in dirt1['data']['detailList']:
print(item)
except:
return ""
if __name__ == '__main__':
get_json()
三、目标3:提取每个数据中的某一项
目标
提取这2个指定项
data:image/s3,"s3://crabby-images/09ede/09ede1f4bb739679847da2c41d22bca1bab8d38a" alt=""
理解:
此时item其实相当于dirt1['data']['detailList']中的一项
所以
item['resourceKey'] === ['data']['detailList']['resourceKey']
这行代码精确到数据中的更小一项resourceKey
运行结果:
data:image/s3,"s3://crabby-images/833e3/833e3122e825e2e0ca7bb2894378d765b167abf2" alt=""
完整代码:
import requests
import json
from fake_useragent import UserAgent
def get_json():
try:
url = 'https://napi-huawei.tianyancha.com/next/web/home/vajialist?_=1688703382196'
ua = UserAgent()
headers = {
'User-Agent': ua.chrome,
}
res = requests.get(url, headers=headers,timeout=10)
dirt1 = json.loads(res.text)
#print(dirt1['data'])
for item in dirt1['data']['detailList']:
#print(item)
print(item['resourceKey'])
print(item['resourceName'])
except:
return ""
if __name__ == '__main__':
get_json()
四、网络安全小圈子
GitHub - BLACKxZONE/Treasure_knowledgehttps://github.com/BLACKxZONE/Treasure_knowledge