这小案例很简单,看完上一篇文章就可以着手做了:
python
# 下面用加载页面,定位数据,动作链来做
from DrissionPage import ChromiumPage
from DrissionPage.common import By
import time
input_data = input('请输入想要搜索的课程:')
webdriver = ChromiumPage()
for page in range(1, 10):
# 单窗口复用
webdriver.get(f'https://search.bilibili.com/video?keyword={input_data}&from_source=webtop_search&spm_id_from=333.1007&search_source=6&page={page}&o={(page-1) * 30}')
time.sleep(0.8)
# 解析元素
infos = webdriver.eles((By.XPATH, '//div[@class="video-list row"]/div'))
lst = []
for info in infos:
url_ = info.ele((By.XPATH, './/div[@class="bili-video-card"]/div[@class="bili-video-card__wrap"]/a'))
url = url_.attr('href')
# print(url)
title = info.ele((By.XPATH, './/div[@class="bili-video-card"]/div[@class="bili-video-card__wrap"]/div/div/a/h3')).attr('title')
dic = {
'title': title,
'url': url,
}
lst.append(dic)
print('-' * 160)
print(f'第{page}页:{lst}')
webdriver.quit()
学了数据库,可以试试redis去重然后存入mysql或者MongDB中,这里就有一点需要注意,要在url中找规律,如果在第一页用动作点击下一页这样翻页获取挺麻烦的,因为这样翻页会刷新网页导致只能抓取前两页
小结
本文很简单,大家可以试试用接口的那种,那种更快,加油加油