软件版本号:
python --version
Python 3.8.0
pip show selenium
Version: 4.20.0
chromedriver.exe -version
109.0.5414.74
主题:爬取10条动态网页内容(电影票房)
1.根据xpath获取网页节点(Ctrl+F)
2.使用Console控制台打印节点内容,看是否是自己需要的内容
F12->$x('//title')
3.python代码获取,进行细微调整
python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
# 配置WebDriver的路径(确保chromedriver的路径正确)
chrome_driver_path = 'C:/Users/Administrator/Downloads/Compressed/chromedriver_win32/chromedriver.exe'
# 初始化WebDriver:旧版本,会报错
# driver = webdriver.Chrome(executable_path=chrome_driver_path)
# 新版本
service = Service(chrome_driver_path)
driver = webdriver.Chrome(service=service)
# 导航到目标网页
driver.get('https://piaofang.maoyan.com/dashboard/movie')
# 等待页面加载完成(根据需要调整等待时间)
time.sleep(5)
tbody = driver.find_element(By.XPATH, '//*[@id="app"]/div/div/div[2]/div[1]/div[2]/div/table/tbody')
rows = tbody.find_elements(By.TAG_NAME, 'tr')
i=1
for row in rows:
title = row.find_element(By.XPATH, './td[1]/div/div[@class="moviename-desc"]/p[@class="moviename-name"]').text
days = row.find_element(By.XPATH, './td[1]/div/div[@class="moviename-desc"]/p[@class="moviename-info"]/span[1]').text
money = row.find_element(By.XPATH, './td[1]/div/div[@class="moviename-desc"]/p[@class="moviename-info"]/span[2]').text
print(str(i) + '.' + title + ' [' + days + '] [票房' + money + ']')
#print() # 换行,表示一行结束
if i == 10:
break;
i+=1
# 关闭浏览器
driver.quit()
运行结果:
1.xxx [点映] [票房5474.1万]
2.xxx [上映32天] [票房9.09亿]
3.xxx [上映27天] [票房7.71亿]
4.xxx [上映31天] [票房2.73亿]
5.xxx [上映26天] [票房5560.5万]
6.xxx [上映27天] [票房1.20亿]
7.xxx [上映11天] [票房1356.9万]
8.xxx [上映39天] [票房3.49亿]
9.xxx [上映27天] [票房1.00亿]
10.xxx [] [票房127.2万]
xxx会根据实际内容输出。