python爬虫题目

网站

https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/

第一道题爬取api并且保存

python 复制代码
import requests,re
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
with open('1.json','w') as f:
    f.write(json.dumps(res,ensure_ascii=False))

第二道爬取所有图片

python 复制代码
from urllib.parse import urljoin
import requests,re
from urllib.parse import urlparse
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
list1 = res['articles']
list2=[]
for i in list1:
    list2.append(i['image'])
base_url ="https://"+urlparse(url).netloc

for image in list2:
    image_url = urljoin(base_url,image)
    img = requests.get(image_url).content
    img_name = image.split("/")[-1]
    with open(img_name,'wb') as f:
        f.write(img)

第三道 爬取题目和摘要

python 复制代码
import requests,csv
from lxml import etree
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["题目","再要"])
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/article/list/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers)
html = etree.HTML(res.text)
wen_zhang = html.xpath('//div[@class="lab-block"]//a//@href')
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["ti","zai"])



for i in wen_zhang:
    url_l = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/"+i
    result = requests.get(url_l,headers=headers)
    select = etree.HTML(result.text)
    timu = select.xpath('//h2/text()')[0]
    zaiyao = select.xpath('//p//text()')
    result = "".join(zaiyao)
    with open("data.csv", "a", newline='',encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow([timu, result])
相关推荐
FreakStudio3 小时前
W55MH32L-EVB 上手测评:硬件 TCP/IP 加持的以太网单片机,MicroPython 零门槛开发
python·单片机·嵌入式·大学生·面向对象·并行计算·电子diy·电子计算机
用户0332126663674 小时前
使用 Python 从零创建 Word 文档
python
Csvn9 小时前
Python 两大经典坑点 —— 可变默认参数 & 闭包延迟绑定
后端·python
曲幽10 小时前
别再用网页翻译看源码了!你的私人翻译神器LibreTranslate,部署避坑指南来了
python·docker·web·pot·translate·libretranslate·arogstranslate
用户5569188175312 小时前
#从脚本到独立程序:Python + Playwright 批量抓取的完整踩坑记录
python·自动化运维
兵慌码乱1 天前
基于 MediaPipe 与 PySide2 的手势交互音乐控制系统实现:轻量化视觉交互全流程解析
python·opencv·计算机视觉·人机交互·手势识别·mediapipe·pyside2
luckdewei1 天前
FastAPI 资产管理系统实战:复杂 ORM 关联、Alembic 迁移与 N+1 查询优化
python
aqi001 天前
15天学会AI应用开发(八)使用向量数据库实现RAG功能
人工智能·python·大模型·ai编程·ai应用
Csvn1 天前
`functools.lru_cache` —— 一行代码搞定缓存加速
后端·python