python爬虫题目

网站

https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/

第一道题爬取api并且保存

python 复制代码
import requests,re
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
with open('1.json','w') as f:
    f.write(json.dumps(res,ensure_ascii=False))

第二道爬取所有图片

python 复制代码
from urllib.parse import urljoin
import requests,re
from urllib.parse import urlparse
import json
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/api/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers).json()
list1 = res['articles']
list2=[]
for i in list1:
    list2.append(i['image'])
base_url ="https://"+urlparse(url).netloc

for image in list2:
    image_url = urljoin(base_url,image)
    img = requests.get(image_url).content
    img_name = image.split("/")[-1]
    with open(img_name,'wb') as f:
        f.write(img)

第三道 爬取题目和摘要

python 复制代码
import requests,csv
from lxml import etree
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["题目","再要"])
url = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/tasks/article/list/"
headers= {

'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
}

res = requests.get(url,headers=headers)
html = etree.HTML(res.text)
wen_zhang = html.xpath('//div[@class="lab-block"]//a//@href')
with open("data.csv","w",newline='',encoding='gbk') as f:
    writer = csv.writer(f)
    writer.writerow(["ti","zai"])



for i in wen_zhang:
    url_l = "https://project-iprj6705f17ebcfad66461658c5c-8000.preview.node01.inscode.run/"+i
    result = requests.get(url_l,headers=headers)
    select = etree.HTML(result.text)
    timu = select.xpath('//h2/text()')[0]
    zaiyao = select.xpath('//p//text()')
    result = "".join(zaiyao)
    with open("data.csv", "a", newline='',encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow([timu, result])
相关推荐
学测绘的小杨4 小时前
CompassFusion:一个从 GNSS 到 GNSS/INS 组合导航的独立工程包
python
zzzzzz31011 小时前
当产品经理说这个很简单:我用Python自动化处理奇葩需求的实战指南
python·pycharm·产品经理
雪隐11 小时前
个人电脑玩AI-06让5060 Ti给你打工——不光能画画,Qwen3-TTS还能学人说话,连我老板都信了!
人工智能·后端·python
兵慌码乱1 天前
面向桌面端的资产管理系统分层架构设计与核心模块实现
python·系统架构·sqlite·pyqt5·数据库设计·桌面应用开发·mvc架构
hboot1 天前
AI工程师第三课 - 机器学习基础
python·scikit-learn·kaggle
顾林海1 天前
Agent入门阶段-编程基础-Python:流程控制
python·agent·ai编程
呱呱复呱呱1 天前
Django CBV 源码解读:一个请求是怎么找到你的 get() 方法的
python·django
Caco_D1 天前
一行代码抓遍全网 20 个热榜!Aneiang.Pa 4.0 发布 — 极简 .NET 爬虫库
爬虫·.net
曲幽2 天前
刚部署的 LibreTranslate 频频翻车?我掏出了 20 年前的 StarDict 词典,用 FastAPI 搭了个本地词典翻译 API
python·fastapi·web·translate·goldendict·libretranslate·stardict·pystardict
荣码2 天前
用Streamlit给AI应用套个界面,10行代码出Web页面
java·python