爬虫第4课：get请求

程序员贵哥2024-03-26 14:24

注意下面这些代码：

这个脚本直接打印出网页的HTML内容，而不是解析后的内容。如果你想要解析网页内容（如提取某些特定信息），你可能需要使用如BeautifulSoup之类的库。
这个脚本没有进行错误处理，例如网络错误或请求超时等。在实际应用中，你可能需要添加适当的错误处理机制。
使用 fake_useragent 生成随机用户代理有助于避免某些网站的防爬虫机制，但这并不能保证一定能成功绕过所有的防爬虫机制。某些网站可能会采用更复杂的策略来检测和阻止爬虫。

复制代码

# Pythonit教程网（blog.pythonit.cn)
# Python全栈视频课件获取：www.dqu.cc
# 加速高防cdn：woaiyundun.cn

复制代码

# coding：utf-8
# 时间：2024/3/16 16:16
# Pythonit教程网（blog.pythonit.cn)
# Python全栈视频课件获取：www.dqu.cc
# 加速高防cdn：woaiyundun.cn
from urllib.request import urlopen,Request
from fake_useragent import UserAgent
from urllib.parse import quote
search = input("请输入搜索的内容：")
url = f"https://blog.pythonit.cn/index.php/search/{quote(search)}"
ua = UserAgent()
headers = {
    'User-Agent': ua.chrome
}
re = Request(url,headers=headers)
reopen = urlopen(re)
print(reopen.read().decode())