文章目录
-
- [1. 安装 httpx](#1. 安装 httpx)
- [2. 同步请求](#2. 同步请求)
- [3. 异步请求](#3. 异步请求)
- [4. 高级功能](#4. 高级功能)
- [5. 错误处理](#5. 错误处理)
- [6. 配置客户端](#6. 配置客户端)
- [7. 结合 Beautiful Soup 使用](#7. 结合 Beautiful Soup 使用)
- [8. 示例:抓取并解析网页](#8. 示例:抓取并解析网页)
- [9. 注意事项](#9. 注意事项)
httpx 是一个现代化的 Python HTTP 客户端库,支持同步和异步请求,功能强大且易于使用。它比 requests 更高效,支持 HTTP/2 和异步操作。以下是 httpx 的详细使用方法:
1. 安装 httpx
首先,确保已经安装了 httpx。可以通过以下命令安装:pip install httpx
如果需要支持 HTTP/2,可以安装额外依赖:pip install httpx[http2]
2. 同步请求
发送 GET 请求
python
import httpx
# 发送 GET 请求
response = httpx.get('https://httpbin.org/get')
print(response.status_code) # 状态码
print(response.text) # 响应内容
发送 POST 请求
python
# 发送 POST 请求
data = {'key': 'value'}
response = httpx.post('https://httpbin.org/post', json=data)
print(response.json()) # 解析 JSON 响应
设置请求头
python
headers = {'User-Agent': 'my-app/1.0.0'}
response = httpx.get('https://httpbin.org/headers', headers=headers)
print(response.json())
设置查询参数
python
params = {'key1': 'value1', 'key2': 'value2'}
response = httpx.get('https://httpbin.org/get', params=params)
print(response.json())
处理超时
python
try:
response = httpx.get('https://httpbin.org/delay/5', timeout=2.0)
except httpx.TimeoutException:
print("请求超时")
3. 异步请求
httpx 支持异步操作,适合高性能场景。
发送异步 GET 请求
python
import httpx
import asyncio
async def fetch(url):
async with httpx.AsyncClient() as client:
response = await client.get(url)
print(response.text)
asyncio.run(fetch('https://httpbin.org/get'))
发送异步 POST 请求
python
async def post_data(url, data):
async with httpx.AsyncClient() as client:
response = await client.post(url, json=data)
print(response.json())
asyncio.run(post_data('https://httpbin.org/post', {'key': 'value'}))
并发请求
python
async def fetch_multiple(urls):
async with httpx.AsyncClient() as client:
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.text)
urls = ['https://httpbin.org/get', 'https://httpbin.org/ip']
asyncio.run(fetch_multiple(urls))
4. 高级功能
使用 HTTP/2
python
# 启用 HTTP/2
client = httpx.Client(http2=True)
response = client.get('https://httpbin.org/get')
print(response.http_version) # 输出协议版本
文件上传
python
files = {'file': open('example.txt', 'rb')}
response = httpx.post('https://httpbin.org/post', files=files)
print(response.json())
流式请求
python
# 流式上传
def generate_data():
yield b"part1"
yield b"part2"
response = httpx.post('https://httpbin.org/post', data=generate_data())
print(response.json())
流式响应
python
# 流式下载
with httpx.stream('GET', 'https://httpbin.org/stream/10') as response:
for chunk in response.iter_bytes():
print(chunk)
5. 错误处理
httpx 提供了多种异常类,方便处理错误。
处理网络错误
python
try:
response = httpx.get('https://nonexistent-domain.com')
except httpx.NetworkError:
print("网络错误")
处理 HTTP 错误状态码
python
response = httpx.get('https://httpbin.org/status/404')
if response.status_code == 404:
print("页面未找到")
6. 配置客户端
可以通过 httpx.Client 或 httpx.AsyncClient 配置全局设置。
设置超时
python
client = httpx.Client(timeout=10.0)
response = client.get('https://httpbin.org/get')
print(response.text)
设置代理
python
proxies = {
"http://": "http://proxy.example.com:8080",
"https://": "http://proxy.example.com:8080",
}
client = httpx.Client(proxies=proxies)
response = client.get('https://httpbin.org/get')
print(response.text)
设置基础 URL
python
client = httpx.Client(base_url='https://httpbin.org')
response = client.get('/get')
print(response.text)
7. 结合 Beautiful Soup 使用
httpx 可以与 Beautiful Soup 结合使用,抓取并解析网页。
python
import httpx
from bs4 import BeautifulSoup
# 抓取网页
response = httpx.get('https://example.com')
html = response.text
# 解析网页
soup = BeautifulSoup(html, 'lxml')
title = soup.find('title').text
print("网页标题:", title)
8. 示例:抓取并解析网页
以下是一个完整的示例,展示如何使用 httpx 抓取并解析网页数据:
python
import httpx
from bs4 import BeautifulSoup
# 抓取网页
url = 'https://example.com'
response = httpx.get(url)
html = response.text
# 解析网页
soup = BeautifulSoup(html, 'lxml')
# 提取标题
title = soup.find('title').text
print("网页标题:", title)
# 提取所有链接
links = soup.find_all('a', href=True)
for link in links:
href = link['href']
text = link.text
print(f"链接文本: {text}, 链接地址: {href}")
9. 注意事项
性能:httpx 的异步模式适合高并发场景。
兼容性:httpx 的 API 与 requests 高度兼容,迁移成本低。
HTTP/2:如果需要使用 HTTP/2,确保安装了 httpx[http2]。
通过以上方法,可以使用 httpx 高效地发送 HTTP 请求,并结合其他工具(如 Beautiful Soup)实现数据抓取和解析。