下载文件wget - 技术栈

wget 命令完整指南

1. 基本命令格式

复制代码

wget [选项] [URL]

📋 常用选项详解

基础下载选项

选项	说明
`-O <文件名>`	指定下载文件的保存名称
`-P <目录>`	指定下载文件的保存目录
`-c`	断点续传（继续未完成的下载）
`-b`	后台下载
`-q`	安静模式（不显示输出）
`-v`	详细模式（显示更多信息）

连接和重试选项

选项	说明
`--tries=<次数>`	设置重试次数
`--timeout=<秒>`	设置超时时间
`--wait=<秒>`	设置下载间隔时间
`--limit-rate=<速度>`	限制下载速度

高级功能选项

选项	说明
`-r`	递归下载（整个网站）
`-np`	不追溯至父目录
`-A <模式>`	接受的文件模式（通配符）
`-R <模式>`	拒绝的文件模式
`--user=<用户名>`	FTP/HTTP 用户名
`--password=<密码>`	FTP/HTTP 密码

📁 实际使用场景示例

场景1：基本文件下载

复制代码

# 下载单个文件到当前目录
wget https://example.com/file.zip

# 下载并指定保存文件名
wget -O myfile.zip https://example.com/file.zip

# 下载到指定目录
wget -P /home/user/downloads/ https://example.com/file.zip

# 下载到指定目录并重命名
wget -O /home/user/downloads/myfile.zip https://example.com/file.zip

场景2：断点续传和后台下载

复制代码

# 断点续传（如果下载中断）
wget -c https://example.com/largefile.iso

# 后台下载（适合大文件）
wget -b https://example.com/largefile.iso
# 查看后台下载进度
tail -f wget-log

# 限制下载速度（单位：k/m 表示 KB/s 或 MB/s）
wget --limit-rate=500k https://example.com/largefile.iso

场景3：递归下载（整站镜像）

复制代码

# 递归下载整个网站
wget -r https://example.com/

# 递归下载，但不追溯至父目录
wget -r -np https://example.com/path/

# 只下载特定类型的文件
wget -r -A "*.pdf,*.doc" https://example.com/documents/

# 排除特定类型的文件
wget -r -R "*.jpg,*.png" https://example.com/

场景4：认证下载

复制代码

# HTTP 基础认证
wget --user=username --password=password https://example.com/protected/file.zip

# FTP 下载
wget --ftp-user=username --ftp-password=password ftp://example.com/file.zip

# 从输入读取密码（更安全）
wget --user=username --ask-password https://example.com/protected/file.zip

场景5：批量下载

复制代码

# 从文件读取 URL 列表进行批量下载
wget -i url_list.txt

# URL 列表文件内容示例：
# https://example.com/file1.zip
# https://example.com/file2.zip
# https://example.com/file3.zip

# 使用通配符批量下载
wget https://example.com/files/file{1..10}.zip

🔧 实用技巧和高级用法

1. 下载重定向文件

复制代码

# 有些链接需要跟随重定向
wget --trust-server-names https://example.com/download?file=123

# 或者手动指定重定向后的文件名
wget -O myfile.zip https://example.com/download?file=123

2. 处理 SSL/TLS 证书问题

复制代码

# 忽略 SSL 证书验证（不安全，仅测试用）
wget --no-check-certificate https://example.com/file.zip

# 指定自定义 CA 证书包
wget --ca-certificate=/path/to/cacert.pem https://example.com/file.zip

3. 伪装浏览器 User-Agent

复制代码

# 有些网站会检查 User-Agent
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" https://example.com/file.zip

4. 限制递归深度

复制代码

# 只递归 2 层深度
wget -r -l 2 https://example.com/

# 镜像整个站点（保持目录结构）
wget -mk -w 10 https://example.com/

5. 定时下载

复制代码

# 等待 10 秒后开始下载
wget -w 10 https://example.com/file.zip

# 在特定时间下载（结合 cron 使用）
# 在 crontab 中设置：0 2 * * * wget https://example.com/backup.zip

⚡ wget 替代方案

1. curl（功能更丰富的传输工具）

复制代码

# 下载文件
curl -O https://example.com/file.zip

# 下载并重命名
curl -o myfile.zip https://example.com/file.zip

# 支持更多协议，更好的错误处理

2. aria2（多连接下载，速度更快）

复制代码

# 多连接下载
aria2c -x 16 https://example.com/largefile.iso

# 分段下载，加速大文件下载
aria2c -s 10 https://example.com/largefile.iso

🛠️ 故障排除和常见问题

问题1：证书验证失败

复制代码

# 解决方案1：更新 CA 证书
sudo apt update && sudo apt install ca-certificates  # Debian/Ubuntu

# 解决方案2：临时忽略（仅测试环境）
wget --no-check-certificate https://example.com/file.zip

问题2：权限被拒绝

复制代码

# 检查目标目录权限
ls -ld /target/directory

# 使用 sudo 或选择用户有权限的目录
wget -P ~/downloads/ https://example.com/file.zip

问题3：连接超时

复制代码

# 增加超时时间和重试次数
wget --timeout=60 --tries=10 https://example.com/file.zip

# 检查网络连接
ping example.com

问题4：403 Forbidden 错误

复制代码

# 尝试伪装 User-Agent
wget --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" https://example.com/file.zip

📋 实用脚本示例

批量下载脚本

复制代码

#!/bin/bash
# batch_download.sh

BASE_URL="https://example.com/files/"
FILE_LIST=("file1.zip" "file2.zip" "file3.zip" "file4.zip")

DOWNLOAD_DIR="./downloads"
mkdir -p "$DOWNLOAD_DIR"

for file in "${FILE_LIST[@]}"; do
    echo "正在下载: $file"
    wget -P "$DOWNLOAD_DIR" "$BASE_URL$file"
    
    # 添加延迟，避免对服务器造成压力
    sleep 2
done

echo "所有文件下载完成！"

网站镜像脚本

复制代码

#!/bin/bash
# mirror_site.sh

SITE_URL="$1"
MIRROR_DIR="./mirror_$(date +%Y%m%d)"

if [ -z "$SITE_URL" ]; then
    echo "用法: $0 <网站URL>"
    exit 1
fi

echo "开始镜像网站: $SITE_URL"
wget --mirror \
     --page-requisites \
     --html-extension \
     --convert-links \
     --adjust-extension \
     --restrict-file-names=windows \
     --no-parent \
     --directory-prefix="$MIRROR_DIR" \
     "$SITE_URL"

echo "网站镜像完成: $MIRROR_DIR"

使用示例：

复制代码

chmod +x mirror_site.sh
./mirror_site.sh https://example.com

💡 最佳实践建议

大文件下载 ：使用 -c 参数支持断点续传
批量下载 ：使用 -i 参数从文件读取 URL 列表
网站镜像 ：使用 -mk 或 --mirror 参数
生产环境 ：避免使用 --no-check-certificate
礼貌下载 ：使用 --wait 和 --limit-rate 避免对服务器造成压力

记住：wget 非常适合自动化脚本和服务器环境下的文件下载任务！