Ftp目录整个下载

最近有个需求是要下载ftp接近十个T的数据,在调研过多个工具后发现还是lftp的mirror最省事

mirror参数

复制代码
Mirror specified source directory to local target directory. If target directory ends with a slash, the source base name is appended to target directory name.  Source
       and/or target can be URLs pointing to directories.

            -c,    --continue                continue a mirror job if possible
            -e,    --delete                  delete files not present at remote site

                   --delete-first            delete old files before transferring new ones
                   --depth-first             descend into subdirectories before transferring files
            -s,    --allow-suid              set suid/sgid bits according to remote site
                   --allow-chown             try to set owner and group on files
                   --ascii                   use ascii mode transfers (implies --ignore-size)
                   --ignore-time             ignore time when deciding whether to download
                   --ignore-size             ignore size when deciding whether to download
                   --only-missing            download only missing files
                   --only-existing           download only files already existing at target
            -n,    --only-newer              download only newer files (-c won't work)
                   --no-empty-dirs           don't create empty directories (implies --depth-first)
            -r,    --no-recursion            don't go to subdirectories
                   --no-symlinks             don't create symbolic links
            -p,    --no-perms                don't set file permissions
                   --no-umask                don't apply umask to file modes
            -R,    --reverse                 reverse mirror (put files)
            -L,    --dereference             download symbolic links as files
            -N,    --newer-than=SPEC         download only files newer than specified time
                   --on-change=CMD           execute the command if anything has been changed
                   --older-than=SPEC         download only files older than specified time
                   --size-range=RANGE        download only files with size in specified range
            -P,    --parallel[=N]            download N files in parallel
                   --use-pget[-n=N]          use pget to transfer every single file
                   --loop                    loop until no changes found
            -i RX, --include RX              include matching files
            -x RX, --exclude RX              exclude matching files
            -I GP, --include-glob GP         include matching files
            -X GP, --exclude-glob GP         exclude matching files
            -v,    --verbose[=level]         verbose operation
                   --log=FILE                write lftp commands being executed to FILE
                   --script=FILE             write lftp commands to FILE, but don't execute them
                   --just-print, --dry-run   same as --script=-
                   --use-cache               use cached directory listings
                   --Remove-source-files     remove files after transfer (use with caution)
            -a                               same as --allow-chown --allow-suid --no-umask

问题记录

1.虽然mirror支持多线程,我们也是针对三个大目录(其中很多子目录)下载,但是整个过程中list列表比较费时间,建议是直接mirror子目录 这样线程会多一些。

2.注意使用--only-missing参数,其他的参数比如only-newer 不太清楚原因但是会先删掉本地再下载一遍

复制代码
#!/bin/bash

# FTP服务器信息
FTP_HOST="xxxxx"
FTP_USER="xxxx"
FTP_PASS="xxxxxxx"

# 定义要同步的远程和本地目录对

declare -A DIR_MAP=(
["/fumulu/zimulu1"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu2"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu3"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu4"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu5"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu6"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu7"]="/data/0/bendi/fumulu/"
)
# 创建日志目录
LOG_DIR="sync_logs"
mkdir -p "$LOG_DIR"

sync_directory() {
    local remote_dir=$1
    local local_dir=$2
    
    # 生成日志文件名(将目录分隔符替换为下划线)
    local log_name=$(echo "${remote_dir}" | tr '/' '_')
    local log_file="$LOG_DIR/${log_name}sync.log"
    
    # 确保本地目录存在
    mkdir -p "$local_dir"
    
    echo "开始同步 $remote_dir 到 $local_dir..." | tee -a "$log_file"
    echo "同步开始时间: $(date)" >> "$log_file"

    # 使用lftp进行同步操作,添加 --size-only 参数
    temp_log=$(mktemp)
    lftp -c "open -u $FTP_USER,$FTP_PASS $FTP_HOST; \
             mirror --parallel=1000 --verbose --only-missing  $remote_dir $local_dir" 2>&1 | tee -a "$temp_log" "$log_file"
    
    # 检查文件下载失败的情况
    if grep -i "File not available" "$temp_log" > /dev/null; then
        echo "发现文件下载失败,记录到 shibai.txt..."
        # 提取并记录失败的文件信息
        grep -i "File not available" "$temp_log" | while read -r line; do
            # 提取完整的文件路径和文件名
            full_path=$(echo "$line" | grep -o "@.*" | cut -d' ' -f1)
            echo "$full_path" >> shibai.txt
        done
    fi
    
    echo "同步结束时间: $(date)" >> "$log_file"
    echo "----------------------------------------" >> "$log_file"
    
    # 清理临时日志文件
    rm -f "$temp_log"
}

# 同时启动所有同步任务
for remote_dir in "${!DIR_MAP[@]}"; do
    local_dir=${DIR_MAP[$remote_dir]}
    sync_directory "$remote_dir" "$local_dir" &
done

# 等待所有后台任务完成
wait

echo "所有同步任务已完成。"
相关推荐
为思念酝酿的痛3 小时前
POSIX信号量
linux·运维·服务器·后端
ccddsdsdfsdf3 小时前
DBeaver怎么链接mongoDB
数据库·mongodb
隔窗听雨眠3 小时前
Nginx网关响应慢排查手记
java·服务器·nginx
丷丩3 小时前
Postgresql基础实践教程(十一)各种Join
数据库·postgresql·join
星夜夏空994 小时前
FreeRTOS学习(4)——内存映射
数据库·学习·mongodb
人还是要有梦想的4 小时前
linux下用搜狗输入法,中英文切换
linux·运维·服务器
bush44 小时前
嵌入式linux学习记录二
linux·运维·学习
9分钟带帽4 小时前
linux_通过NFS挂载远程服务器的硬盘
linux·服务器
TheRouter4 小时前
AI Agent 记忆体系建设实战:短期、长期与工作记忆的工程实现
数据库·人工智能·oracle
Omics Pro5 小时前
首个!外源天然产物综合性代谢图谱
数据库·人工智能·算法·机器学习·r语言