Ftp目录整个下载

最近有个需求是要下载ftp接近十个T的数据，在调研过多个工具后发现还是lftp的mirror最省事

mirror参数

复制代码

Mirror specified source directory to local target directory. If target directory ends with a slash, the source base name is appended to target directory name.  Source
       and/or target can be URLs pointing to directories.

            -c,    --continue                continue a mirror job if possible
            -e,    --delete                  delete files not present at remote site

                   --delete-first            delete old files before transferring new ones
                   --depth-first             descend into subdirectories before transferring files
            -s,    --allow-suid              set suid/sgid bits according to remote site
                   --allow-chown             try to set owner and group on files
                   --ascii                   use ascii mode transfers (implies --ignore-size)
                   --ignore-time             ignore time when deciding whether to download
                   --ignore-size             ignore size when deciding whether to download
                   --only-missing            download only missing files
                   --only-existing           download only files already existing at target
            -n,    --only-newer              download only newer files (-c won't work)
                   --no-empty-dirs           don't create empty directories (implies --depth-first)
            -r,    --no-recursion            don't go to subdirectories
                   --no-symlinks             don't create symbolic links
            -p,    --no-perms                don't set file permissions
                   --no-umask                don't apply umask to file modes
            -R,    --reverse                 reverse mirror (put files)
            -L,    --dereference             download symbolic links as files
            -N,    --newer-than=SPEC         download only files newer than specified time
                   --on-change=CMD           execute the command if anything has been changed
                   --older-than=SPEC         download only files older than specified time
                   --size-range=RANGE        download only files with size in specified range
            -P,    --parallel[=N]            download N files in parallel
                   --use-pget[-n=N]          use pget to transfer every single file
                   --loop                    loop until no changes found
            -i RX, --include RX              include matching files
            -x RX, --exclude RX              exclude matching files
            -I GP, --include-glob GP         include matching files
            -X GP, --exclude-glob GP         exclude matching files
            -v,    --verbose[=level]         verbose operation
                   --log=FILE                write lftp commands being executed to FILE
                   --script=FILE             write lftp commands to FILE, but don't execute them
                   --just-print, --dry-run   same as --script=-
                   --use-cache               use cached directory listings
                   --Remove-source-files     remove files after transfer (use with caution)
            -a                               same as --allow-chown --allow-suid --no-umask

问题记录

1.虽然mirror支持多线程，我们也是针对三个大目录(其中很多子目录)下载，但是整个过程中list列表比较费时间，建议是直接mirror子目录这样线程会多一些。

2.注意使用--only-missing参数，其他的参数比如only-newer 不太清楚原因但是会先删掉本地再下载一遍

复制代码

#!/bin/bash

# FTP服务器信息
FTP_HOST="xxxxx"
FTP_USER="xxxx"
FTP_PASS="xxxxxxx"

# 定义要同步的远程和本地目录对

declare -A DIR_MAP=(
["/fumulu/zimulu1"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu2"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu3"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu4"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu5"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu6"]="/data/0/bendi/fumulu/"
["/fumulu/zimulu7"]="/data/0/bendi/fumulu/"
)
# 创建日志目录
LOG_DIR="sync_logs"
mkdir -p "$LOG_DIR"

sync_directory() {
    local remote_dir=$1
    local local_dir=$2
    
    # 生成日志文件名（将目录分隔符替换为下划线）
    local log_name=$(echo "${remote_dir}" | tr '/' '_')
    local log_file="$LOG_DIR/${log_name}sync.log"
    
    # 确保本地目录存在
    mkdir -p "$local_dir"
    
    echo "开始同步 $remote_dir 到 $local_dir..." | tee -a "$log_file"
    echo "同步开始时间: $(date)" >> "$log_file"

    # 使用lftp进行同步操作，添加 --size-only 参数
    temp_log=$(mktemp)
    lftp -c "open -u $FTP_USER,$FTP_PASS $FTP_HOST; \
             mirror --parallel=1000 --verbose --only-missing  $remote_dir $local_dir" 2>&1 | tee -a "$temp_log" "$log_file"
    
    # 检查文件下载失败的情况
    if grep -i "File not available" "$temp_log" > /dev/null; then
        echo "发现文件下载失败，记录到 shibai.txt..."
        # 提取并记录失败的文件信息
        grep -i "File not available" "$temp_log" | while read -r line; do
            # 提取完整的文件路径和文件名
            full_path=$(echo "$line" | grep -o "@.*" | cut -d' ' -f1)
            echo "$full_path" >> shibai.txt
        done
    fi
    
    echo "同步结束时间: $(date)" >> "$log_file"
    echo "----------------------------------------" >> "$log_file"
    
    # 清理临时日志文件
    rm -f "$temp_log"
}

# 同时启动所有同步任务
for remote_dir in "${!DIR_MAP[@]}"; do
    local_dir=${DIR_MAP[$remote_dir]}
    sync_directory "$remote_dir" "$local_dir" &
done

# 等待所有后台任务完成
wait

echo "所有同步任务已完成。"