copy_file_range系统调用及示例

这次我们介绍 copy_file_rangeLinux 系统编程中的重要函数

1. 函数介绍

copy_file_range 是一个相对较新的 Linux 系统调用（内核版本 >= 4.5），专门用于在两个文件描述符 之间高效地复制数据。
你可以把它想象成一个优化版的 "文件剪切板" 功能：

1. 你不需要先 read 把数据从源文件拿到用户空间的缓冲区。
2. 也不需要再 write 把数据从用户空间缓冲区放到目标文件。
3. 而是直接告诉内核："嘿，内核，帮我把数据从文件 A 的这里，复制到文件 B 的那里。"

内核会尽可能地在内核空间内部 完成这个操作，利用各种优化手段（如copy-on-write (COW)、 reflink、缓冲区到缓冲区的直接传输等），避免了在用户空间和内核空间之间来回复制数据（即避免了 用户态和内核态的上下文切换以及数据拷贝**的开销），从而极大地提高了文件复制的效率。
这在复制大文件、备份、文件系统内部移动/复制（如果文件系统支持）等场景下尤其有用。

2. 函数原型

c 复制代码

#define _GNU_SOURCE // 必须定义以使用 copy_file_range
#include <unistd.h> // ssize_t
#include <fcntl.h>  // 定义了相关的标志 (如果需要)

ssize_t copy_file_range(int fd_in, off_t *off_in,
                        int fd_out, off_t *off_out,
                        size_t len, unsigned int flags);

3. 功能

高效复制 : 在内核内部将数据从一个文件描述符 fd_in 复制到另一个文件描述符 fd_out。
指定范围 : 可以指定源文件的起始偏移量 off_in、目标文件的起始偏移量 off_out 以及要复制的字节数 len。
灵活偏移 : 通过 off_in 和 off_out 指针，可以控制是使用文件的当前偏移量还是指定绝对偏移量。
潜在优化 : 内核可能会利用文件系统特性（如 reflink）来实现零拷贝 或写时复制，使得复制操作极其快速。

4. 参数

int fd_in : 源文件 的文件描述符。这个文件描述符必须是可读的。
off_t *off_in : 指向一个 off_t 类型变量的指针，该变量指定在源文件 中开始复制的偏移量 。
- 如果 off_in 是 NULL : 复制从源文件的当前偏移量 （由 lseek(fd_in, 0, SEEK_CUR) 决定）开始。复制操作会更新源文件的当前偏移量（增加已复制的字节数）。
- 如果 off_in 非 NULL : 复制从 *off_in 指定的绝对偏移量 开始。复制操作不会更新 源文件的当前偏移量，但会更新 *off_in 的值（增加已复制的字节数）。
int fd_out : 目标文件 的文件描述符。这个文件描述符必须是可写的。
off_t *off_out : 指向一个 off_t 类型变量的指针，该变量指定在目标文件 中开始写入的偏移量 。
- 如果 off_out 是 NULL : 数据写入到目标文件的当前偏移量 。复制操作会更新目标文件的当前偏移量。
- 如果 off_out 非 NULL : 数据写入到 *off_out 指定的绝对偏移量 。复制操作不会更新 目标文件的当前偏移量，但会更新 *off_out 的值。
size_t len : 请求复制的最大字节数。
unsigned int flags : 控制复制行为的标志。在 Linux 中，目前这个参数必须设置为 0。保留供将来扩展。

5. 返回值

成功时 : 返回实际复制的字节数 （一个非负值）。这个数可能小于请求的 len（例如，在读取时遇到文件末尾）。
失败时 : 返回 -1，并设置全局变量 errno 来指示具体的错误原因（例如 EBADF 文件描述符无效或权限不足，EINVAL 参数无效，EXDEV fd_in 和 fd_out 不在同一个文件系统挂载点上且文件系统不支持跨挂载点复制，ENOMEM 内存不足等）。

6. 相似函数，或关联函数

sendfile : 用于在文件描述符之间（通常是文件到套接字）高效传输数据，是 copy_file_range 的前身和灵感来源之一。sendfile 通常不支持两个普通文件之间的复制（在旧内核上）。
splice: 用于在两个可 pipe 的文件描述符之间移动数据，也是一种零拷贝技术。
传统的 read/write 循环: 最基础的文件复制方法，效率较低，因为涉及多次用户态/内核态切换和数据拷贝。
mmap + memcpy : 另一种零拷贝思路，但使用起来更复杂，且不一定比 copy_file_range 更快。

7. 示例代码

示例 1：基本使用 `copy_file_range` 复制文件

这个例子演示了如何使用 copy_file_range 将一个文件的内容复制到另一个文件。

c 复制代码

// copy_file_range_basic.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>

#define BUFFER_SIZE 1024

int main(int argc, char *argv[]) {
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <source_file> <destination_file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *src_filename = argv[1];
    const char *dst_filename = argv[2];
    int src_fd, dst_fd;
    struct stat src_stat;
    off_t offset_in, offset_out;
    ssize_t bytes_copied, total_bytes_copied = 0;
    size_t remaining;

    // 1. 打开源文件 (只读)
    src_fd = open(src_filename, O_RDONLY);
    if (src_fd == -1) {
        perror("Error opening source file");
        exit(EXIT_FAILURE);
    }

    // 2. 获取源文件大小
    if (fstat(src_fd, &src_stat) == -1) {
        perror("Error getting source file stats");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    // 3. 创建/打开目标文件 (写入、创建、截断)
    dst_fd = open(dst_filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd == -1) {
        perror("Error opening/creating destination file");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    printf("Copying '%s' to '%s' using copy_file_range()...\n", src_filename, dst_filename);
    printf("Source file size: %ld bytes\n", (long)src_stat.st_size);

    // 4. 使用 copy_file_range 进行复制
    // 初始化偏移量为 0
    offset_in = 0;
    offset_out = 0;
    remaining = src_stat.st_size;

    while (remaining > 0) {
        // 尝试复制剩余的所有字节，或者一个大块
        // copy_file_range 可能不会一次复制完所有请求的字节
        size_t to_copy = (remaining > 0x7ffff000) ? 0x7ffff000 : remaining; // 限制单次调用大小

        bytes_copied = copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0);

        if (bytes_copied == -1) {
            perror("Error in copy_file_range");
            // 尝试清理
            close(src_fd);
            close(dst_fd);
            exit(EXIT_FAILURE);
        }

        if (bytes_copied == 0) {
            // 可能已经到达源文件末尾
            fprintf(stderr, "Warning: copy_file_range returned 0 before copying all data.\n");
            break;
        }

        total_bytes_copied += bytes_copied;
        remaining -= bytes_copied;
        printf("  Copied %zd bytes (total: %zd)\n", bytes_copied, total_bytes_copied);
    }

    printf("Copy completed. Total bytes copied: %zd\n", total_bytes_copied);

    // 5. 关闭文件描述符
    if (close(src_fd) == -1) {
        perror("Error closing source file");
    }
    if (close(dst_fd) == -1) {
        perror("Error closing destination file");
    }

    return 0;
}

如何测试:

bash 复制代码

# 创建一个大一点的测试文件
dd if=/dev/urandom of=large_source_file.txt bs=1M count=10 # 创建 10MB 随机数据文件
# 或者简单点
echo "This is the content of the source file." > small_source_file.txt

# 编译并运行
gcc -o copy_file_range_basic copy_file_range_basic.c
./copy_file_range_basic small_source_file.txt copied_file.txt

# 检查结果
cat copied_file.txt
ls -l small_source_file.txt copied_file.txt

代码解释:

检查命令行参数。
以只读模式打开源文件 src_fd。
使用 fstat 获取源文件的大小 src_stat.st_size。
以写入、创建、截断模式打开（或创建）目标文件 dst_fd。
关键步骤 : 进入 while 循环进行复制。
- 初始化 offset_in 和 offset_out 为 0。
- remaining 变量跟踪还剩多少字节需要复制。
- 在循环中，调用 copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0)。
  - src_fd, dst_fd: 源和目标文件描述符。
  - &offset_in, &offset_out: 传递偏移量的指针。这使得 copy_file_range 在复制后自动更新这两个变量，指向下一次复制的起始位置。
  - to_copy: 本次尝试复制的字节数（做了大小限制）。
  - 0: flags 参数，必须为 0。
- 检查返回值 bytes_copied。
- 如果成功（> 0），则更新 total_bytes_copied 和 remaining。
- 如果返回 0，可能表示源文件已到末尾。
- 如果返回 -1，则处理错误。
循环直到 remaining 为 0 或出错。
打印总复制字节数。
关闭文件描述符。

示例 2：对比 `copy_file_range` 与传统 `read`/`write` 循环

这个例子通过复制同一个大文件，对比 copy_file_range 和传统的 read/write 循环在性能上的差异。

c 复制代码

// copy_file_range_vs_read_write.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <time.h>

#define BUFFER_SIZE (1024 * 1024) // 1MB buffer

// 使用 read/write 循环复制文件
ssize_t copy_with_read_write(int src_fd, int dst_fd) {
    char *buffer = malloc(BUFFER_SIZE);
    if (!buffer) {
        perror("malloc buffer");
        return -1;
    }

    ssize_t total = 0;
    ssize_t nread, nwritten;

    while ((nread = read(src_fd, buffer, BUFFER_SIZE)) > 0) {
        char *buf_ptr = buffer;
        ssize_t nleft = nread;

        while (nleft > 0) {
            nwritten = write(dst_fd, buf_ptr, nleft);
            if (nwritten <= 0) {
                if (nwritten == -1 && errno == EINTR) {
                    continue; // Interrupted, retry
                }
                perror("write");
                free(buffer);
                return -1;
            }
            nleft -= nwritten;
            buf_ptr += nwritten;
        }
        total += nread;
    }

    if (nread == -1) {
        perror("read");
        free(buffer);
        return -1;
    }

    free(buffer);
    return total;
}

// 使用 copy_file_range 复制文件
ssize_t copy_with_copy_file_range(int src_fd, int dst_fd, size_t file_size) {
    off_t offset_in = 0;
    off_t offset_out = 0;
    size_t remaining = file_size;
    ssize_t bytes_copied, total = 0;

    while (remaining > 0) {
        size_t to_copy = (remaining > 0x7ffff000) ? 0x7ffff000 : remaining;
        bytes_copied = copy_file_range(src_fd, &offset_in, dst_fd, &offset_out, to_copy, 0);
        if (bytes_copied == -1) {
            perror("copy_file_range");
            return -1;
        }
        if (bytes_copied == 0) {
            break;
        }
        total += bytes_copied;
        remaining -= bytes_copied;
    }
    return total;
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <source_file>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    const char *src_filename = argv[1];
    int src_fd;
    struct stat src_stat;
    clock_t start, end;
    double cpu_time_used;

    // 打开源文件
    src_fd = open(src_filename, O_RDONLY);
    if (src_fd == -1) {
        perror("open source file");
        exit(EXIT_FAILURE);
    }

    if (fstat(src_fd, &src_stat) == -1) {
        perror("fstat source file");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    printf("Source file: %s\n", src_filename);
    printf("File size: %ld bytes (%.2f MB)\n", (long)src_stat.st_size, (double)src_stat.st_size / (1024*1024));


    // --- 测试 1: copy_file_range ---
    printf("\n--- Testing copy_file_range ---\n");
    char dst_filename1[] = "copy_file_range_dst.tmp";
    int dst_fd1 = open(dst_filename1, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd1 == -1) {
        perror("open destination file 1");
        close(src_fd);
        exit(EXIT_FAILURE);
    }

    // 重置源文件偏移量
    if (lseek(src_fd, 0, SEEK_SET) == -1) {
        perror("lseek src_fd");
        close(src_fd);
        close(dst_fd1);
        exit(EXIT_FAILURE);
    }

    start = clock();
    ssize_t copied1 = copy_with_copy_file_range(src_fd, dst_fd1, src_stat.st_size);
    end = clock();

    if (copied1 == -1) {
        fprintf(stderr, "copy_file_range failed.\n");
    } else {
        cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
        printf("  Bytes copied: %zd\n", copied1);
        printf("  Time taken: %f seconds\n", cpu_time_used);
    }

    close(dst_fd1);


    // --- 测试 2: read/write loop ---
    printf("\n--- Testing read/write loop ---\n");
    char dst_filename2[] = "read_write_dst.tmp";
    int dst_fd2 = open(dst_filename2, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (dst_fd2 == -1) {
        perror("open destination file 2");
        close(src_fd);
        // Cleanup
        unlink(dst_filename1);
        exit(EXIT_FAILURE);
    }

    // 重置源文件偏移量
    if (lseek(src_fd, 0, SEEK_SET) == -1) {
        perror("lseek src_fd");
        close(src_fd);
        close(dst_fd2);
        unlink(dst_filename1);
        exit(EXIT_FAILURE);
    }

    start = clock();
    ssize_t copied2 = copy_with_read_write(src_fd, dst_fd2);
    end = clock();

    if (copied2 == -1) {
        fprintf(stderr, "read/write loop failed.\n");
    } else {
        cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
        printf("  Bytes copied: %zd\n", copied2);
        printf("  Time taken: %f seconds\n", cpu_time_used);
    }

    close(dst_fd2);
    close(src_fd);

    // --- 清理 ---
    unlink(dst_filename1);
    unlink(dst_filename2);

    printf("\nPerformance comparison completed.\n");
    if (copied1 != -1 && copied2 != -1) {
        printf("copy_file_range is expected to be faster, especially for large files on same filesystem.\n");
    }
    return 0;
}

如何测试:

bash 复制代码

# 创建一个较大的测试文件
dd if=/dev/zero of=test_large_file.txt bs=1M count=100 # 100MB 文件

# 编译并运行
gcc -o copy_file_range_vs_read_write copy_file_range_vs_read_write.c
./copy_file_range_vs_read_write test_large_file.txt

代码解释:

定义了两个函数：copy_with_read_write 和 copy_with_copy_file_range，分别实现两种复制方法。
copy_with_read_write:
- 分配一个 1MB 的缓冲区。
- 使用 while 循环 read 数据到缓冲区。
- 内层 while 循环确保 write 将整个缓冲区的内容都写入目标文件（处理 write 可能部分写入的情况）。
- 累计复制的总字节数。
copy_with_copy_file_range:
- 使用 off_t 变量 offset_in 和 offset_out 来跟踪源和目标的偏移量。
- 使用 while 循环调用 copy_file_range，直到复制完所有数据。
main 函数：
- 获取源文件大小。
- 依次测试两种方法。
- 使用 clock() 来测量 CPU 时间（注意：clock 测量的是 CPU 时间，不是墙上时间，但对于比较相对性能还是有用的）。
- 打印结果并清理临时文件。

重要提示与注意事项:

1. 内核版本 : 需要 Linux 内核 4.5 或更高版本。
2. glibc 版本 : 需要 glibc 2.27 或更高版本才能直接使用 copy_file_range 函数。旧版本可能需要使用 syscall。
3. 性能优势 : copy_file_range 的主要优势在于其潜在的内核内部优化 。如果源文件和目标文件在同一个支持 reflink 的文件系统（如 Btrfs, XFS, OCFS2）上，copy_file_range 可能会瞬间创建一个写时复制 （COW）的副本，速度极快。即使不支持 reflink，它也通常比 read/write 循环更高效，因为它减少了用户态和内核态之间的数据拷贝。
4. flags 参数 : 目前必须为 0。未来可能会添加新标志。
5. 跨文件系统 : copy_file_range 可能不支持在不同挂载点的文件系统之间复制（返回 EXDEV 错误），尽管某些组合可能支持。
6. 偏移量指针 : 理解 off_in 和 off_out 指针的行为（NULL vs 非 NULL）非常重要。使用指针允许在不修改文件自身偏移量的情况下进行复制，非常适合多线程环境或需要精确控制偏移的场景。
7. 返回值 : 像许多 I/O 函数一样，copy_file_range 可能不会一次完成所有请求的字节复制，需要循环处理。
8. 错误处理 : 始终检查返回值和 errno。EBADF, EINVAL, EXDEV, ENOMEM 是可能遇到的错误。

总结:

copy_file_range 是一个强大且高效的系统调用，用于在文件描述符之间复制数据。它通过将复制操作完全下放到内核来避免用户空间的开销，并可能利用底层文件系统的高级特性（如 reflink）来实现极致的性能。对于需要在 Linux 系统上进行高性能文件复制的应用程序来说，copy_file_range 是一个值得优先考虑的选择。

copy_file_range系统调用及示例

1. 函数介绍

2. 函数原型

3. 功能

4. 参数

5. 返回值

6. 相似函数，或关联函数

7. 示例代码

示例 1：基本使用 copy_file_range 复制文件

示例 2：对比 copy_file_range 与传统 read/write 循环

重要提示与注意事项:

示例 1：基本使用 `copy_file_range` 复制文件

示例 2：对比 `copy_file_range` 与传统 `read`/`write` 循环