TLPI 第4章练习：File I/O: The Universal I/O Model

笔记和练习博客总目录见：开始读TLPI。

练习 4-1

tee 命令读取其标准输入直到文件结束，将输入的副本写入标准输出和命令行参数指定的文件中。（当我们在第 44.7 节讨论 FIFO 时，会展示该命令的使用示例。）

使用 I/O 系统调用实现 tee。默认情况下，tee 会覆盖具有指定名称的任何现有文件。实现 --a 命令行选项（tee --a 文件），该选项使 tee 在文件已存在时将文本附加到文件末尾。（参见附录 B 中关于 getopt() 函数的描述，该函数可用于解析命令行选项。）

代码如下：

c 复制代码

// ex4-1.c
#include "tlpi_hdr.h"
#include <stdbool.h>
#include <fcntl.h>

#ifndef BUF_SIZE        /* Allow "cc -D" to override definition */
#define BUF_SIZE 1024
#endif

static bool teeAppend = false;
static bool haveFiles = false;

int
main(int argc, char *argv[])
{
    int opt;
    int fd, openFlags;
    mode_t filePerms;
    ssize_t numRead;
    char buf[BUF_SIZE];

    while ((opt = getopt(argc, argv, "a")) != -1) {
        switch (opt) {
        case 'a':
            teeAppend = true;
            break;
        default:
            usageErr("%s [-a] filename...\n", argv[0]);
        }
    }

    if (optind >= argc) {
        printf("tee to stdout, append is %s\n", teeAppend ? "true":"false");
    } else {
        haveFiles = true;
        printf("tee to below files, append is %s\n", teeAppend ? "true":"false");
        for (int i = optind; i < argc; i++) {
            printf("  [%d] %s\n", i - optind + 1, argv[i]);
        }

    }

    if (teeAppend)
        openFlags = O_CREAT | O_WRONLY | O_APPEND;
    else
        openFlags = O_CREAT | O_WRONLY | O_TRUNC;

    filePerms = 0644;

    fd = open(argv[optind], openFlags, filePerms);

    if (fd == -1)
        errExit("opening file %s", argv[2]);

    while ((numRead = read(STDIN_FILENO, buf, BUF_SIZE)) > 0) {
        if (write(STDOUT_FILENO, buf, numRead) != numRead)
            fatal("write() to stdout returned error or partial write occurred");

        if (write(fd, buf, numRead) != numRead)
            fatal("write() to file returned error or partial write occurred");
    }

    if (numRead == -1)
        errExit("read");

    if (close(fd) == -1)
        errExit("close file");

    exit(EXIT_SUCCESS);
}

以下为此练习的心得。

关于命令行参数的处理

tee命令是可以跟多个文件的，这些文件并非通过-f指定，而且可以放在非选项的任何位置。例如：

c 复制代码

tee file1 -a file2
tee -a file1 file2

getopt可以很好的处理这种情形，在getopt(1) 中：

Normally, no non-option parameters output is generated until all options and their arguments have

been generated. Then '--' is generated as a single parameter, and after it the non-option parameters

in the order they were found, each as a separate parameter.

关于bool数据类型

bool类型实在C99中定义的。使用bool类型需要头文件"stdbool.h"。但这个文件并不在目录/usr/include中。

对于 GCC 编译器来说，stdbool.h 并不是 Glibc（C 运行库）的一部分，而是编译器自身携带的头文件。因此，它不会被安装到 /usr/include，而是被安装在 GCC 自己的私有目录下。

bash 复制代码

$ echo '#include <stdbool.h>' | gcc -E -H - -o /dev/null 2>&1 | head -1
. /usr/lib/gcc/x86_64-redhat-linux/11/include/stdbool.h

然后false的值为0，true的值为1：

c 复制代码

#define bool    _Bool
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define true    ((_Bool)+1u)
#define false   ((_Bool)+0u)
#else
#define true    1
#define false   0
#endif

这样我们仍然可以方便的使用以下语法：

c 复制代码

bool teeAppend;

if (teeAppend)
	...

而非：

c 复制代码

if (teeAppend == true)

open的Canonical mode

对于终端而言，分为canonical 和 noncanonical mode，前者是默认的。

c 复制代码

$ stty -a|grep icanon
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt

根据termios(3)，在canonical模式下，输入是逐行提供的。当输入的某一行打出了行分隔符（NL、EOL、EOL2；或者在行首的 EOF）时，该行输入即可使用。除了 EOF 的情况之外，read(2) 返回的缓冲区中会包含该行分隔符。

NL和EOF好理解，默认情况下，EOL 和 EOL2 都没有被分配任何字符，所以你不能直接键入它们。

另外，行首和非行首的Ctrl-D是有区别的：

VEOF (004, EOT, Ctrl-D) End-of-file character (EOF). More precisely: this character causes the

pending tty buffer to be sent to the waiting user program without waiting for end-of-line. If

it is the first character of the line, the read(2) in the user program returns 0, which signi‐

fies end-of-file. Recognized when ICANON is set, and then not passed as input.

open的flags

如果指定了-a，则输出追加到文件，否则覆盖文件。

c 复制代码

    if (teeAppend)
        openFlags = O_CREAT | O_WRONLY | O_APPEND;
    else
        openFlags = O_CREAT | O_WRONLY | O_TRUNC;

在追加模式下，O_CREAT | O_WRONLY | O_APPEND|O_TRUNC会导致文件被清空，这是不符合要求的。

练习 4-1

编写一个类似 cp 的程序，当用于复制包含空洞（即一系列空字节）的普通文件时，也会在目标文件中创建相应的空洞。

在lseek(2)中有关于hole的描述：

a hole is a sequence of zeros that (normally) has not been allocated in the underlying file storage.

lseek() allows the file offset to be set beyond the end of the file (but this does not change the

size of the file). If data is later written at this point, subsequent reads of the data in the gap

(a "hole") return null bytes ('\0') until data is actually written into the gap.

连续多少个0是没说的，但底层存储是没有分配的。

如果从文件中读取到连续的0序列，也不能说他肯定是hole，因为null byte（'\0'）也是可以写到文件中的。

这里我们偷个懒，直接利用lseek的SEEK_DATA和SEEK_HOLE选项。我的文件系统为ext4，此选项恰好在ext4中支持。

这两个选项还要求定义 _GNU_SOURCE feature test macro 。

先创建稀疏文件：

bash 复制代码

$ echo 1234567890 > sparse_file
$ truncate -s 10M sparse_file
$ ls -lh sparse_file
-rw-r--r--. 1 vagrant vagrant 10M Apr  2 02:22 sparse_file
$ du -sh sparse_file
4.0K    sparse_file
$ echo 'abcdefgh' >> sparse_file
$ ls -lh sparse_file
-rw-r--r--. 1 vagrant vagrant 11M Apr  2 02:23 sparse_file
$ du -sh sparse_file
68K     sparse_file
$ truncate -s 15M sparse_file
$ du -sh sparse_file
68K     sparse_file
$ ls -lh sparse_file
-rw-r--r--. 1 vagrant vagrant 15M Apr  2 02:25 sparse_file
$ echo 'ABCDEFG' >> sparse_file
$ ls -lh sparse_file
-rw-r--r--. 1 vagrant vagrant 16M Apr  2 02:25 sparse_file
$ du -sh sparse_file
132K    sparse_file

查看文件分配：

bash 复制代码

$ filefrag -e sparse_file
Filesystem type is: 58465342
File size of sparse_file is 15728648 (3841 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:      99720..     99720:      1:
   1:     2560..    2560:     107201..    107201:      1:     102280:
   2:     2561..    2575:     107202..    107216:     15:             unwritten
   3:     3840..    3840:     111431..    111431:      1:     108481: last,eof
sparse_file: 3 extents found
$ tail sparse_file
1234567890
abcdefgh
ABCDEFG

这个文件的数据和空洞的分布如下：

复制代码

逻辑地址范围（字节）        内容          说明
─────────────────────────────────────────────────────────────
0                          ███          数据
4096                       ───          空洞（Hole 1）
10485760                   ███          数据
10551296                   ───          空洞（Hole 2）
15728640                   ██           数据
15728648                   ───          空洞（文件末尾之后）

再创建一个一开始就是空洞的文件：

bash 复制代码

truncate -s 1M sparse_file_1
echo 'abcdefg'>>sparse_file_1
truncate -s 2M sparse_file_1
echo '1234567'>>sparse_file_1
truncate -s 4M sparse_file_1

这个文件的数据和空洞的分布如下：

复制代码

逻辑地址范围（字节）        内容          说明
─────────────────────────────────────────────────────────────
0                          ───          空洞（Hole 1）
1048576                    ███          数据
1052672                    ───          空洞（Hole 2）
2097152                    ███          数据
2101248                    ───          空洞（Hole 3，到文件末尾）

程序的代码如下：

c 复制代码

// ex4-2.c
#define _GNU_SOURCE
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include "tlpi_hdr.h"

#ifndef BUF_SIZE        /* Allow "cc -D" to override definition */
#define BUF_SIZE 1024
#endif

int
main(int argc, char *argv[])
{
    int fd_in, fd_out, openFlags;
    mode_t filePerms;
    off_t cur, hole_start, data_start, data_end, file_end;
    ssize_t len, bytes_copied;

    if (argc != 3 || strcmp(argv[1], "--help") == 0)
        usageErr("%s old-file new-file\n", argv[0]);

    /* Open input and output files */

    fd_in = open(argv[1], O_RDONLY);
    if (fd_in == -1)
        errExit("opening file %s", argv[1]);

    openFlags = O_CREAT | O_WRONLY | O_TRUNC;
    filePerms = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP |
                S_IROTH | S_IWOTH;      /* rw-rw-rw- */
    fd_out = open(argv[2], openFlags, filePerms);
    if (fd_out == -1)
        errExit("opening file %s", argv[2]);

    /* Transfer data until we encounter end of input or an error */

    hole_start=cur=data_start=data_end=0;
    file_end=lseek(fd_in, 0, SEEK_END);
    while (hole_start != file_end) {
        data_start=lseek(fd_in, hole_start, SEEK_DATA);

        // there is no data left, just hole
        if (data_start == -1 && errno == ENXIO)
            break;

        hole_start=lseek(fd_in, data_start, SEEK_HOLE);

        data_end = hole_start -1;

        printf("Copy data from %ld to %ld\n", data_start, data_end);

        lseek(fd_in, data_start, SEEK_SET);
        lseek(fd_out, data_start, SEEK_SET);

        len = data_end - data_start + 1;
        bytes_copied = 0;

        while (bytes_copied != len) {
            bytes_copied += copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
        }

    }


    if (close(fd_in) == -1)
        errExit("close input");
    if (close(fd_out) == -1)
        errExit("close output");

    exit(EXIT_SUCCESS);
}

TLPI 第4章 练习：File I/O: The Universal I/O Model

练习 4-1

练习 4-1

TLPI 第4章练习：File I/O: The Universal I/O Model