【Linux】基础IO

一、文件本质与IO核心认知

1.1 重新理解"文件"

核心定义

狭义文件：磁盘等永久存储介质上的实体，本质是对外设的输入/输出操作（IO）。
广义文件：Linux系统中"一切皆文件"，键盘、显示器、网卡、进程等都被抽象为文件，可通过统一接口操作。
本质构成：文件 = 属性（元数据，如权限、大小、创建时间） + 内容（实际存储的数据）。

初学者关键疑问

0KB空文件为什么占用磁盘空间？

空文件虽无内容，但需存储元数据（如文件名、权限、inode编号等），这些信息占用磁盘inode节点空间，因此并非完全不占空间。
进程如何找到要操作的文件？

进程启动时会记录当前工作目录（通过/proc/[PID]/cwd符号链接查看），若操作文件时不指定路径，系统会默认在当前工作目录中查找。

示例验证：
bash 复制代码
```
# 查看进程当前工作目录
ls -l /proc/[进程PID]/cwd
# 查看进程对应的可执行文件路径
ls -l /proc/[进程PID]/exe
```

1.2 文件操作的核心分类

所有文件操作本质可分为两类：

内容操作：读写文件中的实际数据（如read/write）。
属性操作：修改文件元数据（如chmod修改权限、chown修改所有者）。

1.3 系统视角：IO操作的底层逻辑

文件的管理者是操作系统，而非应用程序或库函数。
应用程序的IO操作（如C库fwrite）最终都会通过操作系统提供的系统调用接口 （如write）实现，库函数仅为封装层，方便开发者使用。

二、回顾C标准库IO接口

2.1 核心接口实战与常见坑

C语言提供了一套标准IO库函数（stdio.h），核心接口包括fopen、fread、fwrite、fclose等，适合初学者入门，但需注意细节陷阱。

2.1.1 文件打开与路径问题

c 复制代码

#include <stdio.h>
int main() {
    // 以写模式打开文件，默认在进程当前工作目录创建
    FILE *fp = fopen("myfile", "w");
    if (!fp) {
        printf("fopen error!\n");
        return 1;
    }
    fclose(fp);
    return 0;
}

初学者疑问：如何确认文件创建路径？

进程的当前工作目录由启动时的位置决定，而非可执行文件所在目录。
可通过getcwd函数获取当前工作目录，或通过/proc/[PID]/cwd查看。

2.1.2 文件读写实战

写文件示例

c 复制代码

#include <stdio.h>
#include <string.h>
int main() {
    FILE *fp = fopen("myfile", "w");
    if (!fp) {
        printf("fopen error!\n");
        return 1;
    }
    const char *msg = "hello bit!\n";
    int count = 5;
    // 循环写入5次数据
    while (count--) {
        // 参数：数据地址、单次读写大小、次数、文件指针
        fwrite(msg, strlen(msg), 1, fp);
    }
    fclose(fp);
    return 0;
}

读文件示例（模拟简易`cat`命令）

c 复制代码

#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]) {
    if (argc != 2) {
        printf("用法：%s <文件名>\n", argv[0]);
        return 1;
    }
    FILE *fp = fopen(argv[1], "r");
    if (!fp) {
        printf("fopen error!\n");
        return 1;
    }
    char buf[1024];
    while (1) {
        // 读取数据到缓冲区，返回实际读取字节数
        size_t s = fread(buf, 1, sizeof(buf), fp);
        if (s > 0) {
            buf[s] = '\0';
            printf("%s", buf);
        }
        // 检测文件结束（feof），避免死循环
        if (feof(fp)) {
            break;
        }
    }
    fclose(fp);
    return 0;
}

关键坑点：`feof`的正确使用

不可用fread返回0直接判断文件结束，因为fread返回0可能是读取失败（如权限问题）。
需先用fread读取，再用feof判断是否为"正常文件结束"，避免误判。

2.2 标准输入输出流：stdin/stdout/stderr

C语言默认打开3个标准流，类型均为FILE*：

stdin：标准输入，对应键盘（文件描述符0）。
stdout：标准输出，对应显示器（文件描述符1）。
stderr：标准错误，对应显示器（文件描述符2）。

多方式输出到显示器示例

c 复制代码

#include <stdio.h>
#include <string.h>
int main() {
    const char *msg1 = "hello printf\n";
    const char *msg2 = "hello fwrite\n";
    const char *msg3 = "hello fprintf\n";
    
    printf("%s", msg1);                  // 标准输出宏
    fwrite(msg2, strlen(msg2), 1, stdout); // 二进制写
    fprintf(stdout, "%s", msg3);         // 格式化输出到stdout
    return 0;
}

初学者疑问：三者的区别是什么？

stdout是行缓冲，stderr是无缓冲（错误信息立即输出），stdin是行缓冲。
stdout输出可能被缓存，stderr输出直接刷新，适合打印紧急错误信息。

三、系统调用IO接口：底层操作的真相

C库IO函数是对系统调用的封装，若想深入理解IO机制，必须掌握操作系统提供的底层接口。

3.1 核心系统调用接口实战

系统调用IO接口包括open、read、write、close、lseek等，需包含<fcntl.h>、<unistd.h>等头文件。

3.1.1 打开文件：open函数

c 复制代码

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    // 清除文件权限掩码（确保创建文件权限为0644）
    umask(0);
    // 打开文件：只写模式，文件不存在则创建，权限0644
    int fd = open("myfile", O_WRONLY | O_CREAT, 0644);
    if (fd < 0) {
        // perror打印系统调用错误信息
        perror("open");
        return 1;
    }
    close(fd);
    return 0;
}

关键参数解析

pathname：文件路径（绝对路径或相对路径）。
flags：打开模式（必选其一：O_RDONLY只读、O_WRONLY只写、O_RDWR读写；可选：O_CREAT创建、O_APPEND追加、O_TRUNC清空）。
mode：文件权限（仅O_CREAT时有效，如0644表示所有者读写、组和其他只读）。

初学者疑问：为什么需要`umask`？

umask是进程的权限掩码，默认值为0022（八进制），创建文件时实际权限 = mode & ~umask。
若不设置umask(0)，0644 & ~0022 = 0644 - 0022 = 0622，最终权限会不符合预期。

3.1.2 读写文件：read/write函数

写文件示例

c 复制代码

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    umask(0);
    int fd = open("myfile", O_WRONLY | O_CREAT, 0644);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    const char *msg = "hello bit!\n";
    int len = strlen(msg);
    int count = 5;
    // 循环写入5次数据
    while (count--) {
        // 参数：文件描述符、数据地址、长度；返回实际写入字节数
        write(fd, msg, len);
    }
    close(fd);
    return 0;
}

读文件示例

c 复制代码

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    // 只读模式打开文件
    int fd = open("myfile", O_RDONLY);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    const char *msg = "hello bit!\n";
    int len = strlen(msg);
    char buf[1024];
    while (1) {
        // 读取数据到缓冲区，返回实际读取字节数
        ssize_t s = read(fd, buf, len);
        if (s > 0) {
            // 打印读取到的数据
            printf("%s", buf);
        } else {
            // 读取到0（文件结束）或-1（错误），退出循环
            break;
        }
    }
    close(fd);
    return 0;
}

3.2 系统调用与库函数的关系

库函数（如fopen、fwrite）是对系统调用（如open、write）的封装，目的是简化开发（如提供缓冲区、格式化操作）。
系统调用是操作系统暴露的底层接口，是IO操作的最终实现方式，所有语言的IO操作最终都依赖系统调用。

四、文件描述符（fd）

4.1 什么是文件描述符？

open函数的返回值就是文件描述符，本质是一个非负整数（小整数），是进程与打开文件之间的关联索引。

4.1.1 默认打开的文件描述符

Linux进程默认打开3个文件描述符：

0：标准输入（stdin）→ 对应键盘。
1：标准输出（stdout）→ 对应显示器。
2：标准错误（stderr）→ 对应显示器。

验证示例

c 复制代码

#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main() {
    char buf[1024];
    // 从标准输入（fd=0）读取数据
    ssize_t s = read(0, buf, sizeof(buf));
    if (s > 0) {
        buf[s] = 0;
        // 写入标准输出（fd=1）和标准错误（fd=2）
        write(1, buf, strlen(buf));
        write(2, buf, strlen(buf));
    }
    return 0;
}

4.2 文件描述符的分配规则

核心规则：最小未使用原则

系统会从进程的文件描述符数组（fd_array）中，选择当前未使用的最小整数作为新的文件描述符。

实战验证

c 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    // 关闭默认的fd=0（标准输入）
    close(0);
    // 打开新文件，新fd会是0（最小未使用）
    int fd = open("myfile", O_RDONLY);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    printf("新文件描述符：%d\n", fd); // 输出：0
    close(fd);
    return 0;
}

初学者疑问：文件描述符的底层存储逻辑是什么？

每个进程的task_struct（PCB）中包含一个files_struct指针，指向文件描述符表。
文件描述符表的核心是fd_array数组，数组下标就是文件描述符，元素是指向内核file结构体的指针（file结构体存储文件元数据和操作方法）。

4.3 文件描述符与FILE结构体的关系

C库中的FILE结构体是对文件描述符的封装，内部包含：

_fileno：对应的文件描述符（核心成员）。
缓冲区：用户级缓冲区（提升IO效率）。
刷新模式、指针位置等控制信息。

结论：`FILE*`本质是对`fd`的封装，加上用户级缓冲区。

五、重定向

5.1 重定向的本质

通过修改文件描述符对应的file结构体指针，改变IO操作的目标设备/文件。例如：将标准输出（fd=1）从显示器重定向到文件。

基础重定向示例（关闭fd=1实现）

c 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main() {
    // 关闭标准输出（fd=1）
    close(1);
    // 打开文件，新fd=1（最小未使用）
    int fd = open("myfile", O_WRONLY | O_CREAT, 0644);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    // printf默认写入stdout（fd=1），此时已重定向到文件
    printf("fd: %d\n", fd); // 内容写入myfile，而非显示器
    fflush(stdout); // 强制刷新缓冲区
    close(fd);
    return 0;
}

初学者疑问：为什么需要`fflush`？

重定向到文件后，stdout的缓冲区模式从"行缓冲"变为"全缓冲"，数据需填满缓冲区才会刷新到文件。
fflush(stdout)可强制刷新缓冲区，确保数据立即写入文件。

5.2 高效重定向：dup2系统调用

dup2函数可直接复制文件描述符，实现重定向，无需手动关闭默认fd，更简洁高效。

函数原型

c 复制代码

#include <unistd.h>
int dup2(int oldfd, int newfd);

功能：将oldfd复制到newfd，若newfd已打开则先关闭，最终oldfd和newfd指向同一个文件。

实战示例（标准输出重定向到文件）

c 复制代码

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>

int main() {
    // 打开日志文件（创建+读写）
    int fd = open("./log.txt", O_CREAT | O_RDWR, 0644);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    // 将fd复制到1（标准输出），实现重定向
    dup2(fd, 1);
    // 后续printf输出都会写入log.txt
    while (1) {
        char buf[1024] = {0};
        ssize_t s = read(0, buf, sizeof(buf) - 1);
        if (s < 0) {
            perror("read");
            break;
        }
        printf("%s", buf);
        fflush(stdout);
    }
    close(fd);
    return 0;
}

5.3 增强版微型Shell：添加重定向功能

基于之前实现的微型Shell，新增>（输出重定向）、>>（追加重定向）、<（输入重定向）功能，核心步骤：

解析命令行中的重定向符号（>, >>, <）和目标文件名。
子进程中通过dup2完成重定向。
执行命令（程序替换不影响已完成的重定向）。

核心代码实现（关键部分）

c 复制代码

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ctype.h>
using namespace std;

// 重定向类型枚举
#define NONE_REDIR 0
#define INPUT_REDIR 1   // < 输入重定向
#define OUTPUT_REDIR 2  // > 输出重定向
#define APPEND_REDIR 3  // >> 追加重定向

int g_redir = NONE_REDIR;  // 当前重定向类型
char *g_filename = nullptr; // 重定向目标文件名
char *g_argv[64];          // 命令参数数组
int g_argc = 0;            // 参数个数

// 去除字符串首尾空格
#define TRIM_SPACE(pos) do { \
    while (isspace(*pos)) pos++; \
} while (0)

// 解析重定向符号
void ParseRedir(char *command_buf, int len) {
    int end = len - 1;
    while (end >= 0) {
        if (command_buf[end] == '<') {
            // 输入重定向
            g_redir = INPUT_REDIR;
            command_buf[end] = '\0'; // 截断命令部分
            g_filename = &command_buf[end + 1];
            TRIM_SPACE(g_filename);
            break;
        } else if (command_buf[end] == '>') {
            if (command_buf[end - 1] == '>') {
                // 追加重定向
                g_redir = APPEND_REDIR;
                command_buf[end] = '\0';
                command_buf[end - 1] = '\0';
                g_filename = &command_buf[end + 1];
            } else {
                // 输出重定向
                g_redir = OUTPUT_REDIR;
                command_buf[end] = '\0';
                g_filename = &command_buf[end + 1];
            }
            TRIM_SPACE(g_filename);
            break;
        }
        end--;
    }
}

// 执行重定向（子进程中调用）
void DoRedir() {
    int fd = -1;
    switch (g_redir) {
        case INPUT_REDIR:
            // 打开输入文件（只读）
            fd = open(g_filename, O_RDONLY);
            if (fd < 0) exit(2);
            dup2(fd, 0); // 重定向标准输入（fd=0）
            break;
        case OUTPUT_REDIR:
            // 打开输出文件（创建+只写+清空）
            fd = open(g_filename, O_CREAT | O_WRONLY | O_TRUNC, 0666);
            if (fd < 0) exit(4);
            dup2(fd, 1); // 重定向标准输出（fd=1）
            break;
        case APPEND_REDIR:
            // 打开输出文件（创建+只写+追加）
            fd = open(g_filename, O_CREAT | O_WRONLY | O_APPEND, 0666);
            if (fd < 0) exit(6);
            dup2(fd, 1); // 重定向标准输出（fd=1）
            break;
        default:
            return; // 无重定向
    }
    close(fd); // 重定向后关闭原fd
}

// 执行命令（含重定向）
bool ExecuteCommand() {
    pid_t pid = fork();
    if (pid < 0) return false;
    if (pid == 0) {
        DoRedir(); // 子进程中执行重定向
        // 程序替换（执行命令）
        execvpe(g_argv[0], g_argv, g_env);
        perror("exec failed");
        exit(7);
    } else {
        int status = 0;
        waitpid(pid, &status, 0); // 父进程等待
        // 更新退出码
        g_last_code = WIFEXITED(status) ? WEXITSTATUS(status) : 100;
    }
    return true;
}

// 完整Shell主循环（省略命令读取、解析等重复代码）
int main() {
    InitEnv(); // 初始化环境变量
    char command_buf[1024];
    while (true) {
        PrintPrompt(); // 打印提示符
        if (!GetCommandLine(command_buf, sizeof(command_buf))) continue;
        ResetCommand(); // 重置命令参数和重定向状态
        ParseRedir(command_buf, strlen(command_buf)); // 解析重定向
        ParseCommand(command_buf); // 解析命令参数
        if (CheckAndExecBuiltCommand()) continue; // 执行内建命令
        ExecuteCommand(); // 执行外部命令（含重定向）
    }
    return 0;
}

测试示例

bash 复制代码

# 输出重定向：ls -l 结果写入file.txt
[root@localhost myshell]# ls -l > file.txt

# 追加重定向：echo "hello" 追加到file.txt
[root@localhost myshell]# echo "hello" >> file.txt

# 输入重定向：cat 读取file.txt内容
[root@localhost myshell]# cat < file.txt

六、缓冲区

6.1 缓冲区的本质与作用

核心定义

缓冲区是内存中预留的一块存储空间，用于缓存输入/输出数据，减少系统调用次数和外设访问频率。

为什么需要缓冲区？

系统调用（如read/write）会导致CPU从用户态切换到内核态，上下文切换开销大。
外设（如磁盘、显示器）速度远低于CPU和内存，缓冲区可减少外设访问次数，提升整体效率。

6.2 缓冲区的三种类型

标准IO库（C库）提供三种缓冲方式：

全缓冲：填满缓冲区后才执行系统调用，常用于磁盘文件（默认缓冲区大小通常为4KB或8KB）。
行缓冲 ：遇到换行符\n或缓冲区填满时执行系统调用，常用于终端（如stdout）。
无缓冲 ：不使用缓冲区，直接执行系统调用，常用于标准错误（stderr），确保错误信息立即输出。

实战验证缓冲区类型

c 复制代码

#include <stdio.h>
#include <unistd.h>

int main() {
    // stdout：行缓冲，无换行符不刷新
    printf("hello stdout");
    // stderr：无缓冲，立即输出
    fprintf(stderr, "hello stderr\n");
    sleep(3); // 休眠期间观察输出
    return 0;
}

运行结果 ：先输出hello stderr，休眠3秒后输出hello stdout（进程退出时刷新缓冲区）。

6.3 缓冲区的刷新时机

除了上述默认触发条件，以下情况会强制刷新缓冲区：

缓冲区填满时。
调用fflush函数强制刷新（如fflush(stdout)）。
进程正常退出时（exit或return）。
关闭文件时（fclose会自动刷新缓冲区）。

经典坑点：重定向后的缓冲区问题

c 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main() {
    close(1);
    int fd = open("log.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    printf("hello world"); // 重定向后为全缓冲，未填满不刷新
    close(fd); // 未刷新缓冲区，数据丢失
    return 0;
}

解决方法 ：在close前调用fflush(stdout)强制刷新。

6.4 缓冲区的归属：用户级 vs 内核级

用户级缓冲区 ：由C标准库提供（如FILE结构体中的缓冲区），用于减少系统调用次数。
内核级缓冲区：由操作系统提供，用于减少外设访问次数，用户无法直接操作。

验证：库函数与系统调用的缓冲区差异

c 复制代码

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    const char *msg1 = "hello printf\n";
    const char *msg2 = "hello fwrite\n";
    const char *msg3 = "hello write\n";

    // 库函数：带用户级缓冲区
    printf("%s", msg1);
    fwrite(msg2, strlen(msg2), 1, stdout);
    // 系统调用：无用户级缓冲区
    write(1, msg3, strlen(msg3));

    fork(); // 创建子进程，触发写时拷贝
    return 0;
}

重定向到文件后的结果：

printf和fwrite输出2次（缓冲区数据被写时拷贝）。
write输出1次（无用户级缓冲区，数据已直接写入内核）。

七、深入理解"一切皆文件"

7.1 核心原理：统一的文件抽象模型

Linux将所有设备和资源抽象为文件，通过以下机制实现统一操作：

内核file结构体 ：每个打开的文件对应一个file结构体，存储文件元数据（f_inode）、操作方法（f_op）、当前位置（f_pos）等。
file_operations结构体 ：包含文件的操作函数指针（如read、write、open），不同设备（磁盘、键盘、网卡）的file_operations实现不同，但接口统一。

7.2 本质：函数指针的多态性

内核通过file->f_op指向对应设备的操作函数集合，调用read时实际执行的是设备驱动中的read函数。
对开发者而言，无需关心设备差异，只需调用统一的read/write接口，实现"一次编码，多设备兼容"。

7.3 实战意义

例如，读取键盘输入和读取磁盘文件都可通过read函数实现：

键盘：read(0, buf, sizeof(buf))。
磁盘文件：read(fd, buf, sizeof(buf))。
内核自动通过file_operations分发到对应设备的驱动函数。

八、总结与进阶方向

本文从文件本质出发，逐步深入Linux基础IO的核心机制，涵盖C库IO、系统调用IO、文件描述符、重定向及缓冲区原理，最终通过增强版微型Shell将知识点落地。

进阶学习方向

高级文件操作 ：lseek（文件指针定位）、mmap（内存映射IO）、select/poll/epoll（IO多路复用）。
文件系统原理：inode、目录项、超级块、软链接与硬链接。
设备驱动开发 ：基于file_operations实现简单字符设备驱动。
网络IO：Socket编程（本质是文件操作的延伸）、TCP/UDP协议实战。

【Linux】基础IO

【Linux】基础IO

一、文件本质与IO核心认知

1.1 重新理解"文件"

核心定义

初学者关键疑问

1.2 文件操作的核心分类

1.3 系统视角：IO操作的底层逻辑

二、回顾C标准库IO接口

2.1 核心接口实战与常见坑

2.1.1 文件打开与路径问题

初学者疑问：如何确认文件创建路径？

2.1.2 文件读写实战

写文件示例

读文件示例（模拟简易cat命令）

关键坑点：feof的正确使用

2.2 标准输入输出流：stdin/stdout/stderr

多方式输出到显示器示例

初学者疑问：三者的区别是什么？

三、系统调用IO接口：底层操作的真相

3.1 核心系统调用接口实战

3.1.1 打开文件：open函数

关键参数解析

初学者疑问：为什么需要umask？

3.1.2 读写文件：read/write函数

写文件示例

读文件示例

3.2 系统调用与库函数的关系

四、文件描述符（fd）

4.1 什么是文件描述符？

4.1.1 默认打开的文件描述符

验证示例

4.2 文件描述符的分配规则

核心规则：最小未使用原则

实战验证

初学者疑问：文件描述符的底层存储逻辑是什么？

4.3 文件描述符与FILE结构体的关系

结论：FILE*本质是对fd的封装，加上用户级缓冲区。

五、重定向

5.1 重定向的本质

基础重定向示例（关闭fd=1实现）

初学者疑问：为什么需要fflush？

5.2 高效重定向：dup2系统调用

函数原型

实战示例（标准输出重定向到文件）

5.3 增强版微型Shell：添加重定向功能

核心代码实现（关键部分）

测试示例

六、缓冲区

6.1 缓冲区的本质与作用

核心定义

为什么需要缓冲区？

6.2 缓冲区的三种类型

实战验证缓冲区类型

6.3 缓冲区的刷新时机

经典坑点：重定向后的缓冲区问题

6.4 缓冲区的归属：用户级 vs 内核级

验证：库函数与系统调用的缓冲区差异

七、深入理解"一切皆文件"

7.1 核心原理：统一的文件抽象模型

7.2 本质：函数指针的多态性

7.3 实战意义

八、总结与进阶方向

进阶学习方向

读文件示例（模拟简易`cat`命令）

关键坑点：`feof`的正确使用

初学者疑问：为什么需要`umask`？

结论：`FILE*`本质是对`fd`的封装，加上用户级缓冲区。

初学者疑问：为什么需要`fflush`？