文件基础IO - 技术栈

理解"文件"

1-1 狭义理解

文件在磁盘里
磁盘是永久性存储介质，因此文件在磁盘上的存储是永久性的
磁盘是外设（即是输出设备也是输入设备）
磁盘上的文件本质是对文件的所有操作，都是对外设的输入和输出简称IO

1-2 广义理解

Linux 下⼀切皆文件（键盘、显示器、网卡、磁盘...... 这些都是抽象化的过程）

1-3 文件操作的归类认知

对于 0KB 的空文件是占用磁盘空间的，文件创建时间，属性，权限....都是需要存储的
文件是文件属性（元数据）和文件内容的集合（文件 = 属性（元数据）+ 内容）
所有的文件操作本质是文件内容操作和文件属性操作

1-4 系统角度

访问文件，需要先打开文件！谁打开文件？？进程打开文件！对文件的操作，本质是：进程对文件的操作！

对文件的操作本质是进程对文件的操作
磁盘的管理者是操作系统
文件的读写本质不是通过 C 语言 / C++ 的库函数来操作的（这些库函数只是为用户提供方便），而是通过文件相关的系统调用接口来实现的

回顾C文件接口

cpp 复制代码

  1 #include<stdio.h>
  2 #include<string.h>
  3 int main()
  4 {
  5 FILE *fp=fopen("log.txt","w");
  6 
  7   if(fp==NULL)
  8  {
  9   perror("fopen");
 10   return 1;
 11  }
 12 const char *msg="hello bit";
 13 int cnt=1;
 14 while(cnt<=10)
 15 {
 16   char buffer[1024];
 17   snprintf(buffer,sizeof(buffer),"%s%d\n",msg,cnt++);                                                                                                                                    
 18   fwrite(buffer,strlen(buffer),1,fp);
 19 }
 20 
 21 fclose(fp);
 22 
 23   return 0;
 24 }

稍作修改，实现简单cat命令:

cpp 复制代码

#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
if (argc != 2)
{
printf("argv error!\n");
return 1;
}
FILE *fp = fopen(argv[1], "r");
if(!fp){
printf("fopen error!\n");
return 2;
}
char buf[1024];
while(1){
int s = fread(buf, 1, sizeof(buf), fp);
if(s > 0){
buf[s] = 0;
printf("%s", buf);
}
if(feof(fp)){
break;
}
}
fclose(fp);
return 0;
}

输出信息到显示器三种方法

c++还有cout，其实这些都是封装了最原始的writ

cpp 复制代码

#include <stdio.h>
#include <string.h>
int main()
{
const char *msg = "hello fwrite\n";
fwrite(msg, strlen(msg), 1, stdout);
printf("hello printf\n");
fprintf(stdout, "hello fprintf\n");
return 0;
}

2-5 stdin & stdout & stderr

C默认会打开三个输⼊输出流，分别是stdin, stdout, stderr
仔细观察发现，这三个流的类型都是FILE*, fopen返回值类型，文件指针
stdin 标准输入键盘文件
stdout 标准输出显示器文件
stderr标准错误显示器文件

cpp 复制代码

#include <stdio.h>
extern FILE *stdin;
extern FILE *stdout;
extern FILE *stderr;

2-6 打开文件的方式

r：以只读模式打开文本文件，文件指针位于文件开头。
r+：以读写模式打开文件，文件指针位于文件开头。可以读取和写入文件内容。
w：若文件存在则将其内容清空（截断为零长度），若文件不存在则创建一个新的文本文件用于写入，文件指针位于文件开头。
w+：以读写模式打开文件。若文件不存在则创建，若存在则清空内容，文件指针位于文件开头。
a：以追加模式打开文件，若文件不存在则创建。文件指针位于文件末尾，写入的内容会追加到文件现有内容之后。
a+：以读写和追加模式打开文件。若文件不存在则创建，读取时文件指针位于文件开头，写入时内容总是追加到文件末尾

cpp 复制代码

r
Open text file for reading.
The stream is positioned at the beginning of the file.
r+
Open for reading and writing.
The stream is positioned at the beginning of the file.
w
Truncate(缩短) file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+
Open for reading and writing.
The file is created if it does not exist, otherwise it is truncated.
The stream is positioned at the beginning of the file.
a
Open for appending (writing at end of file).
The file is created if it does not exist.
The stream is positioned at the end of the file.
a+
Open for reading and appending (writing at end of file).
The file is created if it does not exist. The initial file position
for reading is at the beginning of the file,
but output is always appended to the end of the file.

系统文件I/O

打开文件的方式不仅仅是fopen，ifstream等流式，语言层的方案，其实系统才是打开文件最底层的方案。不过，在学习系统文件IO之前，先要了解下如何给函数传递标志位，该方法在系统文件IO接口中会使用到：

一种传递标志位的方法(位图)

cpp 复制代码

 #include<stdio.h>
  2 
  3 #define ONE_FLAG (1<<0) //0000 0000 0000...0000 0001
  4 #define TWO_FLAG (1<<1) //0000 0000 0000...0000 0010
  5 #define THREE_FLAG (1<<2) //0000 0000 0000...0000 0100 
  6 #define FOUR_FLAG (1<<3) //0000 0000 0000...0000 0100 
  7 
  8 void Print(int flags)
  9 {
 10   if(flags & ONE_FLAG)
 11   {
 12     printf("One!\n");
 13   }
 14   if(flags & TWO_FLAG)
 15   {
 16     printf("Two!\n");
 17   }
 18   if(flags & THREE_FLAG)
 19   {
 20     printf("Three!\n");
 21   }
 22   if(flags & FOUR_FLAG)
 23   {
 24     printf("Four!\n");
 25   }
 26 }
 27 int main()
 28 {
 29 
 30   Print(1);
 31   printf("\n");
 32   Print(1|2);
 33   printf("\n");                                                                                                                                                                          
 34   Print(1|2|4);                                                                                                            
 35   return 0;                                                                                                                
 36 }

hello.c写文件

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main()
{
umask(0);
int fd = open("myfile", O_WRONLY|O_CREAT, 0644);
if(fd < 0){
perror("open");
return 1;
}
int count = 5;
const char *msg = "hello bit!\n";
int len = strlen(msg);
while(count--){
write(fd, msg, len);//fd: 后⾯讲， msg：缓冲区⾸地址， len: 本次读取，期望写
⼊多少个字节的数据。 返回值：实际写了多少字节数据
}
close(fd);
return 0;
}

hello.c读文件

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main()
{
int fd = open("myfile", O_RDONLY);
if(fd < 0){
perror("open");
return 1;
}
const char *msg = "hello bit!\n";
char buf[1024];
while(1){
ssize_t s = read(fd, buf, strlen(msg));//类⽐write
if(s > 0){
printf("%s", buf);
}else{
break;
}
}
close(fd);
return 0;
}

open

读 int open（const char *pathname，int flags）；
写 int open （const char *pathname，int flags，mode_t mode );

cpp 复制代码

pathname: 要打开或创建的⽬标⽂件
flags: 打开⽂件时，可以传⼊多个参数选项，⽤下⾯的⼀个或者多个常量进⾏"或"运算，构成
flags。
参数:
O_RDONLY: 只读打开
O_WRONLY: 只写打开
O_RDWR : 读，写打开
这三个常量，必须指定⼀个且只能指定⼀个
O_CREAT : 若⽂件不存在，则创建它。需要使⽤mode选项，来指明新⽂件的访问
权限
O_APPEND: 追加写
返回值：
成功：新打开的⽂件描述符
失败：-1

open 函数具体使用哪个，和具体应用场景相关，如目标文件不存在，需要open创建，则第三个参数表示创建文件的默认权限,否则，使用两个参数的open。

cpp 复制代码

#include<stdio.h>
  2 #include<sys/types.h>
  3 #include<sys/stat.h>
  4 #include<fcntl.h>
  5 int main()
  6 {
  7 
  8 int fd=open("log.txt",O_CREAT|O_WRONLY,0666);                                                                                                                                            
  9 if(fd<0)
 10 {
 11   perror("open");
 12   return 1;
 13 }
 14 
 15   return 0;
 16 }

下面，权限设的是666但是这里是664，umask掩码给屏蔽了，open是系统调用，权限掩码的影响是在系统内部，只要设置umsk(0);就能解决，umsk是设置文件创建时的掩码可以屏蔽掉系统内部权限的影响，用户设置说明就是什么。write read close lseek ,类比C文件相关接口。

fopen fclose fread fwrite 都是C标准库当中的函数，我们称之为库函数（libc）。而open close read write lseek 都属于系统提供的接口，称之为系统调用接口。

read

ssize_t read(int fd,void *buf,size_t count);

从指定文件描述符读取count个缓冲区的大小，返回读取成功的字节大小数，小于零表示失败，等于零都到结尾

回忆一下讲操作系统概念时，画的张图：

系统调用接口和库函数的关系，一目了然。

所以，可以认为， f# 系列的函数，都是对系统调用的封装，方便二次开发。

文件描述符fd

0 & 1 & 2

Linux进程默认情况下会有3个缺省打开的文件描述符，分别是标准输入0，标准输出1，标准错

误2.

• 0,1,2对应的物理设备一般是：键盘，显示器，显示器

所以输入输出还可以采用如下方式：

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
1
2
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main()
{
char buf[1024];
ssize_t s = read(0, buf, sizeof(buf));
if(s > 0){
buf[s] = 0;
write(1, buf, strlen(buf));
write(2, buf, strlen(buf));
}
return 0;
}

而现在知道，文件描述符就是从0开始的小整数。当我们打开文件时，操作系统在内存中要创建相应的数据结构来描述目标文件。于是就有了file结构体。表示一个已经打开的文件对象。而进程执行open系统调用，所以必须让进程和文件关联起来。每个进程都有一个指针*files, 指向一张表files_struct,该表最重要的部分就是包含一个指针数组，每个元素都是一个指向打开文件的指针！所以，本质上，文件描述符就是该数组的下标。所以，只要拿着文件描述符，就可以找到对应的文件。对于以上原理结论我们可通过内核源码验证：

文件描述符的分配规则

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
int fd = open("myfile", O_RDONLY);
if(fd < 0){
perror("open");
return 1;
}
printf("fd: %d\n", fd);
close(fd);
return 0;
}

输出发现是 fd: 3

关闭0或者2，在看

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
close(0);
//close(2);
int fd = open("myfile", O_RDONLY);
if(fd < 0){
perror("open");
return 1;
}
printf("fd: %d\n", fd);
close(fd);
return 0;
}

发现是结果是： fd: 0 或者 fd 2 ，可见，文件描述符的分配规则：在files_struct数组当中，找到
当前没有被使用的最小的⼀个下标，作为新的文件描述符。

进程里面有一个stuct*file能找到文件描述符表， file是一个结构体，里面含有文件的各种属性，fd是指针数组，里面通过下标能访问到各自file，文件被打开时，创建file文件，再把file存入文件描述符表中最小的空的fd里面，fd通过地址文件能找到file能访问文件。

进程在调用read等接口时操作系统拿着fd索引来到文件描述符表找到该fd内的file地址，每一个文件都要有自己的文件缓冲区，操作系统预加载，file找到缓冲区，将缓冲区的内容拷贝给read等接口自己的缓冲内，所以读写的本质就是拷贝！

重定向

那如果关闭1呢？看代码：

cpp 复制代码

#include <stdio.h>
    2 #include <sys/types.h>
    3 #include <sys/stat.h>
    4 #include <fcntl.h>
    5 #include <stdlib.h>
    6 int main()
    7 {
    8 close(1);
    9 int fd = open("log.txt", O_WRONLY|O_CREAT, 0666);
   10 
   11 printf("fd: %d\n", fd);                                                                                                                                                                
   12 }

把本来应该写入显示器的内容居然写进了文件里！

为什么不显示？因为把标准输出关了。

为什么又写进了文件里？因为打开了这个文件。

此时，我们发现，本来应该输出到显示器上的内容，输出到了文件 myfile 当中，其中，fd＝1。这

种现象叫做输出重定向。常见的重定向有: > , >> , <

那重定向的本质是什么呢？

文件描述符表包含了fd_array[],数组下标就是对应打开的文件，系统默认打开标准输入(0),标准输出(1),标准错误(2),而我要打开新的文件log.txt之前把标准输出关了，此时文件描述符表下标为1的指向就不在是标准输出，后来我open打开了一个文件log.txt，根据文件描述符分配规则，又因为最小未被使用的下标恰好就是刚才释放的下标为1的文件描述符，1的地址就是log.txt的地址，把1返回给上层用户，printf就是往stdout内打印的，它是封装了标准输出fd=1，上述操作是在操作系统内部实现的，而用户层printf只认文件描述符1，通过文件描述符1找到对应的文件写入内容。

所以，这种在操作系统内部更改内容指向，和用户层没关系，这种叫重定向，是在内核上做的狸猫换太子！

使用 dup2 系统调用

cpp 复制代码

#include <unistd.h>
int dup2(int oldfd, int newfd);

输出重定向

从文件输出显示器

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
close(1);
int fd = open("log.txt", O_WRONLY|O_CREAT, 0666);
if(fd<0)return 1;
dup2(fd,1);
printf("fd: %d\n", fd);
printf("hello bit\n");
printf("hello bit\n");
fprintf(stdout,"hello stdout\n");

}

输入重定向

从文件输入进显示器

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
int fd = open("log.txt", O_RDONLY);
if(fd<0)return 1;
dup2(fd,0);
while(1)
{char buffer[64];
  if(!fgets(buffer,sizeof(buffer),stdin))break;

printf("%s",buffer);

}}

printf是C库当中的IO函数，一般往 stdout 中输出，但是stdout底层访问文件的时候，找的还是fd:1,但此时，fd:1下标所表示内容，已经变成了myfifile的地址，不再是显示器文件的地址，所以，输出的任何消息都会往文件中写入，进而完成输出重定向。那追加和输入重定向如何完成呢？

cpp 复制代码

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include<unistd.h>
#include<string.h>
int main(int argc,char* argv[])
{
  if(argc!=2)exit(1);
int fd = open(argv[1], O_RDONLY);
if(fd<0)return 1;
dup2(fd,0);
while(1)
{char buffer[64];
  if(!fgets(buffer,sizeof(buffer),stdin))break;


printf("%s",buffer);

}}