文件系统(十一)：Linux Squashfs只读文件系统介绍

liwen01 2024.07.21

前言

嵌入式Linux系统中，squashfs文件系统使用非常广泛。它主要的特性是只读，文件压缩比例高。对于flash空间紧张的系统，可以将一些不需要修改的资源打包成压缩的只读文件系统格式，从而达到节省空间的目的。

另外还有个特性就是它可以分块解压缩，使用数据会更加灵活，但同时也会引入读放大的问题。

(一)制作squash文件系统

使用mksquashfs可以将文件及文件夹制作成squash文件系统镜像文件，比如我们要将squashfs-root文件夹打包成squashfs镜像文件，可以使用命令：

复制代码

mksquashfs squashfs-root squashfs-root.sqsh -comp xz

这里是使用xz压缩方式进行文件压缩

(1)压缩比例测试

squashfs是一个只读压缩的文件系统，我们简单测试一下它的压缩功能

使用/dev/zero生成零数据写入到文件夹squashfs_zero对应的文件中

复制代码

dd if=/dev/zero of=file1 bs=256K count=1

制作如下测试文件目录及测试文件:

复制代码

biao@ubuntu:~/test/squashfs/squashfs_zero$ tree
.
├── test1
│   ├── file1
│   ├── file1_1
│   └── file1_2
├── test2
│   ├── file2
│   ├── file2_1
│   └── file2_2
├── test3
│   ├── file3
│   ├── file3_1
│   └── file3_2
└── test4
    ├── file4
    ├── file4_1
    └── file4_2

4 directories, 12 files
biao@ubuntu:~/test/squashfs/squashfs_zero$

文件大小如下：

复制代码

biao@ubuntu:~/test/squashfs/squashfs_zero$ du -h
1.5M    ./test3
2.1M    ./test2
2.1M    ./test1
1.7M    ./test4
7.3M    .
biao@ubuntu:~/test/squashfs/squashfs_zero$

使用xz压缩方式将squashfs_zero制作成镜像文件

复制代码

mksquashfs squashfs_zero squashfs_zero.sqsh -comp xz

文件大小如下：

复制代码

biao@ubuntu:~/test/squashfs$ ll -h squashfs_zero.sqsh 
-rw-r--r-- 1 biao biao 4.0K Jun 26 23:48 squashfs_zero.sqsh
biao@ubuntu:~/test/squashfs$

这里是将7.3M大小squashfs_zero文件夹压缩成了一个4k大小的squashfs_zero.sqsh。当然，这里的测试是非常极端的，因为文件写入的数据都是0，如果写入随机数那压缩比例就会相差非常大了。

(二)squashfs数据分析

(1)数据布局

Squashfs的一个镜像文件它最多包含下面9个部分：Superblock、Compression options、Data blocks fragments、Inode table、Directory table、Fragment table、Export table、 UID/GID lookup table、Xattr table。

最多包含的意思，也就是有些部分不是必须的，比如Compression options 部分。

它们在镜像文件中的数据分布如下图：

(2)制作测试镜像文件

使用/dev/urandom 生成随机数写到文件夹squashfs_urandom对应的文件：

复制代码

dd if=/dev/urandom of=filex bs=10K count=50

制作如下测试文件目录及测试文件：

复制代码

biao@ubuntu:~/test/squashfs/squashfs_urandom$ tree
.
├── test1
│   ├── file1
│   ├── file1_1
│   └── file1_2
├── test2
│   ├── file2
│   ├── file2_1
│   └── file2_2
├── test3
│   ├── file3
│   ├── file3_1
│   └── file3_2
└── test4
    ├── file4
    ├── file4_1
    └── file4_2

4 directories, 12 files
biao@ubuntu:~/test/squashfs/squashfs_urandom$

squashfs 文件系统的组成部分，大部分也都是压缩的，为了我们后面的数据分析，我们设置Data blocks fragments、Inode table、Directory table、Fragment table不进行压缩

制作命令如下：

复制代码

mksquashfs squashfs_urandom squashfs_urandom.sqsh -comp xz  -noF -noX -noI -noD

(3)查看镜像数据信息

如果要查看squashfs的概要信息，可以使用unsquashfs命令进行查看

复制代码

unsquashfs -s squashfs_urandom.sqsh

输出内容信息如下:

复制代码

biao@ubuntu:~/test/squashfs$ unsquashfs -s squashfs_urandom.sqsh 
Found a valid SQUASHFS 4:0 superblock on squashfs_urandom.sqsh.
Creation or last append time Wed Jun 26 23:28:18 2024
Filesystem size 5032.60 Kbytes (4.91 Mbytes)
Compression xz
Block size 131072
Filesystem is exportable via NFS
Inodes are uncompressed
Data is uncompressed
Fragments are uncompressed
Always-use-fragments option is not specified
Xattrs are uncompressed
Duplicates are removed
Number of fragments 2
Number of inodes 37
Number of ids 1
biao@ubuntu:~/test/squashfs$

这里我们可以看到，上面我们设置-no的部分，是没有进行数据压缩的。

(4)Superblock参数分析

Superblock 在镜像文件的最开始位置，大小固定为96个字节，查看数据内容如下：

复制代码

biao@ubuntu:~/test/squashfs$ hexdump  -s 0 -n 96 -C squashfs_urandom.sqsh 
00000000  68 73 71 73 11 00 00 00  ec 5c 7a 66 00 00 02 00  |hsqs.....\zf....|
00000010  02 00 00 00 04 00 11 00  cb 01 01 00 04 00 00 00  |................|
00000020  ac 02 00 00 00 00 00 00  16 9d 4e 00 00 00 00 00  |..........N.....|
00000030  0e 9d 4e 00 00 00 00 00  ff ff ff ff ff ff ff ff  |..N.............|
00000040  60 98 4e 00 00 00 00 00  2e 9b 4e 00 00 00 00 00  |`.N.......N.....|
00000050  6e 9c 4e 00 00 00 00 00  00 9d 4e 00 00 00 00 00  |n.N.......N.....|
00000060
biao@ubuntu:~/test/squashfs$

对Superblock的数据进行解析

这里我们看到几个比较关键的数据

最开始的4个字节为squashfs的magic，值为hsqs
block size 是每个数据块的最大长度，这里是128KB，squashfs支持的块大小范围是：4KB~1MB
compressor 表示压缩类型，这里的4表示xz压缩，其它还支持GZIP、LZMA、LZO、LZ4、ZSTD 数据压缩格式。
frag count 表示有多少段数据是存储在fragments组块中
最后面是各个table组块的开始位置

(5)inode table数据分析

从superblock中我们知道inode table的开始位置是在0x4e9860位置

复制代码

biao@ubuntu:~/test/squashfs$ hexdump  -s 0x4e9860 -n 718 -C squashfs_urandom.sqsh     
004e9860  cc 82 02 00 b4 01 00 00  00 00 9b e9 78 66 02 00  |............xf..|
004e9870  00 00 60 00 00 00 ff ff  ff ff 00 00 00 00 00 20  |..`............ |
004e9880  03 00 00 00 02 01 00 20  01 01 02 00 b4 01 00 00  |....... ........|
004e9890  00 00 c3 e9 78 66 03 00  00 00 60 20 03 00 ff ff  |....xf....` ....|
004e98a0  ff ff 00 00 00 00 00 d0  07 00 00 00 02 01 00 00  |................|
004e98b0  02 01 00 00 02 01 00 d0  01 01 02 00 b4 01 00 00  |................|
004e98c0  00 00 cf e9 78 66 04 00  00 00 60 f0 0a 00 ff ff  |....xf....`.....|
004e98d0  ff ff 00 00 00 00 00 80  0c 00 00 00 02 01 00 00  |................|
004e98e0  02 01 00 00 02 01 00 00  02 01 00 00 02 01 00 00  |................|
004e98f0  02 01 00 80 00 01 01 00  fd 01 00 00 00 00 b1 e9  |................|
004e9900  78 66 01 00 00 00 00 00  00 00 02 00 00 00 3a 00  |xf............:.|
004e9910  00 00 11 00 00 00 02 00  b4 01 00 00 00 00 f9 e9  |................|
004e9920  78 66 06 00 00 00 00 00  00 00 00 00 00 00 00 00  |xf..............|
004e9930  00 00 00 78 00 00 02 00  b4 01 00 00 00 00 01 ea  |...x............|
004e9940  78 66 07 00 00 00 00 00  00 00 00 00 00 00 00 78  |xf.............x|
004e9950  00 00 00 18 01 00 02 00  b4 01 00 00 00 00 08 ea  |................|
004e9960  78 66 08 00 00 00 00 00  00 00 01 00 00 00 00 00  |xf..............|
004e9970  00 00 00 f0 00 00 01 00  fd 01 00 00 00 00 ea e9  |................|
.........
.........
biao@ubuntu:~/test/squashfs$

对数据进行分析

这里有几个参数需要注意：

(a)inode_type

inode_type 是inode的类型，数值2表示普通文件，其它类型定义如下：

(b)block_sizes

这里是描述的块的大小(有可能是压缩的)，这个大小需要解析。

为什么有些inode有多个block_sizes呢？这个是因为superblock中定义了一个block的最大值，如果一个文件的大小大于block最大值，那它就存在多个block_sizes。

实际每一个文件都有一个对应的inode,它都是按序分布在inode table中。

(6)directory table 数据分析

从superblock中我们知道directory table的开始位置是在0x4e9b2e位置:

复制代码

biao@ubuntu:~/test/squashfs$ hexdump  -s 0x4e9b2e -n 320 -C squashfs_urandom.sqsh          
004e9b2e  1c 81 02 00 00 00 00 00  00 00 02 00 00 00 00 00  |................|
004e9b3e  00 00 02 00 04 00 66 69  6c 65 31 28 00 01 00 02  |......file1(....|
004e9b4e  00 06 00 66 69 6c 65 31  5f 31 58 00 02 00 02 00  |...file1_1X.....|
004e9b5e  06 00 66 69 6c 65 31 5f  32 02 00 00 00 00 00 00  |..file1_2.......|
004e9b6e  00 06 00 00 00 b4 00 00  00 02 00 04 00 66 69 6c  |.............fil|
004e9b7e  65 32 d4 00 01 00 02 00  06 00 66 69 6c 65 32 5f  |e2........file2_|
004e9b8e  31 f4 00 02 00 02 00 06  00 66 69 6c 65 32 5f 32  |1........file2_2|
004e9b9e  02 00 00 00 00 00 00 00  0a 00 00 00 34 01 00 00  |............4...|
004e9bae  02 00 04 00 66 69 6c 65  33 68 01 01 00 02 00 06  |....file3h......|
004e9bbe  00 66 69 6c 65 33 5f 31  98 01 02 00 02 00 06 00  |.file3_1........|
004e9bce  66 69 6c 65 33 5f 32 02  00 00 00 00 00 00 00 0e  |file3_2.........|
004e9bde  00 00 00 f8 01 00 00 02  00 04 00 66 69 6c 65 34  |...........file4|
004e9bee  28 02 01 00 02 00 06 00  66 69 6c 65 34 5f 31 60  |(.......file4_1`|
004e9bfe  02 02 00 02 00 06 00 66  69 6c 65 34 5f 32 03 00  |.......file4_2..|
004e9c0e  00 00 00 00 00 00 01 00  00 00 94 00 00 00 01 00  |................|
004e9c1e  04 00 74 65 73 74 31 14  01 04 00 01 00 04 00 74  |..test1........t|
004e9c2e  65 73 74 32 d8 01 08 00  01 00 04 00 74 65 73 74  |est2........test|
004e9c3e  33 8c 02 0c 00 01 00 04  00 74 65 73 74 34 20 80  |3........test4 .|
004e9c4e  60 70 17 00 00 00 00 00  00 90 01 01 00 00 00 00  |`p..............|
004e9c5e  60 a8 4d 00 00 00 00 00  00 f0 00 01 00 00 00 00  |`.M.............|
004e9c6e
biao@ubuntu:~/test/squashfs$

对数据进行解析：

这里最开始是一个directory header结构，它由count、start、inode number组成，它们定义如下：

每个directory header 至少需要携带一个Directory Entry，Directory Entry的定义如下：

这里的inode number 与 inode table 中的inode number是相互对应的

(7)Data blocks fragments 分析

(a)Data blocks

在我们测试的这个镜像文件中，应为使用的是xz压缩方式，属于常规压缩方式，Compression options中不会有描述，也就是说Compression options组成部分是为空。

在Superblock后面紧接着的就是Data blocks数据。

从inode table和dir table我们知道，最开始存储的是inode number为2的file1 文件。

因为我们这里的数据未进行压缩，正常应该是对比镜像文件0x60地址开始的数据与file1文件开始的数据一样的。

(b)fragments

fragments 组块设计的目的是用来存储一些小文件，将它们组合成一个block来存储，还有一种就是前面文件剩余的一小部分数据，也有可能会被存储在fragments组块中。

具体哪些数据存储到了fragments，可以查看fragments table表

(三)squashfs工作原理

(1)挂载文件系统：

squashfs被挂载的时候，系统首先读取superblock块，获取squashfs基本信息和各个表格的位置。

(2)访问文件或目录：

系统从 superblock 获取 inode table 和 directory table 的位置。
如果是访问目录，系统查找 directory table，获取目录中每个文件和子目录的名称及其 inode 编号。
通过 inode 编号，从 inode table 获取文件或目录的 inode，了解文件的元数据和数据块位置。
对于小文件或大文件的片段，通过 inode 中的信息查找 fragment table，获取片段的数据位置。

(四)squashfs优缺点

(1)优点

高压缩率：SquashFS 使用 gzip、lzma、lz4、xz 等压缩算法，能够显著减少文件系统的大小，节省存储空间。

只读特性：适合用于需要保护数据完整性的环境，如嵌入式系统和操作系统的只读镜像。

高效的随机访问：SquashFS 支持高效的随机读取访问，适合读取频繁的场景。

碎片处理： 通过 Fragment Table，SquashFS 能有效处理小文件，减少存储碎片，提高存储效率。

存储和性能优化： 支持文件、目录和 inode 的压缩，减少了存储占用和 I/O 操作，提高了性能。

数据完整性：SquashFS 可以包含校验和，用于确保数据的完整性和防止数据损坏。

(2)缺点

只读特性：SquashFS 是只读的，不能直接修改文件系统中的文件或目录。这意味着需要更新或更改文件系统时，必须重新生成整个文件系统镜像。

压缩开销：虽然读取速度较快，但解压缩过程仍然需要一定的 CPU 资源。在低性能的嵌入式系统中，这可能会对系统性能产生一定影响。

内存消耗：在读取大文件时，解压缩过程可能会消耗大量内存，尤其是在资源受限的嵌入式系统中，这可能会成为一个瓶颈

结尾

上面介绍了squashfs文件系统的数据组成和它们相互工作的原理以及squash文件系统的优缺点。

这里提一个问题：如果根文件系统使用squashfs文件系统，main执行文件也位于根文件系统中，在不考虑双分区备份升级的情况下，要怎么升级根文件系统？

在main程序中直接将新squashfs镜像文件写入到根文件系统所在的mtdblock中是否可以？会不会存在根文件系统更新异常的风险？

-------------------End------------------- 如需获取更多内容请关注liwen01 公众号