文章目录
- [1. 前言](#1. 前言)
- [2. ubi rootfs 故障现场](#2. ubi rootfs 故障现场)
- [3. 问题解决](#3. 问题解决)
- [4. 问题分析](#4. 问题分析)
- [5. 参考资料](#5. 参考资料)
1. 前言
限于作者能力水平,本文可能存在谬误,因此而给读者带来的损失,作者不做任何承诺。
2. ubi rootfs 故障现场
内核故障日志如下:
c
......
[ 0.000000] Linux version 4.19.94-g1194fe2-dirty (bill@bill-virtual-machine) (gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02)) #21 PREEMPT Tue Jun 4 10:18:44 CST 2024
......
[ 0.000000] Kernel command line: console=ttyO0,115200n8 root=ubi0:rootfs rw ubi.mtd=NAND.rootfs,2048 rootfstype=ubifs rootwait=1
......
[ 1.700380] omap-gpmc 50000000.gpmc: GPMC revision 6.0
[ 1.705736] gpmc_mem_init: disabling cs 0 mapped at 0x0-0x1000000
[ 1.713618] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[ 1.720006] nand: Micron MT29F2G08AAD
[ 1.723731] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[ 1.731376] nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
[ 1.736826] 11 fixed-partitions partitions found on MTD device omap2-nand.0
[ 1.743835] Creating 11 MTD partitions on "omap2-nand.0":
[ 1.749263] 0x000000000000-0x000000020000 : "NAND.SPL"
[ 1.755554] 0x000000020000-0x000000040000 : "NAND.SPL.backup1"
[ 1.762305] 0x000000040000-0x000000060000 : "NAND.SPL.backup2"
[ 1.769093] 0x000000060000-0x000000080000 : "NAND.SPL.backup3"
[ 1.775856] 0x000000080000-0x0000000c0000 : "NAND.u-boot-spl-os"
[ 1.782897] 0x0000000c0000-0x0000001c0000 : "NAND.u-boot"
[ 1.789984] 0x0000001c0000-0x0000001e0000 : "NAND.u-boot-env"
[ 1.796661] 0x0000001e0000-0x000000200000 : "NAND.u-boot-env.backup1"
[ 1.804042] 0x000000200000-0x000000a00000 : "NAND.kernel"
[ 1.817708] 0x000000a00000-0x00000e000000 : "NAND.rootfs"
[ 2.023685] 0x00000e000000-0x000010000000 : "NAND.userdata"
......
[ 2.162228] ubi0: attaching mtd9
[ 2.817396] ubi0: scanning is finished
[ 2.841103] ubi0: volume 0 ("rootfs") re-sized from 83 to 1668 LEBs
[ 2.848156] ubi0: attached mtd9 (name "NAND.rootfs", size 214 MiB)
[ 2.854435] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 2.861339] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 512
[ 2.868081] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 2.875082] ubi0: good PEBs: 1711, bad PEBs: 1, corrupted PEBs: 0
[ 2.881199] ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
[ 2.888463] ubi0: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1890895802
[ 2.897644] ubi0: available PEBs: 0, total reserved PEBs: 1711, PEBs reserved for bad PEB handling: 39
[ 2.907010] ubi0: background thread "ubi_bgt0d" started, PID 65
......
[ 2.972922] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 66
[ 3.083296] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[ 3.090747] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 3.122798] UBIFS (ubi0:0): FS size: 210399232 bytes (200 MiB, 1657 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs)
[ 3.142800] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 3.148663] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID 7A19D54A-3848-4AFB-8DDF-4E4A6B04D4FC, small LPT model
[ 3.184943] VFS: Mounted root (ubifs filesystem) on device 0:14.
[ 3.192113] devtmpfs: mounted
......
[ 3.555581] omap2-nand 8000000.nand: uncorrectable bit-flips found
[ 3.572853] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 61 bytes from PEB 82:6144, read only 61 bytes, retry
[ 3.593521] omap2-nand 8000000.nand: uncorrectable bit-flips found
[ 3.602829] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 61 bytes from PEB 82:6144, read only 61 bytes, retry
[ 3.633378] omap2-nand 8000000.nand: uncorrectable bit-flips found
[ 3.642841] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 61 bytes from PEB 82:6144, read only 61 bytes, retry
[ 3.673533] omap2-nand 8000000.nand: uncorrectable bit-flips found
[ 3.682801] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 61 bytes from PEB 82:6144, read 61 bytes
[ 3.712799] CPU: 0 PID: 1 Comm: init Not tainted 4.19.94-g1194fe2-dirty #21
[ 3.719788] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 3.732788] Backtrace:
[ 3.735259] [<c010bfe4>] (dump_backtrace) from [<c010c2b4>] (show_stack+0x18/0x1c)
[ 3.752799] r7:00001800 r6:0000003d r5:cf04c000 r4:ffffffb6
[ 3.758490] [<c010c29c>] (show_stack) from [<c09531b4>] (dump_stack+0x24/0x28)
[ 3.782799] [<c0953190>] (dump_stack) from [<c064e07c>] (ubi_io_read+0x15c/0x350)
[ 3.790320] [<c064df20>] (ubi_io_read) from [<c064bdd8>] (ubi_eba_read_leb+0xcc/0x41c)
[ 3.812796] r10:00608040 r9:cf7e7400 r8:00000052 r7:cf723400 r6:00000000 r5:cf04c000
[ 3.820659] r4:0000003d
[ 3.832800] [<c064bd0c>] (ubi_eba_read_leb) from [<c064a8ac>] (ubi_leb_read+0x78/0xc8)
[ 3.840754] r10:0000003d r9:cf7e7400 r8:00000000 r7:cf04c000 r6:cf723400 r5:00000800
[ 3.862790] r4:0000003d
[ 3.865344] [<c064a834>] (ubi_leb_read) from [<c038d560>] (ubifs_leb_read+0x34/0x80)
[ 3.882796] r8:00000800 r7:00000050 r6:0000003d r5:cf04d000 r4:cf04d000
[ 3.889531] [<c038d52c>] (ubifs_leb_read) from [<c038edf8>] (ubifs_read_node+0x9c/0x254)
[ 3.912796] r8:00000002 r7:0000003d r6:00000050 r5:00000800 r4:cf04d000
[ 3.919531] [<c038ed5c>] (ubifs_read_node) from [<c038f084>] (ubifs_read_node_wbuf+0xd4/0x2b4)
[ 3.942825] r10:0000003d r9:00000002 r8:cf7e7400 r7:cf04d000 r6:cf76fc80 r5:00000050
[ 3.950686] r4:00000800
[ 3.962803] [<c038efb0>] (ubifs_read_node_wbuf) from [<c03ab4a4>] (ubifs_tnc_read_node+0x50/0xbc)
[ 3.971716] r10:cf7e7400 r9:00000000 r8:cf04d0a8 r7:cf04d000 r6:cf7e7400 r5:00000002
[ 4.002793] r4:ca83c868
[ 4.005345] [<c03ab454>] (ubifs_tnc_read_node) from [<c03909c0>] (tnc_read_hashed_node+0xd8/0x1cc)
[ 4.022794] r7:00000000 r6:cf04d000 r5:cf7e7400 r4:ca83c868
[ 4.028482] [<c03908e8>] (tnc_read_hashed_node) from [<c039221c>] (ubifs_tnc_locate+0x1b4/0x1e8)
[ 4.052795] r7:00000000 r6:cf051d28 r5:c0e03048 r4:cf04d000
[ 4.058485] [<c0392068>] (ubifs_tnc_locate) from [<c03929cc>] (ubifs_tnc_lookup_nm+0x40/0x144)
[ 4.082796] r10:0000007f r9:cf051d28 r8:cf7e7400 r7:cf051d0c r6:cf04d000 r5:c0e03048
[ 4.090658] r4:c0e03048
[ 4.102800] [<c039298c>] (ubifs_tnc_lookup_nm) from [<c0386a2c>] (ubifs_lookup+0x244/0x314)
[ 4.111190] r10:0000007f r9:41cd253f r8:cf04d000 r7:cb155dc0 r6:cb154a18 r5:c0e03048
[ 4.132790] r4:cf7e7400
[ 4.135343] [<c03867e8>] (ubifs_lookup) from [<c02428bc>] (__lookup_slow+0x90/0x194)
[ 4.152798] r10:cf051e68 r9:00000001 r8:cb154aa0 r7:cb155dc0 r6:cf051d74 r5:c0e03048
[ 4.160659] r4:cb154a18
[ 4.172798] [<c024282c>] (__lookup_slow) from [<c02429f8>] (lookup_slow+0x38/0x4c)
[ 4.180403] r10:000b1c79 r9:cf051f5c r8:00000041 r7:00000001 r6:cf051e68 r5:cb154aa0
[ 4.202792] r4:cb155e3c
[ 4.205341] [<c02429c0>] (lookup_slow) from [<c0243220>] (walk_component+0x21c/0x31c)
[ 4.232793] r7:00000000 r6:00000000 r5:c0e03048 r4:cf051e60
[ 4.238483] [<c0243004>] (walk_component) from [<c024507c>] (path_lookupat+0x70/0x208)
[ 4.252796] r10:000b1c79 r9:cf051f5c r8:00000041 r7:cf051f5c r6:c0e03048 r5:00000000
[ 4.260658] r4:cf051e60
[ 4.272800] [<c024500c>] (path_lookupat) from [<c0247680>] (filename_lookup+0xa8/0x118)
[ 4.280840] r8:00000001 r7:cf051e60 r6:cf080000 r5:c0e03048 r4:00000001
[ 4.302802] [<c02475d8>] (filename_lookup) from [<c02477ec>] (user_path_at_empty+0x4c/0x54)
[ 4.311192] r9:cf050000 r8:ffffff9c r7:ffffff9c r6:cf051f5c r5:ffffff9c r4:00000001
[ 4.342826] [<c02477a0>] (user_path_at_empty) from [<c0233af4>] (do_faccessat+0xb8/0x22c)
[ 4.351039] r6:cf286d80 r5:00000001 r4:00000006
[ 4.362795] [<c0233a3c>] (do_faccessat) from [<c0233c98>] (sys_access+0x1c/0x20)
[ 4.370226] r10:00000021 r9:cf050000 r8:c0101204 r7:00000021 r6:00000000 r5:00000008
[ 4.392818] r4:00000044
[ 4.395367] [<c0233c7c>] (sys_access) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[ 4.412793] Exception stack(0xcf051fa8 to 0xcf051ff0)
[ 4.417869] 1fa0: 00000044 00000008 000b1c79 00000006 00000000 00000000
[ 4.442827] 1fc0: 00000044 00000008 00000000 00000021 00000000 00000044 b6f28000 00000000
[ 4.451041] 1fe0: 000c24a8 bed92b0c 0008e4f8 b6e9c5c6
从上面的内核日志
bash
[ 3.555581] omap2-nand 8000000.nand: uncorrectable bit-flips found
[ 3.572853] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 61 bytes from PEB 82:6144, read only 61 bytes, retry
观察到,出现了 ECC 错误
,原因是 bit-flips
(ECC 错误
不一定仅限于 bit-flips
) 。
3. 问题解决
这里先不分析问题的原因,先说如何解决的,然后从解决方案倒推问题的原因。解决方法很简单,就是通过 mkfs.ubifs
工具生成 rootfs.ubifs
时,添加了一个 -F
选项。构建 rootfs.ubifs
的完整 mkfs.ubifs
命令如下:
bash
mkfs.ubifs -d <ROOTFS_TARGET_DIR> -F -e 0x1f000 -c 2048 -m 2048
那么 -F
选项是何方神圣?它到底施展了什么魔法,能够修正问题?先看一下 mkfs.ubifs
工具的帮助是怎么描述 -F
选项的:
bash
$ mkfs.ubifs --help
Usage: mkfs.ubifs [OPTIONS] target
Make a UBIFS file system image from an existing directory tree
......
Options:
......
-F, --space-fixup file-system free space has to be fixed up on first mount
(requires kernel version 3.0 or greater)
......
看了之后是不是还是一头雾水?倒是知道了 -F
选项对内核版本
提出了 3.0+
的要求。再来看文档
UBIFS FAQ and HOWTO 里面对 -F
选项的说明:
bash
What is the the purpose of the -F (--space-fixup) mkfs.ubifs option?
Because of subtle ECC errors that can arise when programming NAND flash (see here), ubiformat is the recommended
way of flashing a UBI image which contains a UBIFS file system. However, this is not always possible - for example,
some embedded devices are manufactured using an industrial NAND flash programmer which has no knowledge of UBI or
UBIFS.
The -F option causes mkfs.ubifs to set a special flag in the superblock, which triggers a "free space fixup"
procedure in the kernel the very first time the filesystem is mounted. This fixup procedure involves finding
all empty pages in the UBIFS file system and re-erasing them. This ensures that NAND pages which contain all
0xFF data get fully erased, which removes any problematic non-0xFF data from their OOB areas.
Of course it is not possible to re-erase individual NAND pages, and entire PEBs are erased. UBIFS performs this
procedure by reading the useful (non 0xFF'ed) contents of LEBs and then invoking the atomic LEB change UBI
operation. Obviously, this means that UBIFS has to read and write a lot of LEBs which takes time. But this happens
only once, and the "free space fixup" procedure then unsets the "fixup" UBIFS superblock flag.
This option is supported if you are running a kernel version 3.0 or higher, or if you have pulled the changes from
a UBIFS back-port tree. Note that ubiformat is still the preferred flashing method if the image is not being flashed
for the first time, since it preserves existing erase counters (while using nandwrite or its equivalent does not).
简单翻一下其核心含义:
bash
mkfs.ubifs 的 -F 选项,在 superblock 里插入一个特殊标记,指示内核在第一次挂载 ubi rootfs 时,
重新擦除所有空白页面。这样可确保完全擦除全是 0xFF 数据的 NAND 页面,以避免一些问题页面。
由于擦除动作是按块进行的,如果只需要擦除块内的某个空白页面,这样就需要先读取其它页面到内存,在内
存中做擦除操作后,再回写,这样整个过程会很好使,所以这个过程只在烧录根文件系统后的第一次,在下一
次重新烧录根文件系统之前,都不会再做空白页的擦除操作。
mkfs.ubifs
的 -F
选项在 superblock
里插入一个特殊标记 UBIFS_FLG_SPACE_FIXUP
。
4. 问题分析
mkfs.ubifs
的 -F
处理源码见补丁 [1/1] mkfs.ubifs: add "-F" option for "free-space fixup" 。内核对 UBIFS_FLG_SPACE_FIXUP
的处理见如下代码分析:
c
/* fs/ubifs/super.c */
ubifs_mount()
ubifs_fill_super()
mount_ubifs()
static int mount_ubifs(struct ubifs_info *c)
{
...
err = ubifs_read_superblock(c); /* (1) 设置 @c->space_fixup */
...
/* 非只读挂载 && rootfs 镜像的 superblock 设置了 UBIFS_FLG_SPACE_FIXUP 标志 */
if (!c->ro_mount && c->space_fixup) {
/* (2) 【第 1 次】 烧录 ubi rootfs 镜像启动后,做 LEB 修正动作 */
err = ubifs_fixup_free_space(c);
}
...
}
c
/* (1) 设置 @c->space_fixup */
int ubifs_read_superblock(struct ubifs_info *c)
{
...
/* 按 UBIFS_FLG_SPACE_FIXUP 设置 @c->space_fixup */
c->space_fixup = !!(sup_flags & UBIFS_FLG_SPACE_FIXUP);
...
}
/* (2) 【第 1 次】 烧录 ubi rootfs 镜像启动后,做 LEB 修正动作 */
int ubifs_fixup_free_space(struct ubifs_info *c)
{
int err;
struct ubifs_sb_node *sup;
ubifs_assert(c, c->space_fixup);
ubifs_assert(c, !c->ro_mount);
ubifs_msg(c, "start fixing up free space");
err = fixup_free_space(c); /* 【第 1 次】 做 LEB(Logic Erase Block) 修正 */
if (err)
return err;
/* 读取当前的 rootfs superblock */
sup = ubifs_read_sb_node(c);
if (IS_ERR(sup))
return PTR_ERR(sup);
/* Free-space fixup is no longer required */
/* 清除 rootfs superblock 的 UBIFS_FLG_SPACE_FIXUP 标志 */
c->space_fixup = 0;
sup->flags &= cpu_to_le32(~UBIFS_FLG_SPACE_FIXUP);
/*
* 将清除了 UBIFS_FLG_SPACE_FIXUP 标志 superblock 信息写回:
* 这样下次启动就不会再做 fixup 动作了 。
*/
err = ubifs_write_sb_node(c, sup);
kfree(sup);
if (err)
return err;
...
ubifs_msg(c, "free space fixup complete");
return err;
}
上面的重点是 fixup_free_space()
调用,做了一些修正动作,对这一部分感兴趣的读者可自行阅读相关源码。另外,修正动作只发生在烧录 ubi rootfs
后第一次
启动期间,第一次修正执行后就会清除 ubi rootfs
的 UBIFS_FLG_SPACE_FIXUP
,也即下次重新烧录 ubi rootfs
之前,都不会再做这个修正动作。
修正后的内核日志如下(只截取相关部分),不再有 bit-flips
的 ECC 错误
信息:
bash
[ 1.705300] omap-gpmc 50000000.gpmc: GPMC revision 6.0
[ 1.710488] gpmc_mem_init: disabling cs 0 mapped at 0x0-0x1000000
[ 1.718410] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[ 1.724923] nand: Micron MT29F2G08AAD
[ 1.728603] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[ 1.736266] nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
[ 1.741704] 11 fixed-partitions partitions found on MTD device omap2-nand.0
[ 1.748720] Creating 11 MTD partitions on "omap2-nand.0":
[ 1.754160] 0x000000000000-0x000000020000 : "NAND.SPL"
[ 1.760419] 0x000000020000-0x000000040000 : "NAND.SPL.backup1"
[ 1.767263] 0x000000040000-0x000000060000 : "NAND.SPL.backup2"
[ 1.774018] 0x000000060000-0x000000080000 : "NAND.SPL.backup3"
[ 1.780714] 0x000000080000-0x0000000c0000 : "NAND.u-boot-spl-os"
[ 1.787773] 0x0000000c0000-0x0000001c0000 : "NAND.u-boot"
[ 1.794897] 0x0000001c0000-0x0000001e0000 : "NAND.u-boot-env"
[ 1.801508] 0x0000001e0000-0x000000200000 : "NAND.u-boot-env.backup1"
[ 1.808887] 0x000000200000-0x000000a00000 : "NAND.kernel"
[ 1.822567] 0x000000a00000-0x00000e000000 : "NAND.rootfs"
[ 2.028535] 0x00000e000000-0x000010000000 : "NAND.userdata"
......
[ 2.162337] ubi0: attaching mtd9
[ 2.817470] ubi0: scanning is finished
[ 2.841173] ubi0: volume 0 ("rootfs") re-sized from 83 to 1668 LEBs
[ 2.848229] ubi0: attached mtd9 (name "NAND.rootfs", size 214 MiB)
[ 2.854507] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 2.861411] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 512
[ 2.868154] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 2.875156] ubi0: good PEBs: 1711, bad PEBs: 1, corrupted PEBs: 0
[ 2.881273] ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
[ 2.888538] ubi0: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1188901366
[ 2.897719] ubi0: available PEBs: 0, total reserved PEBs: 1711, PEBs reserved for bad PEB handling: 39
[ 2.907084] ubi0: background thread "ubi_bgt0d" started, PID 65
......
[ 2.973019] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 66
[ 3.033637] UBIFS (ubi0:0): start fixing up free space
[ 5.726825] UBIFS (ubi0:0): free space fixup complete
[ 5.753860] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[ 5.761304] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 5.792971] UBIFS (ubi0:0): FS size: 210399232 bytes (200 MiB, 1657 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs)
[ 5.812884] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 5.818746] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID 150447CF-F29A-405C-8394-C7D00D7C2315, small LPT model
[ 5.855066] VFS: Mounted root (ubifs filesystem) on device 0:14.
[ 5.862233] devtmpfs: mounted
其中:
bash
[ 3.033637] UBIFS (ubi0:0): start fixing up free space
[ 5.726825] UBIFS (ubi0:0): free space fixup complete
正是修正过程相关的日志。
本文承接前一个故事 Linux: ubi rootfs 故障案例 (1) ,感兴趣的读者可前往阅读。
5. 参考资料
[1] What is the the purpose of the -F (--space-fixup) mkfs.ubifs option?
[2] http://www.linux-mtd.infradead.org/doc/ubifs.html#L_source