【银河麒麟高级服务器操作系统】服务器外挂存储ioerror分析及处理分享

更多银河麒麟操作系统产品及技术讨论,欢迎加入银河麒麟操作系统官方论坛

forum.kylinos.cn

了解更多银河麒麟操作系统全新产品,请点击访问

麒麟软件产品专区:product.kylinos.cn

开发者专区:developer.kylinos.cn

文档中心:document.kylinos.cn

服务器环境以及配置

|------|--------------|---------------------------------------------------------------|
| 系统环境 | 物理机/虚拟机/云/容器 | 物理机 |
| 网络环境 | 外网/私有网络/无网络 | 私有网络 |
| 硬件环境 | 处理器: | S2500 |
| 硬件环境 | 内存: | 512GB |
| 硬件环境 | 机器型号 | 擎天EF860 |
| 硬件环境 | 整机类型/架构: | arm64 |
| 硬件环境 | BIOS版本: | Great Wall BIOS V3.0 |
| 软件环境 | 具体操作系统版本 | 银河麒麟高级服务器操作系统 Kylin Linux Advanced Server release V10 (Lance) |
| 软件环境 | 内核版本 | 4.19.90-52.30.v2207.ky10.aarch64 |

现象描述

服务器巡检告显示有io error,需要进行分析。

现象分析

查看磁盘存储情况

根据串口日志,报IO Error错误的是dm-4和dm-5磁盘设备,对应的是/dev/mpathxsky02blk01和/dev/mpathxsky02blk02两块多路径盘。

分析串口日志

查看串口日志,系统出现过三次I/O error相关的报错,第一次导致系统发生了hung task,后面两次出现IO error报错后便出现了shutdown相关的日志打印。

分析第一次出现 I/O error

日志中多次出现print_req_error: I/O error, dev sdb和print_req_error: I/O error, dev sdc的错误。这表明设备sdb和sdc发生了I/O错误。另外,日志中还出现了rejecting I/O to offline device的消息(例如:sd 3:0:0:1: rejecting I/O to offline device),这通常意味着设备已经离线,无法再进行I/O操作。这些I/O错误可能是由于硬盘故障、连接问题(例如,SATA线缆故障)或控制器问题引起的。

日志中有多个任务被报告为挂起超过1200秒,如xfsaild/dm-4和containerd。这些任务的挂起是由于无法完成的磁盘I/O请求导致的,因为设备已经离线或不可用。出现了任务挂起(通常和无法访问存储设备相关),导致最后内核触发了恐慌(Kernel panic - not syncing: hung_task: blocked tasks)。

sdb和sdc设备应该对应sd 3:0:0:0和sd 3:0:0:1,查看当前收集的sosreport中的lsscsi命令,由于相隔的时间太过久远,没有3:0:0:0和3:0:0:1相关的设备,变为了5:0:0:0和5:0:0:1。后续也未再出现过rejecting I/O to offline device,问题应该已经修复。

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 30551.689416 86 print_req_error: I/O error, dev sdb, sector 278120^M 30551.690643 86 print_req_error: I/O error, dev sdb, sector 526730880^M 30551.691874 86 print_req_error: I/O error, dev sdb, sector 794630784^M 30551.693092 86 print_req_error: I/O error, dev sdb, sector 3239552^M 30601.823314 24 print_req_error: I/O error, dev sdb, sector 267177832^M 30622.174613 23 print_req_error: I/O error, dev sdb, sector 267177832^M 30627.418715 86 sd 3:0:0:1: rejecting I/O to offline device^M 30627.420021 86 print_req_error: I/O error, dev sdc, sector 209772368^M 30627.421253 86 print_req_error: I/O error, dev sdc, sector 209772376^M 30627.438674 86 sd 3:0:0:0: rejecting I/O to offline device^M 30627.439942 86 print_req_error: I/O error, dev sdb, sector 1745360^M 30627.441180 86 print_req_error: I/O error, dev sdb, sector 788207504^M 30627.442345 86 print_req_error: I/O error, dev sdb, sector 526731600^M 30627.443517 86 print_req_error: I/O error, dev sdb, sector 263186528^M 30627.444683 86 print_req_error: I/O error, dev sdb, sector 3239552^M 31446.409648 28 print_req_error: I/O error, dev sdb, sector 267177832^M 31467.420538 28 print_req_error: I/O error, dev sdb, sector 267177832^M 31476.890512 61 print_req_error: I/O error, dev sdc, sector 209772368^M 31476.891710 61 print_req_error: I/O error, dev sdc, sector 209772376^M 31482.488033 84 sd 3:0:0:1: rejecting I/O to offline device^M 31482.489254 84 print_req_error: I/O error, dev sdc, sector 209772392^M 31482.490507 84 print_req_error: I/O error, dev sdc, sector 209772384^M 31482.491733 84 print_req_error: I/O error, dev sdc, sector 209772400^M 31482.492918 84 print_req_error: I/O error, dev sdc, sector 209772408^M 31482.494051 84 sd 3:0:0:0: rejecting I/O to offline device^M 31482.495054 84 print_req_error: I/O error, dev sdb, sector 270107808^M 31482.496109 84 print_req_error: I/O error, dev sdb, sector 270106784^M 31482.497193 84 print_req_error: I/O error, dev sdb, sector 788207504^M 31871.417820 4 INFO: task xfsaild/dm-4:402347 blocked for more than 1200 seconds.^M 31871.419496 4 Tainted: G OE 4.19.90-52.30.v2207.ky10.aarch64 #1^M 31871.421055 4 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M 31871.422705 4 xfsaild/dm-4 D 0 402347 2 0x00000628^M 31871.423726 4 Call trace:^M 31871.424436 4 __switch_to+0xe8/0x150^M 31871.425293 4 __schedule+0x2b0/0x768^M 31871.426152 4 schedule+0x30/0xf0^M 31871.426917 4 xfs_log_force+0x170/0x358^M 31871.427809 4 xfsaild_push+0x5a8/0x6c0^M 31871.428766 4 xfsaild+0x11c/0x238^M 31871.429627 4 kthread+0x134/0x138^M 31871.430447 4 ret_from_fork+0x10/0x18^M 31871.431339 4 INFO: task containerd:409512 blocked for more than 1200 seconds.^M 31871.433064 4 Tainted: G OE 4.19.90-52.30.v2207.ky10.aarch64 #1^M 31871.434889 4 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M 31871.436710 4 containerd D 0 409512 1 0x00000608^M 31871.437906 4 Call trace:^M 31871.438822 4 __switch_to+0xe8/0x150^M 31871.439785 4 __schedule+0x2b0/0x768^M 31871.440741 4 schedule+0x30/0xf0^M 31871.441658 4 io_schedule+0x20/0x90^M 31871.442570 4 wait_on_page_bit+0x134/0x178^M 31871.443548 4 __filemap_fdatawait_range+0xd0/0x120^M 31871.444618 4 file_write_and_wait_range+0xb0/0xd8^M 31871.445655 4 xfs_file_fsync+0x58/0x1d8^M 31871.446584 4 vfs_fsync_range+0x4c/0x90^M 31871.447743 4 do_fsync+0x48/0x78^M 31871.448623 4 sys_fdatasync+0x24/0x38^M 31871.449503 4 __sys_trace_return+0x0/0x4^M 31871.450841 4 Kernel panic - not syncing: hung_task: blocked tasks^M 31871.451999 4 CPU: 4 PID: 748 Comm: khungtaskd Tainted: G OE 4.19.90-52.30.v2207.ky10.aarch64 #1^M 31871.453937 4 Source Version: ccdbfc2c55f0fb8dde14fae29155446fc8a7e941^M 31871.455059 4 Hardware name: GreatWall æ\x93\x8e天EF860/GW-748F2A-FTG, BIOS Great Wall BIOS V3.0 2022-11-17^M 31871.457098 4 Call trace:^M 31871.457908 4 dump_backtrace+0x0/0x1b0^M 31871.458911 4 show_stack+0x24/0x30^M 31871.459845 4 dump_stack+0xb4/0xf0^M 31871.460748 4 panic+0x130/0x310^M 31871.461654 4 watchdog+0x2b8/0x468^M 31871.462536 4 kthread+0x134/0x138^M 31871.463337 4 ret_from_fork+0x10/0x18^M 31871.464354 4 SMP: stopping secondary CPUs^M 31871.584509 4 Kernel Offset: 0x37db061e0000 from 0xffff000048000000^M 31871.585577 4 CPU features: 0x10,a0000008^M 31871.586428 4 Memory Limit: none^M 31871.587456 4 Rebooting in 10 seconds..^M |

分析第二次出现I/O error

日志中先出现了多个print_req_error: I/O error, dev dm-4/5/6, sector xxxx,说明lvm逻辑卷输入输出出现错误。而后出现了XFS (dm-4): writeback error on sector xxxx,XFS 文件系统报告了writeback error和metadata I/O error,特别是在xlog_iodone操作中,错误发生在不同的扇区,涉及数据和元数据的读写操作。错误代码为error 5,对应于输入/输出错误(EIO),这表示底层存储设备无法完成读写请求。

而后马上出现了shutdown及reboot: Restarting system相关日志,这应是之前执行了关机操作。虽然是先出现了IO error的日志,而后出现的shutdown日志,但是shutdown操作并不会立即写到日志,推测是关机操作使得多路径存储设备下线,文件系统无法写入,而后出现的IO error。

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4931293.526231 71 print_req_error: I/O error, dev dm-4, sector 5428128^M 4931293.526436 70 print_req_error: I/O error, dev dm-5, sector 15354992^M 4931293.527010 71 XFS (dm-4): writeback error on sector 5428240^M 4931293.527704 70 XFS (dm-5): writeback error on sector 15355120^M 4931293.528811 70 print_req_error: I/O error, dev dm-5, sector 16527680^M 4931293.529516 70 XFS (dm-5): writeback error on sector 16527808^M 4931293.530124 70 print_req_error: I/O error, dev dm-5, sector 24740872^M 4931293.530762 70 print_req_error: I/O error, dev dm-5, sector 27572552^M 4931293.531550 70 XFS (dm-5): writeback error on sector 272373048^M 4931293.532123 70 XFS (dm-5): writeback error on sector 276753512^M 4931293.532702 70 XFS (dm-5): writeback error on sector 531646336^M 4931293.533384 70 XFS (dm-5): writeback error on sector 24740992^M 4931293.533961 70 XFS (dm-5): writeback error on sector 27572656^M 4931293.534656 70 XFS (dm-5): writeback error on sector 288670704^M 4931293.535240 70 XFS (dm-5): writeback error on sector 288670720^M 4931293.538236 66 XFS (dm-4): metadata I/O error in "xlog_iodone" at daddr 0xc8086b0 len 64 error 5^M 4931293.539266 66 XFS (dm-4): Log I/O Error Detected. Shutting down filesystem^M 4931293.539586 50 XFS (dm-5): metadata I/O error in "xlog_iodone" at daddr 0x1f402070 len 64 error 5^M 4931293.540003 50 XFS (dm-4): Please umount the filesystem and rectify the problem(s)^M 4931293.541683 50 XFS (dm-5): Log I/O Error Detected. Shutting down filesystem^M 4931293.542370 50 XFS (dm-5): Please umount the filesystem and rectify the problem(s)^M 4931294.409340 99 XFS (dm-6): metadata I/O error in "xlog_iodone" at daddr 0x1f45ad90 len 64 error 5^M 4931314.007532 42 shutdown: 9 output lines suppressed due to ratelimiting^M 4931314.118056 47 dracut Warning: Killing all remaining processes^M 4931314.508042 47 dracut Warning: Unmounted /oldroot.^M 4931315.028993 75 kauditd_printk_skb: 8 callbacks suppressed^M 4931328.960882 0 reboot: Restarting system^M |

分析第三次出现I/O error

和第二次出现IO error相同,出现IO error后,也立马出现了shutdown相关的日志。

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2001517.911050102 print_req_error: I/O error, dev dm-6, sector 62984704^M 2001517.911712100 print_req_error: I/O error, dev dm-5, sector 218187936^M 2001517.912383102 XFS (dm-6): writeback error on sector 62984832^M 2001517.913028100 print_req_error: I/O error, dev dm-5, sector 216240008^M 2001517.913138100 print_req_error: I/O error, dev dm-5, sector 6241168^M 2001517.913144100 print_req_error: I/O error, dev dm-5, sector 7116376^M 2001517.913166100 XFS (dm-5): writeback error on sector 6241240^M 2001517.913170100 XFS (dm-5): writeback error on sector 7116472^M 2001517.913174100 XFS (dm-5): writeback error on sector 7116568^M 2001517.913229100 print_req_error: I/O error, dev dm-5, sector 6772344^M 2001517.913254100 XFS (dm-5): writeback error on sector 6772472^M 2001517.913258100 XFS (dm-5): writeback error on sector 6772520^M 2001517.913263100 XFS (dm-5): writeback error on sector 338549088^M 2001517.913392100 XFS (dm-5): writeback error on sector 110130576^M 2001517.913398100 XFS (dm-5): writeback error on sector 110130584^M 2001517.923828101 XFS (dm-5): metadata I/O error in "xlog_iodone" at daddr 0xc82f840 len 64 error 5^M 2001517.925072101 XFS (dm-5): Log I/O Error Detected. Shutting down filesystem^M 2001517.925237103 XFS (dm-6): metadata I/O error in "xlog_iodone" at daddr 0x1f44c278 len 64 error 5^M 2001517.925257101 XFS (dm-6): metadata I/O error in "xlog_iodone" at daddr 0x1f44c2b8 len 64 error 5^M 2001517.925419103 XFS (dm-6): Log I/O Error Detected. Shutting down filesystem^M 2001517.925421103 XFS (dm-6): Please umount the filesystem and rectify the problem(s)^M 2001517.925840103 XFS (dm-5): Please umount the filesystem and rectify the problem(s)^M 2001518.084461 59 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.134555 29 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.134679 8 overlayfs: failed to get metacopy (-5)^M 2001518.135513 29 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.137716 17 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.138605 17 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.141203 71 overlayfs: failed to get metacopy (-5)^M 2001518.143645 54 overlayfs: failed to get metacopy (-5)^M 2001518.144968 95 overlayfs: failed to get metacopy (-5)^M 2001518.145177 46 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.146489 46 overlayfs: failed to get xattr trusted.overlay.redirect: err=-5)^M 2001518.148168 72 overlayfs: failed to get metacopy (-5)^M 2001518.148170 34 overlayfs: failed to get metacopy (-5)^M 2001518.148259 34 overlayfs: failed to get metacopy (-5)^M 2001518.150749 96 overlayfs: failed to get metacopy (-5)^M 2001518.152389116 overlayfs: failed to get metacopy (-5)^M 2001528.063127 97 systemd-shutdown1: Waiting for process: local-path-prov^M 2001600.084921123 kauditd_printk_skb: 23 callbacks suppressed^M 2001613.949707 99 shutdown: 7 output lines suppressed due to ratelimiting^M 2001614.020310100 dracut Warning: Killing all remaining processes^M 2001614.364201100 dracut Warning: Unmounted /oldroot.^M 2001628.676636 0 reboot: Restarting system^M |

查看tuned.log,出现了两次关机操作,应该是对应这两次IO error。

分析结果

第一次出现I/O错误时,日志中显示了 rejecting I/O to offline device的消息。这通常意味着设备已经离线,无法继续进行I/O操作,从而导致了I/O错误。由于这个错误的日志已经存在一段时间,问题应该已经得到解决。

第二次和第三次出现I/O错误时,伴随着关机相关的日志信息。具体来说,关机日志(shutdown 和 reboot)表明系统在I/O错误发生前,应该有执行关机操作。系统关机通常会触发文件系统的卸载和数据的回写,但由于多路径存储设备已经下线,XFS 无法完成日志写回和其他数据的持久化操作。日志中的XFS writeback error和metadata I/O error表明XFS文件系统在尝试进行数据写回时失败。尤其是在元数据操作过程中,XFS 在执行xlog_iodone操作时未能完成I/O请求,导致元数据无法成功写回。这些错误的原因是,XFS尝试将数据写入物理存储设备,但由于设备处于离线状态或无法访问,文件系统无法完成这些操作。

相关推荐
難釋懷9 分钟前
Nginx自签名-图形化工具 XCA
运维·nginx
迷枫7121 小时前
DM8 目录结构与常用排查入口梳理
服务器·数据库
运维栈记2 小时前
API Error: 400 Request body format invalid
linux·ai
志栋智能2 小时前
小步快跑:从单一场景开启超自动化巡检之旅
运维·网络·人工智能·自动化
AugustRed2 小时前
Linux 运维常用命令大全(超全速查表)
运维·网络·php
小白兔奶糖ovo2 小时前
【Leetcode】231. 2的幂
linux·算法·leetcode
weixin_394758032 小时前
CRMEB 会员电商系统PRO系统安装之宝塔安装教程-新手推荐(软件管理)
服务器·阿里云
Plastic garden2 小时前
Docker(1)
运维·docker·容器
s_w.h3 小时前
【 linux 】动静态库的制作
linux·运维·服务器·算法·bash
songjxin3 小时前
Nginx 日志分析可视化面板
运维·nginx