linux系统下硬盘无法读写,但是服务器上硬盘没有告警,确定故障硬盘的信息

onecli收集FFDC日志:

1、将onecli文件拷贝到本地临时目录下(比如/tmp);

2、 确保onecli文件具有可执行属性,执行命令:chmod +x lnvgy_utl_lxceb_onecli01n-3.2.0_rhel_x86-64.bin;

3、 执行onecli工具(需要用ROOT用户),执行命令:./ lnvgy_utl_lxceb_onecli01n-3.2.0_rhel_x86-64.bin;

4、稍等一段时间,onecli程序会将收集到的信息保存到/var/log/Lenovo_Support 目录下

storcli收集RAIDlog方法如下:

1、解压缩到Linux/ tmp文件夹

2、安装rpm -Uvh /tmp/Linux/storcli-*

3、执行shell命令

>>>>>>>>>>>>>>>>>>

/opt/MegaRAID/storcli/storcli64 /c0 show all > show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0 show events file=mr_events0.log

(Note: provide "mr_events0.log file once you finish with this command)

/opt/MegaRAID/storcli/storcli64 /c0 show termlog > termlog.txt

/opt/MegaRAID/storcli/storcli64 /c0/bbu show all > bbu_show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0/cv show all > cv_show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0/dall show all > dall_show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0/eall show all > eall_show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all > sall_show_all.txt

/opt/MegaRAID/storcli/storcli64 /c0/vall show all > vall_show_all.txt

故障硬盘判断方式:

1:找到有问题的挂载点:

root@bdp-bat-worker01 \~\]# ls /data/ ls: cannot access /data/disk12: Input/output error disk1 disk10 disk11 disk12 disk2 disk3 disk4 disk5 disk6 disk7 disk8 disk9 2:找到故障盘: root@bdp-bat-worker01 \~\]# df -Th Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 252G 0 252G 0% /dev tmpfs tmpfs 252G 96K 252G 1% /dev/shm tmpfs tmpfs 252G 819M 251G 1% /run tmpfs tmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/mapper/centos-root xfs 450G 97G 354G 22% / /dev/sdm2 xfs 1014M 146M 869M 15% /boot /dev/sde xfs 7.3T 6.4T 919G 88% /data/disk5 /dev/sdl xfs 7.3T 6.5T 840G 89% /data/disk12 /dev/sdh xfs 7.3T 6.4T 908G 88% /data/disk8 /dev/sdk xfs 7.3T 6.4T 924G 88% /data/disk11 /dev/sdb xfs 7.3T 6.4T 919G 88% /data/disk2 /dev/sdj xfs 7.3T 6.4T 923G 88% /data/disk10 /dev/sdi xfs 7.3T 6.4T 924G 88% /data/disk9 /dev/sdg xfs 7.3T 6.4T 924G 88% /data/disk7 /dev/sda xfs 7.3T 6.4T 912G 88% /data/disk1 /dev/sdm1 vfat 200M 12M 189M 6% /boot/efi /dev/sdf xfs 7.3T 6.4T 924G 88% /data/disk6 /dev/sdd xfs 7.3T 6.4T 924G 88% /data/disk4 /dev/sdc xfs 7.3T 6.4T 922G 88% /data/disk3 /dev/mapper/centos-home xfs 438G 41M 438G 1% /home tmpfs tmpfs 51G 0 51G 0% /run/user/0 cm_processes tmpfs 252G 75M 252G 1% /run/cloudera-scm-agent/process tmpfs tmpfs 51G 0 51G 0% /run/user/1001 3:找到故障盘的信息: \[root@bdp-bat-worker01 \~\]# smartctl -all /dev/sdl smartctl 7.0 2018-12-30 r4883 \[x86_64-linux-3.10.0-1127.el7.x86_64\] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org =======\> INVALID ARGUMENT TO -l: l tlog,N\[,RANGE\], nvmelog,N,SIZE \<======= Use smartctl -h to get a usage summary \[root@bdp-bat-worker01 \~\]# smartctl --all /dev/sdl smartctl 7.0 2018-12-30 r4883 \[x86_64-linux-3.10.0-1127.el7.x86_64\] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: LENOVO Product: ST8000NM001A X Revision: LCBA Compliance: SPC-5 User Capacity: 8,001,563,222,016 bytes \[8.00 TB

Logical block size: 512 bytes

Physical block size: 4096 bytes

LU is fully provisioned

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Logical Unit id: 0x5000c500de2fe497

Serial number: WSD2RTGB0000E1430X0R

Device type: disk

Transport protocol: SAS (SPL-3)

Local Time is: Thu Dec 12 14:18:38 2024 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

Temperature Warning: Enabled

=== START OF READ SMART DATA SECTION ===

SMART Health Status: OK

Grown defects during certification <not available>

Total blocks reassigned during format <not available>

Total new blocks reassigned = 992

Power on minutes since format <not available>

Current Drive Temperature: 40 C

Drive Trip Temperature: 65 C

Manufactured in week 20 of year 2021

Specified cycle count over device lifetime: 50000

Accumulated start-stop cycles: 33

Specified load-unload count over device lifetime: 600000

Accumulated load-unload cycles: 44045

Elements in grown defect list: 1018

Error counter log:

Errors Corrected by Total Correction Gigabytes Total

ECC rereads/ errors algorithm processed uncorrected

fast | delayed rewrites corrected invocations [10^9 bytes] errors

read: 0 82 0 82 85 192163.006 1

write: 0 0 100 100 447 89552.595 0

verify: 0 2 7 9 40 583.376 0

Non-medium error count: 0

No Self-tests have been logged

4:在FFDC中找到盘的序号

相关推荐
十年磨一剑~6 分钟前
centos查看开启关闭防火墙状态
linux·运维·centos
无效的名字20 分钟前
向日葵远程控制debian无法进入控制画面的解决方法
运维·debian
行云流水剑41 分钟前
【学习记录】在 Ubuntu 中将新硬盘挂载到 /home 目录的完整指南
服务器·学习·ubuntu
藥瓿亭1 小时前
K8S认证|CKS题库+答案| 7. Dockerfile 检测
运维·ubuntu·docker·云原生·容器·kubernetes·cks
搬码临时工1 小时前
如何把本地服务器变成公网服务器?内网ip网址转换到外网连接访问
运维·服务器·网络·tcp/ip·智能路由器·远程工作·访问公司内网
vortex51 小时前
探索 Shell:选择适合你的命令行利器 bash, zsh, fish, dash, sh...
linux·开发语言·bash·shell·dash
GalaxyPokemon2 小时前
LeetCode - 148. 排序链表
linux·算法·leetcode
Guheyunyi2 小时前
监测预警系统重塑隧道安全新范式
大数据·运维·人工智能·科技·安全
懒羊羊大王呀2 小时前
Ubuntu20.04中 Redis 的安装和配置
linux·redis
鳄鱼杆2 小时前
服务器 | Centos 9 系统中,如何部署SpringBoot后端项目?
服务器·spring boot·centos