onecli收集FFDC日志:
1、将onecli文件拷贝到本地临时目录下(比如/tmp);
2、 确保onecli文件具有可执行属性,执行命令:chmod +x lnvgy_utl_lxceb_onecli01n-3.2.0_rhel_x86-64.bin;
3、 执行onecli工具(需要用ROOT用户),执行命令:./ lnvgy_utl_lxceb_onecli01n-3.2.0_rhel_x86-64.bin;
4、稍等一段时间,onecli程序会将收集到的信息保存到/var/log/Lenovo_Support 目录下
storcli收集RAIDlog方法如下:
1、解压缩到Linux/ tmp文件夹
2、安装rpm -Uvh /tmp/Linux/storcli-*
3、执行shell命令
>>>>>>>>>>>>>>>>>>
/opt/MegaRAID/storcli/storcli64 /c0 show all > show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0 show events file=mr_events0.log
(Note: provide "mr_events0.log file once you finish with this command)
/opt/MegaRAID/storcli/storcli64 /c0 show termlog > termlog.txt
/opt/MegaRAID/storcli/storcli64 /c0/bbu show all > bbu_show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0/cv show all > cv_show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0/dall show all > dall_show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0/eall show all > eall_show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all > sall_show_all.txt
/opt/MegaRAID/storcli/storcli64 /c0/vall show all > vall_show_all.txt
故障硬盘判断方式:
1:找到有问题的挂载点:
[root@bdp-bat-worker01 ~]# ls /data/
ls: cannot access /data/disk12: Input/output error
disk1 disk10 disk11 disk12 disk2 disk3 disk4 disk5 disk6 disk7 disk8 disk9
2:找到故障盘:
root@bdp-bat-worker01 ~]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
devtmpfs devtmpfs 252G 0 252G 0% /dev
tmpfs tmpfs 252G 96K 252G 1% /dev/shm
tmpfs tmpfs 252G 819M 251G 1% /run
tmpfs tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/mapper/centos-root xfs 450G 97G 354G 22% /
/dev/sdm2 xfs 1014M 146M 869M 15% /boot
/dev/sde xfs 7.3T 6.4T 919G 88% /data/disk5
/dev/sdl xfs 7.3T 6.5T 840G 89% /data/disk12
/dev/sdh xfs 7.3T 6.4T 908G 88% /data/disk8
/dev/sdk xfs 7.3T 6.4T 924G 88% /data/disk11
/dev/sdb xfs 7.3T 6.4T 919G 88% /data/disk2
/dev/sdj xfs 7.3T 6.4T 923G 88% /data/disk10
/dev/sdi xfs 7.3T 6.4T 924G 88% /data/disk9
/dev/sdg xfs 7.3T 6.4T 924G 88% /data/disk7
/dev/sda xfs 7.3T 6.4T 912G 88% /data/disk1
/dev/sdm1 vfat 200M 12M 189M 6% /boot/efi
/dev/sdf xfs 7.3T 6.4T 924G 88% /data/disk6
/dev/sdd xfs 7.3T 6.4T 924G 88% /data/disk4
/dev/sdc xfs 7.3T 6.4T 922G 88% /data/disk3
/dev/mapper/centos-home xfs 438G 41M 438G 1% /home
tmpfs tmpfs 51G 0 51G 0% /run/user/0
cm_processes tmpfs 252G 75M 252G 1% /run/cloudera-scm-agent/process
tmpfs tmpfs 51G 0 51G 0% /run/user/1001
3:找到故障盘的信息:
[root@bdp-bat-worker01 ~]# smartctl -all /dev/sdl
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=======> INVALID ARGUMENT TO -l: l
tlog,N[,RANGE], nvmelog,N,SIZE <=======
Use smartctl -h to get a usage summary
[root@bdp-bat-worker01 ~]# smartctl --all /dev/sdl
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: LENOVO
Product: ST8000NM001A X
Revision: LCBA
Compliance: SPC-5
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500de2fe497
Serial number: WSD2RTGB0000E1430X0R
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Thu Dec 12 14:18:38 2024 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned = 992
Power on minutes since format <not available>
Current Drive Temperature: 40 C
Drive Trip Temperature: 65 C
Manufactured in week 20 of year 2021
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 33
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 44045
Elements in grown defect list: 1018
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 82 0 82 85 192163.006 1
write: 0 0 100 100 447 89552.595 0
verify: 0 2 7 9 40 583.376 0
Non-medium error count: 0
No Self-tests have been logged
4:在FFDC中找到盘的序号