infinband诊断工具集合
rpm包:infiniband-diags
命令集合:
/usr/sbin/check_lft_balance.pl
/usr/sbin/dump_fts
/usr/sbin/dump_lfts.sh
/usr/sbin/dump_mfts.sh
/usr/sbin/ibaddr
/usr/sbin/ibcacheedit
/usr/sbin/ibccconfig
/usr/sbin/ibccquery
/usr/sbin/ibfindnodesusing.pl
/usr/sbin/ibhosts
/usr/sbin/ibidsverify.pl
/usr/sbin/iblinkinfo
/usr/sbin/ibnetdiscover
/usr/sbin/ibnodes
/usr/sbin/ibping
/usr/sbin/ibportstate
/usr/sbin/ibqueryerrors
/usr/sbin/ibroute
/usr/sbin/ibrouters
/usr/sbin/ibstat
/usr/sbin/ibstatus
/usr/sbin/ibswitches
/usr/sbin/ibsysstat
/usr/sbin/ibtracert
/usr/sbin/perfquery
/usr/sbin/saquery
/usr/sbin/sminfo
/usr/sbin/smpdump
/usr/sbin/smpquery
/usr/sbin/vendstat
|--------|---------------------------|-----------|
| 类别 | 工具示例 | 主要用途 |
| 状态查询 | ibstat, ibstatus, ibnodes | 查看设备/端口状态 |
| 拓扑发现 | ibnetdiscover, iblinkinfo | 构建网络拓扑 |
| 路由诊断 | ibroute, ibtracert | 路径追踪 |
| 性能监控 | perfquery, ibqueryerrors | 错误与性能计数 |
| 子网管理 | saquery, sminfo | 与 SM 交互 |
| 端口控制 | ibportstate, ibping | 启用/测试端口 |
| 高级脚本 | check_lft_balance.pl | 自动化诊断 |
⚠️ 注意:大多数工具需要 root 权限 或对 /dev/infiniband/umad* 设备的访问权限,并且要求 子网管理器(opensm 或 vendor SM)正在运行。
大部分这些工具是专为原生 InfiniBand(IB)设计的,在 RoCE(RDMA over Converged Ethernet)环境下无法直接使用,或功能受限。
部分工具RoCE不可用
nfiniBand vs RoCE 的架构差异
|---------|---------------------------------|---------------------------------|
| 特性 | InfiniBand (IB) | RoCE (v1/v2) |
| 网络层 | 专用 IB 网络(LID 寻址、子网管理器 SM) | 基于以太网(IP/GID 寻址,无 SM) |
| 路由/拓扑发现 | 依赖子网管理器(SM)维护 LFT/MFT | 依赖 IP 路由(如 BGP/OSPF)或二层交换 |
| 管理协议 | 使用 SA(Subnet Administrator)、SMP | 无 SA/SMP,使用标准以太网管理(如 LLDP、SNMP) |
| 设备标识 | LID(本地 ID)、GUID | GID(基于 MAC/IP 的全局 ID) |
因此,依赖 IB 子网管理器(SM)、LID、SMP 协议的工具在 RoCE 下无法工作。
|------------------------------------|----------------------------------------|
| 工具 | 原因 |
| ibnetdiscover | 需要遍历 SMP 路径,RoCE 无交换机 SMP 接口 |
| iblinkinfo | 依赖 IB 链路层发现,RoCE 是以太网链路 |
| ibroute, ibtracert | 基于 LID 路由,RoCE 使用 IP/GID |
| smpquery, smpdump | SMP 是 IB 专属管理协议 |
| ibswitches, ibrouters | RoCE 交换机不暴露 IB 节点类型 |
| saquery, sminfo | RoCE 没有 Subnet Administrator |
| ibportstate(部分功能) | 无法通过 SMP 控制远程端口状态 |
| dump_lfts.sh, check_lft_balance.pl | LFT(Linear Forwarding Table)是 IB 交换机概念 |
以下工具 主要操作本地 HCA(主机通道适配器),不依赖远程 IB 协议,因此 在 RoCE 主机上仍可使用:
|---------------|----------------------------------------------------------|
| 工具 | 在 RoCE 下的作用 |
| ibstat | 显示本地 RoCE 网卡(如 mlx5_0)状态、端口、GID 等 ✅ |
| ibstatus | 显示本地端口速率、状态、MTU、GID 列表 ✅ |
| ibsysstat | 显示本地系统级 IB/RoCE 统计 ✅ |
| perfquery | 查询本地端口性能计数器(需驱动支持)✅(但可能不显示所有字段) |
| ibqueryerrors | 可能部分工作,但很多 IB 错误计数器在 RoCE 中不存在 ⚠️ |
| ibping | 可以工作! RoCE 支持 ibping(基于 RDMA CM),但需双方运行 ibping -s 和客户端 ✅ |
| ibaddr | 可解析本地 GID,但无法查询远程 LID(因为没有 LID)⚠️ |
使用方法
基础信息查询工具
ibstat
-
作用:显示本地 IB 设备的基本状态(无需 root)
-
示例:
ibstatCA 'mlx5_0'
CA type: MT4119
Number of ports: 1
Firmware version: 12.28.2006
ibstatus
- 作用:更详细的端口状态(无需 root)
- 示例:
ibstatus mlx5_0
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:b8e9:2403:000a:b6c6
base lid: 0xdc
sm lid: 0x105
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 200 Gb/sec (2X NDR)
link_layer: InfiniBand
ibhosts
- 作用:列出网络中所有 IB 主机,仅主机(CA设备)(无需 root)
- 示例:
ibhosts
Ca : 0xfc6a1c03006788d0 ports 1 "Mellanox Technologies Aggregation Node"
Ca : 0x5c25730300ab1985 ports 1 "node128 mlx5_9"
Ca : 0x5c25730300ab1b75 ports 1 "node127 mlx5_9"
Ca : 0x5c25730300a90221 ports 1 "node125 mlx5_9"
Ca : 0x5c25730300a92631 ports 1 "node123 mlx5_9"
Ca : 0x5c25730300ab1f05 ports 1 "node122 mlx5_9"
Ca : 0x5c25730300a988f5 ports 1 "node121 mlx5_9"
Ca : 0x5c25730300a9a515 ports 1 "node120 mlx5_9"
Ca : 0x5c25730300ab5825 ports 1 "node119 mlx5_9"
Ca : 0x5c25730300ab31f5 ports 1 "node118 mlx5_9"
Ca : 0x5c25730300ab2f45 ports 1 "node117 mlx5_9"
Ca : 0x5c25730300ab4875 ports 1 "node116 mlx5_9"
Ca : 0x5c25730300a97ed5 ports 1 "node115 mlx5_9"
Ca : 0x5c25730300ab2ed5 ports 1 "node114 mlx5_9"
Ca : 0x5c25730300a98ee5 ports 1 "node113 mlx5_9"
Ca : 0x5c25730300a9d345 ports 1 "node112 mlx5_9"
Ca : 0x5c25730300ab3435 ports 1 "node111 mlx5_9"
Ca : 0x5c25730300a97f05 ports 1 "node110 mlx5_9"
Ca : 0x5c25730300a99f25 ports 1 "node109 mlx5_9"
Ca : 0x5c25730300ab27c5 ports 1 "node108 mlx5_9"
Ca : 0x5c25730300ab56d5 ports 1 "node107 mlx5_9"
Ca : 0x5c25730300a90051 ports 1 "node106 mlx5_9"
Ca : 0x5c25730300a98ea5 ports 1 "node105 mlx5_9"
Ca : 0x5c25730300ab50b5 ports 1 "node104 mlx5_9"
ibnodes
- 作用:显示所有节点(主机 + 交换机)的 GUID 和端口信息(无需 root)
- 示例:
root@node015# ibnodes
Ca : 0xfc6a1c03006788d0 ports 1 "Mellanox Technologies Aggregation Node"
Ca : 0x5c25730300ab1985 ports 1 "node128 mlx5_9"
Ca : 0x5c25730300ab1b75 ports 1 "node127 mlx5_9"
Ca : 0x5c25730300a90221 ports 1 "node125 mlx5_9"
Ca : 0x5c25730300a92631 ports 1 "node123 mlx5_9"
Ca : 0x5c25730300ab1f05 ports 1 "node122 mlx5_9"
Ca : 0x5c25730300a988f5 ports 1 "node121 mlx5_9"
Ca : 0x5c25730300a9a515 ports 1 "node120 mlx5_9"
Ca : 0x5c25730300ab5825 ports 1 "node119 mlx5_9"
Ca : 0x5c25730300ab31f5 ports 1 "node118 mlx5_9"
Ca : 0x5c25730300ab2f45 ports 1 "node117 mlx5_9"
Ca : 0x5c25730300ab4875 ports 1 "node116 mlx5_9"
Ca : 0x5c25730300a97ed5 ports 1 "node115 mlx5_9"
Ca : 0x5c25730300a98ee5 ports 1 "node113 mlx5_9"
Ca : 0x5c25730300ab2ed5 ports 1 "node114 mlx5_9"
Ca : 0x5c25730300a9d345 ports 1 "node112 mlx5_9"
Ca : 0x5c25730300ab3435 ports 1 "node111 mlx5_9"
Ca : 0x5c25730300a97f05 ports 1 "node110 mlx5_9"
Ca : 0x5c25730300a99f25 ports 1 "node109 mlx5_9"
Ca : 0x5c25730300ab27c5 ports 1 "node108 mlx5_9"
Ca : 0x5c25730300ab56d5 ports 1 "node107 mlx5_9"
Ca : 0x5c25730300a90051 ports 1 "node106 mlx5_9"
Ca : 0x5c25730300a98ea5 ports 1 "node105 mlx5_9"
Ca : 0x5c25730300ab50b5 ports 1 "node104 mlx5_9"
Ca : 0x5c25730300ab5015 ports 1 "node103 mlx5_9"
Ca : 0x5c25730300ab2675 ports 1 "node102 mlx5_9"
Ca : 0x5c25730300ab39c5 ports 1 "node101 mlx5_9"
高级诊断工具(需 root 权限)
iblinkinfo
- 作用:显示链路速率、状态和拓扑
- 示例:
iblinkinfo
CA: Mellanox Technologies Aggregation Node:
0xfc6a1c03006788d0 548 1[ ] ==( 4X 106.25 Gbps Active/ LinkUp)==> 433 129[ ] "SW400-TOR-16" ( )
CA: node128 mlx5_9:
0x5c25730300ab1985 891 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 64[ ] "SW400-TOR-16" ( )
CA: node127 mlx5_9:
0x5c25730300ab1b75 890 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 63[ ] "SW400-TOR-16" ( )
CA: node125 mlx5_9:
0x5c25730300a90221 736 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 61[ ] "SW400-TOR-16" ( )
CA: node123 mlx5_9:
0x5c25730300a92631 749 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 59[ ] "SW400-TOR-16" ( )
CA: node122 mlx5_9:
0x5c25730300ab1f05 747 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 58[ ] "SW400-TOR-16" ( )
CA: node121 mlx5_9:
0x5c25730300a988f5 732 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 57[ ] "SW400-TOR-16" ( )
CA: node120 mlx5_9:
0x5c25730300a9a515 757 1[ ] ==( 2X 106.25 Gbps Active/ LinkUp)==> 433 56[ ] "SW400-TOR-16" ( )
ibnetdiscover
- 作用:发现并绘制 IB 网络拓扑
- 示例:
ibnetdiscover
Topology file: generated on Thu Jan 15 20:03:18 2026
vendid=0x2c9
devid=0xd2f2
sysimgguid=0xfc6a1c03006788c0
switchguid=0xfc6a1c03006788c0(fc6a1c03006788c0)
Switch 129 "S-fc6a1c03006788c0" # "SW400-TOR-16" base port 0 lid 433 lmc 0
1\] "H-5c25730300ab39e5"\[1\](5c25730300ab39e5) # "node066 mlx5_9" lid 448 2xNDR \[2\] "H-5c25730300ab3145"\[1\](5c25730300ab3145) # "node065 mlx5_9" lid 567 2xNDR \[3\] "H-5c25730300ab2f05"\[1\](5c25730300ab2f05) # "node067 mlx5_9" lid 440 2xNDR ...... \[65\] "S-fc6a1c030051ddc0"\[121\] # "SW400-AGG-1" lid 47 4xNDR \[67\] "S-fc6a1c030051ddc0"\[123\] # "SW400-AGG-1" lid 47 4xNDR \[69\] "S-fc6a1c030051ddc0"\[125\] # "SW400-AGG-1" lid 47 4xNDR \[71\] "S-fc6a1c030051ddc0"\[127\] # "SW400-AGG-1" lid 47 4xNDR ...... \[123\] "S-fc6a1c030051dd00"\[123\] # "SW400-AGG-8" lid 4 4xNDR \[125\] "S-fc6a1c030051dd00"\[125\] # "SW400-AGG-8" lid 4 4xNDR \[127\] "S-fc6a1c030051dd00"\[127\] # "SW400-AGG-8" lid 4 4xNDR \[129\] "H-fc6a1c03006788d0"\[1\](fc6a1c03006788d0) # "Mellanox Technologies Aggregation Node" lid 548 4xNDR vendid=0x2c9 devid=0xd2f2 ......
ibroute
- 作用:查询路由表
- 示例:
Examples:
ibroute -- Unicast examples:
ibroute 4 # dump all lids with valid out ports of switch with lid 4
ibroute -a 4 # same, but dump all lids, even with invalid out ports
ibroute -n 4 # simple dump format - no destination resolving
ibroute 4 10 # dump lids starting from 10
ibroute 4 0x10 0x20 # dump lid range
ibroute -G 0x08f1040023 # resolve switch by GUID
ibroute -D 0,1 # resolve switch by direct path
ibroute -- Multicast examples:
ibroute -M 4 # dump all non empty mlids of switch with lid 4
ibroute -M 4 0xc010 0xc020 # same, but with range
ibroute -M -n 4 # simple dump format
性能监控工具
perfquery
InfiniBand 性能计数器查询工具
用途:查询和重置 InfiniBand 端口性能计数器,监控网络健康状况和性能指标。
权限要求:需要 root 权限
示例:
CA 'mlx5_9'
CA type: MT4129
Number of ports: 1
Firmware version: 28.39.3900
Hardware version: 0
Node GUID: 0x5c25730300ab4265
System image GUID: 0x5c25730300ab4264
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 971
LMC: 0
SM lid: 189
Capability mask: 0xa751e848
Port GUID: 0x5c25730300ab4265
Link layer: InfiniBand
root@node013:# perfquery -G 0x5c25730300ab4265 1 #-G 后面也可以指定其他节点上的Port GUID
Port counters: Lid 971 port 1 (CapMask: 0x5A00)
PortSelect:......................1
CounterSelect:...................0x0000
SymbolErrorCounter:..............0
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................0
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
QP1Dropped:......................0
VL15Dropped:.....................0
PortXmitData:....................4294967295
PortRcvData:.....................4294967295
PortXmitPkts:....................3071580951
PortRcvPkts:.....................3086700438
PortXmitWait:....................4294967295
root@node013:# perfquery -C mlx5_9 -P 1
Port counters: Lid 971 port 1 (CapMask: 0x5A00)
PortSelect:......................1
CounterSelect:...................0x0000
SymbolErrorCounter:..............0
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................0
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
QP1Dropped:......................0
VL15Dropped:.....................0
PortXmitData:....................4294967295
PortRcvData:.....................4294967295
PortXmitPkts:....................3071580952
PortRcvPkts:.....................3086700438
PortXmitWait:....................4294967295
perfquery -C mlx5_2 -x -P 1 #扩展计数器(显示详细流量统计)
perfquery -C mlx5_2 -r -P 1 #查询后重置计数
ibqueryerrors
- 作用:聚合所有端口的错误计数
- 示例:
ibqueryerrors -C mlx5_2
Errors for 0xfc6a1c0300673ac0 "SW400-TOR-3"
GUID 0xfc6a1c0300673ac0 port ALL: [LinkDownedCounter == 2554 (2.494K)] [PortRcvErrors == 14 (14.000)] [PortRcvSwitchRelayErrors == 386879 (377.812K)] [PortXmitDiscards == 246693 (240.911K)] [VL15Dropped == 2669 (2.606K)] [PortXmitWait == 86089148144227 (78.298T)]
GUID 0xfc6a1c0300673ac0 port 1: [LinkDownedCounter == 20 (20.000)] [PortRcvErrors == 14 (14.000)] [PortRcvSwitchRelayErrors == 8400 (8.203K)] [PortXmitDiscards == 78548 (76.707K)] [VL15Dropped == 2669 (2.606K)] [PortXmitWait == 26025333702 (24.238G)]
GUID 0xfc6a1c0300673ac0 port 2: [LinkDownedCounter == 78 (78.000)] [PortRcvSwitchRelayErrors == 5169 (5.048K)] [PortXmitDiscards == 104 (104.000)] [PortXmitWait == 2832291821012 (2.576T)]
GUID 0xfc6a1c0300673ac0 port 3: [LinkDownedCounter == 51 (51.000)] [PortRcvSwitchRelayErrors == 6 (6.000)] [PortXmitDiscards == 22981 (22.442K)] [PortXmitWait == 2972142183994 (2.703T)]
GUID 0xfc6a1c0300673ac0 port 4: [LinkDownedCounter == 44 (44.000)] [PortRcvSwitchRelayErrors == 3933 (3.841K)] [PortXmitDiscards == 2009 (1.962K)] [PortXmitWait == 2053477604060 (1.868T)]
GUID 0xfc6a1c0300673ac0 port 5: [LinkDownedCounter == 85 (85.000)] [PortXmitDiscards == 9942 (9.709K)] [PortXmitWait == 2717775352326 (2.472T)]
GUID 0xfc6a1c0300673ac0 port 6: [LinkDownedCounter == 42 (42.000)] [PortRcvSwitchRelayErrors == 1344 (1.312K)] [PortXmitDiscards == 897 (897.000)] [PortXmitWait == 255979191164 (238.399G)]
GUID 0xfc6a1c0300673ac0 port 7: [LinkDownedCounter == 48 (48.000)] [PortRcvSwitchRelayErrors == 4038 (3.943K)] [PortXmitDiscards == 272 (272.000)] [PortXmitWait == 300557824911 (279.916G)]
GUID 0xfc6a1c0300673ac0 port 8: [LinkDownedCounter == 40 (40.000)] [PortRcvSwitchRelayErrors == 38 (38.000)] [PortXmitDiscards == 624 (624.000)] [PortXmitWait == 921739570708 (858.437G)]
GUID 0xfc6a1c0300673ac0 port 9: [LinkDownedCounter == 36 (36.000)] [PortRcvSwitchRelayErrors == 6893 (6.731K)] [PortXmitDiscards == 6936 (6.773K)] [PortXmitWait == 2539587458430 (2.310T)]
特殊用途工具
ibping
- 作用:测试 IB 端口的连通性(类似 ICMP ping)
- 示例:
需要OpenSM 运行:子网管理器负责分配 LID 和管理路由
在某个节点上启动 OpenSM(通常选择一台服务器) sudo systemctl start opensm # 或者 sudo /usr/sbin/opensm -F --report_rate 20
node015节点上:
CA 'mlx5_9'
......
Base lid: 948
LMC: 0
SM lid: 189
在node013节点上:
CA 'mlx5_9'
......
Base lid: 1030
LMC: 0
SM lid: 189
# 在 node015 上 ping node013
ibping -C mlx5_9 -P 1 1030# 在 node013 上 ping node015
ibping -C mlx5_9 -P 1 948测试本地端口回环(需要先启动服务器端)
ibping -S -C mlx5_9 -P 1
在终端1启动服务器
ibping -c 5 -C mlx5_9 -P 1 0 # 在终端2测试本地
ibtracert
-
工具说明
-
功能: 跟踪两端地址(LID 或 GUID)之间的 InfiniBand 路由路径。
-
常用格式:
- LID 路由: ibtracert <src_lid> <dst_lid>
- 简洁输出: ibtracert -n <src_lid> <dst_lid>
- GUID 路由: ibtracert -G <src_guid> <dst_guid>
基本用法 - 跟踪两个节点之间的路径
从 node015 (LID 948) 到 node013 (LID 971)
ibtracert 948 971
输出:
1\] -\> switch port {0xfc6a1c030065d600}\[15\] lid 3-3 "SW400-TOR-1" \[117\] -\> switch port {0xfc6a1c030065ea40}\[5\] lid 19-19 "SW400-AGG-7" \[61\] -\> switch port {0xfc6a1c030065c1c0}\[117\] lid 43-43 "SW400-TOR-8" \[15\] -\> ca port {0x5c25730300a9a645}\[1\] lid 948-948 "node015 mlx5_9" From ca {0x5c25730300a9a645} portnum 1 lid 948-948 "node015 mlx5_9" \[1\] -\> switch port {0xfc6a1c030065c1c0}\[15\] lid 43-43 "SW400-TOR-8" \[13\] -\> ca port {0x5c25730300ab4265}\[1\] lid 971-971 "node013 mlx5_9" To ca {0x5c25730300ab4265} portnum 1 lid 971-971 "node013 mlx5_9" **简洁输出模式** **ibtracert -n 948 971** From {0x5c25730300a9a645}\[1
1\] -\> {0xfc6a1c030065c1c0}\[15
13\] -\> {0x5c25730300ab4265}\[1
To {0x5c25730300ab4265}[1]
使用 GUID 地址(替代 LID)
ibtracert -G 0x5c25730300a9a645 0x5c25730300ab4265
例子:
- 从本机 CA(LID 928,node013 mlx5_2)到交换机 LID 433 的路径:
命令: ibtracert 928 433 | head -n 12
实测输出片段:
From ca {0x5c25730300ab3234} portnum 1 lid 928-928 "node013 mlx5_2"
...\] -\> switch port {...} \[...\] lid 433-433 "SW400-TOR-16"
-
说明: 源 LID 928 来自 ibstat 的 "Base lid";目标 433 为交换机 LID(可用 dump_fts 或 ibswitches 获取)。
-
从本机 CA(LID 1032,node015 mlx5_2)到 SM LID 189 的路径:
命令: ibtracert 1032 189 | head -n 12"
实测输出片段:
From ca {0x5c25730300a90de0} portnum 1 lid 1032-1032 "node015 mlx5_2"
...\] -\> ca port {0x5c25730300a96be4}\[1\] lid 189-189 "node001 mlx5_2"