



可以发现RDMA的一个主流实现RoCEv2,它将IP/UDP/IB transport protocol协议栈实现从软件层面移到了网卡中,如果把IP/UDP/IB transport protocol协议栈重新移到软件层面实现,那么普通的以太网卡不就能支持RDMA网络通信了吗?这样虽然RDMA的性能会打点折扣,但是大大方便了RDMA网络应用的开发测试环境部署搭建、测试、故障定位和学习,而且即使部署到生产环境中,比较传统的网络内核socket通信方式,仍然带来零拷贝、内核协议栈bypass等技术性能优势。熟悉高性能网络开发的朋友们也都知道,零拷贝、内核协议栈bypass也是针对内核socket通信方式进行性能优化的核心优化手段之一。事实上,由IBMMellanox牵头的IBTA RoCE工作组已经实现了纯软件版的RoCE,我们一般叫它SoftRoCE。我们在常用的RoCE开发套件中,就能使用SoftRoCE。


1 安装RDMA开发套件


bash 复制代码
root@ecs01:~# uname -a
Linux ecs01 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble


bash 复制代码
root@ecs01:~# cat /boot/config-$(uname -r) | grep RXE


bash 复制代码
root@ecs01:~# apt-get install -y libibverbs1 ibverbs-utils librdmacm1 libibumad3 ibverbs-providers rdma-core librdmacm-dev


bash 复制代码
root@ecs01:~# dpkg -L ibverbs-utils


bash 复制代码
root@ecs01:~# ibv_devices
    device          	   node GUID
    ------          	----------------


2 添加RDMA设备


bash 复制代码
root@ecs01:~# ip link add veth-client type veth peer name veth-server
root@ecs01:~# ip addr add dev veth-server
root@ecs01:~# ip addr add dev veth-client
root@ecs01:~# ip link set dev veth-server up
root@ecs01:~# ip link set dev veth-client up


bash 复制代码
root@ecs01:~# rdma link add rxe_server type rxe netdev veth-server
root@ecs01:~# rdma link add rxe_client type rxe netdev veth-client


bash 复制代码
# 链路
root@ecs01:~# rdma link
link rxe_server/1 state ACTIVE physical_state LINK_UP netdev veth-server
link rxe_client/1 state ACTIVE physical_state LINK_UP netdev veth-client

# 设备
root@ecs01:~# ibv_devices
    device          	   node GUID
    ------          	----------------
    rxe_server      	9c7ff4fffeaff293
    rxe_client      	fcc7bffffe44ca52

# 设备详细信息
root@ecs01:~# ibv_devinfo
hca_id:	rxe_server
	transport:			InfiniBand (0)
	fw_ver:				0.0.0
	node_guid:			9c7f:f4ff:feaf:f293
	sys_image_guid:			9c7f:f4ff:feaf:f293
	vendor_id:			0xffffff
	vendor_part_id:			0
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		1024 (3)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

hca_id:	rxe_client
	transport:			InfiniBand (0)
	fw_ver:				0.0.0
	node_guid:			fcc7:bfff:fe44:ca52
	sys_image_guid:			fcc7:bfff:fe44:ca52
	vendor_id:			0xffffff
	vendor_part_id:			0
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		1024 (3)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

# 可以通过sysfs文件系统接口查看RDMA设备信息和统计信息
root@ecs01:~# ls /sys/class/infiniband
rxe_client  rxe_server

3 ping连通性测试


bash 复制代码
root@ecs01:~# ibv_rc_pingpong -d rxe_server -g 0
  local address:  LID 0x0000, QPN 0x000011, PSN 0xd24ed4, GID fe80::9c7f:f4ff:feaf:f293
  remote address: LID 0x0000, QPN 0x000012, PSN 0xdedad9, GID fe80::9c7f:f4ff:feaf:f293
8192000 bytes in 0.02 seconds = 2947.29 Mbit/sec
1000 iters in 0.02 seconds = 22.24 usec/iter

root@ecs01:~# ibv_rc_pingpong -d rxe_server -g 0
  local address:  LID 0x0000, QPN 0x000012, PSN 0xdedad9, GID fe80::9c7f:f4ff:feaf:f293
  remote address: LID 0x0000, QPN 0x000011, PSN 0xd24ed4, GID fe80::9c7f:f4ff:feaf:f293
8192000 bytes in 0.02 seconds = 2956.87 Mbit/sec
1000 iters in 0.02 seconds = 22.16 usec/iter

4 perftest测试


bash 复制代码
root@ecs01:~# apt-get -y install perftest

# perftest的相关命令工具
root@ecs01:~# dpkg -L perftest

测试RDMA SEND操作带宽,分别在两个shell窗口输入以下两个测试命令,可以看到测试结果

bash 复制代码
root@ecs01:~# ib_send_bw -d rxe_server
 WARNING: BW peak won't be measured in this run.

* Waiting for client to connect... *
                    Send BW Test
 Dual-port       : OFF		Device         : rxe_server
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 RX depth        : 512
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
 local address: LID 0000 QPN 0x0013 PSN 0x7d2e4c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 remote address: LID 0000 QPN 0x0014 PSN 0x3db41c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      1000             0.00               361.46 		   0.005783

root@ecs01:~# ib_send_bw -d rxe_server
                    Send BW Test
 Dual-port       : OFF		Device         : rxe_server
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
 local address: LID 0000 QPN 0x0014 PSN 0x3db41c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 remote address: LID 0000 QPN 0x0013 PSN 0x7d2e4c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      1000             397.24             360.91 		   0.005775

测试RDMA SEND操作延时,分别在两个shell窗口输入以下两个测试命令

bash 复制代码
root@ecs01:~# ib_send_lat -d rxe_server

* Waiting for client to connect... *
                    Send Latency Test
 Dual-port       : OFF		Device         : rxe_server
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 RX depth        : 512
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
 local address: LID 0000 QPN 0x0015 PSN 0xdb44af
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 remote address: LID 0000 QPN 0x0016 PSN 0x2b4ac1
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          2.85           75.69        2.95     	       3.19        	1.52   		8.92    		75.69

root@ecs01:~# ib_send_lat -d rxe_server
                    Send Latency Test
 Dual-port       : OFF		Device         : rxe_server
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
 local address: LID 0000 QPN 0x0016 PSN 0x2b4ac1
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 remote address: LID 0000 QPN 0x0015 PSN 0xdb44af
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          2.86           87.64        2.95     	       3.25        	2.72   		8.92    		87.64

5 测试自己的RDMA程序


bash 复制代码
root@ecs01:~# apt install librdmacm-dev
root@ecs01:~# ls /usr/include/rdma/ | grep rdma_cma

关于测试的RDMA程序,我这里使用github上现有的一个example程序,我们也可以参考现有的例子开发自己的程序。首先把例子代码git clone到本地

bash 复制代码
root@ecs01:~# git clone https://github.com/animeshtrivedi/rdma-example.git
root@ecs01:~# cd ./rdma-example


bash 复制代码
root@ecs01:~/rdma-example# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu

root@ecs01:~/rdma-example# dpkg -L libibverbs1

root@ecs01:~/rdma-example# dpkg -L librdmacm1t64

root@ecs01:~/rdma-example# vim CMakeLists.txt
find_library(IBVERBS_LIBRARY libibverbs.so.1 HINTS /usr/lib/x86_64-linux-gnu)
find_library(RDMACM_LIBRARY librdmacm.so.1 HINTS /usr/lib/x86_64-linux-gnu)


bash 复制代码
root@ecs01:~/rdma-example# cmake .
root@ecs01:~/rdma-example# make

# 启动server端
root@ecs01:~/rdma-example# ./bin/rdma_server
Server is listening successfully at: , port: 20886
A new connection is accepted from
Client side buffer information is received...
buffer attr, addr: 0x57c3e9a792e0 , len: 10 , stag : 0xdc9
The client has requested buffer length of : 10 bytes
A disconnect event is received from the client...
Server shut-down is complete

# 启动客户端
root@ecs01:~/rdma-example# ./bin/rdma_client -a -s textstring
Passed string is : textstring , with count 10
Trying to connect to server at : port: 20886
The client is connected successfully
buffer attr, addr: 0x5ccd3a8fe600 , len: 10 , stag : 0xfee
SUCCESS, source and destination buffers match
Client resource clean up is complete
helpme流水41 分钟前
【人工智能】Open WebUI+ollama+deepSeek-r1 本地部署大模型与知识库
╰つ゛木槿2 小时前
Spring Boot 调用DeepSeek API的详细教程
java·spring boot·后端·deepseek
Icomi_3 小时前
半问3 小时前
张琪杭3 小时前
訾博ZiBo3 小时前
AI日报 - 2025年3月11日
刘大猫264 小时前
kyle~4 小时前
@心都4 小时前
机器学习数学基础:42.AMOS 结构方程模型(SEM)分析的系统流程