引言
在高性能计算、数据中心和云计算领域,InfiniBand技术因其低延迟、高带宽的特性而备受青睐。多播通信作为一种高效的一对多数据传输机制,在集体通信操作(如广播、归约)中发挥着重要作用。本文基于Mellanox性能测试工具perftest-4.5.0.mlnxlibs的源码,深入探讨InfiniBand多播组管理的实现细节。
多播组管理架构概览
系统架构设计
InfiniBand多播组管理采用客户端-服务器架构,其中子网管理器(Subnet Manager, SM)作为中心控制节点,管理整个子网中的多播组注册和路由。多播组成员通过发送Subnet Administration管理数据包(MAD)与SM通信,实现多播组的加入和离开。
核心数据结构
系统定义了mcast_parameters结构体来维护多播会话的所有必要信息:
c
struct mcast_parameters {
union ibv_gid mgid; /* 多播组GID */
union ibv_gid port_gid; /* 端口GID */
union ibv_gid base_mgid; /* 基础MGID */
uint16_t mlid; /* 多播LID */
uint16_t pkey; /* 分区密钥 */
uint32_t qp_num; /* 队列对号 */
uint32_t mtu; /* 最大传输单元 */
const char *ib_devname; /* IB设备名 */
int ib_port; /* IB端口号 */
struct ibv_context *ib_ctx; /* IB上下文 */
uint8_t sl; /* 服务等级 */
uint16_t sm_lid; /* SM的LID */
uint8_t sm_sl; /* SM的服务等级 */
uint8_t mcast_state; /* 多播状态标志 */
const char *user_mgid; /* 用户提供的MGID */
int is_2nd_mgid_used; /* 是否使用第二个MGID */
};
多播组加入机制深度解析
1. 子网管理器通信协议
多播组管理的核心是与子网管理器的通信。系统使用Subnet Administration类MAD(Management Datagram)进行通信,具体流程如下:
c
int join_multicast_group(subn_adm_method method, struct mcast_parameters *params)
{
// 1. 初始化UMAD库
if (umad_init() < 0) {
fprintf(stderr, "failed to init the UMAD library\n");
goto cleanup;
}
// 2. 打开IB端口
portid = umad_open_port((char*)params->ib_devname, params->ib_port);
// 3. 注册MAD代理
agentid = umad_register(portid, MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
// 4. 准备并发送MAD请求
prepare_mcast_mad(method, params, (struct sa_mad_packet_t *)mad);
// 5. 接收并处理响应
if (umad_recv(portid, umad_buff, &length, 5000) < 0) {
fprintf(stderr, "failed to receive MAD response\n");
goto cleanup;
}
// 6. 提取分配的MLID
get_mlid_from_mad((struct sa_mad_packet_t*)mad, ¶ms->mlid);
}
2. MAD数据包格式详解
MAD数据包遵循InfiniBand规范1.2.1中的表145定义,包含以下关键字段:
- BaseVersion (0x01): 基础版本号
- MgmtClass (SUBN_ADM): 管理类,标识为子网管理
- ClassVersion (0x02): 类版本
- Method (SET/DELETE): 操作方法,SET表示加入,DELETE表示离开
- AttributeID (MC_MEMBER_RECORD): 属性ID,标识多播成员记录
- TransactionID: 事务ID,用于匹配请求和响应
3. 多播GID生成算法
系统支持用户自定义和自动生成两种多播GID模式:
c
void set_multicast_gid(struct mcast_parameters *params, uint32_t qp_num, int is_client)
{
uint8_t mcg_gid[16] = MCG_GID;
if (params->user_mgid) {
// 解析用户提供的冒号分隔的十六进制字符串
const char *pstr = params->user_mgid;
char *term = strpbrk(pstr, ":");
// ... 解析每个字节
}
memcpy(params->mgid.raw, mcg_gid, 16);
// 客户端自动递增最后一个字节
if (is_client && params->user_mgid == NULL)
params->mgid.raw[15]++;
}
多播队列对创建与管理
1. 多播QP的特殊性
多播队列对(QP)与普通QP的主要区别在于创建标志。系统使用IBV_QP_CREATE_MULTICAST标志创建专门用于多播通信的QP:
c
qp_init_attr.comp_mask = IBV_QP_INIT_ATTR_PD | IBV_QP_INIT_ATTR_CREATE_FLAGS;
qp_init_attr.create_flags = IBV_QP_CREATE_MULTICAST;
params->qp = ibv_create_qp_ex(context, &qp_init_attr);
2. QP状态机转换
多播QP需要经过严格的状态转换才能投入使用:
c
// 1. 初始化状态 (INIT)
attr.qp_state = IBV_QPS_INIT;
attr.pkey_index = 0;
attr.port_num = params->ib_port;
attr.qkey = DEF_QKEY;
// 2. 准备接收状态 (RTR)
attr.qp_state = IBV_QPS_RTR;
// 3. 准备发送状态 (RTS)
attr.qp_state = IBV_QPS_RTS;
attr.sq_psn = 0;
3. P_Key管理与安全性
系统实现了智能的P_Key选择算法,确保多播通信的安全性:
c
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
// 优先选择完整成员权限的P_Key
if (tmp_pkey & 0x8000) { // 检查完整成员标志
index = i;
umad_set_pkey(umad_buff, index);
return 0;
}
// 退而选择受限成员权限的P_Key
if (partial_ix >= 0) {
index = partial_ix;
umad_set_pkey(umad_buff, index);
return 0;
}
}
错误处理与资源管理
1. 信号处理机制
系统实现了优雅的退出机制,通过信号处理确保资源正确释放:
c
static void signalCatcher(int sig)
{
if (sig == SIGINT) {
// 从SM注销多播组
if (join_multicast_group(SUBN_ADM_METHOD_DELETE, sighandler_params))
fprintf(stderr, "Couldn't Unregister the Mcast group on the SM\n");
exit(1);
}
}
2. 资源生命周期管理
系统采用引用计数和状态标志管理多播资源:
c
void cleanup_multicast_resources(void)
{
// 1. 离开多播组
if (g_mcast_params.mcast_state & MCAST_IS_JOINED) {
leave_multicast_group_external();
}
// 2. 销毁QP
if (g_mcast_params.qp) {
ibv_destroy_qp(g_mcast_params.qp);
g_mcast_params.qp = NULL;
}
// 3. 重置状态
g_mcast_params.mcast_state = 0;
}
性能优化策略
1. 批量操作支持
系统支持批量多播操作,减少与SM的交互次数:
c
if (sighandler_params->is_2nd_mgid_used) {
memcpy(sighandler_params->mgid.raw, sighandler_params->base_mgid.raw, 16);
if (join_multicast_group(SUBN_ADM_METHOD_DELETE, sighandler_params))
fprintf(stderr, "Couldn't Unregister the Base Mcast group on the SM\n");
}
2. 异步操作实现
通过非阻塞I/O和超时机制提高系统响应性:
c
if (umad_recv(portid, umad_buff, &length, 5000) < 0) {
fprintf(stderr, "failed to receive MAD response\n");
goto cleanup;
}
实际应用场景
1. 高性能计算中的集体通信
在MPI(消息传递接口)实现中,多播组管理用于优化广播、归约等集体操作。通过一次加入操作,多个计算节点可以共享同一个多播组,大幅减少通信开销。
2. 金融交易系统
在低延迟交易系统中,多播用于快速分发市场数据。系统的高效多播组管理确保了数据的及时性和一致性。
3. 云数据中心
在虚拟化环境中,多播组管理支持虚拟机迁移和存储复制等操作,提高资源利用率和系统可靠性。
挑战与解决方案
1. 子网管理器单点故障
挑战 : 传统SM架构存在单点故障风险。
解决方案: 实现SM冗余和故障转移机制,使用多个SM实例。
2. 大规模多播组管理
挑战 : 大规模集群中多播组数量急剧增加。
解决方案: 采用分层多播组管理和动态资源分配。
3. 安全性考虑
挑战 : 多播通信可能被未授权节点监听。
解决方案: 结合P_Key机制和加密传输,确保通信安全。
未来发展趋势
1. 智能网络管理
随着AI技术的发展,未来多播组管理可能引入机器学习算法,实现动态多播组优化和预测性资源分配。
2. 云原生集成
将多播组管理与Kubernetes等容器编排平台集成,为云原生应用提供透明的高性能通信支持。
3. 量子网络适配
为未来量子计算网络设计多播组管理机制,支持量子纠缠分发等新型通信模式。
结论
InfiniBand多播组管理是一个复杂而精密的系统,涉及硬件交互、协议解析、资源管理和错误处理等多个方面。通过对perftest-4.5.0.mlnxlibs源码的分析,我们深入理解了多播组管理的实现细节和设计思想。这些知识不仅有助于优化现有系统,也为设计下一代高性能网络通信系统提供了重要参考。
多播通信作为高效的数据分发机制,在高性能计算、金融科技和云计算等领域将继续发挥重要作用。随着网络技术的不断发展,多播组管理将面临新的挑战和机遇,需要持续创新和优化。

perftest-4.5.0.mlnxlibs\src\multicast_resources.c
c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netdb.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#if !defined(__FreeBSD__)
#include <malloc.h>
#endif
#include <getopt.h>
#include <errno.h>
#if defined(__FreeBSD__)
#include <infiniband/byteswap.h>
#else
#include <byteswap.h>
#endif
#include <signal.h>
#include <pthread.h>
#include "multicast_resources.h"
#include "perftest_communication.h"
/* This is when we get sig handler from the user before we remove the join request. */
struct mcast_parameters *sighandler_params;
/******************************************************************************
* signalCatcher - cacth user signal in order to reregiser the mcast group
******************************************************************************/
static void signalCatcher (int sig)
{
if (sig == SIGINT) {
if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
fprintf(stderr,"Couldn't Unregister the Mcast group on the SM\n");
if (sighandler_params->is_2nd_mgid_used) {
memcpy(sighandler_params->mgid.raw,sighandler_params->base_mgid.raw,16);
if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
fprintf(stderr,"Couldn't Unregister the Base Mcast group on the SM\n");
}
}
exit(1);
}
/******************************************************************************
* prepare_mcast_mad
******************************************************************************/
static void prepare_mcast_mad(uint8_t method,
struct mcast_parameters *params,
struct sa_mad_packet_t *samad_packet)
{
uint8_t *ptr;
uint64_t comp_mask;
memset(samad_packet,0,sizeof(*samad_packet));
/* prepare the MAD header. according to Table 145 in IB spec 1.2.1 */
ptr = samad_packet->mad_header_buf;
ptr[0] = 0x01; /* BaseVersion */
ptr[1] = MANAGMENT_CLASS_SUBN_ADM; /* MgmtClass */
ptr[2] = 0x02; /* ClassVersion */
ptr[3] = INSERTF(ptr[3], 0, method, 0, 7); /* Method */
(*(uint64_t *)(ptr + 8)) = ntoh_64((uint64_t)DEF_TRANS_ID); /* TransactionID */
(*(uint16_t *)(ptr + 16)) = htons(SUBN_ADM_ATTR_MC_MEMBER_RECORD); /* AttributeID */
ptr = samad_packet->SubnetAdminData;
memcpy(&ptr[0],params->mgid.raw, 16);
memcpy(&ptr[16],params->port_gid.raw, 16);
(*(uint32_t *)(ptr + 32)) = htonl(DEF_QKEY);
(*(uint16_t *)(ptr + 40)) = params->pkey;
ptr[39] = DEF_TCLASS;
ptr[44] = INSERTF(ptr[44], 4, DEF_SLL, 0, 4);
ptr[44] = INSERTF(ptr[44], 0, DEF_FLOW_LABLE, 16, 4);
ptr[45] = INSERTF(ptr[45], 0, DEF_FLOW_LABLE, 8, 8);
ptr[46] = INSERTF(ptr[46], 0, DEF_FLOW_LABLE, 0, 8);
ptr[48] = INSERTF(ptr[48], 0, MCMEMBER_JOINSTATE_FULL_MEMBER, 0, 4);
comp_mask = SUBN_ADM_COMPMASK_MGID | SUBN_ADM_COMPMASK_PORT_GID | SUBN_ADM_COMPMASK_Q_KEY |
SUBN_ADM_COMPMASK_P_KEY | SUBN_ADM_COMPMASK_TCLASS | SUBN_ADM_COMPMASK_SL |
SUBN_ADM_COMPMASK_FLOW_LABEL | SUBN_ADM_COMPMASK_JOIN_STATE;
samad_packet->ComponentMask = ntoh_64(comp_mask);
}
/******************************************************************************
* check_mad_status
******************************************************************************/
static int check_mad_status(struct sa_mad_packet_t *samad_packet)
{
uint8_t *ptr;
uint32_t user_trans_id;
uint16_t mad_header_status;
ptr = samad_packet->mad_header_buf;
/* the upper 32 bits of TransactionID were set by the kernel */
user_trans_id = ntohl(*(uint32_t *)(ptr + 12));
/* check the TransactionID to make sure this is the response */
/* for the join/leave multicast group request we posted */
if (user_trans_id != DEF_TRANS_ID) {
fprintf(stderr, "received a mad with TransactionID 0x%x, when expecting 0x%x\n",
(unsigned int)user_trans_id, (unsigned int)DEF_TRANS_ID);;
return 1;
}
mad_header_status = 0x0;
mad_header_status = INSERTF(mad_header_status, 8, ptr[4], 0, 7);
mad_header_status = INSERTF(mad_header_status, 0, ptr[5], 0, 8);
if (mad_header_status) {
fprintf(stderr,"received UMAD with an error: 0x%x\n", mad_header_status);
return 1;
}
return 0;
}
/******************************************************************************
* get_mlid_from_mad
******************************************************************************/
static void get_mlid_from_mad(struct sa_mad_packet_t *samad_packet,uint16_t *mlid)
{
uint8_t *ptr;
ptr = samad_packet->SubnetAdminData;
*mlid = ntohs(*(uint16_t *)(ptr + 36));
}
/******************************************************************************
* set_multicast_gid
******************************************************************************/
void set_multicast_gid(struct mcast_parameters *params,uint32_t qp_num,int is_client)
{
uint8_t mcg_gid[16] = MCG_GID;
const char *pstr = params->user_mgid;
char *term = NULL;
char tmp[20];
int i;
if (params->user_mgid) {
term = strpbrk(pstr, ":");
memcpy(tmp, pstr, term - pstr+1);
tmp[term - pstr] = 0;
mcg_gid[0] = (unsigned char)strtoll(tmp, NULL, 0);
for (i = 1; i < 15; ++i) {
pstr += term - pstr + 1;
term = strpbrk(pstr, ":");
memcpy(tmp, pstr, term - pstr+1);
tmp[term - pstr] = 0;
mcg_gid[i] = (unsigned char)strtoll(tmp, NULL, 0);
}
pstr += term - pstr + 1;
strcpy(tmp, pstr);
mcg_gid[15] = (unsigned char)strtoll(tmp, NULL, 0);
}
memcpy(params->mgid.raw,mcg_gid,16);
if (is_client && params->user_mgid==NULL)
params->mgid.raw[15]++;
}
/******************************************************************************
* Set pkey correctly for cases where non-default values are used (e.g. Azure setup)
******************************************************************************/
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
struct ibv_device_attr device_attr;
int32_t partial_ix = -1;
uint16_t pkey = 0xffff;
uint16_t tmp_pkey;
uint16_t pkey_tbl;
uint16_t index;
int ret;
int i;
ret = ibv_query_device(ctx, &device_attr);
if (ret)
return ret;
pkey_tbl = device_attr.max_pkeys;
for (i = 0; i < pkey_tbl; ++i) {
ret = ibv_query_pkey(ctx, port_num, i, &tmp_pkey);
if (ret)
continue;
tmp_pkey = ntohs(tmp_pkey);
if ((pkey & 0x7fff) == (tmp_pkey & 0x7fff)) {
/* if there is full-member pkey take it.*/
if (tmp_pkey & 0x8000) {
index = i;
umad_set_pkey(umad_buff, index);
return 0;
}
if (partial_ix < 0)
partial_ix = i;
}
}
/*no full-member, if exists take the limited*/
if (partial_ix >= 0) {
index = partial_ix;
umad_set_pkey(umad_buff, index);
return 0;
}
return 1;
}
/******************************************************************************
* join_multicast_group
******************************************************************************/
int join_multicast_group(subn_adm_method method,struct mcast_parameters *params)
{
int portid = -1;
int agentid = -1;
void *umad_buff = NULL;
void *mad = NULL;
int length = MAD_SIZE;
int test_result = 1;
/* mlid will be assigned to the new LID after the join */
if (umad_init() < 0) {
fprintf(stderr, "failed to init the UMAD library\n");
goto cleanup;
}
/* use casting to loose the "const char0 *" */
portid = umad_open_port((char*)params->ib_devname,params->ib_port);
if (portid < 0) {
fprintf(stderr,"failed to open UMAD port %d\n",params->ib_port);
goto cleanup;
}
agentid = umad_register(portid,MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
if (agentid < 0) {
fprintf(stderr,"failed to register UMAD agent for MADs\n");
goto cleanup;
}
umad_buff = umad_alloc(1, umad_size() + MAD_SIZE);
if (!umad_buff) {
fprintf(stderr, "failed to allocate MAD buffer\n");
goto cleanup;
}
mad = umad_get_mad(umad_buff);
prepare_mcast_mad(method,params,(struct sa_mad_packet_t *)mad);
if (set_pkey(umad_buff, params->ib_ctx, params->ib_port)) {
fprintf(stderr, "failed to set pkey index\n");
goto cleanup;
}
if (umad_set_addr(umad_buff,params->sm_lid,1,params->sm_sl,QP1_WELL_KNOWN_Q_KEY) < 0) {
fprintf(stderr, "failed to set the destination address of the SMP\n");
goto cleanup;
}
if (umad_send(portid,agentid,umad_buff,MAD_SIZE,100,5) < 0) {
fprintf(stderr, "failed to send MAD\n");
goto cleanup;
}
if (umad_recv(portid,umad_buff,&length,5000) < 0) {
fprintf(stderr, "failed to receive MAD response\n");
goto cleanup;
}
if (check_mad_status((struct sa_mad_packet_t*)mad)) {
fprintf(stderr, "failed to get mlid from MAD\n");
goto cleanup;
}
/* "Join multicast group" message was sent */
if (method == SUBN_ADM_METHOD_SET) {
get_mlid_from_mad((struct sa_mad_packet_t*)mad,¶ms->mlid);
params->mcast_state |= MCAST_IS_JOINED;
sighandler_params = params;
signal(SIGINT,signalCatcher);
} else {
params->mcast_state &= ~MCAST_IS_JOINED;
}
test_result = 0;
cleanup:
if (umad_buff)
umad_free(umad_buff);
if (portid >= 0) {
if (agentid >= 0) {
if (umad_unregister(portid, agentid)) {
fprintf(stderr, "failed to deregister UMAD agent for MADs\n");
test_result = 1;
}
}
if (umad_close_port(portid)) {
fprintf(stderr, "failed to close UMAD portid\n");
test_result = 1;
}
}
return test_result;
}
/******************************************************************************
* End
******************************************************************************/
我将逐句解读这段C语言代码。这是一个InfiniBand多播组管理的实现代码:
1. 头文件包含部分
c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netdb.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#if !defined(__FreeBSD__)
#include <malloc.h>
#endif
#include <getopt.h>
#include <errno.h>
#if defined(__FreeBSD__)
#include <infiniband/byteswap.h>
#else
#include <byteswap.h>
#endif
#include <signal.h>
#include <pthread.h>
#include "multicast_resources.h"
#include "perftest_communication.h"
解读: 包含了标准C库、网络编程、时间处理、内存管理、信号处理、线程支持等头文件。条件编译处理了不同操作系统(FreeBSD和非FreeBSD)的差异。
2. 全局变量
c
/* This is when we get sig handler from the user before we remove the join request. */
struct mcast_parameters *sighandler_params;
解读: 声明一个全局指针,用于在信号处理函数中访问多播参数结构体。
3. 信号处理函数
c
static void signalCatcher (int sig)
{
if (sig == SIGINT) {
if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
fprintf(stderr,"Couldn't Unregister the Mcast group on the SM\n");
if (sighandler_params->is_2nd_mgid_used) {
memcpy(sighandler_params->mgid.raw,sighandler_params->base_mgid.raw,16);
if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
fprintf(stderr,"Couldn't Unregister the Base Mcast group on the SM\n");
}
}
exit(1);
}
解读:
- 静态信号处理函数,捕获SIGINT信号(Ctrl+C)
- 调用
join_multicast_group函数删除多播组 - 如果需要第二个MGID(多播组ID),也删除基础多播组
- 退出程序
4. 准备MAD包函数
c
static void prepare_mcast_mad(uint8_t method,
struct mcast_parameters *params,
struct sa_mad_packet_t *samad_packet)
{
uint8_t *ptr;
uint64_t comp_mask;
memset(samad_packet,0,sizeof(*samad_packet));
/* prepare the MAD header. according to Table 145 in IB spec 1.2.1 */
ptr = samad_packet->mad_header_buf;
ptr[0] = 0x01; /* BaseVersion */
ptr[1] = MANAGMENT_CLASS_SUBN_ADM; /* MgmtClass */
ptr[2] = 0x02; /* ClassVersion */
ptr[3] = INSERTF(ptr[3], 0, method, 0, 7); /* Method */
(*(uint64_t *)(ptr + 8)) = ntoh_64((uint64_t)DEF_TRANS_ID); /* TransactionID */
(*(uint16_t *)(ptr + 16)) = htons(SUBN_ADM_ATTR_MC_MEMBER_RECORD); /* AttributeID */
ptr = samad_packet->SubnetAdminData;
memcpy(&ptr[0],params->mgid.raw, 16);
memcpy(&ptr[16],params->port_gid.raw, 16);
(*(uint32_t *)(ptr + 32)) = htonl(DEF_QKEY);
(*(uint16_t *)(ptr + 40)) = params->pkey;
ptr[39] = DEF_TCLASS;
ptr[44] = INSERTF(ptr[44], 4, DEF_SLL, 0, 4);
ptr[44] = INSERTF(ptr[44], 0, DEF_FLOW_LABLE, 16, 4);
ptr[45] = INSERTF(ptr[45], 0, DEF_FLOW_LABLE, 8, 8);
ptr[46] = INSERTF(ptr[46], 0, DEF_FLOW_LABLE, 0, 8);
ptr[48] = INSERTF(ptr[48], 0, MCMEMBER_JOINSTATE_FULL_MEMBER, 0, 4);
comp_mask = SUBN_ADM_COMPMASK_MGID | SUBN_ADM_COMPMASK_PORT_GID | SUBN_ADM_COMPMASK_Q_KEY |
SUBN_ADM_COMPMASK_P_KEY | SUBN_ADM_COMPMASK_TCLASS | SUBN_ADM_COMPMASK_SL |
SUBN_ADM_COMPMASK_FLOW_LABEL | SUBN_ADM_COMPMASK_JOIN_STATE;
samad_packet->ComponentMask = ntoh_64(comp_mask);
}
解读:
- 创建Subnet Administration MAD(管理数据包)
- 填充MAD头部:版本号、管理类、方法、事务ID等
- 填充子网管理数据:MGID、端口GID、Q_Key、P_Key、流量类别等
- 设置组件掩码,指定哪些字段是有效的
5. 检查MAD状态
c
static int check_mad_status(struct sa_mad_packet_t *samad_packet)
{
uint8_t *ptr;
uint32_t user_trans_id;
uint16_t mad_header_status;
ptr = samad_packet->mad_header_buf;
/* the upper 32 bits of TransactionID were set by the kernel */
user_trans_id = ntohl(*(uint32_t *)(ptr + 12));
/* check the TransactionID to make sure this is the response */
/* for the join/leave multicast group request we posted */
if (user_trans_id != DEF_TRANS_ID) {
fprintf(stderr, "received a mad with TransactionID 0x%x, when expecting 0x%x\n",
(unsigned int)user_trans_id, (unsigned int)DEF_TRANS_ID);;
return 1;
}
mad_header_status = 0x0;
mad_header_status = INSERTF(mad_header_status, 8, ptr[4], 0, 7);
mad_header_status = INSERTF(mad_header_status, 0, ptr[5], 0, 8);
if (mad_header_status) {
fprintf(stderr,"received UMAD with an error: 0x%x\n", mad_header_status);
return 1;
}
return 0;
}
解读:
- 检查接收到的MAD响应
- 验证事务ID是否匹配(防止响应错乱)
- 检查MAD头部状态字段是否有错误
6. 从MAD获取MLID
c
static void get_mlid_from_mad(struct sa_mad_packet_t *samad_packet,uint16_t *mlid)
{
uint8_t *ptr;
ptr = samad_packet->SubnetAdminData;
*mlid = ntohs(*(uint16_t *)(ptr + 36));
}
解读: 从MAD响应中提取多播LID(MLID),用于后续多播通信。
7. 设置多播GID
c
void set_multicast_gid(struct mcast_parameters *params,uint32_t qp_num,int is_client)
{
uint8_t mcg_gid[16] = MCG_GID;
const char *pstr = params->user_mgid;
char *term = NULL;
char tmp[20];
int i;
if (params->user_mgid) {
term = strpbrk(pstr, ":");
memcpy(tmp, pstr, term - pstr+1);
tmp[term - pstr] = 0;
mcg_gid[0] = (unsigned char)strtoll(tmp, NULL, 0);
for (i = 1; i < 15; ++i) {
pstr += term - pstr + 1;
term = strpbrk(pstr, ":");
memcpy(tmp, pstr, term - pstr+1);
tmp[term - pstr] = 0;
mcg_gid[i] = (unsigned char)strtoll(tmp, NULL, 0);
}
pstr += term - pstr + 1;
strcpy(tmp, pstr);
mcg_gid[15] = (unsigned char)strtoll(tmp, NULL, 0);
}
memcpy(params->mgid.raw,mcg_gid,16);
if (is_client && params->user_mgid==NULL)
params->mgid.raw[15]++;
}
解读:
- 设置多播组GID(全局标识符)
- 如果用户提供了MGID,解析冒号分隔的十六进制字符串
- 如果未提供MGID且是客户端,使用默认GID并递增最后一个字节
8. 设置P_Key
c
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
struct ibv_device_attr device_attr;
int32_t partial_ix = -1;
uint16_t pkey = 0xffff;
uint16_t tmp_pkey;
uint16_t pkey_tbl;
uint16_t index;
int ret;
int i;
ret = ibv_query_device(ctx, &device_attr);
if (ret)
return ret;
pkey_tbl = device_attr.max_pkeys;
for (i = 0; i < pkey_tbl; ++i) {
ret = ibv_query_pkey(ctx, port_num, i, &tmp_pkey);
if (ret)
continue;
tmp_pkey = ntohs(tmp_pkey);
if ((pkey & 0x7fff) == (tmp_pkey & 0x7fff)) {
/* if there is full-member pkey take it.*/
if (tmp_pkey & 0x8000) {
index = i;
umad_set_pkey(umad_buff, index);
return 0;
}
if (partial_ix < 0)
partial_ix = i;
}
}
/*no full-member, if exists take the limited*/
if (partial_ix >= 0) {
index = partial_ix;
umad_set_pkey(umad_buff, index);
return 0;
}
return 1;
}
解读:
- 为MAD包设置P_Key(分区密钥)
- 查询设备属性获取P_Key表大小
- 寻找与默认P_Key匹配的项
- 优先选择完整成员(full-member)P_Key,如果没有则选择受限成员(limited-member)
9. 主要函数:加入多播组
c
int join_multicast_group(subn_adm_method method,struct mcast_parameters *params)
{
int portid = -1;
int agentid = -1;
void *umad_buff = NULL;
void *mad = NULL;
int length = MAD_SIZE;
int test_result = 1;
/* mlid will be assigned to the new LID after the join */
if (umad_init() < 0) {
fprintf(stderr, "failed to init the UMAD library\n");
goto cleanup;
}
/* use casting to loose the "const char0 *" */
portid = umad_open_port((char*)params->ib_devname,params->ib_port);
if (portid < 0) {
fprintf(stderr,"failed to open UMAD port %d\n",params->ib_port);
goto cleanup;
}
agentid = umad_register(portid,MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
if (agentid < 0) {
fprintf(stderr,"failed to register UMAD agent for MADs\n");
goto cleanup;
}
umad_buff = umad_alloc(1, umad_size() + MAD_SIZE);
if (!umad_buff) {
fprintf(stderr, "failed to allocate MAD buffer\n");
goto cleanup;
}
mad = umad_get_mad(umad_buff);
prepare_mcast_mad(method,params,(struct sa_mad_packet_t *)mad);
if (set_pkey(umad_buff, params->ib_ctx, params->ib_port)) {
fprintf(stderr, "failed to set pkey index\n");
goto cleanup;
}
if (umad_set_addr(umad_buff,params->sm_lid,1,params->sm_sl,QP1_WELL_KNOWN_Q_KEY) < 0) {
fprintf(stderr, "failed to set the destination address of the SMP\n");
goto cleanup;
}
if (umad_send(portid,agentid,umad_buff,MAD_SIZE,100,5) < 0) {
fprintf(stderr, "failed to send MAD\n");
goto cleanup;
}
if (umad_recv(portid,umad_buff,&length,5000) < 0) {
fprintf(stderr, "failed to receive MAD response\n");
goto cleanup;
}
if (check_mad_status((struct sa_mad_packet_t*)mad)) {
fprintf(stderr, "failed to get mlid from MAD\n");
goto cleanup;
}
/* "Join multicast group" message was sent */
if (method == SUBN_ADM_METHOD_SET) {
get_mlid_from_mad((struct sa_mad_packet_t*)mad,¶ms->mlid);
params->mcast_state |= MCAST_IS_JOINED;
sighandler_params = params;
signal(SIGINT,signalCatcher);
} else {
params->mcast_state &= ~MCAST_IS_JOINED;
}
test_result = 0;
cleanup:
if (umad_buff)
umad_free(umad_buff);
if (portid >= 0) {
if (agentid >= 0) {
if (umad_unregister(portid, agentid)) {
fprintf(stderr, "failed to deregister UMAD agent for MADs\n");
test_result = 1;
}
}
if (umad_close_port(portid)) {
fprintf(stderr, "failed to close UMAD portid\n");
test_result = 1;
}
}
return test_result;
}
解读:
- 主要功能函数,用于加入或离开多播组
- 初始化UMAD库,打开端口,注册代理
- 准备并发送MAD请求到子网管理器(SM)
- 接收并验证响应
- 如果是加入操作,提取MLID并设置信号处理器
- 清理资源并返回结果
这段代码是一个典型的InfiniBand多播组管理实现,使用子网管理MAD与子网管理器通信,支持多播组的动态加入和离开。