InfiniBand多播组管理:从理论到实现的深度解析

引言

在高性能计算、数据中心和云计算领域,InfiniBand技术因其低延迟、高带宽的特性而备受青睐。多播通信作为一种高效的一对多数据传输机制,在集体通信操作(如广播、归约)中发挥着重要作用。本文基于Mellanox性能测试工具perftest-4.5.0.mlnxlibs的源码,深入探讨InfiniBand多播组管理的实现细节。

多播组管理架构概览

系统架构设计

InfiniBand多播组管理采用客户端-服务器架构,其中子网管理器(Subnet Manager, SM)作为中心控制节点,管理整个子网中的多播组注册和路由。多播组成员通过发送Subnet Administration管理数据包(MAD)与SM通信,实现多播组的加入和离开。

核心数据结构

系统定义了mcast_parameters结构体来维护多播会话的所有必要信息:

c 复制代码
struct mcast_parameters {
    union ibv_gid mgid;           /* 多播组GID */
    union ibv_gid port_gid;       /* 端口GID */
    union ibv_gid base_mgid;      /* 基础MGID */
    uint16_t mlid;                /* 多播LID */
    uint16_t pkey;                /* 分区密钥 */
    uint32_t qp_num;              /* 队列对号 */
    uint32_t mtu;                 /* 最大传输单元 */
    const char *ib_devname;       /* IB设备名 */
    int ib_port;                  /* IB端口号 */
    struct ibv_context *ib_ctx;   /* IB上下文 */
    uint8_t sl;                   /* 服务等级 */
    uint16_t sm_lid;              /* SM的LID */
    uint8_t sm_sl;                /* SM的服务等级 */
    uint8_t mcast_state;          /* 多播状态标志 */
    const char *user_mgid;        /* 用户提供的MGID */
    int is_2nd_mgid_used;         /* 是否使用第二个MGID */
};

多播组加入机制深度解析

1. 子网管理器通信协议

多播组管理的核心是与子网管理器的通信。系统使用Subnet Administration类MAD(Management Datagram)进行通信,具体流程如下:

c 复制代码
int join_multicast_group(subn_adm_method method, struct mcast_parameters *params)
{
    // 1. 初始化UMAD库
    if (umad_init() < 0) {
        fprintf(stderr, "failed to init the UMAD library\n");
        goto cleanup;
    }
    
    // 2. 打开IB端口
    portid = umad_open_port((char*)params->ib_devname, params->ib_port);
    
    // 3. 注册MAD代理
    agentid = umad_register(portid, MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
    
    // 4. 准备并发送MAD请求
    prepare_mcast_mad(method, params, (struct sa_mad_packet_t *)mad);
    
    // 5. 接收并处理响应
    if (umad_recv(portid, umad_buff, &length, 5000) < 0) {
        fprintf(stderr, "failed to receive MAD response\n");
        goto cleanup;
    }
    
    // 6. 提取分配的MLID
    get_mlid_from_mad((struct sa_mad_packet_t*)mad, &params->mlid);
}

2. MAD数据包格式详解

MAD数据包遵循InfiniBand规范1.2.1中的表145定义,包含以下关键字段:

  • BaseVersion (0x01): 基础版本号
  • MgmtClass (SUBN_ADM): 管理类,标识为子网管理
  • ClassVersion (0x02): 类版本
  • Method (SET/DELETE): 操作方法,SET表示加入,DELETE表示离开
  • AttributeID (MC_MEMBER_RECORD): 属性ID,标识多播成员记录
  • TransactionID: 事务ID,用于匹配请求和响应

3. 多播GID生成算法

系统支持用户自定义和自动生成两种多播GID模式:

c 复制代码
void set_multicast_gid(struct mcast_parameters *params, uint32_t qp_num, int is_client)
{
    uint8_t mcg_gid[16] = MCG_GID;
    
    if (params->user_mgid) {
        // 解析用户提供的冒号分隔的十六进制字符串
        const char *pstr = params->user_mgid;
        char *term = strpbrk(pstr, ":");
        // ... 解析每个字节
    }
    
    memcpy(params->mgid.raw, mcg_gid, 16);
    
    // 客户端自动递增最后一个字节
    if (is_client && params->user_mgid == NULL)
        params->mgid.raw[15]++;
}

多播队列对创建与管理

1. 多播QP的特殊性

多播队列对(QP)与普通QP的主要区别在于创建标志。系统使用IBV_QP_CREATE_MULTICAST标志创建专门用于多播通信的QP:

c 复制代码
qp_init_attr.comp_mask = IBV_QP_INIT_ATTR_PD | IBV_QP_INIT_ATTR_CREATE_FLAGS;
qp_init_attr.create_flags = IBV_QP_CREATE_MULTICAST;

params->qp = ibv_create_qp_ex(context, &qp_init_attr);

2. QP状态机转换

多播QP需要经过严格的状态转换才能投入使用:

c 复制代码
// 1. 初始化状态 (INIT)
attr.qp_state = IBV_QPS_INIT;
attr.pkey_index = 0;
attr.port_num = params->ib_port;
attr.qkey = DEF_QKEY;

// 2. 准备接收状态 (RTR)
attr.qp_state = IBV_QPS_RTR;

// 3. 准备发送状态 (RTS)
attr.qp_state = IBV_QPS_RTS;
attr.sq_psn = 0;

3. P_Key管理与安全性

系统实现了智能的P_Key选择算法,确保多播通信的安全性:

c 复制代码
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
    // 优先选择完整成员权限的P_Key
    if (tmp_pkey & 0x8000) {  // 检查完整成员标志
        index = i;
        umad_set_pkey(umad_buff, index);
        return 0;
    }
    // 退而选择受限成员权限的P_Key
    if (partial_ix >= 0) {
        index = partial_ix;
        umad_set_pkey(umad_buff, index);
        return 0;
    }
}

错误处理与资源管理

1. 信号处理机制

系统实现了优雅的退出机制,通过信号处理确保资源正确释放:

c 复制代码
static void signalCatcher(int sig)
{
    if (sig == SIGINT) {
        // 从SM注销多播组
        if (join_multicast_group(SUBN_ADM_METHOD_DELETE, sighandler_params))
            fprintf(stderr, "Couldn't Unregister the Mcast group on the SM\n");
        exit(1);
    }
}

2. 资源生命周期管理

系统采用引用计数和状态标志管理多播资源:

c 复制代码
void cleanup_multicast_resources(void)
{
    // 1. 离开多播组
    if (g_mcast_params.mcast_state & MCAST_IS_JOINED) {
        leave_multicast_group_external();
    }
    
    // 2. 销毁QP
    if (g_mcast_params.qp) {
        ibv_destroy_qp(g_mcast_params.qp);
        g_mcast_params.qp = NULL;
    }
    
    // 3. 重置状态
    g_mcast_params.mcast_state = 0;
}

性能优化策略

1. 批量操作支持

系统支持批量多播操作,减少与SM的交互次数:

c 复制代码
if (sighandler_params->is_2nd_mgid_used) {
    memcpy(sighandler_params->mgid.raw, sighandler_params->base_mgid.raw, 16);
    if (join_multicast_group(SUBN_ADM_METHOD_DELETE, sighandler_params))
        fprintf(stderr, "Couldn't Unregister the Base Mcast group on the SM\n");
}

2. 异步操作实现

通过非阻塞I/O和超时机制提高系统响应性:

c 复制代码
if (umad_recv(portid, umad_buff, &length, 5000) < 0) {
    fprintf(stderr, "failed to receive MAD response\n");
    goto cleanup;
}

实际应用场景

1. 高性能计算中的集体通信

在MPI(消息传递接口)实现中,多播组管理用于优化广播、归约等集体操作。通过一次加入操作,多个计算节点可以共享同一个多播组,大幅减少通信开销。

2. 金融交易系统

在低延迟交易系统中,多播用于快速分发市场数据。系统的高效多播组管理确保了数据的及时性和一致性。

3. 云数据中心

在虚拟化环境中,多播组管理支持虚拟机迁移和存储复制等操作,提高资源利用率和系统可靠性。

挑战与解决方案

1. 子网管理器单点故障

挑战 : 传统SM架构存在单点故障风险。
解决方案: 实现SM冗余和故障转移机制,使用多个SM实例。

2. 大规模多播组管理

挑战 : 大规模集群中多播组数量急剧增加。
解决方案: 采用分层多播组管理和动态资源分配。

3. 安全性考虑

挑战 : 多播通信可能被未授权节点监听。
解决方案: 结合P_Key机制和加密传输,确保通信安全。

未来发展趋势

1. 智能网络管理

随着AI技术的发展,未来多播组管理可能引入机器学习算法,实现动态多播组优化和预测性资源分配。

2. 云原生集成

将多播组管理与Kubernetes等容器编排平台集成,为云原生应用提供透明的高性能通信支持。

3. 量子网络适配

为未来量子计算网络设计多播组管理机制,支持量子纠缠分发等新型通信模式。

结论

InfiniBand多播组管理是一个复杂而精密的系统,涉及硬件交互、协议解析、资源管理和错误处理等多个方面。通过对perftest-4.5.0.mlnxlibs源码的分析,我们深入理解了多播组管理的实现细节和设计思想。这些知识不仅有助于优化现有系统,也为设计下一代高性能网络通信系统提供了重要参考。

多播通信作为高效的数据分发机制,在高性能计算、金融科技和云计算等领域将继续发挥重要作用。随着网络技术的不断发展,多播组管理将面临新的挑战和机遇,需要持续创新和优化。

perftest-4.5.0.mlnxlibs\src\multicast_resources.c

c 复制代码
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netdb.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#if !defined(__FreeBSD__)
#include <malloc.h>
#endif
#include <getopt.h>
#include <errno.h>
#if defined(__FreeBSD__)
#include <infiniband/byteswap.h>
#else
#include <byteswap.h>
#endif
#include <signal.h>
#include <pthread.h>
#include "multicast_resources.h"
#include "perftest_communication.h"

/* This is when we get sig handler from the user before we remove the join request. */
struct mcast_parameters *sighandler_params;

/******************************************************************************
 * signalCatcher - cacth user signal in order to reregiser the mcast group
 ******************************************************************************/
static void signalCatcher (int sig)
{
	if (sig == SIGINT) {

		if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
			fprintf(stderr,"Couldn't Unregister the Mcast group on the SM\n");

		if (sighandler_params->is_2nd_mgid_used) {
			memcpy(sighandler_params->mgid.raw,sighandler_params->base_mgid.raw,16);
			if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
				fprintf(stderr,"Couldn't Unregister the Base Mcast group on the SM\n");
		}
	}
	exit(1);
}

/******************************************************************************
 * prepare_mcast_mad
 ******************************************************************************/
static void prepare_mcast_mad(uint8_t method,
		struct mcast_parameters *params,
		struct sa_mad_packet_t *samad_packet)
{
	uint8_t *ptr;
	uint64_t comp_mask;

	memset(samad_packet,0,sizeof(*samad_packet));

	/* prepare the MAD header. according to Table 145 in IB spec 1.2.1 */
	ptr = samad_packet->mad_header_buf;
	ptr[0]                     = 0x01;					/* BaseVersion */
	ptr[1]                     = MANAGMENT_CLASS_SUBN_ADM;			/* MgmtClass */
	ptr[2]                     = 0x02; 					/* ClassVersion */
	ptr[3]                     = INSERTF(ptr[3], 0, method, 0, 7); 		/* Method */
	(*(uint64_t *)(ptr + 8))   = ntoh_64((uint64_t)DEF_TRANS_ID);             /* TransactionID */
	(*(uint16_t *)(ptr + 16))  = htons(SUBN_ADM_ATTR_MC_MEMBER_RECORD);      /* AttributeID */

	ptr = samad_packet->SubnetAdminData;

	memcpy(&ptr[0],params->mgid.raw, 16);
	memcpy(&ptr[16],params->port_gid.raw, 16);

	(*(uint32_t *)(ptr + 32)) = htonl(DEF_QKEY);
	(*(uint16_t *)(ptr + 40)) = params->pkey;
	ptr[39]                    = DEF_TCLASS;
	ptr[44]                    = INSERTF(ptr[44], 4, DEF_SLL, 0, 4);
	ptr[44]                    = INSERTF(ptr[44], 0, DEF_FLOW_LABLE, 16, 4);
	ptr[45]                    = INSERTF(ptr[45], 0, DEF_FLOW_LABLE, 8, 8);
	ptr[46]                    = INSERTF(ptr[46], 0, DEF_FLOW_LABLE, 0, 8);
	ptr[48]                    = INSERTF(ptr[48], 0, MCMEMBER_JOINSTATE_FULL_MEMBER, 0, 4);

	comp_mask = SUBN_ADM_COMPMASK_MGID | SUBN_ADM_COMPMASK_PORT_GID | SUBN_ADM_COMPMASK_Q_KEY |
		SUBN_ADM_COMPMASK_P_KEY | SUBN_ADM_COMPMASK_TCLASS | SUBN_ADM_COMPMASK_SL |
		SUBN_ADM_COMPMASK_FLOW_LABEL | SUBN_ADM_COMPMASK_JOIN_STATE;

	samad_packet->ComponentMask = ntoh_64(comp_mask);
}

/******************************************************************************
 * check_mad_status
 ******************************************************************************/
static int check_mad_status(struct sa_mad_packet_t *samad_packet)
{
	uint8_t *ptr;
	uint32_t user_trans_id;
	uint16_t mad_header_status;

	ptr = samad_packet->mad_header_buf;

	/* the upper 32 bits of TransactionID were set by the kernel */
	user_trans_id = ntohl(*(uint32_t *)(ptr + 12));

	/* check the TransactionID to make sure this is the response */
	/* for the join/leave multicast group request we posted */
	if (user_trans_id != DEF_TRANS_ID) {
		fprintf(stderr, "received a mad with TransactionID 0x%x, when expecting 0x%x\n",
				(unsigned int)user_trans_id, (unsigned int)DEF_TRANS_ID);;
		return 1;
	}

	mad_header_status = 0x0;
	mad_header_status = INSERTF(mad_header_status, 8, ptr[4], 0, 7);
	mad_header_status = INSERTF(mad_header_status, 0, ptr[5], 0, 8);

	if (mad_header_status) {
		fprintf(stderr,"received UMAD with an error: 0x%x\n", mad_header_status);
		return 1;
	}

	return 0;
}


/******************************************************************************
 * get_mlid_from_mad
 ******************************************************************************/
static void get_mlid_from_mad(struct sa_mad_packet_t *samad_packet,uint16_t *mlid)
{
	uint8_t *ptr;
	ptr = samad_packet->SubnetAdminData;
	*mlid = ntohs(*(uint16_t *)(ptr + 36));
}

/******************************************************************************
 * set_multicast_gid
 ******************************************************************************/
void set_multicast_gid(struct mcast_parameters *params,uint32_t qp_num,int is_client)
{
	uint8_t mcg_gid[16] = MCG_GID;
	const char *pstr = params->user_mgid;
	char *term = NULL;
	char tmp[20];
	int i;

	if (params->user_mgid) {
		term = strpbrk(pstr, ":");
		memcpy(tmp, pstr, term - pstr+1);
		tmp[term - pstr] = 0;

		mcg_gid[0] = (unsigned char)strtoll(tmp, NULL, 0);

		for (i = 1; i < 15; ++i) {
			pstr += term - pstr + 1;
			term = strpbrk(pstr, ":");
			memcpy(tmp, pstr, term - pstr+1);
			tmp[term - pstr] = 0;

			mcg_gid[i] = (unsigned char)strtoll(tmp, NULL, 0);
		}
		pstr += term - pstr + 1;

		strcpy(tmp, pstr);
		mcg_gid[15] = (unsigned char)strtoll(tmp, NULL, 0);
	}

	memcpy(params->mgid.raw,mcg_gid,16);
	if (is_client && params->user_mgid==NULL)
		params->mgid.raw[15]++;
}

/******************************************************************************
 * Set pkey correctly for cases where non-default values are used (e.g. Azure setup)
 ******************************************************************************/
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
	struct ibv_device_attr device_attr;
	int32_t partial_ix = -1;
	uint16_t pkey = 0xffff;
	uint16_t tmp_pkey;
	uint16_t pkey_tbl;
	uint16_t index;
	int ret;
	int i;

	ret = ibv_query_device(ctx, &device_attr);
	if (ret)
		return ret;

	pkey_tbl = device_attr.max_pkeys;
	for (i = 0; i < pkey_tbl; ++i) {
		ret = ibv_query_pkey(ctx, port_num, i, &tmp_pkey);
		if (ret)
			continue;

		tmp_pkey = ntohs(tmp_pkey);
		if ((pkey & 0x7fff) == (tmp_pkey & 0x7fff)) {
			/* if there is full-member pkey take it.*/
			if (tmp_pkey & 0x8000) {
				index = i;
				umad_set_pkey(umad_buff, index);
				return 0;
			}
			if (partial_ix < 0)
				partial_ix = i;
		}
	}

	/*no full-member, if exists take the limited*/
	if (partial_ix >= 0) {
		index = partial_ix;
		umad_set_pkey(umad_buff, index);
		return 0;
	}

	return 1;
}

/******************************************************************************
 * join_multicast_group
 ******************************************************************************/
int join_multicast_group(subn_adm_method method,struct mcast_parameters *params)
{
	int portid = -1;
	int agentid = -1;
	void *umad_buff = NULL;
	void *mad = NULL;
	int length = MAD_SIZE;
	int test_result = 1;

	/* mlid will be assigned to the new LID after the join */
	if (umad_init() < 0) {
		fprintf(stderr, "failed to init the UMAD library\n");
		goto cleanup;
	}
	/* use casting to loose the "const char0 *" */
	portid = umad_open_port((char*)params->ib_devname,params->ib_port);
	if (portid < 0) {
		fprintf(stderr,"failed to open UMAD port %d\n",params->ib_port);
		goto cleanup;
	}

	agentid = umad_register(portid,MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
	if (agentid < 0) {
		fprintf(stderr,"failed to register UMAD agent for MADs\n");
		goto cleanup;
	}

	umad_buff = umad_alloc(1, umad_size() + MAD_SIZE);
	if (!umad_buff) {
		fprintf(stderr, "failed to allocate MAD buffer\n");
		goto cleanup;
	}

	mad = umad_get_mad(umad_buff);
	prepare_mcast_mad(method,params,(struct sa_mad_packet_t *)mad);

	if (set_pkey(umad_buff, params->ib_ctx, params->ib_port)) {
		fprintf(stderr, "failed to set pkey index\n");
		goto cleanup;
	}

	if (umad_set_addr(umad_buff,params->sm_lid,1,params->sm_sl,QP1_WELL_KNOWN_Q_KEY) < 0) {
		fprintf(stderr, "failed to set the destination address of the SMP\n");
		goto cleanup;
	}

	if (umad_send(portid,agentid,umad_buff,MAD_SIZE,100,5) < 0) {
		fprintf(stderr, "failed to send MAD\n");
		goto cleanup;
	}

	if (umad_recv(portid,umad_buff,&length,5000) < 0) {
		fprintf(stderr, "failed to receive MAD response\n");
		goto cleanup;
	}

	if (check_mad_status((struct sa_mad_packet_t*)mad)) {
		fprintf(stderr, "failed to get mlid from MAD\n");
		goto cleanup;
	}

	/*  "Join multicast group" message was sent */
	if (method == SUBN_ADM_METHOD_SET) {
		get_mlid_from_mad((struct sa_mad_packet_t*)mad,&params->mlid);
		params->mcast_state |= MCAST_IS_JOINED;
		sighandler_params = params;
		signal(SIGINT,signalCatcher);
	} else {
		params->mcast_state &= ~MCAST_IS_JOINED;
	}
	test_result = 0;

cleanup:
	if (umad_buff)
		umad_free(umad_buff);

	if (portid >= 0) {
		if (agentid >= 0) {
			if (umad_unregister(portid, agentid)) {
				fprintf(stderr, "failed to deregister UMAD agent for MADs\n");
				test_result = 1;
			}
		}

		if (umad_close_port(portid)) {
			fprintf(stderr, "failed to close UMAD portid\n");
			test_result = 1;
		}
	}

	return test_result;
}

/******************************************************************************
 * End
 ******************************************************************************/

我将逐句解读这段C语言代码。这是一个InfiniBand多播组管理的实现代码:

1. 头文件包含部分

c 复制代码
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <netdb.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#if !defined(__FreeBSD__)
#include <malloc.h>
#endif
#include <getopt.h>
#include <errno.h>
#if defined(__FreeBSD__)
#include <infiniband/byteswap.h>
#else
#include <byteswap.h>
#endif
#include <signal.h>
#include <pthread.h>
#include "multicast_resources.h"
#include "perftest_communication.h"

解读: 包含了标准C库、网络编程、时间处理、内存管理、信号处理、线程支持等头文件。条件编译处理了不同操作系统(FreeBSD和非FreeBSD)的差异。

2. 全局变量

c 复制代码
/* This is when we get sig handler from the user before we remove the join request. */
struct mcast_parameters *sighandler_params;

解读: 声明一个全局指针,用于在信号处理函数中访问多播参数结构体。

3. 信号处理函数

c 复制代码
static void signalCatcher (int sig)
{
    if (sig == SIGINT) {
        if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
            fprintf(stderr,"Couldn't Unregister the Mcast group on the SM\n");
        
        if (sighandler_params->is_2nd_mgid_used) {
            memcpy(sighandler_params->mgid.raw,sighandler_params->base_mgid.raw,16);
            if (join_multicast_group(SUBN_ADM_METHOD_DELETE,sighandler_params))
                fprintf(stderr,"Couldn't Unregister the Base Mcast group on the SM\n");
        }
    }
    exit(1);
}

解读:

  • 静态信号处理函数,捕获SIGINT信号(Ctrl+C)
  • 调用join_multicast_group函数删除多播组
  • 如果需要第二个MGID(多播组ID),也删除基础多播组
  • 退出程序

4. 准备MAD包函数

c 复制代码
static void prepare_mcast_mad(uint8_t method,
        struct mcast_parameters *params,
        struct sa_mad_packet_t *samad_packet)
{
    uint8_t *ptr;
    uint64_t comp_mask;
    
    memset(samad_packet,0,sizeof(*samad_packet));
    
    /* prepare the MAD header. according to Table 145 in IB spec 1.2.1 */
    ptr = samad_packet->mad_header_buf;
    ptr[0]                     = 0x01;                    /* BaseVersion */
    ptr[1]                     = MANAGMENT_CLASS_SUBN_ADM;            /* MgmtClass */
    ptr[2]                     = 0x02;                     /* ClassVersion */
    ptr[3]                     = INSERTF(ptr[3], 0, method, 0, 7);         /* Method */
    (*(uint64_t *)(ptr + 8))   = ntoh_64((uint64_t)DEF_TRANS_ID);             /* TransactionID */
    (*(uint16_t *)(ptr + 16))  = htons(SUBN_ADM_ATTR_MC_MEMBER_RECORD);      /* AttributeID */
    
    ptr = samad_packet->SubnetAdminData;
    
    memcpy(&ptr[0],params->mgid.raw, 16);
    memcpy(&ptr[16],params->port_gid.raw, 16);
    
    (*(uint32_t *)(ptr + 32)) = htonl(DEF_QKEY);
    (*(uint16_t *)(ptr + 40)) = params->pkey;
    ptr[39]                    = DEF_TCLASS;
    ptr[44]                    = INSERTF(ptr[44], 4, DEF_SLL, 0, 4);
    ptr[44]                    = INSERTF(ptr[44], 0, DEF_FLOW_LABLE, 16, 4);
    ptr[45]                    = INSERTF(ptr[45], 0, DEF_FLOW_LABLE, 8, 8);
    ptr[46]                    = INSERTF(ptr[46], 0, DEF_FLOW_LABLE, 0, 8);
    ptr[48]                    = INSERTF(ptr[48], 0, MCMEMBER_JOINSTATE_FULL_MEMBER, 0, 4);
    
    comp_mask = SUBN_ADM_COMPMASK_MGID | SUBN_ADM_COMPMASK_PORT_GID | SUBN_ADM_COMPMASK_Q_KEY |
        SUBN_ADM_COMPMASK_P_KEY | SUBN_ADM_COMPMASK_TCLASS | SUBN_ADM_COMPMASK_SL |
        SUBN_ADM_COMPMASK_FLOW_LABEL | SUBN_ADM_COMPMASK_JOIN_STATE;
    
    samad_packet->ComponentMask = ntoh_64(comp_mask);
}

解读:

  • 创建Subnet Administration MAD(管理数据包)
  • 填充MAD头部:版本号、管理类、方法、事务ID等
  • 填充子网管理数据:MGID、端口GID、Q_Key、P_Key、流量类别等
  • 设置组件掩码,指定哪些字段是有效的

5. 检查MAD状态

c 复制代码
static int check_mad_status(struct sa_mad_packet_t *samad_packet)
{
    uint8_t *ptr;
    uint32_t user_trans_id;
    uint16_t mad_header_status;
    
    ptr = samad_packet->mad_header_buf;
    
    /* the upper 32 bits of TransactionID were set by the kernel */
    user_trans_id = ntohl(*(uint32_t *)(ptr + 12));
    
    /* check the TransactionID to make sure this is the response */
    /* for the join/leave multicast group request we posted */
    if (user_trans_id != DEF_TRANS_ID) {
        fprintf(stderr, "received a mad with TransactionID 0x%x, when expecting 0x%x\n",
                (unsigned int)user_trans_id, (unsigned int)DEF_TRANS_ID);;
        return 1;
    }
    
    mad_header_status = 0x0;
    mad_header_status = INSERTF(mad_header_status, 8, ptr[4], 0, 7);
    mad_header_status = INSERTF(mad_header_status, 0, ptr[5], 0, 8);
    
    if (mad_header_status) {
        fprintf(stderr,"received UMAD with an error: 0x%x\n", mad_header_status);
        return 1;
    }
    
    return 0;
}

解读:

  • 检查接收到的MAD响应
  • 验证事务ID是否匹配(防止响应错乱)
  • 检查MAD头部状态字段是否有错误

6. 从MAD获取MLID

c 复制代码
static void get_mlid_from_mad(struct sa_mad_packet_t *samad_packet,uint16_t *mlid)
{
    uint8_t *ptr;
    ptr = samad_packet->SubnetAdminData;
    *mlid = ntohs(*(uint16_t *)(ptr + 36));
}

解读: 从MAD响应中提取多播LID(MLID),用于后续多播通信。

7. 设置多播GID

c 复制代码
void set_multicast_gid(struct mcast_parameters *params,uint32_t qp_num,int is_client)
{
    uint8_t mcg_gid[16] = MCG_GID;
    const char *pstr = params->user_mgid;
    char *term = NULL;
    char tmp[20];
    int i;
    
    if (params->user_mgid) {
        term = strpbrk(pstr, ":");
        memcpy(tmp, pstr, term - pstr+1);
        tmp[term - pstr] = 0;
        
        mcg_gid[0] = (unsigned char)strtoll(tmp, NULL, 0);
        
        for (i = 1; i < 15; ++i) {
            pstr += term - pstr + 1;
            term = strpbrk(pstr, ":");
            memcpy(tmp, pstr, term - pstr+1);
            tmp[term - pstr] = 0;
            
            mcg_gid[i] = (unsigned char)strtoll(tmp, NULL, 0);
        }
        pstr += term - pstr + 1;
        
        strcpy(tmp, pstr);
        mcg_gid[15] = (unsigned char)strtoll(tmp, NULL, 0);
    }
    
    memcpy(params->mgid.raw,mcg_gid,16);
    if (is_client && params->user_mgid==NULL)
        params->mgid.raw[15]++;
}

解读:

  • 设置多播组GID(全局标识符)
  • 如果用户提供了MGID,解析冒号分隔的十六进制字符串
  • 如果未提供MGID且是客户端,使用默认GID并递增最后一个字节

8. 设置P_Key

c 复制代码
static int set_pkey(void *umad_buff, struct ibv_context *ctx, int port_num)
{
    struct ibv_device_attr device_attr;
    int32_t partial_ix = -1;
    uint16_t pkey = 0xffff;
    uint16_t tmp_pkey;
    uint16_t pkey_tbl;
    uint16_t index;
    int ret;
    int i;
    
    ret = ibv_query_device(ctx, &device_attr);
    if (ret)
        return ret;
    
    pkey_tbl = device_attr.max_pkeys;
    for (i = 0; i < pkey_tbl; ++i) {
        ret = ibv_query_pkey(ctx, port_num, i, &tmp_pkey);
        if (ret)
            continue;
        
        tmp_pkey = ntohs(tmp_pkey);
        if ((pkey & 0x7fff) == (tmp_pkey & 0x7fff)) {
            /* if there is full-member pkey take it.*/
            if (tmp_pkey & 0x8000) {
                index = i;
                umad_set_pkey(umad_buff, index);
                return 0;
            }
            if (partial_ix < 0)
                partial_ix = i;
        }
    }
    
    /*no full-member, if exists take the limited*/
    if (partial_ix >= 0) {
        index = partial_ix;
        umad_set_pkey(umad_buff, index);
        return 0;
    }
    
    return 1;
}

解读:

  • 为MAD包设置P_Key(分区密钥)
  • 查询设备属性获取P_Key表大小
  • 寻找与默认P_Key匹配的项
  • 优先选择完整成员(full-member)P_Key,如果没有则选择受限成员(limited-member)

9. 主要函数:加入多播组

c 复制代码
int join_multicast_group(subn_adm_method method,struct mcast_parameters *params)
{
    int portid = -1;
    int agentid = -1;
    void *umad_buff = NULL;
    void *mad = NULL;
    int length = MAD_SIZE;
    int test_result = 1;
    
    /* mlid will be assigned to the new LID after the join */
    if (umad_init() < 0) {
        fprintf(stderr, "failed to init the UMAD library\n");
        goto cleanup;
    }
    /* use casting to loose the "const char0 *" */
    portid = umad_open_port((char*)params->ib_devname,params->ib_port);
    if (portid < 0) {
        fprintf(stderr,"failed to open UMAD port %d\n",params->ib_port);
        goto cleanup;
    }
    
    agentid = umad_register(portid,MANAGMENT_CLASS_SUBN_ADM, 2, 0, 0);
    if (agentid < 0) {
        fprintf(stderr,"failed to register UMAD agent for MADs\n");
        goto cleanup;
    }
    
    umad_buff = umad_alloc(1, umad_size() + MAD_SIZE);
    if (!umad_buff) {
        fprintf(stderr, "failed to allocate MAD buffer\n");
        goto cleanup;
    }
    
    mad = umad_get_mad(umad_buff);
    prepare_mcast_mad(method,params,(struct sa_mad_packet_t *)mad);
    
    if (set_pkey(umad_buff, params->ib_ctx, params->ib_port)) {
        fprintf(stderr, "failed to set pkey index\n");
        goto cleanup;
    }
    
    if (umad_set_addr(umad_buff,params->sm_lid,1,params->sm_sl,QP1_WELL_KNOWN_Q_KEY) < 0) {
        fprintf(stderr, "failed to set the destination address of the SMP\n");
        goto cleanup;
    }
    
    if (umad_send(portid,agentid,umad_buff,MAD_SIZE,100,5) < 0) {
        fprintf(stderr, "failed to send MAD\n");
        goto cleanup;
    }
    
    if (umad_recv(portid,umad_buff,&length,5000) < 0) {
        fprintf(stderr, "failed to receive MAD response\n");
        goto cleanup;
    }
    
    if (check_mad_status((struct sa_mad_packet_t*)mad)) {
        fprintf(stderr, "failed to get mlid from MAD\n");
        goto cleanup;
    }
    
    /*  "Join multicast group" message was sent */
    if (method == SUBN_ADM_METHOD_SET) {
        get_mlid_from_mad((struct sa_mad_packet_t*)mad,&params->mlid);
        params->mcast_state |= MCAST_IS_JOINED;
        sighandler_params = params;
        signal(SIGINT,signalCatcher);
    } else {
        params->mcast_state &= ~MCAST_IS_JOINED;
    }
    test_result = 0;
    
cleanup:
    if (umad_buff)
        umad_free(umad_buff);
    
    if (portid >= 0) {
        if (agentid >= 0) {
            if (umad_unregister(portid, agentid)) {
                fprintf(stderr, "failed to deregister UMAD agent for MADs\n");
                test_result = 1;
            }
        }
        
        if (umad_close_port(portid)) {
            fprintf(stderr, "failed to close UMAD portid\n");
            test_result = 1;
        }
    }
    
    return test_result;
}

解读:

  • 主要功能函数,用于加入或离开多播组
  • 初始化UMAD库,打开端口,注册代理
  • 准备并发送MAD请求到子网管理器(SM)
  • 接收并验证响应
  • 如果是加入操作,提取MLID并设置信号处理器
  • 清理资源并返回结果

这段代码是一个典型的InfiniBand多播组管理实现,使用子网管理MAD与子网管理器通信,支持多播组的动态加入和离开。

相关推荐
悟道|养家2 小时前
微服务扇出:网络往返时间的影响与优化实践(5)
网络·微服务
funnycoffee1232 小时前
华为USG6555F 防火墙 ---华为6857交换机 光口对接无法UP故障
服务器·网络·华为·usg自协商
Tandy12356_2 小时前
手写TCP/IP协议栈——TCP数据接收
c语言·网络·网络协议·tcp/ip·计算机网络
wait_luky3 小时前
NFS服务器
linux·服务器·网络
2501_927773073 小时前
嵌入式——串口
网络
fy zs3 小时前
TCP/IP 协议栈深度解析
网络·网络协议·tcp/ip
NewCarRen3 小时前
安全碰撞测试:汽车车载IT组件的实际安全评估
网络·网络安全
xinxinhenmeihao3 小时前
使用长效代理是否存在安全风险?长效代理适合哪些应用场景?
服务器·网络·安全