DPDK MP (Multi-Process) 通道深度解析

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DPDK 多进程架构                                      │
└─────────────────────────────────────────────────────────────────────────────┘

                    ┌──────────────────────────────────┐
                    │         Primary Process          │
                    │           (主进程)                │
                    │                                  │
                    │  ┌────────────────────────────┐  │
                    │  │   MP Server Socket         │  │
                    │  │   /var/run/.dpdk/mp_socket │  │
                    │  └────────────────────────────┘  │
                    └──────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    │               │               │
                    ▼               ▼               ▼
            ┌───────────┐   ┌───────────┐   ┌───────────┐
            │ Secondary │   │ Secondary │   │ Secondary │
            │ Process 1 │   │ Process 2 │   │ Process N │
            │ (从进程)   │   │ (从进程)   │   │ (从进程)   │
            └───────────┘   └───────────┘   └───────────┘

1.2 核心特性

特性	说明
通信方式	Unix Domain Socket (SOCK_SEQPACKET)
消息格式	JSON 结构化消息
支持场景	配置同步、状态查询、资源管理
可靠性	可靠传输，支持请求 - 响应模式
扩展性	支持自定义消息处理器

2. 设计背景与动机

2.1 为什么需要 MP 通道？

在 DPDK 多进程场景下，主进程和从进程需要协同工作：

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                        需要进程间通信的场景                                   │
└─────────────────────────────────────────────────────────────────────────────┘

场景 1: 内存资源共享
─────────────────────────────────────────────────────────────────────────────

    ┌─────────────────────────────────────────────────────────────────────┐
    │                         大页内存 (Hugepages)                         │
    │                                                                      │
    │   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │
    │   │   Page 0     │  │   Page 1     │  │   Page N     │             │
    │   │  (2MB/1GB)   │  │  (2MB/1GB)   │  │  (2MB/1GB)   │             │
    │   └──────────────┘  └──────────────┘  └──────────────┘             │
    │          ▲                 ▲                 ▲                      │
    │          │                 │                 │                      │
    │    ┌─────┴─────┐     ┌─────┴─────┐     ┌─────┴─────┐               │
    │    │  Primary  │     │ Secondary │     │ Secondary │               │
    │    │  Process  │     │  Process 1│     │  Process N│               │
    │    └───────────┘     └───────────┘     └───────────┘               │
    │                                                                      │
    │   问题: 主进程分配的内存，从进程如何知道地址？                          │
    │   解决: MP 通道传递内存映射信息                                       │
    └─────────────────────────────────────────────────────────────────────┘

场景 2: 设备状态同步
─────────────────────────────────────────────────────────────────────────────

    ┌─────────────────────────────────────────────────────────────────────┐
    │                          网络设备 (NIC)                             │
    │                                                                      │
    │   ┌──────────────────────────────────────────────────────────────┐  │
    │   │                    Rx/Tx Queue 状态                           │  │
    │   │                                                                │  │
    │   │   Queue 0: Primary 使用                                       │  │
    │   │   Queue 1: Secondary 1 使用                                   │  │
    │   │   Queue 2: Secondary 2 使用                                   │  │
    │   │   ...                                                         │  │
    │   └──────────────────────────────────────────────────────────────┘  │
    │                                                                      │
    │   问题: 队列如何分配？从进程如何知道哪些队列可用？                      │
    │   解决: MP 通道传递设备配置信息                                       │
    └─────────────────────────────────────────────────────────────────────┘

场景 3: 热插拔事件通知
─────────────────────────────────────────────────────────────────────────────

    ┌─────────────────────────────────────────────────────────────────────┐
    │                                                                      │
    │   [新网卡插入]                                                       │
    │        │                                                             │
    │        ▼                                                             │
    │   Primary Process 检测到热插拔事件                                   │
    │        │                                                             │
    │        │ 通过 MP 通道广播                                            │
    │        ▼                                                             │
    │   ┌──────────────┬──────────────┬──────────────┐                    │
    │   │ Secondary 1  │ Secondary 2  │ Secondary N  │                    │
    │   │ 收到通知      │ 收到通知      │ 收到通知      │                    │
    │   └──────────────┴──────────────┴──────────────┘                    │
    │                                                                      │
    │   问题: 如何通知所有进程设备状态变化？                                 │
    │   解决: MP 通道广播消息                                              │
    └─────────────────────────────────────────────────────────────────────┘

2.2 设计目标

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                          MP 通道设计目标                                     │
└─────────────────────────────────────────────────────────────────────────────┘

1. 通用性 (Generality)
   ────────────────────────────────────────────────────────────────────────
   • 支持任意类型的消息传递
   • 不局限于特定业务场景
   • 用户可注册自定义消息处理器

2. 可靠性 (Reliability)
   ────────────────────────────────────────────────────────────────────────
   • 基于 SOCK_SEQPACKET，保证消息边界和顺序
   • 支持请求-响应模式，确认消息送达
   • 超时机制，防止无限等待

3. 高性能 (Performance)
   ────────────────────────────────────────────────────────────────────────
   • Unix Domain Socket，避免网络协议栈开销
   • 零拷贝设计（尽可能）
   • 异步处理，不阻塞数据面

4. 安全性 (Security)
   ──────────────────────────────────────────────────────────────────────
   • 文件系统权限控制访问
   • 支持 SO_PEERCRED 验证对端身份

3. 核心架构

3.1 整体架构图

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MP 通道整体架构                                      │
└─────────────────────────────────────────────────────────────────────────────┘

                                    Primary Process
                                    ┌──────────────────────────────────────┐
                                    │                                      │
    用户层                          │  ┌────────────────────────────────┐  │
    ─────────────────────────────   │  │    rte_mp_action 注册表        │  │
                                    │  │                                │  │
                                    │  │  ┌─────────┐  ┌─────────┐     │  │
                                    │  │  │ action1 │  │ action2 │ ... │  │
                                    │  │  │ handler │  │ handler │     │  │
                                    │  │  └─────────┘  └─────────┘     │  │
                                    │  └────────────────────────────────┘  │
                                    │                 ▲                    │
                                    │                 │ 查找处理函数        │
                                    │                 ▼                    │
                                    │  ┌────────────────────────────────┐  │
                                    │  │       MP Server 线程            │  │
                                    │  │                                │  │
                                    │  │  ┌────────────────────────┐    │  │
                                    │  │  │  epoll_wait()          │    │  │
                                    │  │  │  监听 Unix Socket      │    │  │
                                    │  │  └────────────────────────┘    │  │
                                    │  └────────────────────────────────┘  │
                                    │                 ▲                    │
                                    │                 │                    │
                                    │  ┌────────────────────────────────┐  │
                                    │  │      Unix Domain Socket        │  │
                                    │  │   /var/run/.dpdk/mp_socket     │  │
                                    │  └────────────────────────────────┘  │
                                    └──────────────────────────────────────┘
                                                        ▲
                                                        │
                                    ┌───────────────────┼───────────────────┐
                                    │                   │                   │
                                    ▼                   ▼                   ▼
                            Secondary 1          Secondary 2          Secondary N
                            ┌──────────┐          ┌──────────┐          ┌──────────┐
                            │ MP Client│          │ MP Client│          │ MP Client│
                            │          │          │          │          │          │
                            │ connect()│          │ connect()│          │ connect()│
                            └──────────┘          └──────────┘          └──────────┘

3.2 通信模型

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MP 通道通信模型                                      │
└─────────────────────────────────────────────────────────────────────────────┘

模式 1: 请求-响应 (Request-Response)
─────────────────────────────────────────────────────────────────────────────

    Secondary Process              Primary Process
    ┌──────────────────┐          ┌──────────────────┐
    │                  │          │                  │
    │  rte_mp_request()│──────────▶                  │
    │  (发送请求)       │   请求    │   处理请求       │
    │                  │          │   调用 handler   │
    │                  │◀──────────│                  │
    │                  │   响应    │  rte_mp_reply()  │
    │                  │          │                  │
    └──────────────────┘          └──────────────────┘
    
    特点:
    • 同步阻塞（可设置超时）
    • 保证响应匹配请求
    • 适用于配置查询、资源请求


模式 2: 单向消息 (One-way Message)
─────────────────────────────────────────────────────────────────────────────

    Secondary Process              Primary Process
    ┌──────────────────┐          ┌──────────────────┐
    │                  │          │                  │
    │  rte_mp_sendmsg()│──────────▶                  │
    │  (发送消息)       │   消息    │   处理消息       │
    │                  │          │   无需响应       │
    │                  │          │                  │
    │   (继续执行)      │          │                  │
    │                  │          │                  │
    └──────────────────┘          └──────────────────┘
    
    特点:
    • 异步非阻塞
    • 不等待响应
    • 适用于事件通知、日志上报


模式 3: 广播消息 (Broadcast)
─────────────────────────────────────────────────────────────────────────────

    Primary Process
    ┌──────────────────┐
    │                  │
    │ rte_mp_sendmsg() │
    │ (broadcast=true) │
    │                  │
    └──────────────────┘
            │
            │ 广播消息
            ▼
    ┌───────────────────────────────────────────────┐
    │                                               │
    ▼               ▼               ▼               ▼
┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐
│ Sec 1  │    │ Sec 2  │    │ Sec 3  │    │ Sec N  │
│收到消息│    │收到消息│    │收到消息│    │收到消息│
└────────┘    └────────┘    └────────┘    └────────┘

    特点:
    • 一对多通信
    • 主进程可广播给所有从进程
    • 适用于全局事件通知（如热插拔）

3.3 目录结构

复制代码

DPDK 19.08 MP 通道相关源码:
─────────────────────────────────────────────────────────────────────────────

lib/librte_eal/
├── common/
│   ├── eal_common_mp.c              # MP 通道核心实现（平台无关）
│   ├── include/rte_eal.h            # EAL 公共头文件
│   └── include/rte_mp.h             # MP 消息结构定义
│
├── linux/
│   └── eal/
│       ├── eal_mp.c                 # Linux 特定的 MP 实现
│       └── eal.c                    # EAL 初始化（调用 mp_channel_init）
│
└── freebsd/
    └── eal/
        └── eal_mp.c                 # FreeBSD 特定的 MP 实现

4. 源码深度分析

4.1 初始化流程

cpp 复制代码

// 文件: lib/librte_eal/common/eal_common_mp.c

/*
 * MP 通道初始化函数
 * 在 rte_eal_init() 中被调用
 */
int
rte_mp_channel_init(void)
{
    char path[PATH_MAX];
    int fd, ret;
    
    // 1. 创建 socket 目录
    // 路径: /var/run/.dpdk/ (或 /tmp/.dpdk/)
    ret = snprintf(mp_dir_path, sizeof(mp_dir_path), "/var/run/.dpdk/%s",
                   internal_config.hugefile_prefix);
    mkdir(mp_dir_path, 0700);  // 权限: 仅所有者可读写执行
    
    // 2. 创建 Unix Domain Socket
    fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
    if (fd < 0) {
        RTE_LOG(ERR, EAL, "failed to create unix socket\n");
        return -1;
    }
    
    // 3. 绑定 socket 到文件路径
    snprintf(path, sizeof(path), "%s/mp_socket", mp_dir_path);
    
    struct sockaddr_un un;
    memset(&un, 0, sizeof(un));
    un.sun_family = AF_UNIX;
    strncpy(un.sun_path, path, sizeof(un.sun_path) - 1);
    
    ret = bind(fd, (struct sockaddr *)&un, sizeof(un));
    if (ret < 0) {
        RTE_LOG(ERR, EAL, "failed to bind %s: %s\n", path, strerror(errno));
        close(fd);
        return -1;
    }
    
    // 4. 监听连接
    ret = listen(fd, LISTEN_BACKLOG);
    if (ret < 0) {
        RTE_LOG(ERR, EAL, "failed to listen: %s\n", strerror(errno));
        close(fd);
        unlink(path);
        return -1;
    }
    
    // 5. 保存 socket fd
    mp_fd = fd;
    
    // 6. 启动监听线程 (Primary Process)
    if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
        ret = pthread_create(&mp_thread_id, NULL, mp_thread, NULL);
        if (ret < 0) {
            RTE_LOG(ERR, EAL, "failed to create mp thread: %s\n",
                    strerror(errno));
            close(fd);
            unlink(path);
            return -1;
        }
    }
    
    RTE_LOG(INFO, EAL, "MP channel initialized successfully\n");
    return 0;
}

初始化流程图:

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MP 通道初始化流程                                    │
└─────────────────────────────────────────────────────────────────────────────┘

    rte_eal_init(argc, argv)
            │
            ▼
    rte_mp_channel_init()
            │
            ├───▶ 创建目录: /var/run/.dpdk/<prefix>/
            │
            ├───▶ 创建 Unix Domain Socket
            │          │
            │          │ socket(AF_UNIX, SOCK_SEQPACKET, 0)
            │          │
            │          ▼
            │          fd = socket()
            │
            ├───▶ 绑定 socket 到文件路径
            │          │
            │          │ bind(fd, "/var/run/.dpdk/<prefix>/mp_socket")
            │          │
            │          ▼
            │          bind()
            │
            ├───▶ 开始监听
            │          │
            │          │ listen(fd, backlog)
            │          │
            │          ▼
            │          listen()
            │
            └───▶ 判断进程类型
                    │
                    ├─── Primary Process
                    │        │
                    │        ▼
                    │    启动 MP 监听线程
                    │        │
                    │        ▼
                    │    pthread_create(mp_thread, ...)
                    │
                    └─── Secondary Process
                             │
                             ▼
                         仅保存 socket 路径
                         等待连接

4.2 主进程监听线程

cpp 复制代码

// 文件: lib/librte_eal/common/eal_common_mp.c

/*
 * MP 监听线程主函数 (Primary Process 专用)
 * 使用 epoll 监听连接请求和消息
 */
static void *
mp_thread(void *arg __rte_unused)
{
    struct epoll_event events[MAX_EVENTS];
    struct epoll_event ev;
    int epoll_fd;
    
    // 1. 创建 epoll 实例
    epoll_fd = epoll_create1(0);  // ◀── 这里使用系统的 epoll！
    if (epoll_fd < 0) {
        RTE_LOG(ERR, EAL, "Failed to create epoll: %s\n", strerror(errno));
        return NULL;
    }
    
    // 2. 将监听 socket 加入 epoll
    ev.events = EPOLLIN;
    ev.data.fd = mp_fd;
    if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, mp_fd, &ev) < 0) {  // ◀── 系统调用
        RTE_LOG(ERR, EAL, "Failed to add mp_fd to epoll: %s\n", strerror(errno));
        close(epoll_fd);
        return NULL;
    }
    
    // 3. 事件循环
    while (!mp_exit_flag) {
        // 等待事件
        int n = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);  // ◀── 系统调用
        if (n < 0) {
            if (errno == EINTR)
                continue;
            RTE_LOG(ERR, EAL, "epoll_wait failed: %s\n", strerror(errno));
            break;
        }
        
        // 处理所有就绪的事件
        for (int i = 0; i < n; i++) {
            if (events[i].data.fd == mp_fd) {
                // 新连接请求
                handle_new_connection(epoll_fd);
            } else {
                // 已有连接的数据
                handle_client_message(events[i].data.fd);
            }
        }
    }
    
    close(epoll_fd);
    return NULL;
}

/*
 * 处理新连接
 */
static void
handle_new_connection(int epoll_fd)
{
    struct sockaddr_un client_addr;
    socklen_t addr_len = sizeof(client_addr);
    struct epoll_event ev;
    
    // 接受连接
    int client_fd = accept(mp_fd, (struct sockaddr *)&client_addr, &addr_len);
    if (client_fd < 0) {
        RTE_LOG(ERR, EAL, "Failed to accept: %s\n", strerror(errno));
        return;
    }
    
    // 将新连接加入 epoll
    ev.events = EPOLLIN;
    ev.data.fd = client_fd;
    if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_fd, &ev) < 0) {  // ◀── 系统调用
        RTE_LOG(ERR, EAL, "Failed to add client to epoll: %s\n", strerror(errno));
        close(client_fd);
        return;
    }
    
    RTE_LOG(DEBUG, EAL, "New client connected, fd=%d\n", client_fd);
}

/*
 * 处理客户端消息
 */
static void
handle_client_message(int client_fd)
{
    struct rte_mp_msg msg;
    struct rte_mp_reply reply;
    int ret;
    
    // 接收消息
    ret = recv(client_fd, &msg, sizeof(msg), 0);
    if (ret <= 0) {
        // 连接关闭或错误
        close(client_fd);
        return;
    }
    
    // 查找并调用消息处理函数
    struct action_entry *entry = find_action(msg.name);
    if (entry && entry->action) {
        // 调用注册的处理函数
        ret = entry->action(&msg, &reply);
        
        // 发送响应
        if (ret == 0) {
            send(client_fd, &reply, sizeof(reply), 0);
        }
    } else {
        RTE_LOG(ERR, EAL, "No handler for message: %s\n", msg.name);
    }
}

监听线程流程图:

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                      MP 监听线程工作流程                                     │
└─────────────────────────────────────────────────────────────────────────────┘

    mp_thread()
        │
        ▼
    创建 epoll 实例
    epoll_fd = epoll_create1(0)
        │
        ▼
    添加监听 socket 到 epoll
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, mp_fd, ...)
        │
        ▼
    ┌─────────────────────────────────────┐
    │         事件循环 (Event Loop)        │
    │                                     │
    │     epoll_wait(epoll_fd, events)    │
    │             │                       │
    │             ▼                       │
    │     ┌───────┴───────┐               │
    │     │               │               │
    │     ▼               ▼               │
    │ 新连接事件       数据就绪事件        │
    │ (mp_fd)         (client_fd)         │
    │     │               │               │
    │     ▼               ▼               │
    │ accept()        recv()              │
    │     │               │               │
    │     ▼               ▼               │
    │ 添加到 epoll    查找 handler        │
    │ epoll_ctl()     调用 action()       │
    │                     │               │
    │                     ▼               │
    │                 send() 响应          │
    │                                     │
    └─────────────────────────────────────┘
        │
        ▼
    线程退出

4.3 从进程连接流程

cpp 复制代码

// 文件: lib/librte_eal/common/eal_common_mp.c

/*
 * 从进程连接到主进程
 */
int
rte_mp_channel_connect(void)
{
    struct sockaddr_un un;
    int fd, ret;
    
    // 1. 创建 socket
    fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
    if (fd < 0) {
        RTE_LOG(ERR, EAL, "Failed to create socket: %s\n", strerror(errno));
        return -1;
    }
    
    // 2. 设置服务端地址
    memset(&un, 0, sizeof(un));
    un.sun_family = AF_UNIX;
    snprintf(un.sun_path, sizeof(un.sun_path), "%s/mp_socket", mp_dir_path);
    
    // 3. 连接到主进程
    ret = connect(fd, (struct sockaddr *)&un, sizeof(un));
    if (ret < 0) {
        RTE_LOG(ERR, EAL, "Failed to connect to primary: %s\n", strerror(errno));
        close(fd);
        return -1;
    }
    
    // 4. 保存连接 fd
    mp_fd = fd;
    
    RTE_LOG(INFO, EAL, "Connected to primary process\n");
    return 0;
}

5. 通信协议详解

5.1 消息格式

cpp 复制代码

// 文件: lib/librte_eal/common/include/rte_mp.h

/*
 * MP 消息结构
 */
#define RTE_MP_MAX_NAME_LEN    64    // 消息名称最大长度
#define RTE_MP_MAX_PARAM_LEN   256   // 参数最大长度
#define RTE_MP_MAX_FD_NUM      8     // 最大文件描述符数量

struct rte_mp_msg {
    char name[RTE_MP_MAX_NAME_LEN];     // 消息名称/类型
    int len_param;                       // 参数长度
    uint8_t param[RTE_MP_MAX_PARAM_LEN]; // 参数数据
    int num_fds;                         // 文件描述符数量
    int fds[RTE_MP_MAX_FD_NUM];          // 文件描述符数组
};

/*
 * MP 响应结构
 */
struct rte_mp_reply {
    int num_received;              // 收到的响应数量
    struct rte_mp_msg *msgs;       // 响应消息数组
};

/*
 * 异步请求结构
 */
struct async_request_param {
    rte_mp_async_reply_t clb;      // 异步回调函数
    void *user_data;               // 用户数据
    int n_responses;               // 期望响应数量
    int timeout;                   // 超时时间
};

消息结构图解:

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MP 消息结构                                         │
└─────────────────────────────────────────────────────────────────────────────┘

    struct rte_mp_msg (总共约 336 字节)
    ┌─────────────────────────────────────────────────────────────────────┐
    │                                                                     │
    │   name[64]           消息名称/类型标识                              │
    │   ┌─────────────────────────────────────────────────────────────┐  │
    │   │ 'e', 't', 'h', 'd', 'e', 'v', '_', 'l', 'i', 'n', 'k', ... │  │
    │   └─────────────────────────────────────────────────────────────┘  │
    │                                                                     │
    │   len_param          参数长度                                       │
    │   ┌─────────────────────────────────────────────────────────────┐  │
    │   │  32 (bytes)                                                  │  │
    │   └─────────────────────────────────────────────────────────────┘  │
    │                                                                     │
    │   param[256]         参数数据                                       │
    │   ┌─────────────────────────────────────────────────────────────┐  │
    │   │  { "port_id": 0, "speed": 10000, "duplex": "full", ... }    │  │
    │   │  (可以是 JSON 字符串或二进制数据)                              │  │
    │   └─────────────────────────────────────────────────────────────┘  │
    │                                                                     │
    │   num_fds            文件描述符数量                                 │
    │   ┌─────────────────────────────────────────────────────────────┐  │
    │   │  2                                                           │  │
    │   └─────────────────────────────────────────────────────────────┘  │
    │                                                                     │
    │   fds[8]             文件描述符数组                                 │
    │   ┌─────────────────────────────────────────────────────────────┐  │
    │   │  fds[0] = 5  (共享内存 fd)                                   │  │
    │   │  fds[1] = 6  (event fd)                                      │  │
    │   │  fds[2..7] = -1 (未使用)                                     │  │
    │   └─────────────────────────────────────────────────────────────┘  │
    │                                                                     │
    └─────────────────────────────────────────────────────────────────────┘

5.2 消息类型

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         内置消息类型                                        │
└─────────────────────────────────────────────────────────────────────────────┘

1. ethdev 相关消息
   ────────────────────────────────────────────────────────────────────────
   
   名称                    功能                    参数
   ─────────────────────────────────────────────────────────────────────
   "ethdev_link_update"   链路状态更新            port_id, link_status
   "ethdev_rx_queue"      Rx 队列配置             port_id, queue_id
   "ethdev_tx_queue"      Tx 队列配置             port_id, queue_id


2. 内存相关消息
   ────────────────────────────────────────────────────────────────────────
   
   名称                    功能                    参数
   ─────────────────────────────────────────────────────────────────────
   "memseg_sync"          内存段同步              memseg_info
   "malloc_sync"          堆内存同步              malloc_info


3. 设备热插拔消息
   ────────────────────────────────────────────────────────────────────────
   
   名称                    功能                    参数
   ─────────────────────────────────────────────────────────────────────
   "device_hotplug"       设备热插拔通知          dev_name, action
   "device_remove"        设备移除请求            dev_name


4. 自定义消息
   ────────────────────────────────────────────────────────────────────────
   
   用户可注册自己的消息类型和处理器：
   
   rte_mp_action_register("my_action", my_handler);

6. 完整通信流程

6.1 请求 - 响应完整流程

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                    请求-响应完整流程示例                                     │
└─────────────────────────────────────────────────────────────────────────────┘

    场景: Secondary Process 查询网卡链路状态

    Secondary Process                              Primary Process
    ┌──────────────────────────┐                 ┌──────────────────────────┐
    │                          │                 │                          │
    │  1. 构造请求消息          │                 │                          │
    │  ┌────────────────────┐  │                 │                          │
    │  │ name = "ethdev_    │  │                 │                          │
    │  │        link_get"   │  │                 │                          │
    │  │ param = {          │  │                 │                          │
    │  │   "port_id": 0     │  │                 │                          │
    │  │ }                  │  │                 │                          │
    │  └────────────────────┘  │                 │                          │
    │                          │                 │                          │
    │  2. 发送请求              │                 │                          │
    │  rte_mp_request()        │─────────────────▶                          │
    │                          │                 │                          │
    │                          │                 │  3. epoll_wait() 返回    │
    │                          │                 │                          │
    │                          │                 │  4. recv() 接收消息      │
    │                          │                 │  ┌────────────────────┐  │
    │                          │                 │  │ msg.name = "ethdev_│  │
    │                          │                 │  │ _link_get"         │  │
    │                          │                 │  │ msg.param = {      │  │
    │                          │                 │  │   "port_id": 0     │  │
    │                          │                 │  │ }                  │  │
    │                          │                 │  └────────────────────┘  │
    │                          │                 │                          │
    │                          │                 │  5. 查找处理函数         │
    │                          │                 │  find_action(msg.name)   │
    │                          │                 │         │                │
    │                          │                 │         ▼                │
    │                          │                 │  ┌────────────────────┐  │
    │                          │                 │  │ action_handler()   │  │
    │                          │                 │  │ {                  │  │
    │                          │                 │  │   查询网卡状态      │  │
    │                          │                 │  │   构造响应          │  │
    │                          │                 │  │   return 0;        │  │
    │                          │                 │  │ }                  │  │
    │                          │                 │  └────────────────────┘  │
    │                          │                 │                          │
    │                          │                 │  6. 发送响应             │
    │                          │◀─────────────────│  send(reply)            │
    │                          │                 │                          │
    │  7. 接收响应              │                 │                          │
    │  ┌────────────────────┐  │                 │                          │
    │  │ reply.msgs[0] = {  │  │                 │                          │
    │  │   "status": "up",  │  │                 │                          │
    │  │   "speed": 10000,  │  │                 │                          │
    │  │   "duplex": "full" │  │                 │                          │
    │  │ }                  │  │                 │                          │
    │  └────────────────────┘  │                 │                          │
    │                          │                 │                          │
    │  8. 处理结果              │                 │                          │
    │  printf("Link is up\n"); │                 │                          │
    │                          │                 │                          │
    └──────────────────────────┘                 └──────────────────────────┘

6.2 API 使用示例

cpp 复制代码

/*
 * 示例 1: 同步请求-响应
 */
int query_link_status(uint16_t port_id)
{
    struct rte_mp_msg request;
    struct rte_mp_reply reply;
    int ret;
    
    // 构造请求消息
    memset(&request, 0, sizeof(request));
    strncpy(request.name, "ethdev_link_get", sizeof(request.name) - 1);
    
    // 设置参数 (使用 JSON 格式)
    snprintf((char *)request.param, sizeof(request.param),
             "{\"port_id\":%u}", port_id);
    request.len_param = strlen((char *)request.param) + 1;
    
    // 发送请求并等待响应
    ret = rte_mp_request(&request, &reply, 5000);  // 5秒超时
    if (ret < 0) {
        printf("Failed to send request: %s\n", strerror(errno));
        return -1;
    }
    
    // 解析响应
    if (reply.num_received > 0) {
        printf("Response: %s\n", reply.msgs[0].param);
    }
    
    // 释放资源
    free(reply.msgs);
    return 0;
}

/*
 * 示例 2: 注册自定义消息处理器
 */
static int
my_custom_handler(const struct rte_mp_msg *request,
                  struct rte_mp_reply *reply)
{
    printf("Received request: %s\n", request->name);
    printf("Param: %s\n", request->param);
    
    // 构造响应
    memset(reply, 0, sizeof(*reply));
    strncpy(reply->msgs[0].name, "my_custom_reply", sizeof(reply->msgs[0].name) - 1);
    snprintf((char *)reply->msgs[0].param, sizeof(reply->msgs[0].param),
             "{\"status\":\"ok\",\"code\":0}");
    
    return 0;
}

// 注册处理器
void register_my_handler(void)
{
    rte_mp_action_register("my_custom_action", my_custom_handler);
}

/*
 * 示例 3: 异步请求
 */
static void
async_reply_callback(const struct rte_mp_msg *request,
                     const struct rte_mp_reply *reply,
                     void *user_data)
{
    printf("Async response received!\n");
    printf("User data: %s\n", (char *)user_data);
    
    if (reply->num_received > 0) {
        printf("Response: %s\n", reply->msgs[0].param);
    }
}

int send_async_request(void)
{
    struct rte_mp_msg request;
    char *user_data = "My async request";
    
    memset(&request, 0, sizeof(request));
    strncpy(request.name, "my_custom_action", sizeof(request.name) - 1);
    
    // 发送异步请求
    return rte_mp_request_async(&request, 5000, async_reply_callback, user_data);
}

7. 关键数据结构

7.1 Action 注册表

cpp 复制代码

// 文件: lib/librte_eal/common/eal_common_mp.c

/*
 * 消息处理函数类型
 */
typedef int (*rte_mp_t)(const struct rte_mp_msg *request,
                        struct rte_mp_reply *reply);

/*
 * Action 注册表项
 */
struct action_entry {
    TAILQ_ENTRY(action_entry) next;   // 链表节点
    char name[RTE_MP_MAX_NAME_LEN];    // 消息名称
    rte_mp_t action;                   // 处理函数
};

/*
 * 全局 Action 注册表
 */
static TAILQ_HEAD(action_entry_list, action_entry) action_entry_list =
    TAILQ_HEAD_INITIALIZER(action_entry_list);

static pthread_mutex_t action_mutex = PTHREAD_MUTEX_INITIALIZER;

/*
 * 注册消息处理函数
 */
int
rte_mp_action_register(const char *name, rte_mp_t action)
{
    struct action_entry *entry;
    
    if (name == NULL || action == NULL) {
        RTE_LOG(ERR, EAL, "Invalid parameters\n");
        return -1;
    }
    
    // 分配新条目
    entry = malloc(sizeof(*entry));
    if (entry == NULL) {
        RTE_LOG(ERR, EAL, "Failed to allocate memory\n");
        return -1;
    }
    
    // 初始化
    strncpy(entry->name, name, sizeof(entry->name) - 1);
    entry->action = action;
    
    // 加锁插入链表
    pthread_mutex_lock(&action_mutex);
    TAILQ_INSERT_TAIL(&action_entry_list, entry, next);
    pthread_mutex_unlock(&action_mutex);
    
    RTE_LOG(DEBUG, EAL, "Registered action: %s\n", name);
    return 0;
}

/*
 * 查找消息处理函数
 */
static struct action_entry *
find_action(const char *name)
{
    struct action_entry *entry;
    
    pthread_mutex_lock(&action_mutex);
    TAILQ_FOREACH(entry, &action_entry_list, next) {
        if (strcmp(entry->name, name) == 0) {
            pthread_mutex_unlock(&action_mutex);
            return entry;
        }
    }
    pthread_mutex_unlock(&action_mutex);
    
    return NULL;
}

注册表结构图:

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Action 注册表结构                                    │
└─────────────────────────────────────────────────────────────────────────────┘

    action_entry_list (全局链表)
    │
    ▼
┌────────────────────┐     ┌────────────────────┐     ┌────────────────────┐
│ action_entry       │     │ action_entry       │     │ action_entry       │
│ ┌────────────────┐ │     │ ┌────────────────┐ │     │ ┌────────────────┐ │
│ │ name:          │ │     │ │ name:          │ │     │ │ name:          │ │
│ │ "ethdev_link"  │ │     │ │ "memseg_sync"  │ │     │ │ "my_action"    │ │
│ ├────────────────┤ │     │ ├────────────────┤ │     │ ├────────────────┤ │
│ │ action:        │ │     │ │ action:        │ │     │ │ action:        │ │
│ │ ethdev_handler │ │     │ │ memseg_handler │ │     │ │ my_handler     │ │
│ └────────────────┘ │     │ └────────────────┘ │     │ └────────────────┘ │
│         │          │     │         │          │     │         │          │
└─────────┼──────────┘     └─────────┼──────────┘     └─────────┼──────────┘
          │                          │                          │
          └──────────────────────────┴──────────────────────────┘
                                TAILQ 链表连接

7.2 同步请求状态

cpp 复制代码

// 文件: lib/librte_eal/common/eal_common_mp.c

/*
 * 同步请求的等待状态
 */
struct sync_request {
    TAILQ_ENTRY(sync_request) next;    // 链表节点
    
    pthread_cond_t cond;               // 条件变量
    pthread_mutex_t mutex;             // 互斥锁
    
    int reply_received;                // 响应是否收到
    struct rte_mp_msg reply;           // 响应消息
    
    int timeout;                       // 超时时间
    struct timespec end_time;          // 截止时间
};

/*
 * 全局同步请求列表
 */
static TAILQ_HEAD(sync_request_list, sync_request) sync_requests =
    TAILQ_HEAD_INITIALIZER(sync_requests);

static pthread_mutex_t sync_requests_mutex = PTHREAD_MUTEX_INITIALIZER;

/*
 * 发送同步请求
 */
int
rte_mp_request(struct rte_mp_msg *request,
               struct rte_mp_reply *reply,
               int timeout)
{
    struct sync_request sr;
    int ret;
    
    // 初始化同步请求
    memset(&sr, 0, sizeof(sr));
    pthread_cond_init(&sr.cond, NULL);
    pthread_mutex_init(&sr.mutex, NULL);
    sr.reply_received = 0;
    sr.timeout = timeout;
    
    // 计算截止时间
    clock_gettime(CLOCK_MONOTONIC, &sr.end_time);
    sr.end_time.tv_sec += timeout / 1000;
    sr.end_time.tv_nsec += (timeout % 1000) * 1000000;
    
    // 加入全局列表
    pthread_mutex_lock(&sync_requests_mutex);
    TAILQ_INSERT_TAIL(&sync_requests, &sr, next);
    pthread_mutex_unlock(&sync_requests_mutex);
    
    // 发送请求
    ret = send(mp_fd, request, sizeof(*request), 0);
    if (ret < 0) {
        // 发送失败，清理并返回
        pthread_mutex_lock(&sync_requests_mutex);
        TAILQ_REMOVE(&sync_requests, &sr, next);
        pthread_mutex_unlock(&sync_requests_mutex);
        return -1;
    }
    
    // 等待响应
    pthread_mutex_lock(&sr.mutex);
    while (!sr.reply_received) {
        ret = pthread_cond_timedwait(&sr.cond, &sr.mutex, &sr.end_time);
        if (ret == ETIMEDOUT) {
            // 超时
            pthread_mutex_unlock(&sr.mutex);
            pthread_mutex_lock(&sync_requests_mutex);
            TAILQ_REMOVE(&sync_requests, &sr, next);
            pthread_mutex_unlock(&sync_requests_mutex);
            return -1;
        }
    }
    pthread_mutex_unlock(&sr.mutex);
    
    // 复制响应
    memcpy(reply->msgs, &sr.reply, sizeof(sr.reply));
    reply->num_received = 1;
    
    // 清理
    pthread_mutex_lock(&sync_requests_mutex);
    TAILQ_REMOVE(&sync_requests, &sr, next);
    pthread_mutex_unlock(&sync_requests_mutex);
    
    pthread_cond_destroy(&sr.cond);
    pthread_mutex_destroy(&sr.mutex);
    
    return 0;
}

9. 最佳实践与注意事项

9.1 使用建议

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MP 通道最佳实践                                      │
└─────────────────────────────────────────────────────────────────────────────┘

1. 消息设计原则
   ────────────────────────────────────────────────────────────────────────
   
   • 使用 JSON 格式传递结构化数据 (可读性好，易于扩展)
   • 消息名称要有意义，便于调试
   • 参数长度控制在合理范围内 (< 256 bytes)
   • 文件描述符传递后要及时关闭原 fd


2. 错误处理
   ────────────────────────────────────────────────────────────────────────
   
   • 总是检查 rte_mp_request() 返回值
   • 设置合理的超时时间
   • 处理 EINTR 等中断情况
   • 记录详细日志


3. 性能优化
   ────────────────────────────────────────────────────────────────────────
   
   • 避免频繁的小消息
   • 使用异步请求减少阻塞
   • 批量处理多个请求


4. 调试技巧
   ────────────────────────────────────────────────────────────────────────
   
   • 设置环境变量 RTE_LOG_LEVEL=debug 查看详细日志
   • 使用 strace 跟踪 socket 通信
   • 检查 socket 文件: ls -la /var/run/.dpdk/

9.2 常见问题

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         常见问题与解决方案                                   │
└─────────────────────────────────────────────────────────────────────────────┘

Q1: MP 通道初始化失败
─────────────────────────────────────────────────────────────────────────────

    错误信息:
    EAL: failed to create unix socket
    EAL: FATAL: failed to init mp channel
    
    可能原因:
    1. 目录权限不足 (/var/run/.dpdk/)
    2. socket 文件已存在 (上次未正确清理)
    3. 符号冲突 (你的 epoll_create 覆盖了系统的)
    
    解决方案:
    $ rm -rf /var/run/.dpdk/
    $ mkdir -p /var/run/.dpdk/
    $ chmod 700 /var/run/.dpdk/


Q2: 请求超时
─────────────────────────────────────────────────────────────────────────────

    错误信息:
    Failed to send request: Timer expired
    
    可能原因:
    1. Primary 进程未运行
    2. Primary 进程繁忙，无法及时响应
    3. 网络延迟过高
    
    解决方案:
    1. 确认 Primary 进程状态
    2. 增加超时时间
    3. 使用异步请求


Q3: 进程间通信失败
─────────────────────────────────────────────────────────────────────────────

    错误信息:
    Failed to connect to primary process
    
    可能原因:
    1. Primary 进程未启动 MP 通道
    2. 使用了不同的 --file-prefix
    3. 权限问题
    
    解决方案:
    1. 确保 Primary 先启动
    2. 检查 --file-prefix 参数是否一致
    3. 检查 socket 文件权限

10. 总结

复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                              总结                                            │
└─────────────────────────────────────────────────────────────────────────────┘

1. MP 通道的本质
   ────────────────────────────────────────────────────────────────────────
   DPDK MP 通道是基于 Unix Domain Socket 的进程间通信框架，使用 epoll
   实现事件驱动的消息处理机制。


2. 核心组件
   ────────────────────────────────────────────────────────────────────────
   • Unix Domain Socket: 通信载体
   • epoll: 事件监听机制
   • Action 注册表: 消息分发机制
   • 同步/异步请求: 两种通信模式


3. 与用户态协议栈的关系
   ────────────────────────────────────────────────────────────────────────
   当你在用户态协议栈中实现自定义 epoll 时，必须使用不同的函数名
   (如 nty_epoll_xxx)，否则会覆盖系统调用，导致 DPDK 内部的
   epoll 操作失败。


4. 最佳实践
   ────────────────────────────────────────────────────────────────────────
   • 使用命名前缀避免符号冲突
   • 合理设置超时时间
   • 使用 JSON 格式传递参数
   • 记录详细日志便于调试

附录：关键源码文件路径

复制代码

DPDK 19.08 MP 通道相关源码:
─────────────────────────────────────────────────────────────────────────────

lib/librte_eal/
├── common/
│   ├── eal_common_mp.c              # MP 通道核心实现
│   ├── eal_common_proc.c            # 进程管理
│   ├── include/
│   │   ├── rte_eal.h               # EAL 公共 API
│   │   └── rte_mp.h                # MP 消息结构
│   └── eal_private.h               # EAL 内部头文件
│
├── linux/
│   └── eal/
│       ├── eal.c                   # EAL 初始化
│       ├── eal_interrupts.c        # 中断处理 (使用 epoll)
│       └── eal_mp.c                # Linux 特定 MP 实现
│
└── freebsd/
    └── eal/
        └── eal_mp.c                # FreeBSD 特定 MP 实现

0voice · GitHub

DPDK MP (Multi-Process) 通道深度解析

目录

1. 概述

1.1 什么是 MP 通道？

1.2 核心特性

2. 设计背景与动机

2.1 为什么需要 MP 通道？

2.2 设计目标

3. 核心架构

3.1 整体架构图

3.2 通信模型

3.3 目录结构

4. 源码深度分析

4.1 初始化流程

4.2 主进程监听线程

4.3 从进程连接流程

5. 通信协议详解

5.1 消息格式

5.2 消息类型

6. 完整通信流程

6.1 请求 - 响应完整流程

6.2 API 使用示例

7. 关键数据结构

7.1 Action 注册表

7.2 同步请求状态

9. 最佳实践与注意事项

9.1 使用建议

9.2 常见问题

10. 总结

附录：关键源码文件路径