使用HIP编写GPU 算子向量加法

HIP (Heterogeneous-compute Interface for Portability) 来编写一个 GPU 算子(operator)。HIP 是 AMD 推出的 GPU 编程接口,类似 CUDA,但可在 AMD 和 NVIDIA GPU 上运行。下面我给你一个完整示例,演示如何写一个简单算子,比如 向量加法(Vector Add),并用 HIP 编译运行。


实践

1️⃣ 基本 HIP 算子示例:向量加法

cpp 复制代码
#include <hip/hip_runtime.h>
#include <iostream>

#define N 1024  // 向量长度
#define THREADS_PER_BLOCK 256

// HIP kernel: 向量加法
__global__ void vector_add(const float* A, const float* B, float* C, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        C[idx] = A[idx] + B[idx];
    }
}

int main() {
    // 主机向量
    float *h_A = new float[N];
    float *h_B = new float[N];
    float *h_C = new float[N];

    // 初始化向量
    for (int i = 0; i < N; i++) {
        h_A[i] = i * 1.0f;
        h_B[i] = (N - i) * 1.0f;
    }

    // 设备向量
    float *d_A, *d_B, *d_C;
    hipMalloc(&d_A, N * sizeof(float));
    hipMalloc(&d_B, N * sizeof(float));
    hipMalloc(&d_C, N * sizeof(float));

    // 拷贝数据到设备
    hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
    hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);

    // 计算线程块和网格大小
    int blocks = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK;

    // 启动 kernel
    hipLaunchKernelGGL(vector_add, dim3(blocks), dim3(THREADS_PER_BLOCK), 0, 0, d_A, d_B, d_C, N);

    // 拷贝结果回主机
    hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);

    // 验证结果
    for (int i = 0; i < 10; i++) {  // 打印前 10 个元素
        std::cout << h_A[i] << " + " << h_B[i] << " = " << h_C[i] << std::endl;
    }

    // 释放内存
    delete[] h_A; delete[] h_B; delete[] h_C;
    hipFree(d_A); hipFree(d_B); hipFree(d_C);

    return 0;
}

2️⃣ 编译 & 运行

bash 复制代码
# 编译
hipcc vector_add.cpp -o vector_add

# 运行
./vector_add

输出前 10 个元素类似:

复制代码
0 + 1024 = 1024
1 + 1023 = 1024
2 + 1022 = 1024
...

3️⃣ 说明

  1. __global__ 修饰函数表示这是一个 GPU kernel。

  2. hipMalloc / hipFree 对应 CUDA 的 cudaMalloc / cudaFree

  3. hipMemcpy 负责主机和设备间数据拷贝。

  4. hipLaunchKernelGGL 用来启动 kernel:

    cpp 复制代码
    hipLaunchKernelGGL(kernel_name, dim3(blocks), dim3(threads_per_block), shared_mem_bytes, stream, args...)
  5. HIP 的算子逻辑和 CUDA 几乎一致,只要把 cudaXXX 改成 hipXXX 就能跑在 AMD GPU。


运行时报错

复制代码
/workspace/models/liuysh# hipcc vector_add.cpp -o vector_add
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for gfx906.
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for gfx926.
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for gfx928.
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for gfx936.
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for gfx938.
vector_add.cpp:29:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   29 |     hipMalloc(&d_A, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:30:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   30 |     hipMalloc(&d_B, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:31:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   31 |     hipMalloc(&d_C, N * sizeof(float));
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:34:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   34 |     hipMemcpy(d_A, h_A, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:35:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   35 |     hipMemcpy(d_B, h_B, N * sizeof(float), hipMemcpyHostToDevice);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:44:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   44 |     hipMemcpy(h_C, d_C, N * sizeof(float), hipMemcpyDeviceToHost);
      |     ^~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vector_add.cpp:53:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |     ^~~~~~~ ~~~
vector_add.cpp:53:19: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                   ^~~~~~~ ~~~
vector_add.cpp:53:33: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
   53 |     hipFree(d_A); hipFree(d_B); hipFree(d_C);
      |                                 ^~~~~~~ ~~~
9 warnings generated when compiling for host.
root@f10r1n06:/workspace/models/liuysh# ./vector_add
0 + 1024 = 1024
1 + 1023 = 1024
2 + 1022 = 1024
3 + 1021 = 1024
4 + 1020 = 1024
5 + 1019 = 1024
6 + 1018 = 1024
7 + 1017 = 1024
8 + 1016 = 1024
9 + 1015 = 1024
Warning: Resource leak detected by SharedSignalPool, 68 Signals leaked.

根据编译和运行日志来看,遇到的"报错"主要是 编译时的警告运行时的资源泄漏提示


1️⃣ 编译时警告

日志里反复出现:

复制代码
warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
  • 出现在 hipMallochipMemcpyhipFree 这几行。
  • 原因:从 HIP 5.x 版本开始,这些函数都加了 [[nodiscard]] 属性,意思是 返回值不能忽略,因为返回值中可能包含错误码。
  • 你现在的代码写法直接调用了函数,没有检查返回值,所以会产生警告。

示例

cpp 复制代码
hipError_t err = hipMalloc(&d_A, N * sizeof(float));
if (err != hipSuccess) {
    std::cerr << "hipMalloc failed: " << hipGetErrorString(err) << std::endl;
}

✅ 建议:编译通过,不会阻止程序运行,但在生产环境中最好检查返回值。


2️⃣ 运行时提示

复制代码
Warning: Resource leak detected by SharedSignalPool, 68 Signals leaked.
  • 这是 HIP / ROCm 在运行 kernel 后检测到 GPU 资源没有完全释放。

  • 可能原因:

    1. kernel 启动后没有同步 (hipDeviceSynchronize())。
    2. 某些 GPU 信号或流被泄漏(在你的简单示例里可能是内部 runtime 的管理问题)。
  • 解决方法:

    • 在 kernel 调用后加同步,确保所有操作完成:
cpp 复制代码
hipLaunchKernelGGL(vector_add, dim3(blocks), dim3(THREADS_PER_BLOCK), 0, 0, d_A, d_B, d_C, N);
hipDeviceSynchronize(); // 等待 kernel 执行完成
  • 对于资源释放问题,保证 hipFree 被调用即可。你原来的代码已经调用了 hipFree(d_A)hipFree(d_B)hipFree(d_C),所以主要是 runtime 内部 signal 还未完全回收。加同步通常可以消除警告。

✅ 总结

  1. 编译警告nodiscard 警告,可以通过检查返回值消除。
  2. 运行时警告 :SharedSignalPool 资源泄漏,一般加 hipDeviceSynchronize() 后消失。
  3. 程序逻辑:你的向量加法结果是正确的,所以算子逻辑没有问题。
相关推荐
DeeplyMind6 个月前
ROCm GPU间 P2P 能力确定机制分析
p2p·hip·rocm
青禾子的夏2 年前
Rocm-Programming with HIP 内存分配
hip·rocm·异构计算