VScode编译调试debug,gpu的cuda程序,Nsight

进行下面操作的前提是,我们的环境已经能跑简单的CUDA程序了。

一、安装Nsight

二、创建launch.json文件

bash 复制代码
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "${fileDirname}/main",
            "preLaunchTask": "mynvcc",
            "args": ["1024"]  // 示例:传递向量大小作为参数
        }
    ]
}

三、创建task.json文件

bash 复制代码
{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "mynvcc",
            "type": "shell",
            "command": "nvcc",
            "args": [
                "-g",
                "-G",
                "-o",
                "${fileDirname}/main",
                "${file}",
                "-I", "/usr/local/cuda/include",
                "-L", "/usr/local/cuda/lib64",
                "-l", "cudart",
                "-D_MWAITXINTRIN_H_INCLUDED"
            ],
            "group": {
                "kind": "build",
                "isDefault": true
            },
            "problemMatcher": ["$gcc"]
        }
    ]
}

四、创建main.cu

注意:名称一定是main.cu,和上面的json文件中的main对应。

cpp 复制代码
#include <cuda.h> 
#include <iostream> 
#include <vector> 
using namespace std;

// Add A and B vector on the GPU. Results stored into C
__global__
void addKernel(int n, float* A, float* B, float* C)
{
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if(i < n) C[i] = A[i] + B[i];
}

// Add A and B vector. Results stored into C
int add(int n, float* h_A, float* h_B, float* h_C)
{
  int size = n*sizeof(float);

  // Allocate memory on device and copy data
  float* d_A;
  cudaMalloc((void**)&d_A, size);
  cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

  float* d_B;
  cudaMalloc((void**)&d_B, size);
  cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

  float* d_C;
  cudaMalloc((void**)&d_C, size);

  // launch Kernel
  cout << "Running 256 threads on " << ceil(n/256.0f) << " blocks -> " << 256*ceil(n/256.0f) << endl;
  addKernel<<<ceil(n/256.0f),256>>>(n, d_A, d_B, d_C);

  // Transfer results back to host
  cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

  // Free device memory
  cudaFree(d_A);
  cudaFree(d_B);
  cudaFree(d_C);

  return 0;
}

// C = A + B on a GPU, where A is a vector of 1.0f and B a vector of 2.0f
// The main function takes one argument, the size of the vectors
int main(int argc, char* argv[])
{
  int n = atoi(argv[1]);

  vector<float> h_A(n, 1.0f);
  vector<float> h_B(n, 2.0f);
  vector<float> h_C(n);

  add(n, h_A.data(), h_B.data(), h_C.data());

  for(auto& c : h_C) {
    if(fabs(c-3.0f) > 0.00001f) {
      cout << "Error!" << endl;
      return 1;
    }
  }

  cout << "The program completed successfully" << endl;

  return 0;
}

五、编译main.cu

bash 复制代码
nvcc -g -G -o main main.cu

六、开始调试

进入main.cu文件,打上断点,按F5,开始debug调试。

按F5后可能会出现警告,点击 无论如何继续,能调试就行,先别管乱起八糟的。

相关推荐
萤萤七悬1 小时前
【AI帮玩游戏】一、搭建Claude+vscode环境,先看看异环ok-nte项目
人工智能·vscode·玩游戏
lifewange1 天前
如何在VScode中配置shell环境?
ide·vscode·编辑器
codingxb451 天前
VSCode中使用ClaudeCode接入Deepseek-v4模型
vscode·deepseek·claude code
阿凡达蘑菇灯1 天前
gemini助手 插件 强制设置在本地运行
vscode
Tisfy1 天前
VSCode Docker(Code Server)首次调试C++长时间下载debuginfo问题
c++·vscode·docker
Car121 天前
在vscode中添加一个tasks.json实现 rt thread的scons编译功能
vscode·json·build·scons
小墨宝2 天前
vscode自带内网穿透
ide·vscode·编辑器
蚂蚁不吃土&2 天前
Visual Studio 修改编码格式
vscode
日月新著2 天前
在VSCode中通过Copilot链接Figma直接生成完整产品
vscode·copilot·figma
量子炒饭大师2 天前
【2026全新 Claude Code + VScode + CCswitch + 接入deepseek-v4-pro 套餐】从环境配置到实战演练:Claude Code彻底开箱指南!
ide·vscode·编辑器·deepseek·claude code·cc-switch