在ComfyUI的python_embeded下编译安装module

在ComfyUI中安装某些module时，并不能直接获得whl文件安装，而是拉取module项目包，并编译生成whl后再安装。例如 insightface、flash-attn、dlib、xformers等。

由于遇到一个问题，尝试需要安装的包没有找到可用的whl文件，于是再次有了这次的尝试，并成功编译安装了insightface。

（先按照1、2、3点配置好环境）

1、首先是要下载安装 Visual Studio 生成工具 2022：

后来发现，生成工具版本不是越高越好 ，在将生成工具升级到版本17.12后，发现想在python3.11编译flash_attn时不成功，查到这里有文章《CUDA compatibility with Visual Studio 2022 version 17.10》说：编译工具版本还需要与cuda版本匹配，而我的cuda12.4，按照文章所说，生成工具版本降到17.10比较稳妥。

只好卸载，从这里下载 Fixed version bootstrappers 安装固定版本：

参考文章《Windows 如何仅安装 MSVC 而不安装 Visual Studio》（好像除了下面设置有用外，没发现其他设置起什么作用，就是有与没有没什么区别）设置环境变量（打开设置--->系统--->关于--->高级系统设置--->环境变量）：

下面路径指向 cl.exe ，以令到错误 "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。"消失。

path=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\bin\Hostx64\x64;%path%

2、安装中提示：无法打开包括文件: "Python.h"

错误信息：无法打开包括文件: "Python.h":

复制代码

 "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -Iinsightface/thirdparty/face3d/mesh/cython -IH:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" /EHsc /Tpinsightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp /Fobuild\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.obj
      mesh_core_cython.cpp
      insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp(36): fatal error C1083: 无法打开包括文件: "Python.h": No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)

搜索过很多文章说如何安装python-dev包什么的，都不靠谱（未能解决我的问题），最后还是直接下载 python完整安装包（与ComfyUI的python_embeded相同版本，3.12安装大约345M），安装到某个路径下，并且安装时不勾选设置环境变量（为防止因为环境变量而出现影响comfyUI内嵌python的情况发生），并将文件夹include复制到H:\V.0.2.7\python_embeded\include。

无法打开包括文件: "Python.h" 问题解决。

3、无法打开文件"python312.lib"

同样，将完整安装包路径下的文件夹 libs 复制到H:\V.0.2.7\python_embeded\libs即可解决问题

复制代码

      H:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
      "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:H:\V.0.2.7\python_embeded\libs /LIBPATH:H:\V.0.2.7\python_embeded /LIBPATH:H:\V.0.2.7\python_embeded\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" /EXPORT:PyInit_mesh_core_cython build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core.obj build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.obj /OUT:build\lib.win-amd64-cpython-312\insightface\thirdparty\face3d\mesh\cython\mesh_core_cython.cp312-win_amd64.pyd /IMPLIB:build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython\mesh_core_cython.cp312-win_amd64.lib
      LINK : fatal error LNK1104: 无法打开文件"python312.lib"
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\link.exe' failed with exit code 1104
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)

H:\V.0.2.7>

4、ninja.exe

从 https://github.com/ninja-build/ninja 的自述文件可知到，我们只需要生成的exe即可：

从 https://github.com/ninja-build/ninja/releases 下载最新适用的windows版本解压得到ninja.exe，复制到 path 路径指向的任意某个文件夹即可。

设置好后，警告错误：warnings.warn(msg.format('we could not find ninja.')) 消失。

在我的系统中，下面2个路径都有：

复制代码

C:\Users\Monday\AppData\Local\Microsoft\WinGet\Packages\Ninja-build.Ninja_Microsoft.Winget.Source_8wekyb3d8bbwe

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja

第一个路径文件较小只有557k为最新版本 1.12.1，第二个路径文件较大有2.42M版本为1.11.0

5、编译 insightface 并安装成功：

运行也没有什么问题。

6、编译 flash-attn 并安装成功

网上看过文章，知道在Windows下编译时间很长，开始编译，经过漫长的等待（约3个小时左右吧），报如下错误：

复制代码

     tmpxft_000038e4_00000000-7_flash_fwd_hdim64_bf16_sm80.compute_90.cudafe1.cpp
      "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc" -c csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.cu -o build\temp.win-amd64-cpython-312\Release\csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.obj -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 --use-local-env
      flash_fwd_hdim64_fp16_causal_sm80.cu
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_CONVERSIONS__"(用"/U__CUDA_NO_HALF_CONVERSIONS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF2_OPERATORS__"(用"/U__CUDA_NO_HALF2_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_BFLOAT16_CONVERSIONS__"(用"/U__CUDA_NO_BFLOAT16_CONVERSIONS__")
      flash_fwd_hdim64_fp16_causal_sm80.cu
      c1xx: fatal error C1083: 无法打开源文件: "C:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src\flash_fwd_hdim64_fp16_causal_sm80.cu": No such file or directory

搜索到文章《win10下cuda12.1 +troch2.4.1+vs2022环境下编译安装flash-attn》中提到：安装cutlass库，该库为编译flash-attn的必须依赖。

于是执行命令安装nvidia-cutlass后再安装flash_attn：

复制代码

python_embeded\python.exe -m pip install nvidia-cutlass
python_embeded\python.exe -m pip install flash_attn

再次经过更漫长的等待（耗时4小时45分 ），终于编译成功并安装。网上也没找到flash-attn-2.7.0.post2的whl安装包，这次自己终于编译得到了。

在等待过程中，怕会失败，也继续搜索看看第一次失败的原因是什么，有文章《nvcc fatal : Could not open output file '/tmp/tmpxft_00003d04_00000000'》说是因为用户权限没法读写文件导致，不过既然成功了，看来我的错误就是因为没安装 nvidia-cutlass 。

重装编译环境（Visual Studio 生成工具 2022、CUDA Toolkit），然后在python3.11下再次编译，耗时2个小时 （这次看到CPU占用率100%，第一次时好像占用率不是很高，忘记了。）

7、编译安装开发版的xformers-0.0.29.dev940 失败

因为运行时遇到xformers报下面这个错误，才折腾尝试自己编译安装module：xformers-0.0.29.dev940。

复制代码

Loading PuLID-Flux model.
!!! Exception during processing !!! No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 577, 16, 64) (torch.bfloat16)
     key         : shape=(1, 577, 16, 64) (torch.bfloat16)
     value       : shape=(1, 577, 16, 64) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:
    bf16 is only supported on A100+ GPUs
Traceback (most recent call last):
  File "H:\V.0.2.7\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "H:\V.0.2.7\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\pulidflux.py", line 382, in apply_pulid_flux
    id_cond_vit, id_vit_hidden = eva_clip(face_features_image, return_all_features=False, return_hidden=True, shuffle=False)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 544, in forward
    x, hidden_states = self.forward_features(x, return_all_features, return_hidden, shuffle)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 531, in forward_features
    x = blk(x, rel_pos_bias=rel_pos_bias)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 293, in forward
    x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias, attn_mask=attn_mask))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 208, in forward
    x = xops.memory_efficient_attention(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 306, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 467, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 486, in _memory_efficient_attention_forward
    op = _dispatch_fw(inp, False)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 135, in _dispatch_fw
    return _run_priority_list(
           ^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 76, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 577, 16, 64) (torch.bfloat16)
     key         : shape=(1, 577, 16, 64) (torch.bfloat16)
     value       : shape=(1, 577, 16, 64) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:
    bf16 is only supported on A100+ GPUs

Prompt executed in 222.85 seconds

继续编译安装开发版的xformers-0.0.29.dev940，看看是否能解决我遇到运行中报错的问题。

报错：

复制代码

只要报错信息如下：

  Building wheel for xformers (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [5830 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      running bdist_wheel

      H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:497: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
省略 N 行

      H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
        warnings.warn(f'Error checking compiler version for {compiler}: {error}')

  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/85] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0
      FAILED: C:/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/build/temp.win-amd64-cpython-312/Release/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/third_party/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_bf16_causal_sm80.obj
      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0
      flash_bwd_hdim192_bf16_causal_sm80.cu
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_CONVERSIONS__"(用"/U__CUDA_NO_HALF_CONVERSIONS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF2_OPERATORS__"(用"/U__CUDA_NO_HALF2_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_BFLOAT16_CONVERSIONS__"(用"/U__CUDA_NO_BFLOAT16_CONVERSIONS__")
这里省略 N 多行
      flash_bwd_hdim192_bf16_causal_sm80.cu
      fatal   : Could not open output file C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d

这里省略 N 多个 fatal   : Could not open output file 错误

用秘塔AI搜索： cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")

环境变量设置问题：

③和⒂ 提到，在 Windows 系统上安装 CUDA 时，环境变量设置不正确可能导致编译失败。特别是 ③ 中提到的 vcvars64.bat 脚本，用于设置 Visual Studio 的环境变量。

解决方案 ：在安装 CUDA 之前，运行 vcvars64.bat 脚本来设置正确的环境变量。确保在命令提示符中运行该脚本，并且路径正确。
运行 C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build>vcvars64.bat

设置环境变量后，错误少了很多。

但根据错误再次设置了参数 set DISTUTILS_USE_SDK=1 后，错误依旧一样多（与原来一样）。

而文心一言的答案则是（看起来更靠谱？因为就是冲突，导致命令没有结果：没有文件生成）：

暂时不折腾这个了。

8、关于ModuleNotFoundError: No module named 'distutils.msvccompiler' 的问题 --> 编译安装APEX 成功

错误信息：ModuleNotFoundError: No module named 'distutils.msvccompiler'

复制代码

H:\V.0.2.7>python_embeded\python.exe -m pip install apex
Collecting apex
  Using cached apex-0.9.10dev.tar.gz (36 kB)
  Preparing metadata (setup.py) ... done
Collecting cryptacular (from apex)
  Using cached cryptacular-1.6.2.tar.gz (75 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 2
  ╰─> [4 lines of output]
      scons: Reading SConscript files ...
      ModuleNotFoundError: No module named 'distutils.msvccompiler':
        File "C:\Users\Monday\AppData\Local\Temp\pip-install-whj_jyaw\cryptacular_4162e6ad50164a3baf1cd0472e6f84c1\SConstruct", line 21:
          import distutils.msvccompiler
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

H:\V.0.2.7>

查了多篇文章未能解决ModuleNotFoundError: No module named 'distutils.msvccompiler'，官网文章：https://docs.python.org/3.10/library/distutils.html 说 distutils is deprecated with removal planned for Python 3.12.

还是到 GitHub https://github.com/NVIDIA/apex/issues/1852 找到答案并编译成功，原来不是所有的 module 都能简单用 "pip module名" 命令进行编译安装。

复制代码

运行下面代码编译安装
H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
H:\V.0.2.7>cd apex
H:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .

H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
Cloning into 'apex'...
remote: Enumerating objects: 11902, done.
remote: Counting objects: 100% (3970/3970), done.
remote: Compressing objects: 100% (759/759), done.
remote: Total 11902 (delta 3492), reused 3413 (delta 3205), pack-reused 7932 (from 1)
Receiving objects: 100% (11902/11902), 15.61 MiB | 4.25 MiB/s, done.
Resolving deltas: 100% (8321/8321), done.
Updating files: 100% (505/505), done.

H:\V.0.2.7>cd apex

H:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .
Using pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)
Processing h:\v.0.2.7\apex
  Running command pip subprocess to install build dependencies
  Using pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)
  Collecting setuptools
    Obtaining dependency information for setuptools from https://files.pythonhosted.org/packages/55/21/47d163f615df1d30c094f6c8bbb353619274edccf0327b185cc2493c2c33/setuptools-75.6.0-py3-none-any.whl.metadata
    Using cached setuptools-75.6.0-py3-none-any.whl.metadata (6.7 kB)
  Collecting wheel
    Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
    Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
  Using cached setuptools-75.6.0-py3-none-any.whl (1.2 MB)
  Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
  Installing collected packages: wheel, setuptools
    Creating C:\Users\Monday\AppData\Local\Temp\pip-build-env-da54lkae\overlay\Scripts
  Successfully installed setuptools-75.6.0 wheel-0.45.1
  Installing build dependencies ... done
  Running command Getting requirements to build wheel


  torch.__version__  = 2.5.1+cu124


  running egg_info
  creating apex.egg-info
  writing apex.egg-info\PKG-INFO

。。。。。。。。省略 N 行。。。。
  removing build\bdist.win-amd64\wheel
  Building wheel for apex (pyproject.toml) ... done
  Created wheel for apex: filename=apex-0.1-py3-none-any.whl size=406607 sha256=206aca315212aa0a76b14de395b6afe1ecdcd4c5fdd61b57986dabb509e83121
  Stored in directory: C:\Users\Monday\AppData\Local\Temp\pip-ephem-wheel-cache-zwa4z7gq\wheels\65\c7\12\b7e49ba4abd3da74df298dc51ea0f6a086d496566f4310f620
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1

H:\V.0.2.7\apex>

..\python_embeded\python.exe -m pip install -v --disable-pip-version-check --no-cache-dir  --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" .

以后遇到问题，还是要多看官网相关说明，可以少走很多弯路啊。

9、编译后获得的文件名，不能修改

例如编译后得到的文件为：insightface-0.7.3-cp312-cp312-win_amd64.whl

如果你将文件重命名为：insightface-0.7.3.whl

执行安装时会报错：ERROR: insightface-0.7.3.whl is not a valid wheel filename.

NotImplementedError：找不到 Memory_efficient_attention_forward 的运算符 - stable-diffusion - SO中文参考 - www.soinside.com