在ComfyUI的python_embeded下编译安装module

在ComfyUI中安装某些module时,并不能直接获得whl文件安装,而是拉取module项目包,并编译生成whl后再安装。例如 insightface、flash-attn、dlib、xformers等。

由于遇到一个问题,尝试需要安装的包没有找到可用的whl文件,于是再次有了这次的尝试,并成功编译安装了insightface。

(先按照1、2、3点配置好环境)

1、首先是要下载安装 Visual Studio 生成工具 2022

后来发现,生成工具版本不是越高越好 ,在将生成工具升级到版本17.12后,发现想在python3.11编译flash_attn时不成功, 查到这里有文章《CUDA compatibility with Visual Studio 2022 version 17.10》说:编译工具版本还需要与cuda版本匹配,而我的cuda12.4,按照文章所说,生成工具版本降到17.10比较稳妥。

只好卸载,从这里下载 Fixed version bootstrappers 安装固定版本:

参考文章《Windows 如何仅安装 MSVC 而不安装 Visual Studio》(好像除了下面设置有用外,没发现其他设置起什么作用,就是有与没有没什么区别)设置环境变量(打开设置--->系统--->关于--->高级系统设置--->环境变量):

下面路径指向 cl.exe ,以令到 错误 "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。"消失

path=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\bin\Hostx64\x64;%path%

2、安装中提示:无法打开包括文件: "Python.h"

错误信息:无法打开包括文件: "Python.h":

 "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -Iinsightface/thirdparty/face3d/mesh/cython -IH:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" /EHsc /Tpinsightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp /Fobuild\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.obj
      mesh_core_cython.cpp
      insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp(36): fatal error C1083: 无法打开包括文件: "Python.h": No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)

搜索过很多文章说如何安装python-dev包什么的,都不靠谱(未能解决我的问题),最后还是直接下载 python完整安装包(与ComfyUI的python_embeded相同版本,3.12安装大约345M),安装到某个路径下,并且安装时不勾选设置环境变量(为防止因为环境变量而出现影响comfyUI内嵌python的情况发生),并将文件夹include复制到H:\V.0.2.7\python_embeded\include。

无法打开包括文件: "Python.h" 问题解决。

3、无法打开文件"python312.lib"

同样,将完整安装包路径下的文件夹 libs 复制到H:\V.0.2.7\python_embeded\libs即可解决问题

      H:\V.0.2.7\python_embeded\Lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
      "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\bin\HostX86\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:H:\V.0.2.7\python_embeded\libs /LIBPATH:H:\V.0.2.7\python_embeded /LIBPATH:H:\V.0.2.7\python_embeded\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" /EXPORT:PyInit_mesh_core_cython build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core.obj build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython/mesh_core_cython.obj /OUT:build\lib.win-amd64-cpython-312\insightface\thirdparty\face3d\mesh\cython\mesh_core_cython.cp312-win_amd64.pyd /IMPLIB:build\temp.win-amd64-cpython-312\Release\insightface/thirdparty/face3d/mesh/cython\mesh_core_cython.cp312-win_amd64.lib
      LINK : fatal error LNK1104: 无法打开文件"python312.lib"
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.41.34120\\bin\\HostX86\\x64\\link.exe' failed with exit code 1104
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for insightface
Failed to build insightface
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (insightface)

H:\V.0.2.7>

4、ninja.exe

https://github.com/ninja-build/ninja 的自述文件可知到,我们只需要生成的exe即可:

https://github.com/ninja-build/ninja/releases 下载最新适用的windows版本解压得到ninja.exe,复制到 path 路径指向的任意某个文件夹即可。

设置好后, 警告错误:warnings.warn(msg.format('we could not find ninja.')) 消失。

在我的系统中,下面2个路径都有:

C:\Users\Monday\AppData\Local\Microsoft\WinGet\Packages\Ninja-build.Ninja_Microsoft.Winget.Source_8wekyb3d8bbwe

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja

第一个路径文件较小只有557k为最新版本 1.12.1,第二个路径文件较大有2.42M版本为1.11.0

5、编译 insightface 并安装成功:

运行也没有什么问题。

6、编译 flash-attn 并安装成功

网上看过文章,知道在Windows下编译 时间很长,开始编译,经过漫长的等待(约3个小时左右吧),报如下错误:

     tmpxft_000038e4_00000000-7_flash_fwd_hdim64_bf16_sm80.compute_90.cudafe1.cpp
      "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc" -c csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.cu -o build\temp.win-amd64-cpython-312\Release\csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.obj -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.41.34120\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 --use-local-env
      flash_fwd_hdim64_fp16_causal_sm80.cu
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_CONVERSIONS__"(用"/U__CUDA_NO_HALF_CONVERSIONS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF2_OPERATORS__"(用"/U__CUDA_NO_HALF2_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_BFLOAT16_CONVERSIONS__"(用"/U__CUDA_NO_BFLOAT16_CONVERSIONS__")
      flash_fwd_hdim64_fp16_causal_sm80.cu
      c1xx: fatal error C1083: 无法打开源文件: "C:\Users\Monday\AppData\Local\Temp\pip-install-vpjwb98z\flash-attn_71d6532d3ab546e1bb76dd71119a8066\csrc\flash_attn\src\flash_fwd_hdim64_fp16_causal_sm80.cu": No such file or directory

搜索到文章《win10下cuda12.1 +troch2.4.1+vs2022环境下编译安装flash-attn》中提到:安装cutlass库,该库为编译flash-attn的必须依赖。

于是执行命令安装nvidia-cutlass后再安装flash_attn:

python_embeded\python.exe -m pip install nvidia-cutlass
python_embeded\python.exe -m pip install flash_attn

再次经过更漫长的等待(耗时4小时45分 ),终于编译成功并安装。网上也没找到flash-attn-2.7.0.post2的whl安装包,这次自己终于编译得到了。

在等待过程中,怕会失败,也继续搜索看看第一次失败的原因是什么,有文章《nvcc fatal : Could not open output file '/tmp/tmpxft_00003d04_00000000'》说是因为用户权限没法读写文件导致,不过既然成功了,看来我的错误就是因为没安装 nvidia-cutlass 。

重装编译环境(Visual Studio 生成工具 2022、CUDA Toolkit),然后在python3.11下再次编译,耗时2个小时 (这次看到CPU占用率100%,第一次时好像占用率不是很高,忘记了。)

7、编译安装开发版的xformers-0.0.29.dev940 失败

因为运行时遇到xformers报下面这个错误,才折腾尝试自己编译安装module:xformers-0.0.29.dev940。

Loading PuLID-Flux model.
!!! Exception during processing !!! No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 577, 16, 64) (torch.bfloat16)
     key         : shape=(1, 577, 16, 64) (torch.bfloat16)
     value       : shape=(1, 577, 16, 64) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:
    bf16 is only supported on A100+ GPUs
Traceback (most recent call last):
  File "H:\V.0.2.7\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "H:\V.0.2.7\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\pulidflux.py", line 382, in apply_pulid_flux
    id_cond_vit, id_vit_hidden = eva_clip(face_features_image, return_all_features=False, return_hidden=True, shuffle=False)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 544, in forward
    x, hidden_states = self.forward_features(x, return_all_features, return_hidden, shuffle)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 531, in forward_features
    x = blk(x, rel_pos_bias=rel_pos_bias)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 293, in forward
    x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias, attn_mask=attn_mask))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\ComfyUI\custom_nodes\ComfyUI-PuLID-Flux-Enhanced\eva_clip\eva_vit_model.py", line 208, in forward
    x = xops.memory_efficient_attention(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 306, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 467, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\__init__.py", line 486, in _memory_efficient_attention_forward
    op = _dispatch_fw(inp, False)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 135, in _dispatch_fw
    return _run_priority_list(
           ^^^^^^^^^^^^^^^^^^^
  File "H:\V.0.2.7\python_embeded\Lib\site-packages\xformers\ops\fmha\dispatch.py", line 76, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 577, 16, 64) (torch.bfloat16)
     key         : shape=(1, 577, 16, 64) (torch.bfloat16)
     value       : shape=(1, 577, 16, 64) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`fa2F@v2.6.3-24-gbdf733b` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`cutlassF-pt` is not supported because:
    bf16 is only supported on A100+ GPUs

Prompt executed in 222.85 seconds

继续编译安装开发版的xformers-0.0.29.dev940,看看是否能解决我遇到运行中报错的问题。

报错:

只要报错信息如下:

  Building wheel for xformers (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [5830 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      running bdist_wheel

      H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:497: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
省略 N 行

      H:\V.0.2.7\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py:382: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
        warnings.warn(f'Error checking compiler version for {compiler}: {error}')

  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/85] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0
      FAILED: C:/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/build/temp.win-amd64-cpython-312/Release/Users/Monday/AppData/Local/Temp/pip-install-h5wrwrf7/xformers_385ac3dddc8a4e779d876f9cbb34ec19/third_party/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_bf16_causal_sm80.obj
      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src -IC:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\cutlass\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\torch\csrc\api\include -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\TH -IH:\V.0.2.7\python_embeded\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include" -IH:\V.0.2.7\python_embeded\include -IH:\V.0.2.7\python_embeded\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.cu -o C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -Xcompiler /Zc:lambda -Xcompiler /Zc:preprocessor -Xcompiler /Zc:__cplusplus -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_90,code=sm_90 -DFLASHATTENTION_DISABLE_ALIBI --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C_flashattention -D_GLIBCXX_USE_CXX11_ABI=0
      flash_bwd_hdim192_bf16_causal_sm80.cu
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_CONVERSIONS__"(用"/U__CUDA_NO_HALF_CONVERSIONS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF2_OPERATORS__"(用"/U__CUDA_NO_HALF2_OPERATORS__")
      cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_BFLOAT16_CONVERSIONS__"(用"/U__CUDA_NO_BFLOAT16_CONVERSIONS__")
这里省略 N 多行
      flash_bwd_hdim192_bf16_causal_sm80.cu
      fatal   : Could not open output file C:\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\build\temp.win-amd64-cpython-312\Release\Users\Monday\AppData\Local\Temp\pip-install-h5wrwrf7\xformers_385ac3dddc8a4e779d876f9cbb34ec19\third_party\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_causal_sm80.obj.d

这里省略 N 多个 fatal   : Could not open output file 错误

秘塔AI搜索 : cl: 命令行 warning D9025 :正在重写"/D__CUDA_NO_HALF_OPERATORS__"(用"/U__CUDA_NO_HALF_OPERATORS__")

  1. 环境变量设置问题

    • 提到,在 Windows 系统上安装 CUDA 时,环境变量设置不正确可能导致编译失败。特别是 中提到的 vcvars64.bat 脚本,用于设置 Visual Studio 的环境变量。
    • 解决方案 :在安装 CUDA 之前,运行 vcvars64.bat 脚本来设置正确的环境变量。确保在命令提示符中运行该脚本,并且路径正确。
      运行 C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build>vcvars64.bat

设置环境变量后,错误少了很多。

但根据错误再次设置了参数 set DISTUTILS_USE_SDK=1 后,错误依旧一样多(与原来一样)。

而文心一言的答案则是(看起来更靠谱?因为就是冲突,导致命令没有结果:没有文件生成):

暂时不折腾这个了。

8、关于ModuleNotFoundError: No module named 'distutils.msvccompiler' 的问题 --> 编译安装APEX 成功

错误信息:ModuleNotFoundError: No module named 'distutils.msvccompiler'

H:\V.0.2.7>python_embeded\python.exe -m pip install apex
Collecting apex
  Using cached apex-0.9.10dev.tar.gz (36 kB)
  Preparing metadata (setup.py) ... done
Collecting cryptacular (from apex)
  Using cached cryptacular-1.6.2.tar.gz (75 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 2
  ╰─> [4 lines of output]
      scons: Reading SConscript files ...
      ModuleNotFoundError: No module named 'distutils.msvccompiler':
        File "C:\Users\Monday\AppData\Local\Temp\pip-install-whj_jyaw\cryptacular_4162e6ad50164a3baf1cd0472e6f84c1\SConstruct", line 21:
          import distutils.msvccompiler
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

H:\V.0.2.7>

查了多篇文章未能解决ModuleNotFoundError: No module named 'distutils.msvccompiler',官网文章:https://docs.python.org/3.10/library/distutils.html 说 distutils is deprecated with removal planned for Python 3.12.

还是到 GitHub https://github.com/NVIDIA/apex/issues/1852 找到答案并编译成功,原来不是所有的 module 都能简单用 "pip module名" 命令进行编译安装。

运行下面代码编译安装
H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
H:\V.0.2.7>cd apex
H:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .

H:\V.0.2.7>git clone https://github.com/NVIDIA/apex.git
Cloning into 'apex'...
remote: Enumerating objects: 11902, done.
remote: Counting objects: 100% (3970/3970), done.
remote: Compressing objects: 100% (759/759), done.
remote: Total 11902 (delta 3492), reused 3413 (delta 3205), pack-reused 7932 (from 1)
Receiving objects: 100% (11902/11902), 15.61 MiB | 4.25 MiB/s, done.
Resolving deltas: 100% (8321/8321), done.
Updating files: 100% (505/505), done.

H:\V.0.2.7>cd apex

H:\V.0.2.7\apex>..\python_embeded\python.exe -m pip install -v --no-cache-dir .
Using pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)
Processing h:\v.0.2.7\apex
  Running command pip subprocess to install build dependencies
  Using pip 24.3.1 from H:\V.0.2.7\python_embeded\Lib\site-packages\pip (python 3.12)
  Collecting setuptools
    Obtaining dependency information for setuptools from https://files.pythonhosted.org/packages/55/21/47d163f615df1d30c094f6c8bbb353619274edccf0327b185cc2493c2c33/setuptools-75.6.0-py3-none-any.whl.metadata
    Using cached setuptools-75.6.0-py3-none-any.whl.metadata (6.7 kB)
  Collecting wheel
    Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
    Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
  Using cached setuptools-75.6.0-py3-none-any.whl (1.2 MB)
  Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
  Installing collected packages: wheel, setuptools
    Creating C:\Users\Monday\AppData\Local\Temp\pip-build-env-da54lkae\overlay\Scripts
  Successfully installed setuptools-75.6.0 wheel-0.45.1
  Installing build dependencies ... done
  Running command Getting requirements to build wheel


  torch.__version__  = 2.5.1+cu124


  running egg_info
  creating apex.egg-info
  writing apex.egg-info\PKG-INFO

。。。。。。。。省略 N 行。。。。
  removing build\bdist.win-amd64\wheel
  Building wheel for apex (pyproject.toml) ... done
  Created wheel for apex: filename=apex-0.1-py3-none-any.whl size=406607 sha256=206aca315212aa0a76b14de395b6afe1ecdcd4c5fdd61b57986dabb509e83121
  Stored in directory: C:\Users\Monday\AppData\Local\Temp\pip-ephem-wheel-cache-zwa4z7gq\wheels\65\c7\12\b7e49ba4abd3da74df298dc51ea0f6a086d496566f4310f620
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1

H:\V.0.2.7\apex>

..\python_embeded\python.exe -m pip install -v --disable-pip-version-check --no-cache-dir  --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" .

以后遇到问题,还是要多看官网相关说明,可以少走很多弯路啊。

9、编译后获得的文件名,不能修改

例如编译后得到的文件为:insightface-0.7.3-cp312-cp312-win_amd64.whl

如果你将文件重命名为:insightface-0.7.3.whl

执行安装时会报错:ERROR: insightface-0.7.3.whl is not a valid wheel filename.

NotImplementedError:找不到 Memory_efficient_attention_forward 的运算符 - stable-diffusion - SO中文参考 - www.soinside.com

相关推荐
金色旭光1 小时前
docker-compose部署下Fastapi中使用sqlalchemy和Alembic
python
Spcarrydoinb1 小时前
python学习笔记—15—数据容器之列表
笔记·python·学习
Want5952 小时前
《Python趣味编程》专栏介绍与专栏目录
开发语言·python
qq19783663082 小时前
Python 批量生成Word 合同
开发语言·python·自动化·word
io_T_T2 小时前
python SQLAlchemy ORM——从零开始学习 02简单的增删查改
python
Q_19284999063 小时前
基于Django的农业管理系统
后端·python·django
sunxunyong3 小时前
pycharm-pyspark 环境安装
ide·python·pycharm
B站计算机毕业设计超人3 小时前
计算机毕业设计Python+Spark中药推荐系统 中药识别系统 中药数据分析 中药大数据 中药可视化 中药爬虫 中药大数据 大数据毕业设计 大
大数据·python·深度学习·机器学习·课程设计·数据可视化·推荐算法
测试杂货铺3 小时前
基于selenium和python的UI自动化测试方案
自动化测试·软件测试·python·selenium·测试工具·职场和发展·测试用例
zslefour4 小时前
apex安装
python·comfyui