2G大小的GPU对深度学习的加速效果如何?

训练数据情况

总共42776张224*224*3张图片

Found 42776 files belonging to 9 classes.

Using 12833 files for training.

模型参数情况

Total params: 10,917,385

Trainable params: 10,860,745

Non-trainable params: 56,640

batch-size:12

GPU信息

NVIDIA GeForce GT 730

驱动程序版本: 27.21.14.6133

驱动程序日期: 2021/1/19

DirectX 版本: 12 (FL 11.0)

物理位置: PCI 总线 1、设备 0、功能 0

利用率 11%

专用 GPU 内存 0.3/2.0 GB

共享 GPU 内存 0.0/31.9 GB

GPU 内存 0.4/33.9 GB

训练情况分析

完全使用CPU进行训练的时候,每次训练大约需要2750s。

python 复制代码
Epoch 1/65
2496/2496 [==============================] - 2937s 1s/step - loss: 0.4254 - accuracy: 0.8403 - val_loss: 0.3192 - val_accuracy: 0.8867
Epoch 2/65
2496/2496 [==============================] - 2756s 1s/step - loss: 0.2890 - accuracy: 0.8973 - val_loss: 0.4358 - val_accuracy: 0.8520
Epoch 3/65
2496/2496 [==============================] - 2737s 1s/step - loss: 0.2464 - accuracy: 0.9102 - val_loss: 0.2689 - val_accuracy: 0.9020

使用GPU加速进行训练的时候,每次训练的时间从2750s缩短到2100s左右,每次训练大约节省了650秒,效果也是比较明显的。

python 复制代码
Epoch 1/65
2023-10-04 10:38:26.686146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-04 10:38:27.343524: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8101
2023-10-04 10:38:28.439803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-04 10:38:29.088670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-04 10:38:31.502277: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 606.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:31.805129: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.084683: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 599.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.129001: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 620.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.738828: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 620.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:32.801711: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 592.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.034554: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 592.19MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.056645: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 599.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.099135: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 10:38:33.124441: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
1070/1070 [==============================] - 2120s 2s/step - loss: 0.5235 - accuracy: 0.8034 - val_loss: 0.5122 - val_accuracy: 0.8171
Epoch 2/65
1070/1070 [==============================] - 2060s 2s/step - loss: 0.3620 - accuracy: 0.8668 - val_loss: 2.9616 - val_accuracy: 0.4629
Epoch 3/65
 124/1070 [==>...........................] - ETA: 18:02 - loss: 0.3194 - accuracy: 0.8844
Process finished with exit code -1

使用更小的数据集效果分析

数据集

10249

Found 10249 files belonging to 16 classes.

Using 3075 files for training.

参数

Total params: 6,143,760

Trainable params: 6,113,168

Non-trainable params: 30,592

只使用CPU

python 复制代码
Epoch 1/65
684/684 [==============================] - 758s 1s/step - loss: 1.1408 - accuracy: 0.5963 - val_loss: 3.0769 - val_accuracy: 0.2738
Epoch 2/65
684/684 [==============================] - 744s 1s/step - loss: 0.7745 - accuracy: 0.7173 - val_loss: 1.0438 - val_accuracy: 0.6369
Epoch 3/65
684/684 [==============================] - 769s 1s/step - loss: 0.6504 - accuracy: 0.7602 - val_loss: 0.8624 - val_accuracy: 0.6964

使用GPU

小数据集速度节省了接近50%

python 复制代码
Epoch 1/65
2023-10-04 16:58:19.928226: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-04 16:58:20.236817: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8101
2023-10-04 16:58:20.791072: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-04 16:58:21.096985: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-04 16:58:23.704576: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 16:58:24.962633: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 16:58:24.987354: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.17GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
256/257 [============================>.] - ETA: 0s - loss: 1.3775 - accuracy: 0.51432023-10-04 17:01:47.489983: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:01:48.530265: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:01:48.550469: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
257/257 [==============================] - ETA: 0s - loss: 1.3776 - accuracy: 0.51452023-10-04 17:04:21.899587: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 626.56MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:04:23.391704: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2023-10-04 17:04:23.801583: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 615.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
257/257 [==============================] - 368s 1s/step - loss: 1.3776 - accuracy: 0.5145 - val_loss: 5.9301 - val_accuracy: 0.2432
Epoch 2/65
257/257 [==============================] - 376s 1s/step - loss: 1.0042 - accuracy: 0.6237 - val_loss: 1.0432 - val_accuracy: 0.6183
Epoch 3/65
相关推荐
Wilber的技术分享5 分钟前
【大模型实战笔记 1】Prompt-Tuning方法
人工智能·笔记·机器学习·大模型·llm·prompt
liliangcsdn8 分钟前
结合prompt分析NodeRAG的build过程
java·服务器·人工智能·数据分析·知识图谱
东方不败之鸭梨的测试笔记9 分钟前
LangChain: Models, Prompts 模型和提示词
人工智能·python·langchain
初恋叫萱萱10 分钟前
AI驱动开发实战:基于飞算JavaAI的在线考试系统设计与实现
人工智能
yzx99101316 分钟前
图像去雾:从暗通道先验到可学习融合——一份可跑的 PyTorch 教程
人工智能·pytorch·学习
博大世界26 分钟前
解剖智驾“大脑”:一文读懂自动驾驶系统软件架构
人工智能·机器学习·自动驾驶
大熊猫侯佩31 分钟前
苹果 AI 探秘:代号 “AFM” —— “温柔的反叛者”
人工智能·sft·ai 大模型·apple 本地大模型·foundationmodel·苹果智能·applebot
AI Echoes1 小时前
别再手工缝合API了!开源LLMOps神器LMForge,让你像搭积木一样玩转AI智能体!
人工智能·python·langchain·开源·agent
AI Echoes1 小时前
从零构建企业级LLMOps平台:LMForge——支持多模型、可视化编排、知识库与安全审核的全栈解决方案
人工智能·python·langchain·开源·agent
Coovally AI模型快速验证1 小时前
无人机小目标检测新SOTA:MASF-YOLO重磅开源,多模块协同助力精度飞跃
人工智能·yolo·目标检测·机器学习·计算机视觉·无人机