【AudioClassificationModelZoo-Pytorch】基于Pytorch的声音事件检测分类系统

源码:https://github.com/Shybert-AI/AudioClassificationModelZoo-Pytorch







模型测试表

模型网络结构 batch_size FLOPs(G) Params(M) 特征提取方式 数据集 类别数量 模型验证集性能
EcapaTdnn 128 0.48 6.1 mel UrbanSound8K 10 accuracy=0.974, precision=0.972 recall=0.967, F1-score=0.967
PANNS(CNN6) 128 0.98 4.57 mel UrbanSound8K 10 accuracy=0.971, precision=0.963 recall=0.954, F1-score=0.955
TDNN 128 0.21 2.60 mel UrbanSound8K 10 accuracy=0.968, precision=0.964 recall=0.959, F1-score=0.958
PANNS(CNN14) 128 1.98 79.7 mel UrbanSound8K 10 accuracy=0.966, precision=0.956 recall=0.957, F1-score=0.952
PANNS(CNN10) 128 1.29 4.96 mel UrbanSound8K 10 accuracy=0.964, precision=0.955 recall=0.955, F1-score=0.95
DTFAT(MaxAST) 16 8.32 68.32 mel UrbanSound8K 10 accuracy=0.963, precision=0.939 recall=0.935, F1-score=0.933
EAT-M-Transformer 128 0.16 1.59 mel UrbanSound8K 10 accuracy=0.935, precision=0.905 recall=0.907, F1-score=0.9
AST 16 5.28 85.26 mel UrbanSound8K 10 accuracy=0.932, precision=0.893 recall=0.887, F1-score=0.884
TDNN_GRU_SE 256 0.26 3.02 mel UrbanSound8K 10 accuracy=0.929, precision=0.916 recall=0.907, F1-score=0.904
mn10_as 128 0.03 4.21 mel UrbanSound8K 10 accuracy=0.912, precision=0.88 recall=0.894, F1-score=0.878
dymn10_as 128 0.01 4.76 mel UrbanSound8K 10 accuracy=0.904, precision=0.886 recall=0.883, F1-score=0.872
ERes2NetV2 128 0.87 5.07 mel UrbanSound8K 10 accuracy=0.874, precision=0.828 recall=0.832, F1-score=0.818
ResNetSE_GRU 128 1.84 10.31 mel UrbanSound8K 10 accuracy=0.865, precision=0.824 recall=0.827, F1-score=0.813
ResNetSE 128 1.51 7.15 mel UrbanSound8K 10 accuracy=0.859, precision=0.82 recall=0.819, F1-score=0.807
CAMPPlus 128 0.47 7.30 mel UrbanSound8K 10 accuracy=0.842, precision=0.793 recall=0.788, F1-score=0.778
HTS-AT 16 5.70 27.59 mel UrbanSound8K 10 accuracy=0.84, precision=0.802 recall=0.796, F1-score=0.795
EffilecentNet_B2 128 -- 7.73 mel UrbanSound8K 10 accuracy=0.779, precision=0.718 recall=0.741, F1-score=0.712
ERes2Net 128 1.39 6.22 mel UrbanSound8K 10 accuracy=0.778, precision=0.808 recall=0.787, F1-score=0.779
Res2Net 128 0.04 5.09 mel UrbanSound8K 10 accuracy=0.723, precision=0.669 recall=0.672, F1-score=0.648
MobileNetV4 128 0.03 2.51 mel UrbanSound8K 10 accuracy=0.608, precision=0.553 recall=0.549, F1-score=0.523

说明:

使用的测试集为从数据集中每10条音频取一条,共874条。

5.准备数据

生成数据集的list,label_list.txt,train_list.txt,test_list.txt

执行create_data.py即可生成数据列表,里面提供了生成多种数据集列表方式,具体看代码。

shell 复制代码
python create_data.py

生成的列表是长这样的,前面是音频的路径,后面是该音频对应的标签,从0开始,路径和标签之间用\t隔开。

shell 复制代码
dataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav	4
dataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav	7
dataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav	5
dataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav	3

5.特征提取(可选,如果进行特征提取,训练耗时提升36倍),已提取的特征文件和已训练的模型文件下载。模型放到model目录下,特征放到features目录下。

链接: https://pan.baidu.com/s/15ziJovO3t41Nqgqtmovuew 提取码: 8a59

shell 复制代码
python extract_feature.py

6.训练,可以通过指定--model_type的参数来指定模型,进行模型训练。

如:EcapaTdnn、PANNS(CNN6)、TDNN、PANNS(CNN14)、PANNS(CNN10)、DTFAT(MaxAST)、EAT-M-Transformer、AST、TDNN_GRU_SE、mn10_as、dymn10_as、ERes2NetV2、ResNetSE_GRU、ResNetSE、CAMPPlus、HTS-AT、EffilecentNet_B2、ERes2Net、Res2Net、MobileNetV4

shell 复制代码
python train.py --model_type EAT-M-Transformer

在线提取特征训练的日志为:

commandline 复制代码
Epoch: 10
Train: 100%|██████████| 62/62 [07:28<00:00,  7.23s/it, BCELoss=0.931, accuracy=0.502, precision=0.563, recall=0.508, F1-score=0.505]
Valid: 100%|██████████| 14/14 [00:53<00:00,  3.82s/it, BCELoss=1.19, accuracy=0.425, precision=0.43, recall=0.393, F1-score=0.362]

Epoch: 11
Train: 100%|██████████| 62/62 [07:23<00:00,  7.16s/it, BCELoss=2.17, accuracy=0.377, precision=0.472, recall=0.386, F1-score=0.375]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=2.7, accuracy=0.362, precision=0.341, recall=0.328, F1-score=0.295]

Epoch: 12
Train: 100%|██████████| 62/62 [07:20<00:00,  7.11s/it, BCELoss=1.8, accuracy=0.297, precision=0.375, recall=0.308, F1-score=0.274]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=1.08, accuracy=0.287, precision=0.317, recall=0.285, F1-score=0.234]

离线提取特征训练的日志为:

commandline 复制代码
Epoch: 1
Train: 100%|██████████| 62/62 [00:12<00:00,  4.77it/s, BCELoss=8.25, accuracy=0.0935, precision=0.0982, recall=0.0878, F1-score=0.0741]
Valid: 100%|██████████| 14/14 [00:00<00:00, 29.53it/s, BCELoss=5.98, accuracy=0.142, precision=0.108, recall=0.129, F1-score=0.0909]
Model saved in the folder :  model
Model name is :  SAR_Pesudo_ResNetSE_s0_BCELoss

Epoch: 2
Train: 100%|██████████| 62/62 [00:12<00:00,  4.93it/s, BCELoss=7.71, accuracy=0.117, precision=0.144, recall=0.113, F1-score=0.0995]
Valid: 100%|██████████| 14/14 [00:00<00:00, 34.54it/s, BCELoss=8.15, accuracy=0.141, precision=0.0811, recall=0.133, F1-score=0.0785]

7.测试

测试采用流式测试的方式,即每次送入模型2秒的音频数据,将音频数据转为[1,1,64,100]维度的张量数据,然后送入到模型中进行推理,每次都很得到推理的结构,可以根据阈值来判断该事件是否发生。

shell 复制代码
python model_test.py --model_type EAT-M-Transformer
相关推荐
colorknight14 分钟前
数据编织-异构数据存储的自动化治理
数据仓库·人工智能·数据治理·数据湖·数据科学·数据编织·自动化治理
Lun3866buzha33 分钟前
篮球场景目标检测与定位_YOLO11-RFPN实现详解
人工智能·目标检测·计算机视觉
janefir38 分钟前
LangChain框架下DirectoryLoader使用报错zipfile.BadZipFile
人工智能·langchain
齐齐大魔王1 小时前
COCO 数据集
人工智能·机器学习
AI营销实验室2 小时前
原圈科技AI CRM系统赋能销售新未来,行业应用与创新点评
人工智能·科技
爱笑的眼睛112 小时前
超越MSE与交叉熵:深度解析损失函数的动态本质与高阶设计
java·人工智能·python·ai
tap.AI3 小时前
RAG系列(一) 架构基础与原理
人工智能·架构
北邮刘老师3 小时前
【智能体互联协议解析】北邮ACPs协议和代码与智能体互联AIP标准的关系
人工智能·大模型·智能体·智能体互联网
亚马逊云开发者3 小时前
使用Amazon Q Developer CLI快速构建市场分析智能体
人工智能
Coding茶水间3 小时前
基于深度学习的非机动车头盔检测系统演示与介绍(YOLOv12/v11/v8/v5模型+Pyqt5界面+训练代码+数据集)
图像处理·人工智能·深度学习·yolo·目标检测·机器学习·计算机视觉