【AudioClassificationModelZoo-Pytorch】基于Pytorch的声音事件检测分类系统

源码:https://github.com/Shybert-AI/AudioClassificationModelZoo-Pytorch







模型测试表

模型网络结构 batch_size FLOPs(G) Params(M) 特征提取方式 数据集 类别数量 模型验证集性能
EcapaTdnn 128 0.48 6.1 mel UrbanSound8K 10 accuracy=0.974, precision=0.972 recall=0.967, F1-score=0.967
PANNS(CNN6) 128 0.98 4.57 mel UrbanSound8K 10 accuracy=0.971, precision=0.963 recall=0.954, F1-score=0.955
TDNN 128 0.21 2.60 mel UrbanSound8K 10 accuracy=0.968, precision=0.964 recall=0.959, F1-score=0.958
PANNS(CNN14) 128 1.98 79.7 mel UrbanSound8K 10 accuracy=0.966, precision=0.956 recall=0.957, F1-score=0.952
PANNS(CNN10) 128 1.29 4.96 mel UrbanSound8K 10 accuracy=0.964, precision=0.955 recall=0.955, F1-score=0.95
DTFAT(MaxAST) 16 8.32 68.32 mel UrbanSound8K 10 accuracy=0.963, precision=0.939 recall=0.935, F1-score=0.933
EAT-M-Transformer 128 0.16 1.59 mel UrbanSound8K 10 accuracy=0.935, precision=0.905 recall=0.907, F1-score=0.9
AST 16 5.28 85.26 mel UrbanSound8K 10 accuracy=0.932, precision=0.893 recall=0.887, F1-score=0.884
TDNN_GRU_SE 256 0.26 3.02 mel UrbanSound8K 10 accuracy=0.929, precision=0.916 recall=0.907, F1-score=0.904
mn10_as 128 0.03 4.21 mel UrbanSound8K 10 accuracy=0.912, precision=0.88 recall=0.894, F1-score=0.878
dymn10_as 128 0.01 4.76 mel UrbanSound8K 10 accuracy=0.904, precision=0.886 recall=0.883, F1-score=0.872
ERes2NetV2 128 0.87 5.07 mel UrbanSound8K 10 accuracy=0.874, precision=0.828 recall=0.832, F1-score=0.818
ResNetSE_GRU 128 1.84 10.31 mel UrbanSound8K 10 accuracy=0.865, precision=0.824 recall=0.827, F1-score=0.813
ResNetSE 128 1.51 7.15 mel UrbanSound8K 10 accuracy=0.859, precision=0.82 recall=0.819, F1-score=0.807
CAMPPlus 128 0.47 7.30 mel UrbanSound8K 10 accuracy=0.842, precision=0.793 recall=0.788, F1-score=0.778
HTS-AT 16 5.70 27.59 mel UrbanSound8K 10 accuracy=0.84, precision=0.802 recall=0.796, F1-score=0.795
EffilecentNet_B2 128 -- 7.73 mel UrbanSound8K 10 accuracy=0.779, precision=0.718 recall=0.741, F1-score=0.712
ERes2Net 128 1.39 6.22 mel UrbanSound8K 10 accuracy=0.778, precision=0.808 recall=0.787, F1-score=0.779
Res2Net 128 0.04 5.09 mel UrbanSound8K 10 accuracy=0.723, precision=0.669 recall=0.672, F1-score=0.648
MobileNetV4 128 0.03 2.51 mel UrbanSound8K 10 accuracy=0.608, precision=0.553 recall=0.549, F1-score=0.523

说明:

使用的测试集为从数据集中每10条音频取一条,共874条。

5.准备数据

生成数据集的list,label_list.txt,train_list.txt,test_list.txt

执行create_data.py即可生成数据列表,里面提供了生成多种数据集列表方式,具体看代码。

shell 复制代码
python create_data.py

生成的列表是长这样的,前面是音频的路径,后面是该音频对应的标签,从0开始,路径和标签之间用\t隔开。

shell 复制代码
dataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav	4
dataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav	7
dataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav	5
dataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav	3

5.特征提取(可选,如果进行特征提取,训练耗时提升36倍),已提取的特征文件和已训练的模型文件下载。模型放到model目录下,特征放到features目录下。

链接: https://pan.baidu.com/s/15ziJovO3t41Nqgqtmovuew 提取码: 8a59

shell 复制代码
python extract_feature.py

6.训练,可以通过指定--model_type的参数来指定模型,进行模型训练。

如:EcapaTdnn、PANNS(CNN6)、TDNN、PANNS(CNN14)、PANNS(CNN10)、DTFAT(MaxAST)、EAT-M-Transformer、AST、TDNN_GRU_SE、mn10_as、dymn10_as、ERes2NetV2、ResNetSE_GRU、ResNetSE、CAMPPlus、HTS-AT、EffilecentNet_B2、ERes2Net、Res2Net、MobileNetV4

shell 复制代码
python train.py --model_type EAT-M-Transformer

在线提取特征训练的日志为:

commandline 复制代码
Epoch: 10
Train: 100%|██████████| 62/62 [07:28<00:00,  7.23s/it, BCELoss=0.931, accuracy=0.502, precision=0.563, recall=0.508, F1-score=0.505]
Valid: 100%|██████████| 14/14 [00:53<00:00,  3.82s/it, BCELoss=1.19, accuracy=0.425, precision=0.43, recall=0.393, F1-score=0.362]

Epoch: 11
Train: 100%|██████████| 62/62 [07:23<00:00,  7.16s/it, BCELoss=2.17, accuracy=0.377, precision=0.472, recall=0.386, F1-score=0.375]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=2.7, accuracy=0.362, precision=0.341, recall=0.328, F1-score=0.295]

Epoch: 12
Train: 100%|██████████| 62/62 [07:20<00:00,  7.11s/it, BCELoss=1.8, accuracy=0.297, precision=0.375, recall=0.308, F1-score=0.274]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=1.08, accuracy=0.287, precision=0.317, recall=0.285, F1-score=0.234]

离线提取特征训练的日志为:

commandline 复制代码
Epoch: 1
Train: 100%|██████████| 62/62 [00:12<00:00,  4.77it/s, BCELoss=8.25, accuracy=0.0935, precision=0.0982, recall=0.0878, F1-score=0.0741]
Valid: 100%|██████████| 14/14 [00:00<00:00, 29.53it/s, BCELoss=5.98, accuracy=0.142, precision=0.108, recall=0.129, F1-score=0.0909]
Model saved in the folder :  model
Model name is :  SAR_Pesudo_ResNetSE_s0_BCELoss

Epoch: 2
Train: 100%|██████████| 62/62 [00:12<00:00,  4.93it/s, BCELoss=7.71, accuracy=0.117, precision=0.144, recall=0.113, F1-score=0.0995]
Valid: 100%|██████████| 14/14 [00:00<00:00, 34.54it/s, BCELoss=8.15, accuracy=0.141, precision=0.0811, recall=0.133, F1-score=0.0785]

7.测试

测试采用流式测试的方式,即每次送入模型2秒的音频数据,将音频数据转为[1,1,64,100]维度的张量数据,然后送入到模型中进行推理,每次都很得到推理的结构,可以根据阈值来判断该事件是否发生。

shell 复制代码
python model_test.py --model_type EAT-M-Transformer
相关推荐
yuyuyue24936 分钟前
lstm部分代码解释1.0
人工智能·rnn·lstm
橙意满满的西瓜大侠1 小时前
PDF问答工具(基于openai API和streamlit)
人工智能·langchain·streamlit
xiaokcehui2 小时前
深度学习与神经网络
人工智能·深度学习·神经网络
Watermelo6172 小时前
DeepSeek:全栈开发者视角下的AI革命者
人工智能·深度学习·神经网络·机器学习·语言模型·自然语言处理·transformer
bohu832 小时前
亚博microros小车-原生ubuntu支持系列:21 颜色追踪
人工智能·opencv·ubuntu·机器人·视觉检测·microros·视觉追踪
金融OG3 小时前
100.1 AI量化面试题:解释夏普比率(Sharpe Ratio)的计算方法及其在投资组合管理中的应用,并说明其局限性
大数据·人工智能·python·机器学习·金融
明晚十点睡3 小时前
2022ACMToG | 寻找快速的去马赛克算法
人工智能·python·深度学习·算法·机器学习·计算机视觉
wzx_Eleven3 小时前
利用腾讯云cloud studio云端免费部署deepseek-R1
人工智能·云计算·腾讯云
ኈ ቼ ዽ4 小时前
机器学习day5
人工智能·机器学习