MATLAB环境下一种音频降噪优化方法—基于时频正则化重叠群收缩

语音增强是语音信号处理领域中的一个重大分支,这一分支已经得到国内外学者的广泛研究。当今时代,随着近六十年来的不断发展,己经产生了许多有效的语音增强算法。根据语音增强过程中是否利用语音和噪声的先验信息,语音增强算法一般被归类为两类, 一类是无先验信息的语音增强算法,另外一类则是具有先验信息的语音增强算法。在第一类无先验信息语音增强算法中,比较常用的语音增强算法有谱减算法、基于统计模型的算法、基于信号子空间的算法、维纳滤波算法之类。这些算法对于平稳噪声具有很好的增强效果,但是对于特征很快变化的非平稳噪声,降噪性能经常无法满足需求。与上面的无先验信息语音增强算法相比,有先验信息语音增强算法能够弥补上述的缺点,能够有效的处理于非平稳噪声,达到合适的效果。有先验信息语音增强算法主要有隐马尔科夫模型语音增强算法以及码数驱动的语音增强算法。有先验信息语音增强算法能够通过线下提取语音和噪声的先验信息,然后利用HMM或者码数分别对获取的语音和噪声先验信息进行建模,结合线上语音和噪声的先验HMM 或者码数估计的语音谱和噪声谱,并构建维纳滤波器增强声音中的语音部分。由于利用到了语音和噪声的先验信息,这类算法能够很好地追踪线上语音和噪声特征的变化,实现对非平稳噪声很好的降噪效果。语音增强的主要目包含两个方面:1. 降低含噪语音中的噪音;2 尽量保留甚至增强原语音的质量。

提出一种基于时频正则化重叠群收缩的音频降噪优化方法,该方法使用可分解凸优化问题的预定结构知识对语音信号进行降噪,利用语音谱图观察聚类特性,使用混合范数惩罚项迭代地获得稀疏干净的语音信号。并在重叠群收缩算法基础上,在代价函数中引入时频权值。

% Reading in our audio files
[clean_signal, clean_speech_rate] = audioread("data/speech_files/sp01.wav");
[noise_signal, noise_signal_rate] = audioread("data/noise_files/keyboard_noise.wav");
noise_signal = noise_signal'; clean_signal = clean_signal';

% Ensuring noise signal length matches clean signl length through reptition and cropping
noise_signal = noise_signal(mod(0:length(clean_signal)-1, numel(noise_signal)) + 1);
assert(length(clean_signal) == length(noise_signal));

% Combining to create noisy signal
SNR = -10; % in dB
noisy_signal = clean_signal + (noise_signal / norm(noise_signal) * norm(clean_signal) / 10.0^(0.05*SNR));
% Uncomment the following line following line for using gaussian noise
%noisy_signal = awgn(clean_signal, SNR, "measured");

% Setting this changes what we take the STFT of
yo = noisy_signal;

% Some parameters for our test
noise_type = "impulsive"; % Noise types are impulsive, clean, stationary, used for weighting
lambda = 30; % Higher values when using both T & F weightings
Nit = 6;
K1 = 2;
K2 = 8;
window = sqrt(hann(256, 'periodic')); 
overlap_length = 128;
fft_length = 512;

%% Preprocess Data
% Take STFT
% ensure to use both a cola compliant window and overlap length and k needs to
% be an int in this eq. k = (length(yo) - overlap_length) / (length(window) - overlap_length)
% otherwise we can pad signal to make k an integer and remove padded 0's
% from our output
if (~iscola(window, overlap_length))
    error("COLA noncompliant parameters, imperfect reconstruction");
end
k = (length(yo) - overlap_length) / (length(window) - overlap_length);
if (k ~= floor(k))
    warning("Padding signal to provide sample reconstruction post istft, results may be off for groups that stretch across to these padded zeros");
    padding = ceil(k) * overlap_length + overlap_length - length(noisy_signal);
    yo = [yo zeros(1, padding)];
else
    padding = 0;
end
tf = stft(yo, noise_signal_rate, 'Window', window, 'OverlapLength', overlap_length, 'FFTLength', fft_length);

%% Creating our frequency weighting 
% For noise with varying energy across the bands
if noise_type == "clean" || noise_type == "impulsive" || noise_type == "stationary"
    [N, Fo, Ao, W] = firpmord([4000, 6000]/(noise_signal_rate/2), [1 0.8], [0.01, 0.01]);
    b = firpm(10, Fo, Ao, W);
    [filter_magnitudes, ~] = freqz(b, 1, size(tf, 1));
    filter_magnitudes = abs(filter_magnitudes);
elseif noise_type == "stationary"
    filter_magnitudes = ones(fft_length, 1);
end
frequency_weighting = repmat(filter_magnitudes, [1, size(tf, 2)]);

%% Denoising Signal
% Running algorithm
[tf_denoised, cost, weights, energy_ratios] = tfs(tf, K1, K2, lambda, Nit, frequency_weighting);
denoised_signal = real(istft(tf_denoised, noise_signal_rate, 'Window', window, 'OverlapLength', overlap_length, 'FFTLength', fft_length)');

% Undoing the padding if any was necessary
if (padding ~= 0)
    yo = yo(1:length(yo)-padding);
    denoised_signal = denoised_signal(1:length(denoised_signal)-padding);
end

%% Plots, SNR Readout & Playing Denoised Signal
time = (1:length(denoised_signal))/noise_signal_rate;
figure(1)
clf;
subplot(3,3,1);
hold on;
plot(time, denoised_signal, 'Color', [1, 0, 0, 0.2]);
plot(time, clean_signal, 'Color', [0, 1, 0, 0.05]);
axis tight;
hold off;
title("Clean vs Denoised Signal");
legend("Denoised Signal", "Clean Signal");

subplot(3,3,2);
plot(time, clean_signal - denoised_signal, 'Color', [1, 0, 0, 1]);
axis tight;
title("Delta of Clean vs Denoised");

subplot(3,3,3);
hold on;
plot(time, denoised_signal, 'Color', [1, 0, 0, 0.2]);
plot(time, noisy_signal, 'Color', [0, 1, 0, 0.05]);
axis tight;
hold off;
title("Noisy vs Denoised Signal");
legend("Denoised Signal", "Noisy Signal");

subplot(3,3,4);
plot(cost)
title("Cost Per Iteration");
legend("Cost");

subplot(3, 3, 5);
mesh(mag2db(weights));
c = colorbar;
c.Label.String = "Power/Frequency db/Hz";
shading interp;
view(0, 90);
xlim([0 size(weights, 2)])
ylim([0 size(weights, 1)])
title ("Time & Frequency Attenutation Weighting");

subplot(3, 3, 6);
spectrogram(clean_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Clean Signal");

subplot(3, 3, 7);
spectrogram(noisy_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Noisy Signal");

subplot(3, 3, 8);
spectrogram(denoised_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Denoised Signal");

subplot(3, 3, 9);
spectrogram(clean_signal - denoised_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Delta Between Clean & Denoised Signal");

if noise_type ~= "stationary"
    figure(2)
    freqz(b, 1, size(tf, 1));
end

figure(3)
subplot(2, 1, 1);
plot((1:fft_length).*(noise_signal_rate/2)/fft_length, smooth(filter_magnitudes));
xlabel("Frequency Hz");
ylabel("Weight");
title("Frequency Weights")

subplot(2, 1, 2);
plot(energy_ratios);
axis tight;
xlabel("Time");
ylabel("Weight");
title("Time Weights")

% Get the SNR and play the denoised signal
if (sum(clean_signal(:).^2) == 0) || (sum((clean_signal(:)-yo(:)).^2) == 0)
    preSNR = 0;
else
    preSNR = 10*log10(sum(clean_signal(:).^2) / (sum((clean_signal(:)-yo(:)).^2)));
end

if (sum(clean_signal(:).^2) == 0) || (sum((clean_signal(:)-denoised_signal(:)).^2) == 0)
    postSNR = 0;
else
    postSNR = 10*log10(sum(clean_signal(:).^2) / (sum((clean_signal(:)-denoised_signal(:)).^2)));
end
fprintf("SNR Pre-Denoising: %.2f SNR Post-Denoising: %.2f dB\n", preSNR, postSNR);
sound(denoised_signal, noise_signal_rate);

工学博士,担任《Mechanical System and Signal Processing》审稿专家,担任《中国电机工程学报》优秀审稿专家,《控制与决策》,《系统工程与电子技术》,《电力系统保护与控制》,《宇航学报》等EI期刊审稿专家,担任《计算机科学》,《电子器件》 , 《现代制造过程》 ,《电源学报》,《船舶工程》 ,《轴承》 ,《工矿自动化》 ,《重庆理工大学学报》 ,《噪声与振动控制》 ,《机械传动》 ,《机械强度》 ,《机械科学与技术》 ,《机床与液压》,《声学技术》,《应用声学》,《石油机械》,《西安工业大学学报》等中文核心审稿专家。

擅长领域:现代信号处理,机器学习,深度学习,数字孪生,时间序列分析,设备缺陷检测、设备异常检测、设备智能故障诊断与健康管理PHM等。

相关推荐
井底哇哇19 分钟前
ChatGPT是强人工智能吗?
人工智能·chatgpt
Coovally AI模型快速验证24 分钟前
MMYOLO:打破单一模式限制,多模态目标检测的革命性突破!
人工智能·算法·yolo·目标检测·机器学习·计算机视觉·目标跟踪
AI浩1 小时前
【面试总结】FFN(前馈神经网络)在Transformer模型中先升维再降维的原因
人工智能·深度学习·计算机视觉·transformer
可为测控1 小时前
图像处理基础(4):高斯滤波器详解
人工智能·算法·计算机视觉
一水鉴天1 小时前
为AI聊天工具添加一个知识系统 之63 详细设计 之4:AI操作系统 之2 智能合约
开发语言·人工智能·python
倔强的石头1062 小时前
解锁辅助驾驶新境界:基于昇腾 AI 异构计算架构 CANN 的应用探秘
人工智能·架构
佛州小李哥2 小时前
Agent群舞,在亚马逊云科技搭建数字营销多代理(Multi-Agent)(下篇)
人工智能·科技·ai·语言模型·云计算·aws·亚马逊云科技
说私域3 小时前
社群裂变+2+1链动新纪元:S2B2C小程序如何重塑企业客户管理版图?
大数据·人工智能·小程序·开源
程序猿阿伟3 小时前
《探秘鸿蒙Next:如何保障AI模型轻量化后多设备协同功能一致》
人工智能·华为·harmonyos
2401_897579653 小时前
AI赋能Flutter开发:ScriptEcho助你高效构建跨端应用
前端·人工智能·flutter