MATLAB环境下一种音频降噪优化方法—基于时频正则化重叠群收缩

语音增强是语音信号处理领域中的一个重大分支，这一分支已经得到国内外学者的广泛研究。当今时代，随着近六十年来的不断发展，己经产生了许多有效的语音增强算法。根据语音增强过程中是否利用语音和噪声的先验信息，语音增强算法一般被归类为两类，一类是无先验信息的语音增强算法，另外一类则是具有先验信息的语音增强算法。在第一类无先验信息语音增强算法中，比较常用的语音增强算法有谱减算法、基于统计模型的算法、基于信号子空间的算法、维纳滤波算法之类。这些算法对于平稳噪声具有很好的增强效果，但是对于特征很快变化的非平稳噪声，降噪性能经常无法满足需求。与上面的无先验信息语音增强算法相比，有先验信息语音增强算法能够弥补上述的缺点，能够有效的处理于非平稳噪声，达到合适的效果。有先验信息语音增强算法主要有隐马尔科夫模型语音增强算法以及码数驱动的语音增强算法。有先验信息语音增强算法能够通过线下提取语音和噪声的先验信息，然后利用HMM或者码数分别对获取的语音和噪声先验信息进行建模，结合线上语音和噪声的先验HMM 或者码数估计的语音谱和噪声谱，并构建维纳滤波器增强声音中的语音部分。由于利用到了语音和噪声的先验信息，这类算法能够很好地追踪线上语音和噪声特征的变化，实现对非平稳噪声很好的降噪效果。语音增强的主要目包含两个方面：1. 降低含噪语音中的噪音；2 尽量保留甚至增强原语音的质量。

提出一种基于时频正则化重叠群收缩的音频降噪优化方法，该方法使用可分解凸优化问题的预定结构知识对语音信号进行降噪，利用语音谱图观察聚类特性，使用混合范数惩罚项迭代地获得稀疏干净的语音信号。并在重叠群收缩算法基础上，在代价函数中引入时频权值。

复制代码

% Reading in our audio files
[clean_signal, clean_speech_rate] = audioread("data/speech_files/sp01.wav");
[noise_signal, noise_signal_rate] = audioread("data/noise_files/keyboard_noise.wav");
noise_signal = noise_signal'; clean_signal = clean_signal';

% Ensuring noise signal length matches clean signl length through reptition and cropping
noise_signal = noise_signal(mod(0:length(clean_signal)-1, numel(noise_signal)) + 1);
assert(length(clean_signal) == length(noise_signal));

% Combining to create noisy signal
SNR = -10; % in dB
noisy_signal = clean_signal + (noise_signal / norm(noise_signal) * norm(clean_signal) / 10.0^(0.05*SNR));
% Uncomment the following line following line for using gaussian noise
%noisy_signal = awgn(clean_signal, SNR, "measured");

% Setting this changes what we take the STFT of
yo = noisy_signal;

% Some parameters for our test
noise_type = "impulsive"; % Noise types are impulsive, clean, stationary, used for weighting
lambda = 30; % Higher values when using both T & F weightings
Nit = 6;
K1 = 2;
K2 = 8;
window = sqrt(hann(256, 'periodic')); 
overlap_length = 128;
fft_length = 512;

%% Preprocess Data
% Take STFT
% ensure to use both a cola compliant window and overlap length and k needs to
% be an int in this eq. k = (length(yo) - overlap_length) / (length(window) - overlap_length)
% otherwise we can pad signal to make k an integer and remove padded 0's
% from our output
if (~iscola(window, overlap_length))
    error("COLA noncompliant parameters, imperfect reconstruction");
end
k = (length(yo) - overlap_length) / (length(window) - overlap_length);
if (k ~= floor(k))
    warning("Padding signal to provide sample reconstruction post istft, results may be off for groups that stretch across to these padded zeros");
    padding = ceil(k) * overlap_length + overlap_length - length(noisy_signal);
    yo = [yo zeros(1, padding)];
else
    padding = 0;
end
tf = stft(yo, noise_signal_rate, 'Window', window, 'OverlapLength', overlap_length, 'FFTLength', fft_length);

%% Creating our frequency weighting 
% For noise with varying energy across the bands
if noise_type == "clean" || noise_type == "impulsive" || noise_type == "stationary"
    [N, Fo, Ao, W] = firpmord([4000, 6000]/(noise_signal_rate/2), [1 0.8], [0.01, 0.01]);
    b = firpm(10, Fo, Ao, W);
    [filter_magnitudes, ~] = freqz(b, 1, size(tf, 1));
    filter_magnitudes = abs(filter_magnitudes);
elseif noise_type == "stationary"
    filter_magnitudes = ones(fft_length, 1);
end
frequency_weighting = repmat(filter_magnitudes, [1, size(tf, 2)]);

%% Denoising Signal
% Running algorithm
[tf_denoised, cost, weights, energy_ratios] = tfs(tf, K1, K2, lambda, Nit, frequency_weighting);
denoised_signal = real(istft(tf_denoised, noise_signal_rate, 'Window', window, 'OverlapLength', overlap_length, 'FFTLength', fft_length)');

% Undoing the padding if any was necessary
if (padding ~= 0)
    yo = yo(1:length(yo)-padding);
    denoised_signal = denoised_signal(1:length(denoised_signal)-padding);
end

%% Plots, SNR Readout & Playing Denoised Signal
time = (1:length(denoised_signal))/noise_signal_rate;
figure(1)
clf;
subplot(3,3,1);
hold on;
plot(time, denoised_signal, 'Color', [1, 0, 0, 0.2]);
plot(time, clean_signal, 'Color', [0, 1, 0, 0.05]);
axis tight;
hold off;
title("Clean vs Denoised Signal");
legend("Denoised Signal", "Clean Signal");

subplot(3,3,2);
plot(time, clean_signal - denoised_signal, 'Color', [1, 0, 0, 1]);
axis tight;
title("Delta of Clean vs Denoised");

subplot(3,3,3);
hold on;
plot(time, denoised_signal, 'Color', [1, 0, 0, 0.2]);
plot(time, noisy_signal, 'Color', [0, 1, 0, 0.05]);
axis tight;
hold off;
title("Noisy vs Denoised Signal");
legend("Denoised Signal", "Noisy Signal");

subplot(3,3,4);
plot(cost)
title("Cost Per Iteration");
legend("Cost");

subplot(3, 3, 5);
mesh(mag2db(weights));
c = colorbar;
c.Label.String = "Power/Frequency db/Hz";
shading interp;
view(0, 90);
xlim([0 size(weights, 2)])
ylim([0 size(weights, 1)])
title ("Time & Frequency Attenutation Weighting");

subplot(3, 3, 6);
spectrogram(clean_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Clean Signal");

subplot(3, 3, 7);
spectrogram(noisy_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Noisy Signal");

subplot(3, 3, 8);
spectrogram(denoised_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Denoised Signal");

subplot(3, 3, 9);
spectrogram(clean_signal - denoised_signal, window, overlap_length, fft_length, noise_signal_rate, 'yaxis');
title("Spectrogram of Delta Between Clean & Denoised Signal");

if noise_type ~= "stationary"
    figure(2)
    freqz(b, 1, size(tf, 1));
end

figure(3)
subplot(2, 1, 1);
plot((1:fft_length).*(noise_signal_rate/2)/fft_length, smooth(filter_magnitudes));
xlabel("Frequency Hz");
ylabel("Weight");
title("Frequency Weights")

subplot(2, 1, 2);
plot(energy_ratios);
axis tight;
xlabel("Time");
ylabel("Weight");
title("Time Weights")

% Get the SNR and play the denoised signal
if (sum(clean_signal(:).^2) == 0) || (sum((clean_signal(:)-yo(:)).^2) == 0)
    preSNR = 0;
else
    preSNR = 10*log10(sum(clean_signal(:).^2) / (sum((clean_signal(:)-yo(:)).^2)));
end

if (sum(clean_signal(:).^2) == 0) || (sum((clean_signal(:)-denoised_signal(:)).^2) == 0)
    postSNR = 0;
else
    postSNR = 10*log10(sum(clean_signal(:).^2) / (sum((clean_signal(:)-denoised_signal(:)).^2)));
end
fprintf("SNR Pre-Denoising: %.2f SNR Post-Denoising: %.2f dB\n", preSNR, postSNR);
sound(denoised_signal, noise_signal_rate);

工学博士，担任《Mechanical System and Signal Processing》审稿专家，担任《中国电机工程学报》优秀审稿专家，《控制与决策》，《系统工程与电子技术》，《电力系统保护与控制》，《宇航学报》等EI期刊审稿专家，担任《计算机科学》，《电子器件》，《现代制造过程》，《电源学报》，《船舶工程》，《轴承》，《工矿自动化》，《重庆理工大学学报》，《噪声与振动控制》，《机械传动》，《机械强度》，《机械科学与技术》，《机床与液压》，《声学技术》，《应用声学》，《石油机械》，《西安工业大学学报》等中文核心审稿专家。

擅长领域：现代信号处理，机器学习，深度学习，数字孪生，时间序列分析，设备缺陷检测、设备异常检测、设备智能故障诊断与健康管理PHM等。