一、系统架构与原理
1.1 系统整体架构
语音输入 → 预处理 → 特征提取 → HMM训练/识别 → 识别结果
↓ ↓ ↓
端点检测 MFCC特征 Viterbi解码
↓ ↓
特征向量 词级HMM模型
1.2 HMM在语音识别中的原理
HMM模型参数:
- 状态数N:通常3-5个状态对应一个音素
- 观测序列O:MFCC特征向量序列
- 状态转移矩阵A :aij=P(qt+1=j∣qt=i)a_{ij} = P(q_{t+1}=j|q_t=i)aij=P(qt+1=j∣qt=i)
- 观测概率矩阵B :bj(ot)=P(ot∣qt=j)b_j(o_t) = P(o_t|q_t=j)bj(ot)=P(ot∣qt=j)
- 初始状态分布π :πi=P(q1=i)\pi_i = P(q_1=i)πi=P(q1=i)
三个基本问题:
- 评估问题 :给定模型λ和观测序列O,计算P(O∣λ)P(O|λ)P(O∣λ)(前向算法)
- 解码问题:给定模型λ和观测序列O,求最可能的状态序列(Viterbi算法)
- 学习问题:给定观测序列O,估计模型参数λ(Baum-Welch算法)
二、MATLAB实现代码
matlab
%% 基于HMM的孤立词语音识别系统
clear; clc; close all;
%% 1. 系统参数设置
fprintf('========== HMM孤立词语音识别系统 ==========\n');
% 系统参数
fs = 16000; % 采样率16kHz
frame_length = 0.025; % 帧长25ms
frame_shift = 0.01; % 帧移10ms
num_mfcc = 13; % MFCC系数个数
num_filters = 26; % 梅尔滤波器个数
pre_emphasis = 0.97; % 预加重系数
% HMM参数
num_states = 5; % 每个词的HMM状态数
num_mixtures = 3; % 每个状态的GMM混合数
max_iterations = 20; % Baum-Welch最大迭代次数
convergence_threshold = 1e-4; % 收敛阈值
% 词汇表(示例:数字0-9)
vocabulary = {'zero', 'one', 'two', 'three', 'four', ...
'five', 'six', 'seven', 'eight', 'nine'};
num_words = length(vocabulary);
%% 2. 数据准备与预处理
fprintf('准备训练和测试数据...\n');
% 假设数据目录结构:
% data/
% train/
% zero/
% zero_001.wav
% zero_002.wav
% ...
% one/
% one_001.wav
% ...
% test/
% test_001.wav
% ...
% 创建模拟数据(实际应用时从文件读取)
train_data = cell(num_words, 1);
test_data = cell(num_words, 1);
% 为每个词生成模拟训练数据
for w = 1:num_words
num_train_samples = 10; % 每个词10个训练样本
num_test_samples = 3; % 每个词3个测试样本
train_data{w} = cell(num_train_samples, 1);
test_data{w} = cell(num_test_samples, 1);
for s = 1:num_train_samples
% 生成模拟语音信号(实际应从wav文件读取)
duration = 0.5 + rand() * 0.3; % 0.5-0.8秒
t = 0:1/fs:duration;
freq = 100 + w * 50; % 不同词有不同的基频
signal = 0.5 * sin(2*pi*freq*t) + 0.1*randn(size(t));
% 添加静音段
silence = zeros(1, round(0.1*fs));
signal = [silence, signal, silence];
train_data{w}{s} = signal;
end
for s = 1:num_test_samples
duration = 0.5 + rand() * 0.3;
t = 0:1/fs:duration;
freq = 100 + w * 50;
signal = 0.5 * sin(2*pi*freq*t) + 0.1*randn(size(t));
silence = zeros(1, round(0.1*fs));
signal = [silence, signal, silence];
test_data{w}{s} = signal;
end
end
%% 3. 特征提取:MFCC
fprintf('提取MFCC特征...\n');
% 训练数据特征
train_features = cell(num_words, 1);
for w = 1:num_words
num_samples = length(train_data{w});
train_features{w} = cell(num_samples, 1);
for s = 1:num_samples
signal = train_data{w}{s};
% 端点检测(简单能量法)
[start_idx, end_idx] = endpoint_detection(signal, fs);
signal = signal(start_idx:end_idx);
% 提取MFCC特征
mfcc_features = extract_mfcc(signal, fs, frame_length, ...
frame_shift, num_mfcc, num_filters, ...
pre_emphasis);
train_features{w}{s} = mfcc_features;
end
end
% 测试数据特征
test_features = cell(num_words, 1);
for w = 1:num_words
num_samples = length(test_data{w});
test_features{w} = cell(num_samples, 1);
for s = 1:num_samples
signal = test_data{w}{s};
[start_idx, end_idx] = endpoint_detection(signal, fs);
signal = signal(start_idx:end_idx);
mfcc_features = extract_mfcc(signal, fs, frame_length, ...
frame_shift, num_mfcc, num_filters, ...
pre_emphasis);
test_features{w}{s} = mfcc_features;
end
end
%% 4. HMM模型训练
fprintf('训练HMM模型...\n');
% 初始化HMM模型
hmm_models = cell(num_words, 1);
for w = 1:num_words
fprintf('训练词 "%s" 的HMM模型...\n', vocabulary{w});
% 获取该词的所有训练样本特征
word_features = train_features{w};
num_samples = length(word_features);
% 将所有样本的特征合并,用于初始化GMM
all_features = [];
for s = 1:num_samples
all_features = [all_features, word_features{s}];
end
% 初始化HMM参数
[init_A, init_B, init_pi] = initialize_hmm(num_states, num_mixtures, all_features);
% 使用Baum-Welch算法训练HMM
hmm_model = struct();
hmm_model.A = init_A;
hmm_model.B = init_B; % GMM参数
hmm_model.pi = init_pi;
hmm_model.num_states = num_states;
hmm_model.num_mixtures = num_mixtures;
% Baum-Welch训练
hmm_model = train_hmm_baum_welch(hmm_model, word_features, ...
max_iterations, convergence_threshold);
hmm_models{w} = hmm_model;
end
%% 5. 识别测试
fprintf('进行识别测试...\n');
confusion_matrix = zeros(num_words, num_words);
total_tests = 0;
for true_word = 1:num_words
num_test_samples = length(test_features{true_word});
for s = 1:num_test_samples
test_sequence = test_features{true_word}{s};
% 计算每个HMM模型的对数似然概率
log_likelihoods = zeros(num_words, 1);
for w = 1:num_words
% 使用前向算法计算观测序列的概率
log_prob = forward_algorithm(test_sequence, hmm_models{w});
log_likelihoods(w) = log_prob;
end
% 选择最大概率对应的词
[~, recognized_word] = max(log_likelihoods);
% 更新混淆矩阵
confusion_matrix(true_word, recognized_word) = ...
confusion_matrix(true_word, recognized_word) + 1;
total_tests = total_tests + 1;
fprintf('测试样本: 真实词="%s", 识别结果="%s"\n', ...
vocabulary{true_word}, vocabulary{recognized_word});
end
end
%% 6. 性能评估
fprintf('\n========== 识别结果统计 ==========\n');
% 计算准确率
accuracy = trace(confusion_matrix) / total_tests * 100;
fprintf('总体识别准确率: %.2f%%\n', accuracy);
% 显示混淆矩阵
figure('Position', [100, 100, 800, 600]);
imagesc(confusion_matrix);
colormap(jet);
colorbar;
xlabel('识别结果');
ylabel('真实标签');
title(sprintf('混淆矩阵 (准确率: %.2f%%)', accuracy));
% 设置坐标轴标签
set(gca, 'XTick', 1:num_words, 'XTickLabel', vocabulary);
set(gca, 'YTick', 1:num_words, 'YTickLabel', vocabulary);
rotateXLabels(gca, 45);
% 计算每个词的识别率
for w = 1:num_words
word_accuracy = confusion_matrix(w,w) / sum(confusion_matrix(w,:)) * 100;
fprintf('词 "%s" 的识别率: %.2f%%\n', vocabulary{w}, word_accuracy);
end
%% 7. 实时识别演示(可选)
fprintf('\n========== 实时识别演示 ==========\n');
fprintf('按任意键开始录音,说完一个词后自动识别...\n');
pause;
% 录音参数
recording_duration = 2; % 录音时长2秒
% 创建录音对象
recorder = audiorecorder(fs, 16, 1); % 16kHz, 16位, 单声道
% 开始录音
fprintf('开始录音...\n');
recordblocking(recorder, recording_duration);
fprintf('录音结束\n');
% 获取录音数据
audio_data = getaudiodata(recorder);
% 特征提取
[start_idx, end_idx] = endpoint_detection(audio_data, fs);
if end_idx > start_idx
audio_data = audio_data(start_idx:end_idx);
end
mfcc_features = extract_mfcc(audio_data, fs, frame_length, frame_shift, ...
num_mfcc, num_filters, pre_emphasis);
% 识别
log_likelihoods = zeros(num_words, 1);
for w = 1:num_words
log_prob = forward_algorithm(mfcc_features, hmm_models{w});
log_likelihoods(w) = log_prob;
end
% 显示识别结果
[~, recognized_idx] = max(log_likelihoods);
fprintf('识别结果: "%s"\n', vocabulary{recognized_idx});
% 显示概率分布
figure('Position', [100, 100, 800, 400]);
bar(log_likelihoods);
xlabel('词汇');
ylabel('对数似然概率');
title('各词HMM模型的对数似然概率');
set(gca, 'XTick', 1:num_words, 'XTickLabel', vocabulary);
grid on;
%% 辅助函数定义
% ==================== 端点检测函数 ====================
function [start_idx, end_idx] = endpoint_detection(signal, fs)
% 基于能量的端点检测
% signal: 输入语音信号
% fs: 采样率
frame_len = round(0.025 * fs); % 25ms帧长
frame_shift = round(0.01 * fs); % 10ms帧移
% 分帧
num_frames = floor((length(signal) - frame_len) / frame_shift) + 1;
frames = zeros(frame_len, num_frames);
for i = 1:num_frames
start_sample = (i-1)*frame_shift + 1;
end_sample = start_sample + frame_len - 1;
if end_sample > length(signal)
frames(:,i) = [signal(start_sample:end); zeros(end_sample-length(signal),1)];
else
frames(:,i) = signal(start_sample:end_sample);
end
end
% 计算每帧能量
energy = sum(frames.^2, 1);
% 自适应阈值
max_energy = max(energy);
threshold = 0.1 * max_energy;
% 找到能量超过阈值的帧
above_threshold = energy > threshold;
if any(above_threshold)
start_frame = find(above_threshold, 1, 'first');
end_frame = find(above_threshold, 1, 'last');
% 转换为样本索引
start_idx = max(1, (start_frame-1)*frame_shift + 1);
end_idx = min(length(signal), (end_frame-1)*frame_shift + frame_len);
else
% 如果没有超过阈值的帧,使用整个信号
start_idx = 1;
end_idx = length(signal);
end
end
% ==================== MFCC特征提取函数 ====================
function mfcc_features = extract_mfcc(signal, fs, frame_length, frame_shift, ...
num_mfcc, num_filters, pre_emphasis)
% 提取MFCC特征
% signal: 输入语音信号
% fs: 采样率
% frame_length: 帧长(秒)
% frame_shift: 帧移(秒)
% num_mfcc: MFCC系数个数
% num_filters: 梅尔滤波器个数
% pre_emphasis: 预加重系数
% 1. 预加重
signal = filter([1, -pre_emphasis], 1, signal);
% 2. 分帧
frame_len_samples = round(frame_length * fs);
frame_shift_samples = round(frame_shift * fs);
num_frames = floor((length(signal) - frame_len_samples) / frame_shift_samples) + 1;
frames = zeros(frame_len_samples, num_frames);
for i = 1:num_frames
start_sample = (i-1)*frame_shift_samples + 1;
end_sample = start_sample + frame_len_samples - 1;
if end_sample > length(signal)
frames(:,i) = [signal(start_sample:end); zeros(end_sample-length(signal),1)];
else
frames(:,i) = signal(start_sample:end_sample);
end
end
% 3. 加窗(汉明窗)
window = hamming(frame_len_samples);
frames = frames .* window;
% 4. 计算功率谱
NFFT = 2^nextpow2(frame_len_samples);
mag_frames = abs(fft(frames, NFFT, 1)).^2;
mag_frames = mag_frames(1:NFFT/2+1, :);
% 5. 梅尔滤波器组
mel_low = 0;
mel_high = 2595 * log10(1 + fs/2 / 700); % 将Hz转换为Mel
mel_points = linspace(mel_low, mel_high, num_filters+2);
hz_points = 700 * (10.^(mel_points/2595) - 1); % 将Mel转换回Hz
bin_points = floor((NFFT/2+1) * hz_points / (fs/2));
filter_bank = zeros(num_filters, NFFT/2+1);
for m = 2:num_filters+1
left = bin_points(m-1);
center = bin_points(m);
right = bin_points(m+1);
for k = left:center
filter_bank(m-1, k+1) = (k - left) / (center - left);
end
for k = center:right
filter_bank(m-1, k+1) = (right - k) / (right - center);
end
end
% 6. 应用梅尔滤波器组
filter_banks = filter_bank * mag_frames;
filter_banks = max(filter_banks, 1e-10); % 避免log(0)
filter_banks = log10(filter_banks);
% 7. 离散余弦变换(DCT)
mfcc = dct(filter_banks);
mfcc = mfcc(1:num_mfcc, :);
% 8. 动态特征(一阶和二阶差分)
num_coeff = size(mfcc, 1);
delta = zeros(num_coeff, size(mfcc, 2));
delta_delta = zeros(num_coeff, size(mfcc, 2));
for t = 2:size(mfcc, 2)-1
delta(:, t) = (mfcc(:, t+1) - mfcc(:, t-1)) / 2;
end
for t = 3:size(mfcc, 2)-2
delta_delta(:, t) = (delta(:, t+1) - delta(:, t-1)) / 2;
end
% 9. 组合静态、一阶、二阶特征
mfcc_features = [mfcc; delta; delta_delta];
% 10. 倒谱均值归一化(CMN)
mfcc_features = mfcc_features - mean(mfcc_features, 2);
end
% ==================== HMM初始化函数 ====================
function [A, B, pi] = initialize_hmm(num_states, num_mixtures, features)
% 初始化HMM参数
% num_states: 状态数
% num_mixtures: 每个状态的GMM混合数
% features: 训练特征(用于初始化GMM)
% 1. 初始化状态转移矩阵A(左到右HMM)
A = zeros(num_states, num_states);
for i = 1:num_states-1
A(i, i) = 0.7; % 停留在当前状态
A(i, i+1) = 0.3; % 转移到下一个状态
end
A(num_states, num_states) = 1.0; % 最后一个状态自环
% 2. 初始化初始状态分布π
pi = zeros(num_states, 1);
pi(1) = 1.0; % 从第一个状态开始
% 3. 初始化观测概率B(GMM参数)
[dim, num_frames] = size(features);
B = struct();
B.num_mixtures = num_mixtures;
B.dim = dim;
% 使用K-means初始化GMM参数
for s = 1:num_states
% 为每个状态分配部分数据
start_idx = round((s-1)/num_states * num_frames) + 1;
end_idx = round(s/num_states * num_frames);
state_features = features(:, start_idx:min(end_idx, num_frames));
if size(state_features, 2) < num_mixtures
% 如果数据太少,使用所有数据
state_features = features;
end
% K-means聚类初始化
[idx, centers] = kmeans(state_features', num_mixtures, ...
'MaxIter', 100, 'Replicates', 3);
% 初始化GMM参数
B.weight{s} = ones(1, num_mixtures) / num_mixtures; % 混合权重
B.mu{s} = centers'; % 均值
% 初始化协方差矩阵(对角阵)
B.sigma{s} = zeros(dim, dim, num_mixtures);
for m = 1:num_mixtures
cluster_data = state_features(:, idx == m);
if size(cluster_data, 2) > 1
var_vec = var(cluster_data, 0, 2);
else
var_vec = ones(dim, 1);
end
B.sigma{s}(:,:,m) = diag(var_vec + 1e-6); % 添加小值防止奇异
end
end
end
% ==================== GMM概率计算函数 ====================
function prob = gmm_probability(feature_vector, weight, mu, sigma)
% 计算特征向量在GMM下的概率
% feature_vector: 特征向量 (dim×1)
% weight: 混合权重 (1×M)
% mu: 均值矩阵 (dim×M)
% sigma: 协方差矩阵 (dim×dim×M)
num_mixtures = length(weight);
dim = size(feature_vector, 1);
prob = 0;
for m = 1:num_mixtures
% 计算单个高斯分布的概率
diff = feature_vector - mu(:, m);
sigma_m = sigma(:,:,m);
% 使用对数避免下溢
log_prob = -0.5 * (dim * log(2*pi) + log(det(sigma_m)) + ...
diff' * (sigma_m \ diff));
prob = prob + weight(m) * exp(log_prob);
end
% 避免零概率
prob = max(prob, 1e-100);
end
% ==================== 前向算法 ====================
function log_prob = forward_algorithm(observations, hmm)
% 前向算法计算观测序列的概率
% observations: 观测序列 (dim×T)
% hmm: HMM模型
T = size(observations, 2); % 观测序列长度
N = hmm.num_states; % 状态数
% 初始化前向概率
alpha = zeros(N, T);
% 初始化t=1时刻的前向概率
for i = 1:N
b_i = gmm_probability(observations(:,1), ...
hmm.B.weight{i}, ...
hmm.B.mu{i}, ...
hmm.B.sigma{i});
alpha(i, 1) = hmm.pi(i) * b_i;
end
% 递推计算
for t = 1:T-1
for j = 1:N
sum_alpha = 0;
for i = 1:N
sum_alpha = sum_alpha + alpha(i, t) * hmm.A(i, j);
end
b_j = gmm_probability(observations(:,t+1), ...
hmm.B.weight{j}, ...
hmm.B.mu{j}, ...
hmm.B.sigma{j});
alpha(j, t+1) = sum_alpha * b_j;
end
% 缩放防止下溢
scale = sum(alpha(:, t+1));
if scale > 0
alpha(:, t+1) = alpha(:, t+1) / scale;
end
end
% 计算总概率(对数形式)
log_prob = log(sum(alpha(:, T)) + realmin);
end
% ==================== Baum-Welch训练算法 ====================
function hmm = train_hmm_baum_welch(hmm, training_sequences, max_iter, threshold)
% Baum-Welch算法训练HMM
% hmm: 初始HMM模型
% training_sequences: 训练序列集合(每个序列是一个cell)
% max_iter: 最大迭代次数
% threshold: 收敛阈值
num_sequences = length(training_sequences);
N = hmm.num_states;
M = hmm.num_mixtures;
prev_log_likelihood = -inf;
for iter = 1:max_iter
fprintf(' 迭代 %d/%d...\n', iter, max_iter);
% 初始化统计量
A_num = zeros(N, N);
A_den = zeros(N, 1);
% GMM统计量
weight_num = cell(N, 1);
mu_num = cell(N, 1);
sigma_num = cell(N, 1);
den = zeros(N, 1);
for s = 1:N
weight_num{s} = zeros(1, M);
mu_num{s} = zeros(hmm.B.dim, M);
sigma_num{s} = zeros(hmm.B.dim, hmm.B.dim, M);
end
total_log_likelihood = 0;
% 对每个训练序列
for seq_idx = 1:num_sequences
observations = training_sequences{seq_idx};
T = size(observations, 2);
% 前向-后向算法
[alpha, beta, scale, log_likelihood] = forward_backward(observations, hmm);
total_log_likelihood = total_log_likelihood + log_likelihood;
% 计算ξ和γ
xi = zeros(N, N, T-1);
gamma = zeros(N, T);
for t = 1:T-1
for i = 1:N
for j = 1:N
xi(i, j, t) = alpha(i, t) * hmm.A(i, j) * ...
gmm_probability(observations(:,t+1), ...
hmm.B.weight{j}, ...
hmm.B.mu{j}, ...
hmm.B.sigma{j}) * ...
beta(j, t+1);
end
end
xi(:,:,t) = xi(:,:,t) / sum(sum(xi(:,:,t)));
end
for t = 1:T
gamma(:, t) = alpha(:, t) .* beta(:, t);
gamma(:, t) = gamma(:, t) / sum(gamma(:, t));
end
% 更新A的统计量
for i = 1:N
for j = 1:N
A_num(i, j) = A_num(i, j) + sum(squeeze(xi(i, j, :)));
end
A_den(i) = A_den(i) + sum(gamma(i, 1:T-1));
end
% 更新GMM的统计量
for t = 1:T
obs = observations(:, t);
for i = 1:N
gamma_i_t = gamma(i, t);
% 计算GMM后验概率
prob_mix = zeros(1, M);
for m = 1:M
weight_im = hmm.B.weight{i}(m);
mu_im = hmm.B.mu{i}(:, m);
sigma_im = hmm.B.sigma{i}(:,:,m);
diff = obs - mu_im;
log_prob = -0.5 * (hmm.B.dim * log(2*pi) + ...
log(det(sigma_im)) + ...
diff' * (sigma_im \ diff));
prob_mix(m) = weight_im * exp(log_prob);
end
prob_mix = prob_mix / sum(prob_mix);
% 更新统计量
den(i) = den(i) + gamma_i_t;
for m = 1:M
gamma_im_t = gamma_i_t * prob_mix(m);
weight_num{i}(m) = weight_num{i}(m) + gamma_im_t;
mu_num{i}(:, m) = mu_num{i}(:, m) + gamma_im_t * obs;
sigma_num{i}(:,:,m) = sigma_num{i}(:,:,m) + ...
gamma_im_t * (obs * obs');
end
end
end
end
% 更新模型参数
% 更新A
for i = 1:N
if A_den(i) > 0
hmm.A(i, :) = A_num(i, :) / A_den(i);
end
end
% 更新GMM参数
for i = 1:N
if den(i) > 0
% 更新权重
hmm.B.weight{i} = weight_num{i} / den(i);
% 更新均值
for m = 1:M
if weight_num{i}(m) > 0
hmm.B.mu{i}(:, m) = mu_num{i}(:, m) / weight_num{i}(m);
% 更新协方差
sigma_tmp = sigma_num{i}(:,:,m) / weight_num{i}(m) - ...
hmm.B.mu{i}(:, m) * hmm.B.mu{i}(:, m)';
% 确保协方差矩阵正定
[V, D] = eig(sigma_tmp);
D = diag(max(diag(D), 1e-6)); % 设置最小特征值
hmm.B.sigma{i}(:,:,m) = V * D * V';
end
end
end
end
% 检查收敛
if abs(total_log_likelihood - prev_log_likelihood) < threshold
fprintf(' 在第%d次迭代收敛\n', iter);
break;
end
prev_log_likelihood = total_log_likelihood;
end
end
% ==================== 前向-后向算法 ====================
function [alpha, beta, scale, log_likelihood] = forward_backward(observations, hmm)
% 前向-后向算法
% 返回:alpha, beta, 缩放因子, 对数似然
T = size(observations, 2);
N = hmm.num_states;
% 前向算法(带缩放)
alpha = zeros(N, T);
scale = zeros(1, T);
% 初始化
for i = 1:N
b_i = gmm_probability(observations(:,1), ...
hmm.B.weight{i}, ...
hmm.B.mu{i}, ...
hmm.B.sigma{i});
alpha(i, 1) = hmm.pi(i) * b_i;
end
scale(1) = sum(alpha(:, 1));
alpha(:, 1) = alpha(:, 1) / scale(1);
% 递推
for t = 1:T-1
for j = 1:N
sum_alpha = 0;
for i = 1:N
sum_alpha = sum_alpha + alpha(i, t) * hmm.A(i, j);
end
b_j = gmm_probability(observations(:,t+1), ...
hmm.B.weight{j}, ...
hmm.B.mu{j}, ...
hmm.B.sigma{j});
alpha(j, t+1) = sum_alpha * b_j;
end
scale(t+1) = sum(alpha(:, t+1));
alpha(:, t+1) = alpha(:, t+1) / scale(t+1);
end
% 计算对数似然
log_likelihood = sum(log(scale(scale>0)));
% 后向算法(带缩放)
beta = zeros(N, T);
beta(:, T) = 1 / scale(T);
for t = T-1:-1:1
for i = 1:N
beta(i, t) = 0;
for j = 1:N
b_j = gmm_probability(observations(:,t+1), ...
hmm.B.weight{j}, ...
hmm.B.mu{j}, ...
hmm.B.sigma{j});
beta(i, t) = beta(i, t) + hmm.A(i, j) * b_j * beta(j, t+1);
end
beta(i, t) = beta(i, t) / scale(t);
end
end
end
% ==================== Viterbi解码算法 ====================
function [best_path, best_prob] = viterbi_decode(observations, hmm)
% Viterbi算法寻找最优状态序列
% observations: 观测序列
% hmm: HMM模型
% 返回:最优状态序列和对应的概率
T = size(observations, 2);
N = hmm.num_states;
% 初始化
delta = zeros(N, T);
psi = zeros(N, T);
for i = 1:N
b_i = gmm_probability(observations(:,1), ...
hmm.B.weight{i}, ...
hmm.B.mu{i}, ...
hmm.B.sigma{i});
delta(i, 1) = log(hmm.pi(i)) + log(b_i);
psi(i, 1) = 0;
end
% 递推
for t = 2:T
for j = 1:N
max_val = -inf;
max_idx = 1;
for i = 1:N
val = delta(i, t-1) + log(hmm.A(i, j));
if val > max_val
max_val = val;
max_idx = i;
end
end
b_j = gmm_probability(observations(:,t), ...
hmm.B.weight{j}, ...
hmm.B.mu{j}, ...
hmm.B.sigma{j});
delta(j, t) = max_val + log(b_j);
psi(j, t) = max_idx;
end
end
% 终止
[best_prob, best_last_state] = max(delta(:, T));
% 回溯
best_path = zeros(1, T);
best_path(T) = best_last_state;
for t = T-1:-1:1
best_path(t) = psi(best_path(t+1), t+1);
end
end
三、系统优化与改进
3.1 改进的MFCC特征提取
matlab
%% 改进的MFCC特征提取(包含能量和差分特征)
function features = extract_enhanced_mfcc(signal, fs, params)
% 参数设置
frame_length = params.frame_length; % 秒
frame_shift = params.frame_shift; % 秒
num_mfcc = params.num_mfcc; % MFCC系数个数
num_filters = params.num_filters; % 梅尔滤波器个数
pre_emphasis = params.pre_emphasis; % 预加重系数
use_delta = params.use_delta; % 是否使用差分特征
use_delta_delta = params.use_delta_delta; % 是否使用二阶差分
% 基本MFCC提取
mfcc = extract_mfcc(signal, fs, frame_length, frame_shift, ...
num_mfcc, num_filters, pre_emphasis);
% 提取对数能量
frame_len_samples = round(frame_length * fs);
frame_shift_samples = round(frame_shift * fs);
num_frames = floor((length(signal) - frame_len_samples) / frame_shift_samples) + 1;
log_energy = zeros(1, num_frames);
for i = 1:num_frames
start_sample = (i-1)*frame_shift_samples + 1;
end_sample = min(start_sample + frame_len_samples - 1, length(signal));
frame = signal(start_sample:end_sample);
log_energy(i) = log(sum(frame.^2) + eps);
end
% 组合特征
features = [mfcc; log_energy];
% 添加差分特征
if use_delta
delta_features = zeros(size(features));
for t = 2:size(features, 2)-1
delta_features(:, t) = (features(:, t+1) - features(:, t-1)) / 2;
end
features = [features; delta_features];
end
% 添加二阶差分特征
if use_delta_delta && use_delta
delta_delta_features = zeros(size(features, 1)/2, size(features, 2));
for t = 3:size(features, 2)-2
delta_delta_features(:, t) = (delta_features(:, t+1) - delta_features(:, t-1)) / 2;
end
features = [features; delta_delta_features];
end
end
3.2 基于HTK格式的HMM训练
matlab
%% HTK兼容的HMM训练(更稳定的实现)
function hmm = train_hmm_htk_style(training_features, num_states, num_mixtures)
% 基于HTK风格的HMM训练
% training_features: 训练特征集合(cell数组)
% num_states: 状态数
% num_mixtures: 混合数
% 1. 初始化
[dim, ~] = size(training_features{1});
% 均匀分段初始化
num_seqs = length(training_features);
total_frames = 0;
for s = 1:num_seqs
total_frames = total_frames + size(training_features{s}, 2);
end
% 为每个状态分配大致相等的帧数
frames_per_state = ceil(total_frames / num_states);
% 2. 分段K-means初始化
state_features = cell(num_states, 1);
current_state = 1;
current_count = 0;
for s = 1:num_seqs
seq = training_features{s};
seq_len = size(seq, 2);
for f = 1:seq_len
if current_count >= frames_per_state && current_state < num_states
current_state = current_state + 1;
current_count = 0;
end
if isempty(state_features{current_state})
state_features{current_state} = seq(:, f);
else
state_features{current_state} = [state_features{current_state}, seq(:, f)];
end
current_count = current_count + 1;
end
end
% 3. 初始化GMM参数
hmm = struct();
hmm.num_states = num_states;
hmm.num_mixtures = num_mixtures;
% 初始化转移矩阵(左到右,无跳转)
hmm.A = zeros(num_states, num_states);
for i = 1:num_states-1
hmm.A(i, i) = 0.7;
hmm.A(i, i+1) = 0.3;
end
hmm.A(num_states, num_states) = 1.0;
% 初始状态分布
hmm.pi = zeros(num_states, 1);
hmm.pi(1) = 1.0;
% 初始化GMM
hmm.B = struct();
hmm.B.dim = dim;
for s = 1:num_states
if size(state_features{s}, 2) >= num_mixtures
% 使用K-means初始化
[idx, centers] = kmeans(state_features{s}', num_mixtures, ...
'MaxIter', 100, 'Replicates', 3);
hmm.B.weight{s} = ones(1, num_mixtures) / num_mixtures;
hmm.B.mu{s} = centers';
hmm.B.sigma{s} = zeros(dim, dim, num_mixtures);
for m = 1:num_mixtures
cluster_data = state_features{s}(:, idx == m);
if size(cluster_data, 2) > 1
var_vec = var(cluster_data, 0, 2);
else
var_vec = ones(dim, 1);
end
hmm.B.sigma{s}(:,:,m) = diag(var_vec + 1e-6);
end
else
% 如果数据不足,使用全局统计
all_features = [];
for ss = 1:num_states
all_features = [all_features, state_features{ss}];
end
[idx, centers] = kmeans(all_features', num_mixtures, ...
'MaxIter', 100, 'Replicates', 3);
hmm.B.weight{s} = ones(1, num_mixtures) / num_mixtures;
hmm.B.mu{s} = centers';
for m = 1:num_mixtures
hmm.B.sigma{s}(:,:,m) = eye(dim) * 0.1;
end
end
end
% 4. Baum-Welch重估(多轮)
max_iterations = 10;
for iter = 1:max_iterations
fprintf('HTK风格训练 - 迭代 %d/%d\n', iter, max_iterations);
hmm = baum_welch_reestimation(hmm, training_features);
end
end
%% Baum-Welch重估
function new_hmm = baum_welch_reestimation(hmm, training_features)
% Baum-Welch重估(简化版)
% 这里省略了复杂的统计量计算,实际实现需要完整的前向-后向算法
% 建议使用上面的完整实现
new_hmm = hmm;
end
3.3 性能评估与可视化
matlab
%% 性能评估函数
function [metrics, confusion_matrix] = evaluate_system(hmm_models, test_features, vocabulary)
% 评估系统性能
% hmm_models: 训练好的HMM模型
% test_features: 测试特征
% vocabulary: 词汇表
num_words = length(vocabulary);
confusion_matrix = zeros(num_words, num_words);
for true_word = 1:num_words
num_samples = length(test_features{true_word});
for s = 1:num_samples
test_seq = test_features{true_word}{s};
% 计算每个模型的对数似然
log_likelihoods = zeros(num_words, 1);
for w = 1:num_words
log_likelihoods(w) = forward_algorithm(test_seq, hmm_models{w});
end
% 选择最大概率
[~, predicted_word] = max(log_likelihoods);
confusion_matrix(true_word, predicted_word) = ...
confusion_matrix(true_word, predicted_word) + 1;
end
end
% 计算指标
accuracy = trace(confusion_matrix) / sum(confusion_matrix(:));
precision = zeros(num_words, 1);
recall = zeros(num_words, 1);
f1_score = zeros(num_words, 1);
for w = 1:num_words
TP = confusion_matrix(w, w);
FP = sum(confusion_matrix(:, w)) - TP;
FN = sum(confusion_matrix(w, :)) - TP;
precision(w) = TP / (TP + FP + eps);
recall(w) = TP / (TP + FN + eps);
f1_score(w) = 2 * precision(w) * recall(w) / (precision(w) + recall(w) + eps);
end
metrics = struct(...
'accuracy', accuracy, ...
'precision', precision, ...
'recall', recall, ...
'f1_score', f1_score, ...
'confusion_matrix', confusion_matrix);
end
%% 可视化结果
function visualize_results(metrics, vocabulary)
% 可视化评估结果
% 混淆矩阵
figure('Position', [100, 100, 800, 600]);
imagesc(metrics.confusion_matrix);
colormap(jet);
colorbar;
xlabel('Predicted Label');
ylabel('True Label');
title(sprintf('Confusion Matrix (Accuracy: %.2f%%)', metrics.accuracy*100));
% 设置标签
set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
set(gca, 'YTick', 1:length(vocabulary), 'YTickLabel', vocabulary);
rotateXLabels(gca, 45);
% 性能指标柱状图
figure('Position', [100, 100, 1200, 400]);
subplot(1,3,1);
bar(metrics.precision);
set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
ylabel('Precision');
title('Precision per Word');
rotateXLabels(gca, 45);
ylim([0, 1]);
subplot(1,3,2);
bar(metrics.recall);
set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
ylabel('Recall');
title('Recall per Word');
rotateXLabels(gca, 45);
ylim([0, 1]);
subplot(1,3,3);
bar(metrics.f1_score);
set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
ylabel('F1-Score');
title('F1-Score per Word');
rotateXLabels(gca, 45);
ylim([0, 1]);
end
参考代码 基于hmm的语音孤立词识别系统 www.youwenfan.com/contentcsv/79135.html
四、应用建议
4.1 数据准备
- 数据采集:每个词至少采集20-50个样本,覆盖不同说话人、不同环境
- 数据增强:添加噪声、改变语速、改变音调来增强模型鲁棒性
- 标注规范:确保每个样本的标注准确,避免标签错误
4.2 参数调优
| 参数 | 推荐值 | 调整建议 |
|---|---|---|
| 状态数 | 3-5 | 词越长,状态数可适当增加 |
| 混合数 | 3-8 | 训练样本越多,可增加混合数 |
| MFCC系数 | 12-13 | 通常12个系数+1个能量 |
| 帧长 | 20-25ms | 平衡时间分辨率和频谱分辨率 |
| 帧移 | 10ms | 确保帧间有足够重叠 |
4.3 常见问题解决
- 过拟合:增加训练数据、减少混合数、使用正则化
- 欠拟合:增加混合数、增加训练迭代次数、改进特征提取
- 识别率低:检查端点检测、优化MFCC参数、增加训练样本
- 计算量大:使用对角协方差矩阵、减少状态数、使用Viterbi剪枝
4.4 系统扩展
- 连续语音识别:连接多个词的HMM模型
- 说话人自适应:使用MAP或MLLR进行自适应
- 关键词 spotting:结合垃圾模型检测非关键词
- 实时实现:使用滑动窗口、缓存机制优化实时性能
五、总结
本系统实现了一个完整的基于HMM的孤立词语音识别系统,包括:
- 预处理模块:端点检测、预加重、分帧加窗
- 特征提取模块:MFCC特征提取(含动态特征)
- HMM训练模块:Baum-Welch算法训练GMM-HMM
- 识别模块:前向算法计算似然,Viterbi解码
- 评估模块:混淆矩阵、准确率、召回率等指标