基于隐马尔可夫模型(HMM)的孤立词语音识别系统

一、系统架构与原理

1.1 系统整体架构

复制代码
语音输入 → 预处理 → 特征提取 → HMM训练/识别 → 识别结果
         ↓          ↓           ↓
      端点检测    MFCC特征    Viterbi解码
                  ↓           ↓
              特征向量    词级HMM模型

1.2 HMM在语音识别中的原理

HMM模型参数

  • 状态数N:通常3-5个状态对应一个音素
  • 观测序列O:MFCC特征向量序列
  • 状态转移矩阵A :aij=P(qt+1=j∣qt=i)a_{ij} = P(q_{t+1}=j|q_t=i)aij=P(qt+1=j∣qt=i)
  • 观测概率矩阵B :bj(ot)=P(ot∣qt=j)b_j(o_t) = P(o_t|q_t=j)bj(ot)=P(ot∣qt=j)
  • 初始状态分布π :πi=P(q1=i)\pi_i = P(q_1=i)πi=P(q1=i)

三个基本问题

  1. 评估问题 :给定模型λ和观测序列O,计算P(O∣λ)P(O|λ)P(O∣λ)(前向算法)
  2. 解码问题:给定模型λ和观测序列O,求最可能的状态序列(Viterbi算法)
  3. 学习问题:给定观测序列O,估计模型参数λ(Baum-Welch算法)

二、MATLAB实现代码

matlab 复制代码
%% 基于HMM的孤立词语音识别系统
clear; clc; close all;

%% 1. 系统参数设置
fprintf('========== HMM孤立词语音识别系统 ==========\n');

% 系统参数
fs = 16000;                 % 采样率16kHz
frame_length = 0.025;       % 帧长25ms
frame_shift = 0.01;         % 帧移10ms
num_mfcc = 13;              % MFCC系数个数
num_filters = 26;           % 梅尔滤波器个数
pre_emphasis = 0.97;        % 预加重系数

% HMM参数
num_states = 5;             % 每个词的HMM状态数
num_mixtures = 3;           % 每个状态的GMM混合数
max_iterations = 20;        % Baum-Welch最大迭代次数
convergence_threshold = 1e-4; % 收敛阈值

% 词汇表(示例:数字0-9)
vocabulary = {'zero', 'one', 'two', 'three', 'four', ...
              'five', 'six', 'seven', 'eight', 'nine'};
num_words = length(vocabulary);

%% 2. 数据准备与预处理
fprintf('准备训练和测试数据...\n');

% 假设数据目录结构:
% data/
%   train/
%     zero/
%       zero_001.wav
%       zero_002.wav
%       ...
%     one/
%       one_001.wav
%       ...
%   test/
%     test_001.wav
%     ...

% 创建模拟数据(实际应用时从文件读取)
train_data = cell(num_words, 1);
test_data = cell(num_words, 1);

% 为每个词生成模拟训练数据
for w = 1:num_words
    num_train_samples = 10;  % 每个词10个训练样本
    num_test_samples = 3;    % 每个词3个测试样本
    
    train_data{w} = cell(num_train_samples, 1);
    test_data{w} = cell(num_test_samples, 1);
    
    for s = 1:num_train_samples
        % 生成模拟语音信号(实际应从wav文件读取)
        duration = 0.5 + rand() * 0.3;  % 0.5-0.8秒
        t = 0:1/fs:duration;
        freq = 100 + w * 50;  % 不同词有不同的基频
        signal = 0.5 * sin(2*pi*freq*t) + 0.1*randn(size(t));
        
        % 添加静音段
        silence = zeros(1, round(0.1*fs));
        signal = [silence, signal, silence];
        
        train_data{w}{s} = signal;
    end
    
    for s = 1:num_test_samples
        duration = 0.5 + rand() * 0.3;
        t = 0:1/fs:duration;
        freq = 100 + w * 50;
        signal = 0.5 * sin(2*pi*freq*t) + 0.1*randn(size(t));
        silence = zeros(1, round(0.1*fs));
        signal = [silence, signal, silence];
        test_data{w}{s} = signal;
    end
end

%% 3. 特征提取:MFCC
fprintf('提取MFCC特征...\n');

% 训练数据特征
train_features = cell(num_words, 1);
for w = 1:num_words
    num_samples = length(train_data{w});
    train_features{w} = cell(num_samples, 1);
    
    for s = 1:num_samples
        signal = train_data{w}{s};
        
        % 端点检测(简单能量法)
        [start_idx, end_idx] = endpoint_detection(signal, fs);
        signal = signal(start_idx:end_idx);
        
        % 提取MFCC特征
        mfcc_features = extract_mfcc(signal, fs, frame_length, ...
                                     frame_shift, num_mfcc, num_filters, ...
                                     pre_emphasis);
        
        train_features{w}{s} = mfcc_features;
    end
end

% 测试数据特征
test_features = cell(num_words, 1);
for w = 1:num_words
    num_samples = length(test_data{w});
    test_features{w} = cell(num_samples, 1);
    
    for s = 1:num_samples
        signal = test_data{w}{s};
        [start_idx, end_idx] = endpoint_detection(signal, fs);
        signal = signal(start_idx:end_idx);
        
        mfcc_features = extract_mfcc(signal, fs, frame_length, ...
                                     frame_shift, num_mfcc, num_filters, ...
                                     pre_emphasis);
        
        test_features{w}{s} = mfcc_features;
    end
end

%% 4. HMM模型训练
fprintf('训练HMM模型...\n');

% 初始化HMM模型
hmm_models = cell(num_words, 1);

for w = 1:num_words
    fprintf('训练词 "%s" 的HMM模型...\n', vocabulary{w});
    
    % 获取该词的所有训练样本特征
    word_features = train_features{w};
    num_samples = length(word_features);
    
    % 将所有样本的特征合并,用于初始化GMM
    all_features = [];
    for s = 1:num_samples
        all_features = [all_features, word_features{s}];
    end
    
    % 初始化HMM参数
    [init_A, init_B, init_pi] = initialize_hmm(num_states, num_mixtures, all_features);
    
    % 使用Baum-Welch算法训练HMM
    hmm_model = struct();
    hmm_model.A = init_A;
    hmm_model.B = init_B;  % GMM参数
    hmm_model.pi = init_pi;
    hmm_model.num_states = num_states;
    hmm_model.num_mixtures = num_mixtures;
    
    % Baum-Welch训练
    hmm_model = train_hmm_baum_welch(hmm_model, word_features, ...
                                     max_iterations, convergence_threshold);
    
    hmm_models{w} = hmm_model;
end

%% 5. 识别测试
fprintf('进行识别测试...\n');

confusion_matrix = zeros(num_words, num_words);
total_tests = 0;

for true_word = 1:num_words
    num_test_samples = length(test_features{true_word});
    
    for s = 1:num_test_samples
        test_sequence = test_features{true_word}{s};
        
        % 计算每个HMM模型的对数似然概率
        log_likelihoods = zeros(num_words, 1);
        
        for w = 1:num_words
            % 使用前向算法计算观测序列的概率
            log_prob = forward_algorithm(test_sequence, hmm_models{w});
            log_likelihoods(w) = log_prob;
        end
        
        % 选择最大概率对应的词
        [~, recognized_word] = max(log_likelihoods);
        
        % 更新混淆矩阵
        confusion_matrix(true_word, recognized_word) = ...
            confusion_matrix(true_word, recognized_word) + 1;
        total_tests = total_tests + 1;
        
        fprintf('测试样本: 真实词="%s", 识别结果="%s"\n', ...
                vocabulary{true_word}, vocabulary{recognized_word});
    end
end

%% 6. 性能评估
fprintf('\n========== 识别结果统计 ==========\n');

% 计算准确率
accuracy = trace(confusion_matrix) / total_tests * 100;
fprintf('总体识别准确率: %.2f%%\n', accuracy);

% 显示混淆矩阵
figure('Position', [100, 100, 800, 600]);
imagesc(confusion_matrix);
colormap(jet);
colorbar;
xlabel('识别结果');
ylabel('真实标签');
title(sprintf('混淆矩阵 (准确率: %.2f%%)', accuracy));

% 设置坐标轴标签
set(gca, 'XTick', 1:num_words, 'XTickLabel', vocabulary);
set(gca, 'YTick', 1:num_words, 'YTickLabel', vocabulary);
rotateXLabels(gca, 45);

% 计算每个词的识别率
for w = 1:num_words
    word_accuracy = confusion_matrix(w,w) / sum(confusion_matrix(w,:)) * 100;
    fprintf('词 "%s" 的识别率: %.2f%%\n', vocabulary{w}, word_accuracy);
end

%% 7. 实时识别演示(可选)
fprintf('\n========== 实时识别演示 ==========\n');
fprintf('按任意键开始录音,说完一个词后自动识别...\n');
pause;

% 录音参数
recording_duration = 2;  % 录音时长2秒

% 创建录音对象
recorder = audiorecorder(fs, 16, 1);  % 16kHz, 16位, 单声道

% 开始录音
fprintf('开始录音...\n');
recordblocking(recorder, recording_duration);
fprintf('录音结束\n');

% 获取录音数据
audio_data = getaudiodata(recorder);

% 特征提取
[start_idx, end_idx] = endpoint_detection(audio_data, fs);
if end_idx > start_idx
    audio_data = audio_data(start_idx:end_idx);
end

mfcc_features = extract_mfcc(audio_data, fs, frame_length, frame_shift, ...
                             num_mfcc, num_filters, pre_emphasis);

% 识别
log_likelihoods = zeros(num_words, 1);
for w = 1:num_words
    log_prob = forward_algorithm(mfcc_features, hmm_models{w});
    log_likelihoods(w) = log_prob;
end

% 显示识别结果
[~, recognized_idx] = max(log_likelihoods);
fprintf('识别结果: "%s"\n', vocabulary{recognized_idx});

% 显示概率分布
figure('Position', [100, 100, 800, 400]);
bar(log_likelihoods);
xlabel('词汇');
ylabel('对数似然概率');
title('各词HMM模型的对数似然概率');
set(gca, 'XTick', 1:num_words, 'XTickLabel', vocabulary);
grid on;

%% 辅助函数定义

% ==================== 端点检测函数 ====================
function [start_idx, end_idx] = endpoint_detection(signal, fs)
    % 基于能量的端点检测
    % signal: 输入语音信号
    % fs: 采样率
    
    frame_len = round(0.025 * fs);  % 25ms帧长
    frame_shift = round(0.01 * fs); % 10ms帧移
    
    % 分帧
    num_frames = floor((length(signal) - frame_len) / frame_shift) + 1;
    frames = zeros(frame_len, num_frames);
    
    for i = 1:num_frames
        start_sample = (i-1)*frame_shift + 1;
        end_sample = start_sample + frame_len - 1;
        if end_sample > length(signal)
            frames(:,i) = [signal(start_sample:end); zeros(end_sample-length(signal),1)];
        else
            frames(:,i) = signal(start_sample:end_sample);
        end
    end
    
    % 计算每帧能量
    energy = sum(frames.^2, 1);
    
    % 自适应阈值
    max_energy = max(energy);
    threshold = 0.1 * max_energy;
    
    % 找到能量超过阈值的帧
    above_threshold = energy > threshold;
    
    if any(above_threshold)
        start_frame = find(above_threshold, 1, 'first');
        end_frame = find(above_threshold, 1, 'last');
        
        % 转换为样本索引
        start_idx = max(1, (start_frame-1)*frame_shift + 1);
        end_idx = min(length(signal), (end_frame-1)*frame_shift + frame_len);
    else
        % 如果没有超过阈值的帧,使用整个信号
        start_idx = 1;
        end_idx = length(signal);
    end
end

% ==================== MFCC特征提取函数 ====================
function mfcc_features = extract_mfcc(signal, fs, frame_length, frame_shift, ...
                                      num_mfcc, num_filters, pre_emphasis)
    % 提取MFCC特征
    % signal: 输入语音信号
    % fs: 采样率
    % frame_length: 帧长(秒)
    % frame_shift: 帧移(秒)
    % num_mfcc: MFCC系数个数
    % num_filters: 梅尔滤波器个数
    % pre_emphasis: 预加重系数
    
    % 1. 预加重
    signal = filter([1, -pre_emphasis], 1, signal);
    
    % 2. 分帧
    frame_len_samples = round(frame_length * fs);
    frame_shift_samples = round(frame_shift * fs);
    
    num_frames = floor((length(signal) - frame_len_samples) / frame_shift_samples) + 1;
    frames = zeros(frame_len_samples, num_frames);
    
    for i = 1:num_frames
        start_sample = (i-1)*frame_shift_samples + 1;
        end_sample = start_sample + frame_len_samples - 1;
        if end_sample > length(signal)
            frames(:,i) = [signal(start_sample:end); zeros(end_sample-length(signal),1)];
        else
            frames(:,i) = signal(start_sample:end_sample);
        end
    end
    
    % 3. 加窗(汉明窗)
    window = hamming(frame_len_samples);
    frames = frames .* window;
    
    % 4. 计算功率谱
    NFFT = 2^nextpow2(frame_len_samples);
    mag_frames = abs(fft(frames, NFFT, 1)).^2;
    mag_frames = mag_frames(1:NFFT/2+1, :);
    
    % 5. 梅尔滤波器组
    mel_low = 0;
    mel_high = 2595 * log10(1 + fs/2 / 700);  % 将Hz转换为Mel
    mel_points = linspace(mel_low, mel_high, num_filters+2);
    hz_points = 700 * (10.^(mel_points/2595) - 1);  % 将Mel转换回Hz
    bin_points = floor((NFFT/2+1) * hz_points / (fs/2));
    
    filter_bank = zeros(num_filters, NFFT/2+1);
    for m = 2:num_filters+1
        left = bin_points(m-1);
        center = bin_points(m);
        right = bin_points(m+1);
        
        for k = left:center
            filter_bank(m-1, k+1) = (k - left) / (center - left);
        end
        for k = center:right
            filter_bank(m-1, k+1) = (right - k) / (right - center);
        end
    end
    
    % 6. 应用梅尔滤波器组
    filter_banks = filter_bank * mag_frames;
    filter_banks = max(filter_banks, 1e-10);  % 避免log(0)
    filter_banks = log10(filter_banks);
    
    % 7. 离散余弦变换(DCT)
    mfcc = dct(filter_banks);
    mfcc = mfcc(1:num_mfcc, :);
    
    % 8. 动态特征(一阶和二阶差分)
    num_coeff = size(mfcc, 1);
    delta = zeros(num_coeff, size(mfcc, 2));
    delta_delta = zeros(num_coeff, size(mfcc, 2));
    
    for t = 2:size(mfcc, 2)-1
        delta(:, t) = (mfcc(:, t+1) - mfcc(:, t-1)) / 2;
    end
    
    for t = 3:size(mfcc, 2)-2
        delta_delta(:, t) = (delta(:, t+1) - delta(:, t-1)) / 2;
    end
    
    % 9. 组合静态、一阶、二阶特征
    mfcc_features = [mfcc; delta; delta_delta];
    
    % 10. 倒谱均值归一化(CMN)
    mfcc_features = mfcc_features - mean(mfcc_features, 2);
end

% ==================== HMM初始化函数 ====================
function [A, B, pi] = initialize_hmm(num_states, num_mixtures, features)
    % 初始化HMM参数
    % num_states: 状态数
    % num_mixtures: 每个状态的GMM混合数
    % features: 训练特征(用于初始化GMM)
    
    % 1. 初始化状态转移矩阵A(左到右HMM)
    A = zeros(num_states, num_states);
    for i = 1:num_states-1
        A(i, i) = 0.7;      % 停留在当前状态
        A(i, i+1) = 0.3;    % 转移到下一个状态
    end
    A(num_states, num_states) = 1.0;  % 最后一个状态自环
    
    % 2. 初始化初始状态分布π
    pi = zeros(num_states, 1);
    pi(1) = 1.0;  % 从第一个状态开始
    
    % 3. 初始化观测概率B(GMM参数)
    [dim, num_frames] = size(features);
    
    B = struct();
    B.num_mixtures = num_mixtures;
    B.dim = dim;
    
    % 使用K-means初始化GMM参数
    for s = 1:num_states
        % 为每个状态分配部分数据
        start_idx = round((s-1)/num_states * num_frames) + 1;
        end_idx = round(s/num_states * num_frames);
        state_features = features(:, start_idx:min(end_idx, num_frames));
        
        if size(state_features, 2) < num_mixtures
            % 如果数据太少,使用所有数据
            state_features = features;
        end
        
        % K-means聚类初始化
        [idx, centers] = kmeans(state_features', num_mixtures, ...
                                'MaxIter', 100, 'Replicates', 3);
        
        % 初始化GMM参数
        B.weight{s} = ones(1, num_mixtures) / num_mixtures;  % 混合权重
        B.mu{s} = centers';  % 均值
        
        % 初始化协方差矩阵(对角阵)
        B.sigma{s} = zeros(dim, dim, num_mixtures);
        for m = 1:num_mixtures
            cluster_data = state_features(:, idx == m);
            if size(cluster_data, 2) > 1
                var_vec = var(cluster_data, 0, 2);
            else
                var_vec = ones(dim, 1);
            end
            B.sigma{s}(:,:,m) = diag(var_vec + 1e-6);  % 添加小值防止奇异
        end
    end
end

% ==================== GMM概率计算函数 ====================
function prob = gmm_probability(feature_vector, weight, mu, sigma)
    % 计算特征向量在GMM下的概率
    % feature_vector: 特征向量 (dim×1)
    % weight: 混合权重 (1×M)
    % mu: 均值矩阵 (dim×M)
    % sigma: 协方差矩阵 (dim×dim×M)
    
    num_mixtures = length(weight);
    dim = size(feature_vector, 1);
    
    prob = 0;
    for m = 1:num_mixtures
        % 计算单个高斯分布的概率
        diff = feature_vector - mu(:, m);
        sigma_m = sigma(:,:,m);
        
        % 使用对数避免下溢
        log_prob = -0.5 * (dim * log(2*pi) + log(det(sigma_m)) + ...
                           diff' * (sigma_m \ diff));
        
        prob = prob + weight(m) * exp(log_prob);
    end
    
    % 避免零概率
    prob = max(prob, 1e-100);
end

% ==================== 前向算法 ====================
function log_prob = forward_algorithm(observations, hmm)
    % 前向算法计算观测序列的概率
    % observations: 观测序列 (dim×T)
    % hmm: HMM模型
    
    T = size(observations, 2);  % 观测序列长度
    N = hmm.num_states;         % 状态数
    
    % 初始化前向概率
    alpha = zeros(N, T);
    
    % 初始化t=1时刻的前向概率
    for i = 1:N
        b_i = gmm_probability(observations(:,1), ...
                              hmm.B.weight{i}, ...
                              hmm.B.mu{i}, ...
                              hmm.B.sigma{i});
        alpha(i, 1) = hmm.pi(i) * b_i;
    end
    
    % 递推计算
    for t = 1:T-1
        for j = 1:N
            sum_alpha = 0;
            for i = 1:N
                sum_alpha = sum_alpha + alpha(i, t) * hmm.A(i, j);
            end
            
            b_j = gmm_probability(observations(:,t+1), ...
                                  hmm.B.weight{j}, ...
                                  hmm.B.mu{j}, ...
                                  hmm.B.sigma{j});
            
            alpha(j, t+1) = sum_alpha * b_j;
        end
        
        % 缩放防止下溢
        scale = sum(alpha(:, t+1));
        if scale > 0
            alpha(:, t+1) = alpha(:, t+1) / scale;
        end
    end
    
    % 计算总概率(对数形式)
    log_prob = log(sum(alpha(:, T)) + realmin);
end

% ==================== Baum-Welch训练算法 ====================
function hmm = train_hmm_baum_welch(hmm, training_sequences, max_iter, threshold)
    % Baum-Welch算法训练HMM
    % hmm: 初始HMM模型
    % training_sequences: 训练序列集合(每个序列是一个cell)
    % max_iter: 最大迭代次数
    % threshold: 收敛阈值
    
    num_sequences = length(training_sequences);
    N = hmm.num_states;
    M = hmm.num_mixtures;
    
    prev_log_likelihood = -inf;
    
    for iter = 1:max_iter
        fprintf('  迭代 %d/%d...\n', iter, max_iter);
        
        % 初始化统计量
        A_num = zeros(N, N);
        A_den = zeros(N, 1);
        
        % GMM统计量
        weight_num = cell(N, 1);
        mu_num = cell(N, 1);
        sigma_num = cell(N, 1);
        den = zeros(N, 1);
        
        for s = 1:N
            weight_num{s} = zeros(1, M);
            mu_num{s} = zeros(hmm.B.dim, M);
            sigma_num{s} = zeros(hmm.B.dim, hmm.B.dim, M);
        end
        
        total_log_likelihood = 0;
        
        % 对每个训练序列
        for seq_idx = 1:num_sequences
            observations = training_sequences{seq_idx};
            T = size(observations, 2);
            
            % 前向-后向算法
            [alpha, beta, scale, log_likelihood] = forward_backward(observations, hmm);
            total_log_likelihood = total_log_likelihood + log_likelihood;
            
            % 计算ξ和γ
            xi = zeros(N, N, T-1);
            gamma = zeros(N, T);
            
            for t = 1:T-1
                for i = 1:N
                    for j = 1:N
                        xi(i, j, t) = alpha(i, t) * hmm.A(i, j) * ...
                                      gmm_probability(observations(:,t+1), ...
                                                      hmm.B.weight{j}, ...
                                                      hmm.B.mu{j}, ...
                                                      hmm.B.sigma{j}) * ...
                                      beta(j, t+1);
                    end
                end
                xi(:,:,t) = xi(:,:,t) / sum(sum(xi(:,:,t)));
            end
            
            for t = 1:T
                gamma(:, t) = alpha(:, t) .* beta(:, t);
                gamma(:, t) = gamma(:, t) / sum(gamma(:, t));
            end
            
            % 更新A的统计量
            for i = 1:N
                for j = 1:N
                    A_num(i, j) = A_num(i, j) + sum(squeeze(xi(i, j, :)));
                end
                A_den(i) = A_den(i) + sum(gamma(i, 1:T-1));
            end
            
            % 更新GMM的统计量
            for t = 1:T
                obs = observations(:, t);
                for i = 1:N
                    gamma_i_t = gamma(i, t);
                    
                    % 计算GMM后验概率
                    prob_mix = zeros(1, M);
                    for m = 1:M
                        weight_im = hmm.B.weight{i}(m);
                        mu_im = hmm.B.mu{i}(:, m);
                        sigma_im = hmm.B.sigma{i}(:,:,m);
                        
                        diff = obs - mu_im;
                        log_prob = -0.5 * (hmm.B.dim * log(2*pi) + ...
                                           log(det(sigma_im)) + ...
                                           diff' * (sigma_im \ diff));
                        prob_mix(m) = weight_im * exp(log_prob);
                    end
                    prob_mix = prob_mix / sum(prob_mix);
                    
                    % 更新统计量
                    den(i) = den(i) + gamma_i_t;
                    
                    for m = 1:M
                        gamma_im_t = gamma_i_t * prob_mix(m);
                        
                        weight_num{i}(m) = weight_num{i}(m) + gamma_im_t;
                        mu_num{i}(:, m) = mu_num{i}(:, m) + gamma_im_t * obs;
                        sigma_num{i}(:,:,m) = sigma_num{i}(:,:,m) + ...
                                              gamma_im_t * (obs * obs');
                    end
                end
            end
        end
        
        % 更新模型参数
        % 更新A
        for i = 1:N
            if A_den(i) > 0
                hmm.A(i, :) = A_num(i, :) / A_den(i);
            end
        end
        
        % 更新GMM参数
        for i = 1:N
            if den(i) > 0
                % 更新权重
                hmm.B.weight{i} = weight_num{i} / den(i);
                
                % 更新均值
                for m = 1:M
                    if weight_num{i}(m) > 0
                        hmm.B.mu{i}(:, m) = mu_num{i}(:, m) / weight_num{i}(m);
                        
                        % 更新协方差
                        sigma_tmp = sigma_num{i}(:,:,m) / weight_num{i}(m) - ...
                                    hmm.B.mu{i}(:, m) * hmm.B.mu{i}(:, m)';
                        
                        % 确保协方差矩阵正定
                        [V, D] = eig(sigma_tmp);
                        D = diag(max(diag(D), 1e-6));  % 设置最小特征值
                        hmm.B.sigma{i}(:,:,m) = V * D * V';
                    end
                end
            end
        end
        
        % 检查收敛
        if abs(total_log_likelihood - prev_log_likelihood) < threshold
            fprintf('  在第%d次迭代收敛\n', iter);
            break;
        end
        
        prev_log_likelihood = total_log_likelihood;
    end
end

% ==================== 前向-后向算法 ====================
function [alpha, beta, scale, log_likelihood] = forward_backward(observations, hmm)
    % 前向-后向算法
    % 返回:alpha, beta, 缩放因子, 对数似然
    
    T = size(observations, 2);
    N = hmm.num_states;
    
    % 前向算法(带缩放)
    alpha = zeros(N, T);
    scale = zeros(1, T);
    
    % 初始化
    for i = 1:N
        b_i = gmm_probability(observations(:,1), ...
                              hmm.B.weight{i}, ...
                              hmm.B.mu{i}, ...
                              hmm.B.sigma{i});
        alpha(i, 1) = hmm.pi(i) * b_i;
    end
    
    scale(1) = sum(alpha(:, 1));
    alpha(:, 1) = alpha(:, 1) / scale(1);
    
    % 递推
    for t = 1:T-1
        for j = 1:N
            sum_alpha = 0;
            for i = 1:N
                sum_alpha = sum_alpha + alpha(i, t) * hmm.A(i, j);
            end
            
            b_j = gmm_probability(observations(:,t+1), ...
                                  hmm.B.weight{j}, ...
                                  hmm.B.mu{j}, ...
                                  hmm.B.sigma{j});
            
            alpha(j, t+1) = sum_alpha * b_j;
        end
        
        scale(t+1) = sum(alpha(:, t+1));
        alpha(:, t+1) = alpha(:, t+1) / scale(t+1);
    end
    
    % 计算对数似然
    log_likelihood = sum(log(scale(scale>0)));
    
    % 后向算法(带缩放)
    beta = zeros(N, T);
    beta(:, T) = 1 / scale(T);
    
    for t = T-1:-1:1
        for i = 1:N
            beta(i, t) = 0;
            for j = 1:N
                b_j = gmm_probability(observations(:,t+1), ...
                                      hmm.B.weight{j}, ...
                                      hmm.B.mu{j}, ...
                                      hmm.B.sigma{j});
                beta(i, t) = beta(i, t) + hmm.A(i, j) * b_j * beta(j, t+1);
            end
            beta(i, t) = beta(i, t) / scale(t);
        end
    end
end

% ==================== Viterbi解码算法 ====================
function [best_path, best_prob] = viterbi_decode(observations, hmm)
    % Viterbi算法寻找最优状态序列
    % observations: 观测序列
    % hmm: HMM模型
    % 返回:最优状态序列和对应的概率
    
    T = size(observations, 2);
    N = hmm.num_states;
    
    % 初始化
    delta = zeros(N, T);
    psi = zeros(N, T);
    
    for i = 1:N
        b_i = gmm_probability(observations(:,1), ...
                              hmm.B.weight{i}, ...
                              hmm.B.mu{i}, ...
                              hmm.B.sigma{i});
        delta(i, 1) = log(hmm.pi(i)) + log(b_i);
        psi(i, 1) = 0;
    end
    
    % 递推
    for t = 2:T
        for j = 1:N
            max_val = -inf;
            max_idx = 1;
            
            for i = 1:N
                val = delta(i, t-1) + log(hmm.A(i, j));
                if val > max_val
                    max_val = val;
                    max_idx = i;
                end
            end
            
            b_j = gmm_probability(observations(:,t), ...
                                  hmm.B.weight{j}, ...
                                  hmm.B.mu{j}, ...
                                  hmm.B.sigma{j});
            
            delta(j, t) = max_val + log(b_j);
            psi(j, t) = max_idx;
        end
    end
    
    % 终止
    [best_prob, best_last_state] = max(delta(:, T));
    
    % 回溯
    best_path = zeros(1, T);
    best_path(T) = best_last_state;
    
    for t = T-1:-1:1
        best_path(t) = psi(best_path(t+1), t+1);
    end
end

三、系统优化与改进

3.1 改进的MFCC特征提取

matlab 复制代码
%% 改进的MFCC特征提取(包含能量和差分特征)
function features = extract_enhanced_mfcc(signal, fs, params)
    % 参数设置
    frame_length = params.frame_length;  % 秒
    frame_shift = params.frame_shift;    % 秒
    num_mfcc = params.num_mfcc;          % MFCC系数个数
    num_filters = params.num_filters;    % 梅尔滤波器个数
    pre_emphasis = params.pre_emphasis;  % 预加重系数
    use_delta = params.use_delta;        % 是否使用差分特征
    use_delta_delta = params.use_delta_delta; % 是否使用二阶差分
    
    % 基本MFCC提取
    mfcc = extract_mfcc(signal, fs, frame_length, frame_shift, ...
                        num_mfcc, num_filters, pre_emphasis);
    
    % 提取对数能量
    frame_len_samples = round(frame_length * fs);
    frame_shift_samples = round(frame_shift * fs);
    
    num_frames = floor((length(signal) - frame_len_samples) / frame_shift_samples) + 1;
    log_energy = zeros(1, num_frames);
    
    for i = 1:num_frames
        start_sample = (i-1)*frame_shift_samples + 1;
        end_sample = min(start_sample + frame_len_samples - 1, length(signal));
        frame = signal(start_sample:end_sample);
        log_energy(i) = log(sum(frame.^2) + eps);
    end
    
    % 组合特征
    features = [mfcc; log_energy];
    
    % 添加差分特征
    if use_delta
        delta_features = zeros(size(features));
        for t = 2:size(features, 2)-1
            delta_features(:, t) = (features(:, t+1) - features(:, t-1)) / 2;
        end
        features = [features; delta_features];
    end
    
    % 添加二阶差分特征
    if use_delta_delta && use_delta
        delta_delta_features = zeros(size(features, 1)/2, size(features, 2));
        for t = 3:size(features, 2)-2
            delta_delta_features(:, t) = (delta_features(:, t+1) - delta_features(:, t-1)) / 2;
        end
        features = [features; delta_delta_features];
    end
end

3.2 基于HTK格式的HMM训练

matlab 复制代码
%% HTK兼容的HMM训练(更稳定的实现)
function hmm = train_hmm_htk_style(training_features, num_states, num_mixtures)
    % 基于HTK风格的HMM训练
    % training_features: 训练特征集合(cell数组)
    % num_states: 状态数
    % num_mixtures: 混合数
    
    % 1. 初始化
    [dim, ~] = size(training_features{1});
    
    % 均匀分段初始化
    num_seqs = length(training_features);
    total_frames = 0;
    for s = 1:num_seqs
        total_frames = total_frames + size(training_features{s}, 2);
    end
    
    % 为每个状态分配大致相等的帧数
    frames_per_state = ceil(total_frames / num_states);
    
    % 2. 分段K-means初始化
    state_features = cell(num_states, 1);
    current_state = 1;
    current_count = 0;
    
    for s = 1:num_seqs
        seq = training_features{s};
        seq_len = size(seq, 2);
        
        for f = 1:seq_len
            if current_count >= frames_per_state && current_state < num_states
                current_state = current_state + 1;
                current_count = 0;
            end
            
            if isempty(state_features{current_state})
                state_features{current_state} = seq(:, f);
            else
                state_features{current_state} = [state_features{current_state}, seq(:, f)];
            end
            
            current_count = current_count + 1;
        end
    end
    
    % 3. 初始化GMM参数
    hmm = struct();
    hmm.num_states = num_states;
    hmm.num_mixtures = num_mixtures;
    
    % 初始化转移矩阵(左到右,无跳转)
    hmm.A = zeros(num_states, num_states);
    for i = 1:num_states-1
        hmm.A(i, i) = 0.7;
        hmm.A(i, i+1) = 0.3;
    end
    hmm.A(num_states, num_states) = 1.0;
    
    % 初始状态分布
    hmm.pi = zeros(num_states, 1);
    hmm.pi(1) = 1.0;
    
    % 初始化GMM
    hmm.B = struct();
    hmm.B.dim = dim;
    
    for s = 1:num_states
        if size(state_features{s}, 2) >= num_mixtures
            % 使用K-means初始化
            [idx, centers] = kmeans(state_features{s}', num_mixtures, ...
                                    'MaxIter', 100, 'Replicates', 3);
            
            hmm.B.weight{s} = ones(1, num_mixtures) / num_mixtures;
            hmm.B.mu{s} = centers';
            
            hmm.B.sigma{s} = zeros(dim, dim, num_mixtures);
            for m = 1:num_mixtures
                cluster_data = state_features{s}(:, idx == m);
                if size(cluster_data, 2) > 1
                    var_vec = var(cluster_data, 0, 2);
                else
                    var_vec = ones(dim, 1);
                end
                hmm.B.sigma{s}(:,:,m) = diag(var_vec + 1e-6);
            end
        else
            % 如果数据不足,使用全局统计
            all_features = [];
            for ss = 1:num_states
                all_features = [all_features, state_features{ss}];
            end
            
            [idx, centers] = kmeans(all_features', num_mixtures, ...
                                    'MaxIter', 100, 'Replicates', 3);
            
            hmm.B.weight{s} = ones(1, num_mixtures) / num_mixtures;
            hmm.B.mu{s} = centers';
            
            for m = 1:num_mixtures
                hmm.B.sigma{s}(:,:,m) = eye(dim) * 0.1;
            end
        end
    end
    
    % 4. Baum-Welch重估(多轮)
    max_iterations = 10;
    for iter = 1:max_iterations
        fprintf('HTK风格训练 - 迭代 %d/%d\n', iter, max_iterations);
        hmm = baum_welch_reestimation(hmm, training_features);
    end
end

%% Baum-Welch重估
function new_hmm = baum_welch_reestimation(hmm, training_features)
    % Baum-Welch重估(简化版)
    % 这里省略了复杂的统计量计算,实际实现需要完整的前向-后向算法
    % 建议使用上面的完整实现
    new_hmm = hmm;
end

3.3 性能评估与可视化

matlab 复制代码
%% 性能评估函数
function [metrics, confusion_matrix] = evaluate_system(hmm_models, test_features, vocabulary)
    % 评估系统性能
    % hmm_models: 训练好的HMM模型
    % test_features: 测试特征
    % vocabulary: 词汇表
    
    num_words = length(vocabulary);
    confusion_matrix = zeros(num_words, num_words);
    
    for true_word = 1:num_words
        num_samples = length(test_features{true_word});
        
        for s = 1:num_samples
            test_seq = test_features{true_word}{s};
            
            % 计算每个模型的对数似然
            log_likelihoods = zeros(num_words, 1);
            for w = 1:num_words
                log_likelihoods(w) = forward_algorithm(test_seq, hmm_models{w});
            end
            
            % 选择最大概率
            [~, predicted_word] = max(log_likelihoods);
            confusion_matrix(true_word, predicted_word) = ...
                confusion_matrix(true_word, predicted_word) + 1;
        end
    end
    
    % 计算指标
    accuracy = trace(confusion_matrix) / sum(confusion_matrix(:));
    
    precision = zeros(num_words, 1);
    recall = zeros(num_words, 1);
    f1_score = zeros(num_words, 1);
    
    for w = 1:num_words
        TP = confusion_matrix(w, w);
        FP = sum(confusion_matrix(:, w)) - TP;
        FN = sum(confusion_matrix(w, :)) - TP;
        
        precision(w) = TP / (TP + FP + eps);
        recall(w) = TP / (TP + FN + eps);
        f1_score(w) = 2 * precision(w) * recall(w) / (precision(w) + recall(w) + eps);
    end
    
    metrics = struct(...
        'accuracy', accuracy, ...
        'precision', precision, ...
        'recall', recall, ...
        'f1_score', f1_score, ...
        'confusion_matrix', confusion_matrix);
end

%% 可视化结果
function visualize_results(metrics, vocabulary)
    % 可视化评估结果
    
    % 混淆矩阵
    figure('Position', [100, 100, 800, 600]);
    imagesc(metrics.confusion_matrix);
    colormap(jet);
    colorbar;
    xlabel('Predicted Label');
    ylabel('True Label');
    title(sprintf('Confusion Matrix (Accuracy: %.2f%%)', metrics.accuracy*100));
    
    % 设置标签
    set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
    set(gca, 'YTick', 1:length(vocabulary), 'YTickLabel', vocabulary);
    rotateXLabels(gca, 45);
    
    % 性能指标柱状图
    figure('Position', [100, 100, 1200, 400]);
    
    subplot(1,3,1);
    bar(metrics.precision);
    set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
    ylabel('Precision');
    title('Precision per Word');
    rotateXLabels(gca, 45);
    ylim([0, 1]);
    
    subplot(1,3,2);
    bar(metrics.recall);
    set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
    ylabel('Recall');
    title('Recall per Word');
    rotateXLabels(gca, 45);
    ylim([0, 1]);
    
    subplot(1,3,3);
    bar(metrics.f1_score);
    set(gca, 'XTick', 1:length(vocabulary), 'XTickLabel', vocabulary);
    ylabel('F1-Score');
    title('F1-Score per Word');
    rotateXLabels(gca, 45);
    ylim([0, 1]);
end

参考代码 基于hmm的语音孤立词识别系统 www.youwenfan.com/contentcsv/79135.html

四、应用建议

4.1 数据准备

  1. 数据采集:每个词至少采集20-50个样本,覆盖不同说话人、不同环境
  2. 数据增强:添加噪声、改变语速、改变音调来增强模型鲁棒性
  3. 标注规范:确保每个样本的标注准确,避免标签错误

4.2 参数调优

参数 推荐值 调整建议
状态数 3-5 词越长,状态数可适当增加
混合数 3-8 训练样本越多,可增加混合数
MFCC系数 12-13 通常12个系数+1个能量
帧长 20-25ms 平衡时间分辨率和频谱分辨率
帧移 10ms 确保帧间有足够重叠

4.3 常见问题解决

  1. 过拟合:增加训练数据、减少混合数、使用正则化
  2. 欠拟合:增加混合数、增加训练迭代次数、改进特征提取
  3. 识别率低:检查端点检测、优化MFCC参数、增加训练样本
  4. 计算量大:使用对角协方差矩阵、减少状态数、使用Viterbi剪枝

4.4 系统扩展

  1. 连续语音识别:连接多个词的HMM模型
  2. 说话人自适应:使用MAP或MLLR进行自适应
  3. 关键词 spotting:结合垃圾模型检测非关键词
  4. 实时实现:使用滑动窗口、缓存机制优化实时性能

五、总结

本系统实现了一个完整的基于HMM的孤立词语音识别系统,包括:

  1. 预处理模块:端点检测、预加重、分帧加窗
  2. 特征提取模块:MFCC特征提取(含动态特征)
  3. HMM训练模块:Baum-Welch算法训练GMM-HMM
  4. 识别模块:前向算法计算似然,Viterbi解码
  5. 评估模块:混淆矩阵、准确率、召回率等指标
相关推荐
weixin_468466851 小时前
千问大模型在阿里生态中的实战应用指南
大数据·人工智能·深度学习·ai·大模型·智能交互·自动应答
2601_958352902 小时前
AP-0316 语音模块实测效果与能力边界展示
语音识别·硬件开发·ai降噪·音频处理模块
kTR2hD1qb2 小时前
Claude Code Skill的介绍与使用
java·前端·数据库·人工智能
qq_390934742 小时前
Cursor使用教程
人工智能
码农小白AI2 小时前
规范档案复核流程,IACheck+AI 报告审核满足资质监管要求
人工智能
Luhui Dev2 小时前
大角几何 MCP 服务上线:让 AI Agent 直接完成几何作图
人工智能·数学·机器学习·大角几何·luhuidev
纤纡.2 小时前
阿里云 DSW 实战:从零完成 Qwen3-4B 大模型 LoRA 微调全流程
人工智能·阿里云·语言模型·云计算
AI视觉网奇2 小时前
3d 打印模型修复
人工智能·3d
“码”力全开2 小时前
深度解析:基于 Docker 与边缘计算的 AI 视频管理平台架构——打通 GB28181/RTSP 协议与“源码交付”的高效集成方案
人工智能·docker·边缘计算