Python+AI 大模型实现课堂教学质量智能分析｜加权评分 + 自动诊断 + 改进建议

本文基于Python 数据分析、加权欧氏距离算法、AI 大模型，实现一套课堂教学质量自动评分 + 问题智能诊断 + 改进建议生成系统。可自动分析师生行为、教学话语、师生情绪三大维度，输出专业课堂评估报告，适用于智慧教育、课堂诊断、教研分析等场景。

一、项目背景

在智慧教育快速发展的今天，课堂教学质量评估越来越依赖数据化、智能化、自动化。传统评课依赖人工经验，效率低、主观性强、难以规模化。

为此，我开发了一套全自动课堂分析系统，实现：

自动读取课堂分析 CSV 数据
多维度指标自动评分（师生行为、话语、情绪）
基于加权欧氏距离计算专业得分
AI 大模型自动生成课堂问题分析
AI 自动输出可落地的课堂改进策略

最终输出一份完整、专业、可直接用于教研的课堂评估报告。

二、系统架构

整个系统分为两大核心模块：

模块 1：课堂自动评分模块（CalculateScore）

读取 4 类 CSV 分析结果

加权权重加载

加权欧氏距离计算差异度

自动计算三大维度得分 + 综合得分

模块 2：AI 课堂诊断模块（ProblemImprovement）

读取指标数据

调用 AI 大模型生成问题分析

AI 自动生成结构化改进建议

输出完整报告文本

三、核心技术

Python CSV 数据解析

加权欧氏距离评分算法

JSON 权重配置

上下文管理器统一读取文件 / 内存流

OpenAI 兼容接口调用大模型

AI 提示词工程（高质量输出）

四、完整代码实现

课堂评分核心代码

python 复制代码

# 课堂多维度自动评分器
class ClassroomScorer:
    def __init__(self, weight_data=None, csv_files=None):
        # 初始化权重、CSV路径、缓存
        pass
    
    # 读取CSV（支持文件/内存流）
    @contextmanager
    def _read_csv_source(self, source):
        pass
    
    # 加载课堂数据
    def _load_classroom_data(self):
        pass
    
    # 加载权重
    def _load_weight_data(self):
        pass
    
    # 加权差异度计算
    def _calculate_weighted_difference(self):
        pass
    
    # 最终得分计算
    def _calculate_final_score(self):
        pass
    
    # 获取详细报告
    def get_detailed_report(self):
        pass

总代码

python 复制代码

import csv
import json
import numpy as np
import math
import os
import io
from contextlib import contextmanager

class ClassroomScorer:
    """课堂评分器类，用于计算课堂教学质量评估分数"""
    
    def __init__(self, weight_data=None, csv_files=None):
        """初始化课堂评分器
        
        Args:
            weight_data: 权重数据，可以是：
                        1. 字典格式的权重数据（直接使用）
                        2. 文件路径字符串（从文件读取）
                        3. None（使用默认文件路径）
            csv_files: CSV文件字典，支持两种value类型：
                      1. io.StringIO对象（优先级更高）
                      2. 文件路径字符串
                      格式: {
                          '师生行为': io.StringIO对象 或 'path/to/师生行为.csv',
                          '话语形式': io.StringIO对象 或 'path/to/话语形式.csv', 
                          '话语功能': io.StringIO对象 或 'path/to/话语功能.csv',
                          '师生情绪': io.StringIO对象 或 'path/to/师生情绪.csv'
                      }
        """
        # 处理权重数据
        if isinstance(weight_data, dict):  #判断一个变量 是不是 某种数据类型
            # 直接使用传入的权重字典
            self.weight_data = weight_data
            self.weight_file_path = None
        elif isinstance(weight_data, str):
            # 使用传入的文件路径
            self.weight_data = None
            self.weight_file_path = weight_data
        elif weight_data is None:
            # 使用默认文件路径
            self.weight_data = None
            self.weight_file_path = r"D:\software\Pycharm\语音转文字\话语分析\local_weight_vectors_20250801_115218.json"
        else:
            raise TypeError("weight_data参数必须是字典、字符串路径或None")
        
        # 设置CSV文件路径
        if csv_files is None:
            # 默认路径
            self.csv_files = {
                '师生行为': r".\doc\师生行为.csv",
                '话语形式': r".\doc\话语形式.csv",
                '话语功能': r".\doc\话语功能.csv",
                '师生情绪': r".\doc\师生情绪.csv"
            }
        elif isinstance(csv_files, dict):
            self.csv_files = csv_files
        else:
            raise TypeError("csv_files参数必须是字典类型")
        
        # 缓存数据，避免重复读取
        self._classroom_data = None
        self._local_weights = None
    
    def _validate_csv_files(self):
        """验证CSV文件是否存在（仅对文件路径进行验证）"""
        missing_files = []
        for name, file_source in self.csv_files.items():
            # 如果是字符串路径，检查文件是否存在
            if isinstance(file_source, str) and not os.path.exists(file_source):
                missing_files.append(f"{name}: {file_source}")
        
        if missing_files:
            raise FileNotFoundError(f"以下CSV文件不存在: {missing_files}")


    @contextmanager  # @contextmanager：把函数变成上下文管理器，支持 with 调用
    def _read_csv_source(self, source):
        """
        读取CSV数据源的上下文管理器，统一处理两种数据源：
        1. 内存中的 io.StringIO 对象
        2. 本地文件路径字符串
        """
        # 判断数据源是否为内存文本流 io.StringIO 类型
        if isinstance(source, io.StringIO):
            # 重置流的读取指针到开头（防止流已被读取过导致读不到数据）
            source.seek(0)
            # 生成 csv.DictReader 对象，按行读取为字典格式
            yield csv.DictReader(source)

        # 判断数据源是否为文件路径字符串
        elif isinstance(source, str):
            # 以只读方式打开文件，使用 utf-8-sig 编码兼容带BOM的UTF-8文件
            with open(source, 'r', encoding='utf-8-sig') as file_obj:
                # 生成 csv.DictReader 对象，按行读取为字典格式
                yield csv.DictReader(file_obj)

        # 既不是StringIO也不是路径，抛出不支持类型错误
        else:
            raise TypeError(f"不支持的CSV数据源类型: {type(source)}")
    
    def _load_classroom_data(self):
        """从四个CSV数据源中读取课堂数据，返回字典格式的三个部分数据"""
        if self._classroom_data is not None:
            return self._classroom_data
        
        # 验证文件存在性（仅对文件路径）
        self._validate_csv_files()
            
        # 初始化三个部分的数据字典
        classroom_data = {
            '师生行为': {'test_vector': [], 'quality_vector': []},
            '教学话语': {'test_vector': [], 'quality_vector': []},
            '师生情绪': {'test_vector': [], 'quality_vector': []}
        }
        
        try:
            # 1. 处理师生行为数据 
            behavior_source = self.csv_files['师生行为']
            with self._read_csv_source(behavior_source) as reader:
                for row in reader:
                    if row.get('维度') != '维度':  # 跳过标题行
                        second_level = row.get('二级指标', '').strip() 
                        if second_level:  # 确保二级指标不为空
                            # 读取占比和优质课数据
                            ratio_str = row.get('占比', '').strip()
                            quality_str = row.get('优质课', '').strip()
                            
                            if ratio_str and quality_str:
                                if ratio_str.endswith('%') and quality_str.endswith('%'):  # 是不是都以百分号 % 结尾
                                    try:
                                        ratio_value = float(ratio_str[:-1]) # 截取，去掉%
                                        quality_value = float(quality_str[:-1])
                                        classroom_data['师生行为']['test_vector'].append(ratio_value)
                                        classroom_data['师生行为']['quality_vector'].append(quality_value)
                                    except ValueError as e:
                                        print(f"警告：师生行为数据格式错误 - {ratio_str}, {quality_str}: {e}")
            
            # 2. 处理话语形式数据
            form_source = self.csv_files['话语形式']
            with self._read_csv_source(form_source) as reader:
                for row in reader:
                    if row.get('维度') != '维度':  # 跳过标题行
                        first_level = row.get('一级指标', '').strip()
                        if first_level:  # 确保一级指标不为空
                            # 读取占比和优质课数据
                            ratio_str = row.get('汇总', '').strip()
                            quality_str = row.get('优质课', '').strip()
                            
                            if ratio_str and quality_str:
                                if ratio_str.endswith('%') and quality_str.endswith('%'):
                                    try:
                                        ratio_value = float(ratio_str[:-1])
                                        quality_value = float(quality_str[:-1])
                                        classroom_data['教学话语']['test_vector'].append(ratio_value)
                                        classroom_data['教学话语']['quality_vector'].append(quality_value)
                                    except ValueError as e:
                                        print(f"警告：话语形式数据格式错误 - {ratio_str}, {quality_str}: {e}")
            
            # 3. 处理话语功能数据
            function_source = self.csv_files['话语功能']
            with self._read_csv_source(function_source) as reader:
                for row in reader:
                    if row.get('维度') != '维度':  # 跳过标题行
                        first_level = row.get('一级指标', '').strip()
                        if first_level:  # 确保一级指标不为空
                            # 读取汇总和优质课数据
                            summary_str = row.get('汇总', '').strip()
                            quality_str = row.get('优质课', '').strip()
                            
                            if summary_str and quality_str:
                                if summary_str.endswith('%') and quality_str.endswith('%'):
                                    try:
                                        summary_value = float(summary_str[:-1])
                                        quality_value = float(quality_str[:-1])
                                        classroom_data['教学话语']['test_vector'].append(summary_value)
                                        classroom_data['教学话语']['quality_vector'].append(quality_value)
                                    except ValueError as e:
                                        print(f"警告：话语功能数据格式错误 - {summary_str}, {quality_str}: {e}")
            
            # 4. 处理师生情绪数据
            emotion_source = self.csv_files['师生情绪']
            with self._read_csv_source(emotion_source) as reader:
                for row in reader:
                    if row.get('维度') != '维度':  # 跳过标题行
                        second_level = row.get('二级指标', '').strip()
                        if second_level:  # 确保二级指标不为空
                            # 读取占比和优质课数据
                            ratio_str = row.get('占比', '').strip()
                            quality_str = row.get('优质课', '').strip()
                            
                            if ratio_str and quality_str:
                                if ratio_str.endswith('%') and quality_str.endswith('%'):
                                    try:
                                        ratio_value = float(ratio_str[:-1])
                                        quality_value = float(quality_str[:-1])
                                        classroom_data['师生情绪']['test_vector'].append(ratio_value)
                                        classroom_data['师生情绪']['quality_vector'].append(quality_value)
                                    except ValueError as e:
                                        print(f"警告：师生情绪数据格式错误 - {ratio_str}, {quality_str}: {e}")
            
            # 缓存数据
            self._classroom_data = classroom_data
            return classroom_data
            
        except Exception as e:
            raise RuntimeError(f"读取课堂数据时发生错误: {e}")
        
        # 验证数据完整性
        # self._validate_classroom_data(classroom_data)
        
        # 缓存数据
        self._classroom_data = classroom_data
        return classroom_data
    
    def _validate_classroom_data(self, classroom_data):
        """验证课堂数据的完整性"""
        for section_name, section_data in classroom_data.items():
            test_len = len(section_data['test_vector'])
            quality_len = len(section_data['quality_vector'])
            
            if test_len == 0 or quality_len == 0:
                raise ValueError(f"{section_name}部分没有有效数据")
            
            if test_len != quality_len:
                raise ValueError(f"{section_name}部分的待测课向量和优质课向量长度不匹配: {test_len} vs {quality_len}")
    
    def _load_weight_data(self):
        """读取局部权重数据"""
        if self._local_weights is not None:
            return self._local_weights
        
        # 如果已经有权重数据字典，直接使用
        if self.weight_data is not None:
            local_weights = self.weight_data
            
            # 如果传入的是完整的权重数据结构，提取local_weight_vectors
            if 'local_weight_vectors' in local_weights:
                local_weights = local_weights['local_weight_vectors']
            
            # 验证权重数据完整性
            required_sections = ['师生行为', '教学话语', '师生情绪']
            missing_sections = [section for section in required_sections if section not in local_weights]
            
            if missing_sections:
                raise KeyError(f"权重数据中缺少以下部分的权重数据: {missing_sections}")
            
            self._local_weights = local_weights
            return local_weights
        
        # 否则从文件读取
        if not os.path.exists(self.weight_file_path):
            raise FileNotFoundError(f"未找到权重文件: {self.weight_file_path}，请确保权重文件存在")
            
        try:
            with open(self.weight_file_path, 'r', encoding='utf-8') as file:
                data = json.load(file)
            
            # 提取局部权重向量
            if 'local_weight_vectors' not in data:
                raise KeyError("权重文件中缺少 'local_weight_vectors' 字段")
            
            local_weights = data['local_weight_vectors']
            
            # 验证权重数据完整性
            required_sections = ['师生行为', '教学话语', '师生情绪']
            missing_sections = [section for section in required_sections if section not in local_weights]
            
            if missing_sections:
                raise KeyError(f"权重文件中缺少以下部分的权重数据: {missing_sections}")
            
            self._local_weights = local_weights
            return local_weights
            
        except json.JSONDecodeError as e:
            raise ValueError(f"权重文件格式错误: {e}")
        except Exception as e:
            raise RuntimeError(f"读取权重文件时发生错误: {e}")
    
    def _calculate_weighted_difference(self, test_vector, quality_vector, weights):
        """计算加权差异度：利用加权欧氏距离
        Dw = √Σi Wi⋅(ei-ti)²
        其中：ei为第i项指标的参考向量值（优质课），ti为待评估课堂的对应指标值，Wi为预设的指标权重系数
        """
        if len(test_vector) != len(quality_vector) or len(test_vector) != len(weights):
            raise ValueError(f"向量长度不匹配: test={len(test_vector)}, quality={len(quality_vector)}, weights={len(weights)}")
        
        weighted_sum = 0
        for i in range(len(test_vector)):
            # 计算差值的平方
            diff_squared = (quality_vector[i] - test_vector[i]) ** 2
            # 加权
            weighted_sum += weights[i] * diff_squared
        
        # 计算加权欧氏距离
        weighted_distance = math.sqrt(weighted_sum)
        
        return weighted_distance
    
    def _calculate_final_score(self, dw, dmax):
        """计算最终得分：S = 100 * (1 - Dw/dmax)
        其中：Dw为加权差异度，dmax为理论最大差异度
        """
        if dmax == 0:
            return 100  # 如果dmax为0，说明没有差异，得分为满分
        
        score = 100 * (1 - dw / dmax)
        # 确保得分在0-100范围内
        score = max(0, min(100, score))
        
        return score
    
    def _calculate_theoretical_max_difference(self, quality_vector, weights):
        """计算理论最大差异度dmax
        假设待测课堂向量全为0（最差情况），优质课向量为实际值
        """
        weighted_sum = 0
        for i in range(len(quality_vector)):
            # 最大差异为优质课向量值本身（假设待测课为0）
            diff_squared = quality_vector[i] ** 2
            weighted_sum += weights[i] * diff_squared
        
        dmax = math.sqrt(weighted_sum)
        return dmax
    
    def get_scores(self):
        """计算并返回课堂评估分数列表 [师生行为, 教学话语, 师生情绪]"""
        # 读取课堂数据（字典格式）
        classroom_data = self._load_classroom_data()
        
        # 读取局部权重数据
        local_weights = self._load_weight_data()
        
        scores = []
        
        # 计算师生行为部分得分
        test_behavior = classroom_data['师生行为']['test_vector']
        quality_behavior = classroom_data['师生行为']['quality_vector']
        weights_behavior = local_weights['师生行为']
        
        dw_behavior = self._calculate_weighted_difference(test_behavior, quality_behavior, weights_behavior)
        dmax_behavior = self._calculate_theoretical_max_difference(quality_behavior, weights_behavior)
        score_behavior = self._calculate_final_score(dw_behavior, dmax_behavior)
        scores.append(score_behavior)
        
        # 计算教学话语部分得分
        test_discourse = classroom_data['教学话语']['test_vector']
        quality_discourse = classroom_data['教学话语']['quality_vector']
        weights_discourse = local_weights['教学话语']
        
        dw_discourse = self._calculate_weighted_difference(test_discourse, quality_discourse, weights_discourse)
        dmax_discourse = self._calculate_theoretical_max_difference(quality_discourse, weights_discourse)
        score_discourse = self._calculate_final_score(dw_discourse, dmax_discourse)
        scores.append(score_discourse)
        
        # 计算师生情绪部分得分
        test_emotion = classroom_data['师生情绪']['test_vector']
        quality_emotion = classroom_data['师生情绪']['quality_vector']
        weights_emotion = local_weights['师生情绪']
        
        dw_emotion = self._calculate_weighted_difference(test_emotion, quality_emotion, weights_emotion)
        dmax_emotion = self._calculate_theoretical_max_difference(quality_emotion, weights_emotion)
        score_emotion = self._calculate_final_score(dw_emotion, dmax_emotion)
        scores.append(score_emotion)
        
        return scores
    
    def get_detailed_report(self):
        """获取详细的评分报告"""
        # 读取课堂数据（修正：使用字典格式）
        classroom_data = self._load_classroom_data()
        local_weights = self._load_weight_data()
        
        # 获取分数
        scores = self.get_scores()
        score_behavior, score_discourse, score_emotion = scores
        
        # 构建向后兼容的向量格式
        all_test_vector = (classroom_data['师生行为']['test_vector'] + 
                          classroom_data['教学话语']['test_vector'] + 
                          classroom_data['师生情绪']['test_vector'])
        all_quality_vector = (classroom_data['师生行为']['quality_vector'] + 
                             classroom_data['教学话语']['quality_vector'] + 
                             classroom_data['师生情绪']['quality_vector'])
        
        report = {
            '总体信息': {
                '待测课向量': all_test_vector,
                '优质课向量': all_quality_vector
            },
            '师生行为': {
                '待测课向量': classroom_data['师生行为']['test_vector'],
                '优质课向量': classroom_data['师生行为']['quality_vector'],
                '权重向量': local_weights['师生行为'],
                '得分': round(score_behavior, 2)
            },
            '教学话语': {
                '待测课向量': classroom_data['教学话语']['test_vector'],
                '优质课向量': classroom_data['教学话语']['quality_vector'],
                '权重向量': local_weights['教学话语'],
                '得分': round(score_discourse, 2)
            },
            '师生情绪': {
                '待测课向量': classroom_data['师生情绪']['test_vector'],
                '优质课向量': classroom_data['师生情绪']['quality_vector'],
                '权重向量': local_weights['师生情绪'],
                '得分': round(score_emotion, 2)
            },
            '总结': {
                '师生行为得分': round(score_behavior, 2),
                '教学话语得分': round(score_discourse, 2),
                '师生情绪得分': round(score_emotion, 2)
            }
        }
        
        return report
    
    def print_detailed_report(self):
        """打印详细的评分报告"""
        try:
            report = self.get_detailed_report()
            
            print(f"待测课向量 T = {report['总体信息']['待测课向量']}")
            print(f"优质课向量 Q = {report['总体信息']['优质课向量']}")
            
            print("\n=== 师生行为部分 ===")
            print(f"待测课向量: {report['师生行为']['待测课向量']}")
            print(f"优质课向量: {report['师生行为']['优质课向量']}")
            print(f"权重向量: {report['师生行为']['权重向量']}")
            print(f"师生行为得分: {report['师生行为']['得分']}")
            
            print("\n=== 教学话语部分 ===")
            print(f"待测课向量: {report['教学话语']['待测课向量']}")
            print(f"优质课向量: {report['教学话语']['优质课向量']}")
            print(f"权重向量: {report['教学话语']['权重向量']}")
            print(f"教学话语得分: {report['教学话语']['得分']}")
            
            print("\n=== 师生情绪部分 ===")
            print(f"待测课向量: {report['师生情绪']['待测课向量']}")
            print(f"优质课向量: {report['师生情绪']['优质课向量']}")
            print(f"权重向量: {report['师生情绪']['权重向量']}")
            print(f"师生情绪得分: {report['师生情绪']['得分']}")
            
            print("\n=== 总结 ===")
            # print(f"师生行为得分: {report['总结']['师生行为得分']}")
            # print(f"教学话语得分: {report['总结']['教学话语得分']}")
            # print(f"师生情绪得分: {report['总结']['师生情绪得分']}")
            a = {
                '师生行为得分': report['总结']['师生行为得分'],
                '教学话语得分': report['总结']['教学话语得分'],
                '师生情绪得分': report['总结']['师生情绪得分'],
                '综合课堂得分': (report['总结']['师生行为得分']+report['总结']['教学话语得分']+report['总结']['师生情绪得分'])/3
            }
            return a
            
        except Exception as e:
            print(f"生成报告时发生错误: {e}")





if __name__ == "__main__":

    csv_files = {
        '师生行为': r".\doc\师生行为.csv",
        '话语形式': r".\doc\话语形式.csv",
        '话语功能': r".\doc\话语功能.csv",
        '师生情绪': r".\doc\师生情绪.csv"
    }
    # csv_files = {
    #     '师生行为': r"D:\soft\pycharm\code\pythonProject\1_10bsd项目\1_整体接口\师生情绪与行为\20250824233636_师生行为.csv",
    #     '话语形式': r"D:\soft\pycharm\code\pythonProject\1_10bsd项目\1_整体接口\话语形式分析结果\话语形式_总结_20250825_210001.csv",
    #     '话语功能': r"D:\soft\pycharm\code\pythonProject\1_10bsd项目\1_整体接口\话语功能分析结果\话语功能_分类结果汇总_20250825_210049.csv",
    #     '师生情绪': r"D:\soft\pycharm\code\pythonProject\1_10bsd项目\1_整体接口\师生情绪与行为\20250824233636_师生情绪.csv"
    # }
    weight_data = r"D:\software\Pycharm\语音转文字\话语分析\local_weight_vectors_20250801_115218.json"
    try:
        # 创建评分器实例
        scorer = ClassroomScorer(csv_files=csv_files,weight_data=weight_data)

        # 打印详细报告
        a = scorer.print_detailed_report()
        print(a)

    except Exception as e:
        print(f"程序执行错误: {e}")

AI 课堂诊断代码

python 复制代码

# AI课堂问题分析与改进建议生成
class ProblemImprovement:
    def __init__(self, api_key=None, csv_files=None):
        # 初始化大模型、指标体系
        pass
    
    # 读取各类CSV数据
    def extract_teacher_student_data(self):
        pass
    
    # 构建待测课/优质课数据字典
    def get_dict(self):
        pass
    
    # AI生成问题分析
    def get_problem_analysis(self):
        pass
    
    # AI生成改进建议
    def get_improvement_suggestions(self):
        pass

总代码

python 复制代码

# -*- coding: utf-8 -*-
import openai
import json
import csv
import os
from typing import Dict, List, Tuple
import re


class ProblemImprovement:
    def __init__(self, api_key=None, csv_files=None, weight_file_path=None):
        self.api = api_key
        self.client = openai.OpenAI(
            api_key=self.api,
            base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")
        self.indicators = {
            '师生行为': ['全班讲解', '个别指导', '教师巡视', '学生听讲', '学生回答', '学生做题', '小组活动',
                         '汇报展示'],
            '关键能力': ['知识理解', '表达交流', '实践应用', '创造迁移'],
            '关键行为': ['陈述', '提问', '回答', '反馈', '管理'],
            '情感体验': ['积极', '中性', '消极', '积极', '中性', '消极', '积极']
        }
        self.weight_file_path = weight_file_path
        self.csv_files = csv_files

    def read_csv_data(self, file_path: str, column_name: str) -> List[str]:
        """读取CSV文件并返回指定列的数据列表"""
        data = []
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                # 跳过可能的BOM字符
                content = file.read()
                content = content.lstrip('\ufeff')

                # 重新解析CSV
                reader = csv.reader(content.splitlines())
                headers = next(reader)  # 读取表头

                # 查找目标列的索引
                try:
                    column_index = headers.index(column_name)
                except ValueError:
                    # 如果找不到指定列名，尝试查找相似的列
                    for i, header in enumerate(headers):
                        if column_name in header:
                            column_index = i
                            break
                    else:
                        raise ValueError(f"在文件 {file_path} 中找不到列: {column_name}")

                # 读取数据
                for row in reader:
                    if len(row) > column_index and row[column_index].strip():
                        # 处理百分比格式
                        value = row[column_index].strip().rstrip('%')
                        try:
                            # 尝试转换为浮点数
                            float_value = float(value)
                            data.append(float_value)
                        except ValueError:
                            # 如果无法转换为数字，保留原始值
                            data.append(value)

        except Exception as e:
            raise Exception(f"读取CSV文件 {file_path} 时出错: {str(e)}")
        return data

    def extract_teacher_student_data(self, file_path: str) -> Dict[str, List[float]]:
        """提取师生行为数据"""
        result = {'test': [], 'quality': []}
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
                content = content.lstrip('\ufeff')  # 去除BOM头
                reader = csv.reader(content.splitlines())
                headers = next(reader)
                # 查找列索引
                ratio_index = -1
                quality_index = -1
                for i, header in enumerate(headers):
                    if '占比' in header:
                        ratio_index = i
                    elif '优质课' in header:
                        quality_index = i
                if ratio_index == -1 or quality_index == -1:
                    raise ValueError("找不到占比或优质课列")
                # 读取所有有效数据行（排除标题行）
                for row in reader:
                    # 跳过空行
                    if not row:
                        continue
                    try:
                        # 提取百分比数值
                        test_value = float(row[ratio_index].strip().rstrip('%'))
                        quality_value = float(row[quality_index].strip().rstrip('%'))

                        # 转为小数并存入结果
                        result['test'].append(test_value / 100)
                        result['quality'].append(quality_value / 100)
                    except (ValueError, IndexError):
                        continue

        except Exception as e:
            raise Exception(f"提取师生行为数据时出错: {str(e)}")

        return result

    def extract_discourse_data(self, file_path: str, is_form: bool = False) -> Dict[str, List[float]]:
        """提取话语形式或话语功能数据"""
        result = {'test': [], 'quality': []}
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
                content = content.lstrip('\ufeff')  # 去除BOM头
                reader = csv.reader(content.splitlines())
                headers = next(reader)

                # 查找列索引：汇总列 = 测试占比，优质课列不变
                ratio_index = -1
                quality_index = -1

                for i, header in enumerate(headers):
                    if '汇总' in header:  # 汇总列才是你要的一级指标占比
                        ratio_index = i
                    elif '优质课' in header:
                        quality_index = i

                if ratio_index == -1 or quality_index == -1:
                    raise ValueError("找不到汇总占比或优质课列")

                # 读取每一行，提取【汇总列有值】的行
                for row in reader:
                    if not row:
                        continue

                    try:
                        # 只提取 汇总列 有数字的行（就是一级指标汇总行）
                        ratio_val = row[ratio_index].strip()
                        quality_val = row[quality_index].strip()

                        if ratio_val and quality_val:
                            test_value = float(ratio_val.rstrip('%'))
                            quality_value = float(quality_val.rstrip('%'))

                            result['test'].append(test_value / 100)
                            result['quality'].append(quality_value / 100)
                    except (ValueError, IndexError):
                        continue

        except Exception as e:
            raise Exception(f"提取话语数据时出错: {str(e)}")
        return result


    def extract_emotion_data(self, file_path: str) -> Dict[str, List[float]]:
        """提取情感体验数据"""
        result = {'test': [], 'quality': []}
        try:
            with open(file_path, 'r', encoding='utf-8') as file:
                content = file.read()
                content = content.lstrip('\ufeff')
                reader = csv.reader(content.splitlines())
                headers = next(reader)

                # 查找占比和优质课列的索引
                ratio_index = -1
                quality_index = -1
                for i, header in enumerate(headers):
                    if '占比' in header:
                        ratio_index = i
                    elif '优质课' in header:
                        quality_index = i

                if ratio_index == -1 or quality_index == -1:
                    raise ValueError("找不到占比或优质课列")

                # 读取教师情绪和学生情绪数据
                teacher_emotions = []
                student_emotions = []
                student_attention = []

                for row in reader:
                    if len(row) > max(ratio_index, quality_index):
                        if row[1] == '教师情绪' and row[2] in ['积极', '中性', '消极']:
                            try:
                                test_value = float(row[ratio_index].strip().rstrip('%'))
                                quality_value = float(row[quality_index].strip().rstrip('%'))
                                teacher_emotions.append((test_value / 100, quality_value / 100))
                            except (ValueError, IndexError):
                                continue
                        elif row[1] == '学生情绪' and row[2] in ['积极', '中性', '消极']:
                            try:
                                test_value = float(row[ratio_index].strip().rstrip('%'))
                                quality_value = float(row[quality_index].strip().rstrip('%'))
                                student_emotions.append((test_value / 100, quality_value / 100))
                            except (ValueError, IndexError):
                                continue
                        elif row[1] == '学生注意力' and row[2] == '积极':
                            try:
                                test_value = float(row[ratio_index].strip().rstrip('%'))
                                quality_value = float(row[quality_index].strip().rstrip('%'))
                                student_attention.append((test_value / 100, quality_value / 100))
                            except (ValueError, IndexError):
                                continue

                # 按照情感体验的顺序组合数据
                for test, quality in teacher_emotions:
                    result['test'].append(test)
                    result['quality'].append(quality)

                for test, quality in student_emotions:
                    result['test'].append(test)
                    result['quality'].append(quality)

                for test, quality in student_attention:
                    result['test'].append(test)
                    result['quality'].append(quality)

        except Exception as e:
            raise Exception(f"提取情感体验数据时出错: {str(e)}")
        return result

    def get_dict(self) -> tuple:
        """从四个CSV文件读取课堂评估数据，返回待测课和优质课的指标占比字典"""
        try:
            # 初始化向量字典
            test_vectors = {}
            quality_vectors = {}

            # 处理师生行为数据
            if '师生行为' in self.csv_files and os.path.exists(self.csv_files['师生行为']):
                behavior_data = self.extract_teacher_student_data(self.csv_files['师生行为'])
                test_vectors['师生行为'] = behavior_data['test']
                quality_vectors['师生行为'] = behavior_data['quality']

            # 处理话语功能数据（对应关键能力）
            if '话语功能' in self.csv_files and os.path.exists(self.csv_files['话语功能']):
                discourse_function_data = self.extract_discourse_data(self.csv_files['话语功能'])
                test_vectors['关键能力'] = discourse_function_data['test']
                quality_vectors['关键能力'] = discourse_function_data['quality']

            # 处理话语形式数据（对应关键行为）
            if '话语形式' in self.csv_files and os.path.exists(self.csv_files['话语形式']):
                discourse_form_data = self.extract_discourse_data(self.csv_files['话语形式'], is_form=True)
                test_vectors['关键行为'] = discourse_form_data['test']
                quality_vectors['关键行为'] = discourse_form_data['quality']

            # 处理师生情绪数据（对应情感体验）
            if '师生情绪' in self.csv_files and os.path.exists(self.csv_files['师生情绪']):
                emotion_data = self.extract_emotion_data(self.csv_files['师生情绪'])
                test_vectors['情感体验'] = emotion_data['test']
                quality_vectors['情感体验'] = emotion_data['quality']

            # 构建待测课字典和优质课字典（指标：占比）
            test_dict = {}
            quality_dict = {}

            # 按照indicators的顺序处理数据
            for dimension, indicators_list in self.indicators.items():
                if dimension in test_vectors:
                    test_values = test_vectors[dimension]
                    # 确保指标数量与向量长度匹配
                    for i, indicator in enumerate(indicators_list):
                        if i < len(test_values):
                            test_dict[indicator] = test_values[i]

                if dimension in quality_vectors:
                    quality_values = quality_vectors[dimension]
                    # 确保指标数量与向量长度匹配
                    for i, indicator in enumerate(indicators_list):
                        if i < len(quality_values):
                            quality_dict[indicator] = quality_values[i]

            # 返回元组：(待测课字典, 优质课字典)
            return test_dict, quality_dict

        except FileNotFoundError as e:
            raise FileNotFoundError(f"找不到必要的数据文件: {str(e)}")
        except Exception as e:
            raise Exception(f"读取数据时出错: {str(e)}")

    def get_problem_analysis(self) -> str:
        """获取问题分析部分"""
        test_dict, quality_dict = self.get_dict()

        prompt = f"""
# 任务
你的核心任务是通过对比待测课和优质课的指标占比，对课堂教学中存在的主要问题和不足进行分析。

# 输出要求
通过对比待测课和优质课的指标占比，分析课堂教学中存在的主要问题和不足。具体为："数据表明，本节课各项指标表现...，但在...仍有一定的改进空间。例如，...环节虽...，但...，该环节...占比低于优质课，...。在...层面，虽然...，但...，...，特别是...有待加强。...环节主要是...，...类话语较少，表现为...，缺乏...。此外，..."

# 关键要求
- 背景要求: 必须先对比待测课和优质课的指标占比，了解该课堂中老师采用的教学方式，以及学生的学习情况。知晓待测课在不同方面与优质课的差距。
- 结构要求: 必须以制表符(\t)或4个空格开头实现首行缩进。内容的格式遵从每个要素中"具体为："后面的内容。
- 内容要求: 必须生成一个内容完整连贯的段落，不准分段。
- 格式要求: 严禁使用任何形式的列表（如 1, 2, 3）、分点符号（•）或JSON格式。 输出内容除了这段文字外，不应包含任何标题、解释、前言或任何其他额外标记。
- 段落要求： 最多只允许输出内容只有一个换行符，不允许有两个及以上。

#输出示例：
\t数据表明,本节课各项指标表现已贴近优质课的水平,但在关键教学环节仍有一定的改进空间。例如,课堂导入环节教师虽能有效创设情境,但学生思维启动过程稍显局促,该环节个人表达类的话语占比低于优质课,更多依赖教师单向引导,未能充分展现学生从生活经验到数学概念的自主建构过程。在活动实施层面,虽设计多元探究任务,但部分活动未能有效激发全体学生的认知投入,特别是基础薄弱学生的思维卷入度有待加强。课堂总结环节主要是对知识点的归纳总结,比较归纳、迁移创新、情感态度类话语较少,表现为对知识体系的整合力度不足,缺乏对概率概念与生活决策、统计方法之间的结构化关联阐释。此外,信息技术工具的应用尚停留在基础演示功能,未能充分发挥数字技术对认知难点突破和思维可视化的支持作用。

# 下面是你需要对比的待测课和优质课指标占比：
待测课指标占比: {test_dict}
优质课指标占比: {quality_dict}
"""

        try:
            response = self.client.chat.completions.create(
                model="deepseek-v3",
                messages=[
                    {
                        "role": "system",
                        "content": "你是一个擅长对课堂进行问题分析的分析助手。"
                    },
                    {
                        "role": "user",
                        "content": f"{prompt}"
                    }
                ],
                max_tokens=1500,
                temperature=0.5
            )

            response = response.choices[0].message.content.strip()
            result = {
                "status": "success",
                "problem_analysis": response,
            }

            return result

        except Exception as e:
            # 返回错误信息的JSON格式
            error_result = {
                "status": "error",
                "error_message": str(e),
            }
            return error_result

    def get_improvement_suggestions(self) -> str:
        """获取改进建议部分"""
        test_dict, quality_dict = self.get_dict()
        problem_analysis_result = self.get_problem_analysis()
        problem_analysis = problem_analysis_result.get("problem_analysis", "") if isinstance(problem_analysis_result,
                                                                                             dict) else problem_analysis_result

        prompt = f"""
# 任务
你的核心任务是通过对比待测课和优质课的指标占比，针对课堂教学的问题分析结果提出具体的改进策略和实施方案。

# 输出要求
针对发现的问题提出具体的改进策略和实施方案。

# 关键要求
- 背景要求: 必须先对比待测课和优质课的指标占比，读取问题分析来了解该课堂中存在的问题。
- 内容要求: 都是小标题+具体改进建议的形式。严禁使用除了（一）（二）等中文序号外的其他列表形式、分点符号或JSON格式。
- 输出内容除了改进建议外，不应包含任何额外的标题、解释、前言或其他标记。
- 段落要求： 最多只允许输出内容出现一个换行，不允许出现两个及以上的换行符，一定要注意。

#输出示例：
(一)情境导入环节的思维激活优化：
    教师在足球点球情境的引入上已成功激发学生兴趣,可进一步激活学生的思维水平,触及概率比较的本质。例如构建三级问题链深化思考:第一级抛出对比情境"球员A过去10次进7球,球员B过去5次进3球,优先选派谁?",利用磁贴投票暴露学生"仅比较绝对值"的思维惯性;第二级制造认知冲突"若A参赛100次进70球,B参赛50次进30球,谁更稳定?",引导学生发现比率比较的价值;第三级抽象转化"如何用数学工具量化可能性?",借助动态决策天平教具,左侧放置历史进球数卡片,右侧生成概率数值,实现生活问题向数学概念的思维过渡。
(二)核心概念建构的认知冲突设计：
    公式生成环节教师引导学生发现概率计算方法,建议进一步加强对"等可能性"前提的认知建构。例如增设非常规骰子对比实验:提供3D打印的重心偏移骰子,让学生分组收集50次投掷数据,与标准骰子实验对比。当学生发现偏移骰子出现6点的频率高达28%时(理论值16.7%),引导其自主总结"等可能性"对古典概型的重要性。利用双屏互动装置,左屏呈现理论计算过程,右屏动态绘制实验频率折线图,用红色标注偏差超过10%的数据点,强化视觉认知冲击。
(三)迁移应用的真实情境重构：
    概率计算练习已完成基础目标,建议增加真实决策场景的迁移训练。例如将情境改造为校园歌手大赛晋级方案设计给定甲班28人、乙班24人、丙班20人的报名数据,要求设计"公平且利于本校选手"的抽签规则。通过角色扮演(组委会代表、参赛学生、家长监督员)展开三方辩论,在"概率计算(如甲班晋级概率=28/72)""实际公平性""操作可行性"的多维博弈中,理解概率的工具性与局限性。利用AR沙盘实时生成不同方案的概率分布云图,直观呈现决策影响。
(四)技术工具的认知深化应用：
    现有技术使用多停留在数据展示层面,建议发挥深层认知建构作用。例如开发智能概率模拟器,设置两大核心功能:一是"实验参数调节器",允许学生滑动调整实验次数(10-10000次)、骰子偏心度等变量,同步生成概率分布玫瑰图;二是"认知误区捕捉器",当出现"10次实验得3次6点就断言概率30%"等错误时,自动触发对比模块一一左侧回放该组实验片段,右侧叠加大数据模拟的100 万次实验频率收敛动画,通过视觉反差揭示"小样本误差"本质。

# 下面是你需要对比的待测课和优质课指标占比以及问题分析结果：
待测课指标占比: {test_dict}
优质课指标占比: {quality_dict}
问题分析结果: {problem_analysis}
"""

        try:
            response = self.client.chat.completions.create(
                model="deepseek-v3",
                messages=[
                    {
                        "role": "system",
                        "content": "你是一个擅长提供课堂改进建议的分析助手。"
                    },
                    {
                        "role": "user",
                        "content": f"{prompt}"
                    }
                ],
                max_tokens=1500,
                temperature=0.5
            )

            response = response.choices[0].message.content.strip()
            response = re.sub(r'\n\s*\n', '\n', response)  # 移除多个连续换行
            response = re.sub(r'^[\\t ]+', '', response, flags=re.MULTILINE)

            result = {
                "status": "success",
                "improvement_suggestions": response,
            }

            return result

        except Exception as e:
            # 返回错误信息的JSON格式
            error_result = {
                "status": "error",
                "error_message": str(e),
            }
            return error_result


if __name__ == "__main__":
    # 设置你的OpenAI API密钥
    API_KEY = "sk-358a054218ff4741a254022eeeb56b04"  # 请替换为你的实际API密钥
    csv_files = {
        '师生行为': r".\doc\师生行为.csv",
        '话语形式': r".\doc\话语形式.csv",
        '话语功能': r".\doc\话语功能.csv",
        '师生情绪': r".\doc\师生情绪.csv"
    }

    weight_file_path = r'.\result\weight\local_weight_vectors_20250801_115218.json'

    # 创建总结器实例
    problem_improvement = ProblemImprovement(api_key=API_KEY, csv_files=csv_files, weight_file_path=weight_file_path)

    try:
        # 处理口水稿文件
        problem_analysis = problem_improvement.get_problem_analysis()["problem_analysis"]
        improvement_suggestions = problem_improvement.get_improvement_suggestions()["improvement_suggestions"]
        print("问题分析：", problem_analysis)
        print("改进建议：", improvement_suggestions)
    except Exception as e:
        print(f"处理失败: {str(e)}")

五、运行效果展示

自动评分输出结果

bash 复制代码

C:\Users\Dell\AppData\Local\Programs\Python\Python39\python.exe D:\software\Pycharm\语音转文字\话语分析\calculate_score.py 
待测课向量 T = [53.0, 24.0, 32.0, 34.0, 33.0, 18.0, 29.0, 11.0, 3.3, 19.1, 35.2, 33.8, 8.6, 78.3, 2.2, 19.6, 0.0, 70.0, 20.0, 10.0, 65.0, 25.0, 10.0]
优质课向量 Q = [49.0, 28.0, 30.0, 30.0, 36.0, 15.0, 33.0, 14.0, 12.0, 14.0, 34.0, 33.0, 7.0, 51.0, 20.0, 22.0, 7.0, 85.0, 10.0, 5.0, 85.0, 15.0, 5.0]

=== 师生行为部分 ===
待测课向量: [53.0, 24.0, 32.0, 34.0, 33.0, 18.0, 29.0, 11.0]
优质课向量: [49.0, 28.0, 30.0, 30.0, 36.0, 15.0, 33.0, 14.0]
权重向量: [0.1861550061664629, 0.14593097982317685, 0.16791401401036032, 0.11385959210352276, 0.1167235300069847, 0.0726401383722469, 0.1167235300069847, 0.08005320951026097]
师生行为得分: 89.63

=== 教学话语部分 ===
待测课向量: [3.3, 19.1, 35.2, 33.8, 8.6, 78.3, 2.2, 19.6, 0.0]
优质课向量: [12.0, 14.0, 34.0, 33.0, 7.0, 51.0, 20.0, 22.0, 7.0]
权重向量: [0.09795474399771706, 0.09795474399771704, 0.06499306480911246, 0.09701990597177323, 0.14207754122368021, 0.1288106084655074, 0.15210192308588644, 0.14216286325684574, 0.07692460519176048]
教学话语得分: 51.94

=== 师生情绪部分 ===
待测课向量: [70.0, 20.0, 10.0, 65.0, 25.0, 10.0]
优质课向量: [85.0, 10.0, 5.0, 85.0, 15.0, 5.0]
权重向量: [0.19915215095913286, 0.13123933861512038, 0.16960851042574673, 0.15011599018428204, 0.14633878740167885, 0.2035452224140391]
师生情绪得分: 76.54

=== 总结 ===
{'师生行为得分': 89.63, '教学话语得分': 51.94, '师生情绪得分': 76.54, '综合课堂得分': 72.70333333333333}

2.AI 自动生成课堂问题分析

bash 复制代码

C:\Users\Dell\AppData\Local\Programs\Python\Python39\python.exe D:\software\Pycharm\语音转文字\话语分析\problem_improvement.py 
问题分析： 数据表明，本节课各项指标表现与优质课存在一定差距，但在教学环节设计和学生参与度上仍有一定的改进空间。例如，全班讲解环节虽占比高于优质课，但学生回答和小组活动占比低于优质课，该环节学生互动性不足，更多依赖教师单向输出，未能充分激发学生的主动思考和合作探究。在认知层次层面，虽然知识理解占比显著高于优质课，但表达交流、实践应用和创造迁移占比明显不足，特别是创造迁移类活动完全缺失，高阶思维培养有待加强。师生互动环节主要是以回答和反馈为主，陈述和提问类话语较少，表现为教师引导性提问不足，缺乏对学生深度思考的激发。此外，学生情感状态中积极和中性占比均低于优质课，课堂氛围的活跃度和学生投入度有待提升。
改进建议： (一)优化全班讲解环节的师生互动：将教师单向讲解时间压缩至45%以下，每8分钟插入"思考-配对-分享"环节。例如讲解三角函数定义时，先让学生独立绘制单位圆示意图，再与邻座交换批注，最后随机抽选3组用实物投影仪展示并互评，教师仅针对共性错误进行精讲。
(二)提升小组活动的认知参与度：重组4人异质小组时指定角色分工（记录员、发言员、材料员、计时员），每组配备可视化思维工具包。如概率单元可发放双色计数筹码和统计转盘，要求用筹码构建二项分布模型后，用转盘验证理论概率，最终形成图文报告张贴于教室"数学发现墙"。
(三)强化汇报展示的思维外显：将展示环节占比提升至13%，采用"3×3演讲法"------每组3分钟展示需包含1个核心观点、1个验证过程、1个现实应用。例如统计案例展示时，要求学生用平板电脑实时生成数据散点图，同步解说数据采集方法，并演示如何用该模型预测校园食堂就餐人数。
(四)深化课堂互动的话语质量：设计"问题三阶跳"模板，基础问题（如公式套用）由学生互问互答，教师专注提出能引发认知冲突的对比性问题（如"古典概型与几何概型的适用边界"），并采用"追问链"技术，对关键回答进行3层递进追问，引导学生暴露思维过程。
(五)营造积极课堂氛围的激励机制：引入"数学银行"积分系统，将课堂参与行为货币化。如提出创新解法可获"思维币"，帮助同学解决难题可获"互助券"，积分可用于兑换自主作业减免或实验器材优先使用权，每周公布"财富榜"并举行小型拍卖会。

六、项目亮点

全自动：无需人工干预，一键生成完整评课报告

多维度：覆盖师生行为、教学话语、师生情绪

可解释：基于加权欧氏距离，评分科学可解释

专业化：AI 输出内容符合教研规范、可直接使用

高扩展：支持文件 / 内存流，可接入 Web 系统、接口服务

可落地：改进建议具体、可操作、可直接用于课堂优化

七、适用场景

智慧课堂分析平台

教师课堂自我诊断

学校教研、评课系统

区域教育质量监测

师范生教学技能训练

公开课、优质课自动评估

八、总结

本文实现了一套AI 驱动的全自动课堂教学质量分析系统，通过多维度数据评分 + 大模型智能诊断，让课堂评估从 "人工主观" 走向 "数据客观"。

系统可快速输出课堂得分、问题分析、改进建议，大幅提升教研效率，降低评课成本，是智慧教育领域非常实用的技术工具。