AI自动生成Git提交信息-git AI Commit

在现代软件开发中，编写清晰且一致的Git提交信息对于维护项目历史和促进团队协作至关重要。然而，为每次变更手动撰写描述性提交信息可能耗时，尤其是处理复杂差异或大型项目时。AI Commit 是一个利用AI分析Git差异并生成符合Conventional Commits规范的提交信息的Python脚本。本文将介绍其工作原理、核心功能以及如何优化您的开发流程。

什么是AI Commit？

AI Commit 是一个Python工具，通过分析暂存的Git变更（git diff --cached）并利用大型语言模型（LLM）生成简洁、上下文相关的提交信息。它集成了外部API（如阿里云的Qwen模型），能够识别代码变更类型（功能添加、错误修复、重构等），并生成符合规范的提交信息，确保项目历史清晰易懂。

核心功能

1. 智能差异分析

脚本解析Git差异，提取关键信息：

文件变更：识别新增、修改和删除的文件。
变更类型：检测重构、功能添加、错误修复、文档更新或配置变更。
优先文件分析 ：重点关注重要文件类型（如.py、.js、.java）。
变更范围：判断变更影响单个文件、模块或多个项目部分。

对于大型差异或多文件项目，AI Commit生成简洁摘要，保留关键细节。

2. 上下文感知的提交信息

AI Commit根据变更类型生成符合Conventional Commits规范的提交信息，例如：

文档更新 ：使用docs:前缀（例：docs: 更新README中的安装说明）。
测试变更 ：使用test:前缀（例：test: 添加用户认证单元测试）。
错误修复 ：使用fix:前缀（例：fix: 修复数据解析中的空指针异常）。
新功能 ：使用feat:前缀（例：feat: 实现用户配置文件API端点）。
代码重构 ：使用refactor:前缀（例：refactor: 将工具函数提取到独立模块）。
配置变更 ：使用chore:或config:前缀（例：chore: 更新package.json中的依赖）。

这确保提交信息与工具如semantic-release兼容，便于生成变更日志。

3. 支持新项目初始化

脚本能检测新项目的初始提交（例如包含README.md、package.json或.gitignore等文件），生成类似init: 初始化项目结构的提交信息。

4. 处理大型差异

对于超过8000字符或涉及10个以上文件的差异，AI Commit通过以下方式进行智能处理：

按文件类型分组（例：5个.py文件，2个.js文件）。
提取关键变更，如函数或类定义。
提供分层摘要，包含变更类型、主要语言和范围。

5. 健壮的错误处理

脚本处理多种错误情况，包括：

Git命令缺失或无效。
API请求失败（如超时或HTTP错误）。
环境变量（如QWEN_API）缺失。
无效或空差异。

提供清晰的错误信息，便于开发者排查问题。

工作流程

以下是脚本的工作流程图，使用Mermaid展示：
git diff --cached 否是大型差异小型差异成功失败获取Git暂存差异差异是否有效? 输出无暂存更改提示分析差异提取文件变更和模式生成摘要提取关键变更生成上下文感知提示调用LLM API 提取提交信息输出错误信息执行git commit 提交成功

获取差异 ：运行git diff --cached获取暂存变更。
分析差异：解析文件变更、添加/删除行数及模式。
摘要生成：对于大型差异，生成简洁摘要或提取关键变更。
生成提示：根据变更类型创建上下文感知的提示。
调用API：将差异或摘要发送至LLM API生成提交信息。
执行提交 ：使用git commit -m <message>提交。

示例代码

以下是核心代码片段，展示如何分析差异并生成提交信息：

python 复制代码

def analyze_diff(diff):
    """分析diff内容，返回文件变更摘要"""
    lines = diff.split('\n')
    files_info = {
        'new_files': [],
        'modified_files': [],
        'deleted_files': [],
        'total_additions': 0,
        'total_deletions': 0
    }

    current_file = None
    for line in lines:
        if line.startswith('diff --git'):
            match = re.search(r'a/(.*?) b/(.*?)$', line)
            if match:
                current_file = match.group(2)
        elif line.startswith('new file mode'):
            if current_file:
                files_info['new_files'].append(current_file)
        elif line.startswith('deleted file mode'):
            if current_file:
                files_info['deleted_files'].append(current_file)
        elif line.startswith('index') and current_file:
            if current_file not in files_info['new_files'] and current_file not in files_info['deleted_files']:
                files_info['modified_files'].append(current_file)
        elif line.startswith('+') and not line.startswith('+++'):
            files_info['total_additions'] += 1
        elif line.startswith('-') and not line.startswith('---'):
            files_info['total_deletions'] += 1

    return files_info

def get_commit_message(request_body):
    """调用 API 生成提交信息"""
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(API_URL, headers=headers, json=request_body, timeout=REQUEST_TIMEOUT)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.Timeout:
        print(f"[ERROR] API 请求超时 ({REQUEST_TIMEOUT} 秒)。")
        return None

示例使用

假设您在一个Python项目中暂存了变更并运行脚本，输出可能如下：

bash 复制代码

[INFO] 请求体内容：
{
  "model": "qwen-plus-latest",
  "temperature": 0.3,
  "messages": [
    {
      "role": "system",
      "content": "你是一个专业程序员。这是新功能相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'feat:' 前缀。仅返回一行，不要解释。"
    },
    {
      "role": "user",
      "content": "新增文件: src/api.py, tests/test_api.py; +50 -0 行"
    }
  ]
}

[INFO] 生成的提交信息：
  feat: 添加用户认证API端点
[OK] 已提交。

生成的提交信息清晰、规范，准确反映变更内容。

开始使用

依赖

Python 3.6+。
Git安装并在PATH中可用。
兼容的LLM服务API密钥（如阿里云Qwen）。
设置环境变量QWEN_API。

安装

保存脚本为ai_commit.py。
设置API密钥：export QWEN_API=your_api_key。
暂存变更：git add .。
运行脚本：python ai_commit.py。

配置

API_URL ：默认https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions。
MODEL_NAME ：默认qwen-plus-latest。
MAX_DIFF_SIZE：限制差异大小为8000字符。
PRIORITY_FILE_EXTENSIONS ：优先处理的文件类型（如.py、.js）。

实用小工具

App Store 截图生成器、应用图标生成器、在线图片压缩和 Chrome插件-强制开启复制-护眼模式-网页乱码设置编码
 乖猫记账，AI智能分类的聊天记账。

完整代码

python 复制代码

# coding: utf-8
import os
import subprocess
import requests
import json
import re
from collections import defaultdict

# ==================== 配置常量 ====================
API_KEY = os.environ.get("QWEN_API")
if not API_KEY:
    print("[ERROR] QWEN_API environment variable not set. Please set it before running the script.")
    exit(1)

API_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions"
MODEL_NAME = "qwen-plus-latest"
REQUEST_TIMEOUT = 15
MAX_DIFF_SIZE = 8000
MAX_FILES_FOR_DETAIL = 10

# 优先分析的文件类型
PRIORITY_FILE_EXTENSIONS = [
    '.py', '.js', '.ts', '.jsx', '.tsx', '.vue', '.dart',
    '.java', '.cpp', '.c', '.go', '.rs', '.php', '.rb'
]

# 语言映射
LANGUAGE_MAP = {
    '.py': 'Python', '.js': 'JavaScript', '.ts': 'TypeScript',
    '.jsx': 'React', '.tsx': 'TypeScript React', '.vue': 'Vue',
    '.java': 'Java', '.cpp': 'C++', '.c': 'C', '.go': 'Go',
    '.rs': 'Rust', '.php': 'PHP', '.rb': 'Ruby', '.dart': 'Dart'
}

# ==================== Git 操作函数 ====================
def get_git_diff():
    """获取 Git staged 变更"""
    try:
        result = subprocess.run(
            ['git', 'diff', '--cached'],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            encoding='utf-8',
            check=True
        )
        return result.stdout
    except subprocess.CalledProcessError as e:
        print(f"[ERROR] 获取 Git diff 失败: {e}")
        if e.stderr:
            print(f"[ERROR] Stderr: {e.stderr.strip()}")
        return None
    except FileNotFoundError:
        print("[ERROR] Git 命令未找到。请确保 Git 已安装并在您的 PATH 中。")
        return None


def git_commit(message):
    """执行 Git 提交"""
    try:
        subprocess.run(['git', 'commit', '-m', message], encoding='utf-8', check=True)
        print("[OK] 已提交。")
    except subprocess.CalledProcessError as e:
        print(f"[ERROR] Git commit 失败: {e}")
        if e.stderr:
            print(f"[ERROR] Stderr: {e.stderr.strip()}")
        if e.stdout:
            print(f"[ERROR] Stdout: {e.stdout.strip()}")
    except FileNotFoundError:
        print("[ERROR] Git 命令未找到。请确保 Git 已安装并在您的 PATH 中。")


# ==================== Diff 分析函数 ====================
def analyze_diff(diff):
    """分析diff内容，返回文件变更摘要"""
    lines = diff.split('\n')
    files_info = {
        'new_files': [],
        'modified_files': [],
        'deleted_files': [],
        'total_additions': 0,
        'total_deletions': 0
    }

    current_file = None
    for line in lines:
        if line.startswith('diff --git'):
            match = re.search(r'a/(.*?) b/(.*?)$', line)
            if match:
                current_file = match.group(2)
        elif line.startswith('new file mode'):
            if current_file:
                files_info['new_files'].append(current_file)
        elif line.startswith('deleted file mode'):
            if current_file:
                files_info['deleted_files'].append(current_file)
        elif line.startswith('index') and current_file:
            if current_file not in files_info['new_files'] and current_file not in files_info['deleted_files']:
                files_info['modified_files'].append(current_file)
        elif line.startswith('+') and not line.startswith('+++'):
            files_info['total_additions'] += 1
        elif line.startswith('-') and not line.startswith('---'):
            files_info['total_deletions'] += 1

    return files_info


def extract_key_changes(diff):
    """提取diff中的关键变更，优先保留重要文件和函数签名"""
    lines = diff.split('\n')
    key_sections = []
    current_section = []
    current_file = None
    is_priority_file = False
    function_changes = []

    for line in lines:
        if line.startswith('diff --git'):
            if current_section and is_priority_file:
                key_sections.extend(current_section[:50])
            
            current_section = [line]
            match = re.search(r'b/(.*?)$', line)
            if match:
                current_file = match.group(1)
                file_ext = os.path.splitext(current_file)[1]
                is_priority_file = file_ext in PRIORITY_FILE_EXTENSIONS
        
        elif line.startswith('@@'):
            current_section.append(line)
        
        elif is_priority_file:
            current_section.append(line)
            
            if line.startswith(('+', '-')):
                if re.search(r'(def |function |class |interface |struct |enum )', line):
                    function_changes.append(f"{current_file}: {line.strip()}")

    if current_section and is_priority_file:
        key_sections.extend(current_section[:50])

    key_diff = '\n'.join(key_sections)
    if len(key_diff) > MAX_DIFF_SIZE:
        return '\n'.join(key_sections[:100] + function_changes)
    
    return key_diff


def analyze_change_patterns(files_info, diff):
    """分析变更模式，识别重构、功能添加、bug修复等"""
    patterns = {
        'is_refactoring': False,
        'is_feature': False,
        'is_bugfix': False,
        'is_docs': False,
        'is_config': False,
        'is_test': False,
        'main_language': None,
        'change_scope': 'multiple'
    }

    all_files = files_info['new_files'] + files_info['modified_files'] + files_info['deleted_files']
    
    # 分析文件类型分布
    file_types = defaultdict(int)
    for file in all_files:
        ext = os.path.splitext(file)[1].lower()
        file_types[ext] += 1

    # 确定主要编程语言
    if file_types:
        main_ext = max(file_types.items(), key=lambda x: x[1])[0]
        patterns['main_language'] = LANGUAGE_MAP.get(main_ext, main_ext)

    # 分析变更类型
    doc_extensions = ['.md', '.txt', '.rst']
    config_extensions = ['.json', '.yaml', '.yml', '.toml', '.ini', '.conf']
    
    doc_files = [f for f in all_files if any(f.lower().endswith(ext) for ext in doc_extensions) or 'readme' in f.lower()]
    test_files = [f for f in all_files if 'test' in f.lower() or f.endswith(('_test.py', '.test.js', '.spec.js'))]
    config_files = [f for f in all_files if any(f.endswith(ext) for ext in config_extensions)]

    total_files = len(all_files)
    if total_files > 0:
        if len(doc_files) / total_files > 0.5:
            patterns['is_docs'] = True
        if len(test_files) > 0:
            patterns['is_test'] = True
        if len(config_files) / total_files > 0.3:
            patterns['is_config'] = True

    # 通过diff内容分析变更类型
    if diff:
        diff_lower = diff.lower()
        
        refactor_keywords = ['rename', 'move', 'extract', 'refactor', 'reorganize']
        feature_keywords = ['add', 'new', 'implement', 'feature', 'support']
        bugfix_keywords = ['fix', 'bug', 'error', 'issue', 'problem', 'correct']
        
        if any(keyword in diff_lower for keyword in refactor_keywords):
            patterns['is_refactoring'] = True
        
        if any(keyword in diff_lower for keyword in feature_keywords) and files_info['new_files']:
            patterns['is_feature'] = True
        
        if any(keyword in diff_lower for keyword in bugfix_keywords):
            patterns['is_bugfix'] = True

    # 确定变更范围
    if len(all_files) == 1:
        patterns['change_scope'] = 'single'
    elif len(set(os.path.dirname(f) for f in all_files)) == 1:
        patterns['change_scope'] = 'module'
    else:
        patterns['change_scope'] = 'multiple'

    return patterns


# ==================== 项目检测和摘要生成 ====================
def is_new_project_init(files_info):
    """检测是否为新项目的初始提交"""
    total_files = len(files_info['new_files'])
    
    if (total_files >= 5 and 
        len(files_info['modified_files']) == 0 and 
        len(files_info['deleted_files']) == 0):
        
        new_files_str = ' '.join(files_info['new_files']).lower()
        project_indicators = [
            'readme', 'package.json', 'requirements.txt', 'cargo.toml',
            'pom.xml', 'build.gradle', '.gitignore', 'main.', 'index.'
        ]
        
        return any(indicator in new_files_str for indicator in project_indicators)
    
    return False


def create_diff_summary(files_info):
    """为大型diff创建摘要"""
    summary_parts = []

    if files_info['new_files']:
        if len(files_info['new_files']) > 5:
            file_types = {}
            for file in files_info['new_files']:
                ext = os.path.splitext(file)[1] or 'no_ext'
                file_types[ext] = file_types.get(ext, 0) + 1
            
            type_summary = ', '.join([f"{count} {ext} files" for ext, count in file_types.items()])
            summary_parts.append(f"新增文件: {type_summary} (共{len(files_info['new_files'])}个文件)")
        else:
            summary_parts.append(f"新增文件: {', '.join(files_info['new_files'])}")

    if files_info['modified_files']:
        if len(files_info['modified_files']) > 5:
            summary_parts.append(f"修改文件: {len(files_info['modified_files'])}个文件")
        else:
            summary_parts.append(f"修改文件: {', '.join(files_info['modified_files'])}")

    if files_info['deleted_files']:
        summary_parts.append(f"删除文件: {', '.join(files_info['deleted_files'])}")

    summary_parts.append(f"+{files_info['total_additions']} -{files_info['total_deletions']} 行")
    
    return '; '.join(summary_parts)


def create_layered_summary(files_info, patterns):
    """创建分层的变更摘要"""
    summary_parts = []

    # 第一层：变更类型
    change_types = []
    type_mapping = {
        'is_feature': "新功能",
        'is_bugfix': "错误修复",
        'is_refactoring': "代码重构",
        'is_docs': "文档更新",
        'is_config': "配置变更",
        'is_test': "测试相关"
    }
    
    for key, label in type_mapping.items():
        if patterns[key]:
            change_types.append(label)

    if change_types:
        summary_parts.append(f"变更类型: {', '.join(change_types)}")

    # 第二层：主要语言和范围
    if patterns['main_language']:
        summary_parts.append(f"主要语言: {patterns['main_language']}")
    
    summary_parts.append(f"影响范围: {patterns['change_scope']}")

    # 第三层：具体文件变更
    file_summary = create_diff_summary(files_info)
    summary_parts.append(file_summary)

    return '\n'.join(summary_parts)


# ==================== 提示词生成 ====================
def get_context_aware_prompt(patterns, files_info):
    """根据变更模式生成上下文感知的提示词"""
    base_prompt = "你是一个专业程序员。"
    
    prompt_mapping = {
        'is_docs': f"{base_prompt}这是文档相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'docs:' 前缀。仅返回一行，不要解释。",
        'is_test': f"{base_prompt}这是测试相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'test:' 前缀。仅返回一行，不要解释。",
        'is_config': f"{base_prompt}这是配置文件相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'chore:' 或 'config:' 前缀。仅返回一行，不要解释。",
        'is_refactoring': f"{base_prompt}这是代码重构相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'refactor:' 前缀。仅返回一行，不要解释。",
        'is_bugfix': f"{base_prompt}这是错误修复相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'fix:' 前缀。仅返回一行，不要解释。",
    }
    
    for pattern_type, prompt in prompt_mapping.items():
        if patterns[pattern_type]:
            return prompt
    
    # 特征检测
    if patterns['is_feature'] or len(files_info['new_files']) > len(files_info['modified_files']):
        return f"{base_prompt}这是新功能相关的变更，请生成符合 conventional commits 规范的提交信息，使用 'feat:' 前缀。仅返回一行，不要解释。"
    
    return f"{base_prompt}请为以下代码变更生成一条简洁、符合 conventional commits 规范的提交信息，仅返回一行，不要解释。"


# ==================== API 调用函数 ====================
def build_request_body(diff):
    """构建API请求体"""
    files_info = analyze_diff(diff)
    total_files = len(files_info['new_files']) + len(files_info['modified_files']) + len(files_info['deleted_files'])
    patterns = analyze_change_patterns(files_info, diff)
    
    use_summary = len(diff) > MAX_DIFF_SIZE or total_files > MAX_FILES_FOR_DETAIL
    
    if use_summary:
        print(f"[INFO] Diff过大({len(diff)}字符) 或文件过多({total_files}个)，使用智能摘要模式")
        
        if is_new_project_init(files_info):
            content = f"新项目初始化提交，包含以下变更：\n{create_diff_summary(files_info)}"
            system_prompt = "你是一个专业程序员。这是一个新项目的初始提交，请生成一条符合 conventional commits 规范的提交信息，通常使用 'feat: ' 或 'init: ' 开头。仅返回一行，不要解释。"
        else:
            content = create_layered_summary(files_info, patterns)
            system_prompt = get_context_aware_prompt(patterns, files_info)
    else:
        if len(diff) > MAX_DIFF_SIZE // 2:
            print(f"[INFO] 使用智能diff提取，原始大小: {len(diff)}字符")
            content = extract_key_changes(diff)
            print(f"[INFO] 提取后大小: {len(content)}字符")
        else:
            content = diff
        
        system_prompt = get_context_aware_prompt(patterns, files_info)

    return {
        "model": MODEL_NAME,
        "temperature": 0.3,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": content}
        ]
    }


def get_commit_message(request_body):
    """调用 API 生成提交信息"""
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(API_URL, headers=headers, json=request_body, timeout=REQUEST_TIMEOUT)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.Timeout:
        print(f"[ERROR] API 请求超时 ({REQUEST_TIMEOUT} 秒)。")
        return None
    except requests.exceptions.HTTPError as e:
        print(f"[ERROR] API 请求失败，HTTP 状态码: {e.response.status_code}")
        try:
            print(f"[ERROR] API 响应内容: {e.response.text}")
        except Exception:
            pass
        return None
    except requests.exceptions.RequestException as e:
        print(f"[ERROR] API 请求发生错误: {e}")
        return None
    except json.JSONDecodeError:
        print(f"[ERROR] 解码 API 响应失败。状态码: {response.status_code if 'response' in locals() else 'N/A'}")
        return None


def extract_commit_message(response_data):
    """提取提交信息"""
    if not response_data:
        return None
    
    try:
        return response_data['choices'][0]['message']['content']
    except (KeyError, IndexError, TypeError) as e:
        print(f"[ERROR] 无法提取提交信息。错误: {e}。响应内容：")
        print(json.dumps(response_data, indent=2, ensure_ascii=False))
        return None


# ==================== 主程序 ====================
def main():
    """主程序入口"""
    diff = get_git_diff()
    if not diff:
        if diff is not None:
            print("[INFO] 没有暂存更改，请先运行 git add。")
        return

    request_body = build_request_body(diff)

    print("[INFO] 请求体内容：")
    print(json.dumps(request_body, indent=2, ensure_ascii=False))

    api_response = get_commit_message(request_body)
    if not api_response:
        return

    message = extract_commit_message(api_response)
    if not message:
        return

    print("\n[INFO] 生成的提交信息：")
    print(f"  {message}")
    
    git_commit(message)


if __name__ == "__main__":
    main()