1. 起因, 目的:
- 想看看一个 github 项目的开始时间。
2. 先看效果
见文末输出示例
3. 过程, 功能
- 解析仓库 URL;用于提取用户名和项目名,便于构造本地路径和文件名。
- 克隆仓库(如果本地不存在);避免重复克隆,节省时间和磁盘空间。
- 获取提交记录;通过
git log
命令提取每次提交的关键信息。 - 保存为 CSV;便于后续使用 Excel 或 Python 工具进行分析处理。
- 打印部分提交摘要;快速查看最早和最新的提交情况。
代码 1
python
import os
import git # pip install gitpython pandas
import pandas as pd
import subprocess
from urllib.parse import urlparse, urlsplit
def clone_repository(repo_url, local_path):
try:
if os.path.exists(local_path):
print(f"路径 {local_path} 已存在,跳过克隆。")
return True
git.Repo.clone_from(repo_url, local_path)
print(f"成功克隆仓库到 {local_path}")
return True
except Exception as e:
print(f"克隆仓库失败: {e}")
return False
def get_git_commits(local_path):
try:
# Windows 命令行语法,使用 UTF-8 编码
cmd = (
f'cd /d "{local_path}" & git log --pretty=format:"%H||%an||%ad||%s" '
'--date=iso'
)
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True, check=True, encoding='utf-8'
)
commits = []
for line in result.stdout.splitlines():
try:
parts = line.split("||", 3)
if len(parts) == 4:
commits.append({
'sha': parts[0],
'author': parts[1],
'date': parts[2],
'message': parts[3]
})
else:
print(f"跳过格式错误的提交记录: {line[:50]}...")
except UnicodeDecodeError as e:
print(f"跳过无法解码的提交记录: {e}")
continue
print(f"从本地仓库获取 {len(commits)} 条提交记录")
return commits
except subprocess.CalledProcessError as e:
print(f"获取本地提交记录失败: {e}")
return None
except UnicodeDecodeError as e:
print(f"命令输出解码失败: {e}")
return None
def save_commits_to_csv(commits, output_file):
if not commits:
print("没有提交记录可保存")
return
df = pd.DataFrame(commits)
df.to_csv(output_file, index=False, encoding='utf-8')
print(f"提交记录已保存到 {output_file}")
def print_commit_info(commits, num=5):
if not commits:
print("没有提交记录可打印")
return
sorted_commits = sorted(commits, key=lambda x: x['date'])
print(f"\n最早的 {num} 个提交记录:")
for commit in sorted_commits[:num]:
print(f"SHA: {commit['sha'][:7]} | 作者: {commit['author']} | 日期: {commit['date']} | 信息: {commit['message'][:100]}...")
print(f"\n最新的 {num} 个提交记录:")
for commit in sorted_commits[-num:]:
print(f"SHA: {commit['sha'][:7]} | 作者: {commit['author']} | 日期: {commit['date']} | 信息: {commit['message'][:100]}...")
def parse_repo_url(repo_url):
try:
parsed_url = urlparse(repo_url)
path = parsed_url.path.strip('/')
if path.endswith('.git'):
path = path[:-4]
parts = path.split('/')
if len(parts) >= 2:
repo_owner, repo_name = parts[-2], parts[-1]
return repo_owner, repo_name
else:
print("无效的 GitHub URL 格式")
return None, None
except Exception as e:
print(f"解析 URL 失败: {e}")
return None, None
def main(repo_url):
repo_owner, repo_name = parse_repo_url(repo_url)
if not repo_owner or not repo_name:
print("无法继续执行:无效的仓库 URL")
return
local_path = f"./{repo_name}_repo"
output_csv = f"{repo_name}_commits.csv"
if not clone_repository(repo_url, local_path):
return
commits = get_git_commits(local_path)
if commits:
save_commits_to_csv(commits, output_csv)
print_commit_info(commits, num=5)
if __name__ == "__main__":
main("https://github.com/sktime/sktime.git")
#
输出:
txt
路径 ./sktime_repo 已存在,跳过克隆。
从本地仓库获取 5395 条提交记录
提交记录已保存到 sktime_commits.csv
最早的 5 个提交记录:
SHA: 7294faa | 作者: mloning | 日期: 2018-11-18 12:30:13 +0000 | 信息: setting up project repo...
SHA: 3bcd4ff | 作者: mloning | 日期: 2018-11-18 12:36:30 +0000 | 信息: updated contributing guidelines...
SHA: 1539eb3 | 作者: Sajaysurya Ganesh | 日期: 2019-01-08 20:22:43 +0000 | 信息: Initial commit with sklearn suggested template...
SHA: f598253 | 作者: Sajaysurya Ganesh | 日期: 2019-01-09 11:07:06 +0000 | 信息: Removed currently unnecessary files to have just the basics...
SHA: da12582 | 作者: Sajaysurya Ganesh | 日期: 2019-01-09 15:25:16 +0000 | 信息: Added a dummy classifier, with appropriate tests...
最新的 5 个提交记录:
SHA: 4c6f4ed | 作者: Martin Tveten | 日期: 2025-05-19 22:55:57 +0200 | 信息: [ENH ] skchange homogenization: variable identification + minor fixes (#7509)...
SHA: b9fc4fb | 作者: Robert Kübler | 日期: 2025-05-20 23:51:59 +0200 | 信息: [ENH] Simplify HampelFilter code and add a unit test (#8249)...
SHA: 9c57a24 | 作者: Yury Fedotov | 日期: 2025-05-22 16:21:58 -0500 | 信息: [DOC] Add missing space to `Pipeline` constructor warning (#8275)...
SHA: 4169644 | 作者: Om Biradar | 日期: 2025-05-23 00:41:58 +0530 | 信息: [ENH] Add `relative_to` parameter to percentage error metrics to enable relative-by-prediction metri...
SHA: e38641b | 作者: Harshvir Sandhu | 日期: 2025-05-23 15:46:52 +0530 | 信息: [ENH] Add regularization hyperparameter in ReconcilerForecaster (#7660)...
4. 结论 + todo
- 一个小工具而已。
- 虽然AI 能写,但是改来改去,也很费时间。问了老半天。
希望对大家有帮助。