ODS是openoffice的电子表格格式,rusty_sheet插件也能读取,需要造一些数据,想知道有哪些程序能高效生成,在pypi上搜索发现,如下三种:
- odswriter
- pyexcel-ods3
- stream-write-ods
前两种是内存一次写入的,后一种是流式的。
以下是整合xlsx、xlsb、ODS三种格式的写入程序
python
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random
import time
import os
from decimal import Decimal
from collections import OrderedDict
# 尝试导入各种格式的库
try:
from pyxlsbwriter import XlsbWriter
XLSB_AVAILABLE = True
except ImportError:
XLSB_AVAILABLE = False
print("警告: pyxlsbwriter 未安装,无法生成xlsb格式文件")
try:
import odswriter as ods
ODSWRITER_AVAILABLE = True
except ImportError:
ODSWRITER_AVAILABLE = False
print("警告: odswriter 未安装,无法使用odswriter生成ods格式文件")
try:
from pyexcel_ods3 import save_data
PYEXCEL_ODS_AVAILABLE = True
except ImportError:
PYEXCEL_ODS_AVAILABLE = False
print("警告: pyexcel-ods3 未安装,无法使用pyexcel_ods3生成ods格式文件")
try:
from stream_write_ods import stream_write_ods
STREAM_ODS_AVAILABLE = True
except ImportError:
STREAM_ODS_AVAILABLE = False
print("警告: stream-write-ods 未安装,无法使用流式ODS写入")
def generate_stock_data(date, num_rows=10000):
"""
生成指定日期的模拟股市行情数据
Args:
date: 日期 (datetime对象)
num_rows: 数据行数
Returns:
tuple: (data_list, data_df) - 列表格式和DataFrame格式的数据
"""
# 生成股票代码列表
stock_codes = [f"STK{str(i).zfill(5)}" for i in range(1, 1001)]
# 表头
headers = ["股票代码", "时间", "开盘价", "最高价", "最低价", "收盘价",
"前收盘价", "涨跌额", "涨跌幅(%)", "成交量(股)", "成交额(元)", "换手率(%)"]
data_list = [headers]
data_dict = {header: [] for header in headers}
for i in range(num_rows):
stock_code = random.choice(stock_codes)
# 生成时间 (从9:30到15:00)
hour = random.randint(9, 14)
minute = random.randint(0, 59)
if hour == 9:
minute = random.randint(30, 59)
elif hour == 14:
minute = random.randint(0, 57)
timestamp = datetime(date.year, date.month, date.day, hour, minute,
random.randint(0, 59))
# 生成价格数据
base_price = random.uniform(5, 500)
open_price = round(base_price, 2)
high_price = round(open_price * random.uniform(1.0, 1.1), 2)
low_price = round(open_price * random.uniform(0.9, 1.0), 2)
close_price = round(random.uniform(low_price, high_price), 2)
# 计算涨跌幅
prev_close = round(open_price * random.uniform(0.95, 1.05), 2)
change = round(close_price - prev_close, 2)
change_percent = round((change / prev_close) * 100, 2)
# 生成成交量数据
volume = random.randint(10000, 10000000)
amount = round(volume * close_price, 2)
# 换手率
turnover_rate = round(random.uniform(0.1, 15.0), 2)
# 列表格式数据
row_list = [
stock_code,
timestamp,
open_price,
high_price,
low_price,
close_price,
prev_close,
change,
change_percent,
volume,
amount,
turnover_rate
]
data_list.append(row_list)
# DataFrame格式数据
for header, value in zip(headers, row_list):
data_dict[header].append(value)
# 按时间排序 (跳过表头)
data_list[1:] = sorted(data_list[1:], key=lambda x: x[1])
# 创建DataFrame
df = pd.DataFrame(data_dict)
df = df.sort_values('时间').reset_index(drop=True)
return data_list, df
def get_sheets_generator(data_dict):
"""
生成流式ODS所需的sheets生成器
Args:
data_dict: 字典,key为sheet名,value为数据列表
Yields:
tuple: (sheet_name, headers, rows_generator)
"""
for sheet_name, data in data_dict.items():
headers = data[0] # 第一行是表头
rows = data[1:] # 剩余行是数据
def rows_generator():
for row in rows:
# 处理数据类型以确保兼容性
processed_row = []
for cell in row:
if isinstance(cell, (int, float)):
processed_row.append(cell)
elif isinstance(cell, datetime):
processed_row.append(cell)
elif isinstance(cell, Decimal):
processed_row.append(float(cell))
else:
processed_row.append(str(cell) if cell is not None else "")
yield processed_row
yield sheet_name, headers, rows_generator()
def save_as_ods_stream(filename, data_dict):
"""使用stream_write_ods保存为ODS格式"""
if not STREAM_ODS_AVAILABLE:
raise ImportError("stream-write-ods 未安装")
sheets_generator = get_sheets_generator(data_dict)
ods_chunks = stream_write_ods(sheets_generator)
with open(filename, 'wb') as f:
for chunk in ods_chunks:
f.write(chunk)
def save_as_xlsb(filename, data_dict, compression_level=6):
"""使用pyxlsbwriter保存为xlsb格式"""
if not XLSB_AVAILABLE:
raise ImportError("pyxlsbwriter 未安装")
with XlsbWriter(filename, compressionLevel=compression_level) as writer:
for sheet_name, data in data_dict.items():
writer.add_sheet(sheet_name)
writer.write_sheet(data)
def save_as_ods_odswriter(filename, data_dict):
"""使用odswriter保存为ods格式"""
if not ODSWRITER_AVAILABLE:
raise ImportError("odswriter 未安装")
with ods.writer(open(filename, "wb")) as odsfile:
for sheet_name, data in data_dict.items():
sheet = odsfile.new_sheet(sheet_name)
for row in data:
# 处理数据类型以确保兼容性
processed_row = []
for cell in row:
if isinstance(cell, (int, float)):
processed_row.append(cell)
elif isinstance(cell, datetime):
processed_row.append(cell)
elif isinstance(cell, Decimal):
processed_row.append(float(cell))
else:
processed_row.append(str(cell) if cell is not None else "")
sheet.writerow(processed_row)
def save_as_ods_pyexcel(filename, data_dict):
"""使用pyexcel_ods3保存为ods格式"""
if not PYEXCEL_ODS_AVAILABLE:
raise ImportError("pyexcel_ods3 未安装")
# 转换为OrderedDict格式
ods_data = OrderedDict()
for sheet_name, data in data_dict.items():
ods_data[sheet_name] = data
save_data(filename, ods_data)
def save_as_xlsx(filename, data_dict):
"""使用pandas保存为xlsx格式"""
with pd.ExcelWriter(filename, engine='openpyxl') as writer:
for sheet_name, data in data_dict.items():
# 如果是列表格式,转换为DataFrame
if isinstance(data, list):
headers = data[0]
rows = data[1:]
df = pd.DataFrame(rows, columns=headers)
else:
df = data
df.to_excel(writer, sheet_name=sheet_name, index=False)
# 调整列宽
worksheet = writer.sheets[sheet_name]
for idx, col in enumerate(df.columns):
column_width = max(df[col].astype(str).map(len).max(), len(col)) + 2
worksheet.column_dimensions[chr(65 + idx)].width = min(column_width, 20)
def compare_all_ods_libraries(filename, num_sheets=3, rows_per_sheet=100):
"""比较所有可用的ODS库的性能"""
available_libs = []
if STREAM_ODS_AVAILABLE:
available_libs.append(('stream_ods', save_as_ods_stream))
if ODSWRITER_AVAILABLE:
available_libs.append(('odswriter', save_as_ods_odswriter))
if PYEXCEL_ODS_AVAILABLE:
available_libs.append(('pyexcel_ods3', save_as_ods_pyexcel))
if not available_libs:
print("没有可用的ODS库")
return
# 生成测试数据
test_data = {}
for i in range(num_sheets):
sheet_name = f"Sheet{i+1}"
data_list, _ = generate_stock_data(datetime.now(), rows_per_sheet)
test_data[sheet_name] = data_list
print(f"ODS库性能比较: {num_sheets}个sheet, 每个{rows_per_sheet}行")
print("-" * 50)
results = []
for lib_name, save_func in available_libs:
test_file = f"test_{lib_name}.ods"
try:
start_time = time.time()
save_func(test_file, test_data)
save_time = time.time() - start_time
file_size = os.path.getsize(test_file)
results.append((lib_name, save_time, file_size))
print(f"{lib_name:15} - 耗时: {save_time:6.2f}秒, 文件大小: {file_size:8}字节")
# 清理测试文件
try:
#os.remove(test_file)
pass
except:
pass
except Exception as e:
print(f"{lib_name:15} - 失败: {e}")
if results:
# 找出最快的库
fastest = min(results, key=lambda x: x[1])
print(f"\n最快的库: {fastest[0]} ({fastest[1]:.2f}秒)")
if __name__ == "__main__":
#main()
compare_all_ods_libraries("performance_test.ods", num_sheets=20, rows_per_sheet=1000)
测试结果
bash
python odss.py
ODS库性能比较: 20个sheet, 每个1000行
--------------------------------------------------
stream_ods - 耗时: 1.42秒, 文件大小: 1744087字节
odswriter - 耗时: 4.56秒, 文件大小: 27465194字节
pyexcel_ods3 - 耗时: 3.02秒, 文件大小: 1091255字节
最快的库: stream_ods (1.42秒)
odswriter库基本上没有压缩,所以文件较大,stream_ods和pyexcel_ods3压缩率高的慢一些。