一、累计频率的定义
关于频率的定义参考:
https://statorials.org/cn/%E7%B4%AF%E7%A7%AF%E9%A2%91%E7%8E%87/
二、计算累积频率-Python代码
import collections
def calculate_frequencies_with_reverse_cumulative():
# 定义文件名
input_file = "2020-Perimeter.txt"
# 生成输出文件名:原始文件名-Reverse-Frequency.txt
if input_file.endswith('.txt'):
output_file = input_file.replace('.txt', '-Reverse-Frequency.txt')
else:
output_file = input_file + '-Reverse-Frequency.txt'
# 读取文件
with open(input_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
# 获取表头
header = lines[0].strip()
# 读取数据(跳过表头)
data = [line.strip() for line in lines[1:] if line.strip()]
# 计算频率
freq_counter = collections.Counter(data)
total_count = len(data)
# 按值排序(数值或字母顺序)
try:
sorted_items = sorted(freq_counter.items(), key=lambda x: float(x[0]))
except ValueError:
sorted_items = sorted(freq_counter.items())
# 计算正向累计频率
results = []
cum_abs = 0
cum_rel = 0.0
for value, count in sorted_items:
cum_abs += count
rel_freq = count / total_count
cum_rel += rel_freq
results.append({
'value': value,
'abs_freq': count,
'cum_abs_freq': cum_abs,
'rel_freq': rel_freq,
'cum_rel_freq': cum_rel
})
# 计算反向累计频率(从最大值向最小值累计)
rev_cum_abs = 0
rev_cum_rel = 0.0
# 反向遍历结果列表
for i in range(len(results)-1, -1, -1):
rev_cum_abs += results[i]['abs_freq']
rev_cum_rel += results[i]['rel_freq']
results[i]['rev_cum_abs_freq'] = rev_cum_abs
results[i]['rev_cum_rel_freq'] = rev_cum_rel
# 写入结果文件
with open(output_file, 'w', encoding='utf-8') as f:
f.write("值,绝对频率,累计绝对频率,反累计绝对频率,相对频率,累计相对频率,反累计相对频率\n")
for result in results:
f.write(f"{result['value']},{result['abs_freq']},{result['cum_abs_freq']},{result['rev_cum_abs_freq']},"
f"{result['rel_freq']:.4f},{result['cum_rel_freq']:.4f},{result['rev_cum_rel_freq']:.4f}\n")
# 打印摘要信息
print("="*80)
print("频率统计摘要(包含反累计频率)")
print("="*80)
print(f"输入文件: {input_file}")
print(f"输出文件: {output_file}")
print(f"表头名称: {header}")
print(f"总数据行数: {total_count}")
print(f"不同值的数量: {len(sorted_items)}")
print("-"*80)
# 显示前几行结果
print("\n前5行统计结果:")
print("-"*80)
print(f"{'值':<10} {'绝对频率':<10} {'累计绝对频率':<12} {'反累计绝对频率':<12} {'相对频率':<10} {'累计相对频率':<12} {'反累计相对频率':<12}")
print("-"*80)
for result in results[:5]:
print(f"{result['value']:<10} {result['abs_freq']:<10} {result['cum_abs_freq']:<12} "
f"{result['rev_cum_abs_freq']:<12} {result['rel_freq']:<10.4f} {result['cum_rel_freq']:<12.4f} "
f"{result['rev_cum_rel_freq']:<12.4f}")
if len(results) > 5:
print(f"... 还有 {len(results) - 5} 行数据")
print("\n" + "="*80)
print("所有频率统计结果已保存到文件:", output_file)
print("="*80)
# 执行函数
if __name__ == "__main__":
calculate_frequencies_with_reverse_cumulative()
三、输出形式
================================================================================
频率统计摘要(包含反累计频率)
================================================================================
输入文件: 2020-Perimeter.txt
输出文件: 2020-Perimeter-Reverse-Frequency.txt
表头名称: perimeter
总数据行数: 6634
不同值的数量: 6634
--------------------------------------------------------------------------------
前5行统计结果:
--------------------------------------------------------------------------------
值 绝对频率 累计绝对频率 反累计绝对频率 相对频率 累计相对频率 反累计相对频率
--------------------------------------------------------------------------------
10.73133702 1 1 6634 0.0002 0.0002 1.0000
30.65792608 1 2 6633 0.0002 0.0003 0.9998
31.21799034 1 3 6632 0.0002 0.0005 0.9997
31.75747989 1 4 6631 0.0002 0.0006 0.9995
31.82466565 1 5 6630 0.0002 0.0008 0.9994
... 还有 6629 行数据
================================================================================
所有频率统计结果已保存到文件: 2020-Perimeter-Reverse-Frequency.txt
================================================================================
四、Origin绘制累计频率曲线图





这样就差不多做好了。
接下来给他稍微修改一下就行。

就好了。