Python实验4 列表与字典应用

目的：熟练操作组合数据类型。

试验任务：

基础：生日悖论分析。如果一个房间有23 人或以上，那么至少有两
个人的生日相同的概率大于50%。编写程序，输出在不同随机样本数
量下，23 个人中至少两个人生日相同的概率。
源码：

点击查看代码

复制代码

import random
def has_duplicate_birthdays():
    birthdays = [random.randint(1, 365) for _ in range(23)]
    return len(set(birthdays)) != len(birthdays）

def calculate_probability(sample_size):
    count = 0
    for _ in range(sample_size):
        if has_duplicate_birthdays():
            count = count + 1
    return count / sample_size


#不同的随机样本数量
sample_sizes = [100, 1000, 10000, 100000]

for size in sample_sizes:
    probability = calculate_probability(size)
    print(f"当样本数量为 {size} 时，23 个人中至少两个人生日相同的概率为: {probability:.4f}")

运行截图：

进阶：统计《一句顶一万句》文本中前10 高频词，生成词云。
源码：

点击查看代码

复制代码

import jieba
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt


def generate_wordcloud(file_path):
    try:
        # 读取文本文件
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()

        # 分词
        words = jieba.lcut(text)

        # 过滤停用词，这里简单过滤单字和一些常见无意义符号
        stopwords = set(['，', '。', '、', '的', '是', '在', '和', '也', '有', '不'])
        filtered_words = [word for word in words if word not in stopwords and len(word) > 1]

        # 统计词频
        word_counts = Counter(filtered_words)

        # 获取前 10 高频词
        top_10_words = word_counts.most_common(10)
        print("前 10 高频词：")
        for word, count in top_10_words:
            print(f"{word}: {count}")

        # 生成词云
        wordcloud = WordCloud(font_path='simhei.ttf',
                              background_color='white',
                              width=800,
                              height=600).generate_from_frequencies(word_counts)

        # 显示词云
        plt.figure(figsize=(8, 6))
        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis('off')
        plt.show()

    except FileNotFoundError:
        print(f"错误：未找到文件 {file_path}，请检查文件路径是否正确。")
    except Exception as e:
        print(f"发生未知错误：{e}")


file_path = 'yi.txt'
generate_wordcloud(file_path)

运行截图： ![](https://img2024.cnblogs.com/blog/3450701/202504/3450701-20250425101027390-850398732.png)

拓展：金庸、古龙等武侠小说写作风格分析。输出不少于3 个金庸（古
龙）作品的最常用10 个词语，找到其中的相关性，总结其风格。
源码：

点击查看代码

复制代码

import jieba
from collections import Counter
import re  # 导入正则表达式模块

def count_word_frequencies(file_path, top_n=10):
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()

    # 使用 jieba 进行分词
    words = jieba.lcut(text)

    # 排除一些常见的无意义词汇（停用词）
    stopwords = set('的 了 是 在 和 有 个 与 不 一 我 他 以 也 子 之 幺'.split())
    filtered_words = [word for word in words if word not in stopwords and len(word.strip()) > 1]

    # 增加对标点符号的过滤
    filtered_words = [word for word in filtered_words if not re.match(r'^[\W_]+$', word)]

    # 统计词频
    word_counts = Counter(filtered_words)

    # 获取出现频率最高的前 top_n 个词
    most_common_words = word_counts.most_common(top_n)
    
    return most_common_words

# 调用函数并打印结果
file_paths = ['天龙八部.txt', '神雕侠侣.txt']
for file_path in file_paths:
    most_common_words = count_word_frequencies(file_path, top_n=10)
    print(f"小说《{file_path}》中前十高频词汇为：")
    for word, count in most_common_words:
        print(f"{word}")
    print("\n" + "-"*50 + "\n")

运行截图：

金庸作品风格为：

金庸的作品充满了中国传统文化元素，包括历史、哲学、文学、武术等。他巧妙地将这些元素融入到故事情节中，使得作品不仅是一部武侠小说，更是一部展示中华文化的窗口。金庸的作品强调善恶有报、正义终将战胜邪恶的理念。爱情是金庸小说不可或缺的一部分，他擅长描写不同类型的爱情关系------从青梅竹马到生死相依，从一见钟情到日久生情，每一种都写得动人心弦。