青少年编程与数学 02-016 Python数据结构与算法 30课题、数据压缩算法

青少年编程与数学 02-016 Python数据结构与算法 30课题、数据压缩算法

  • 一、无损压缩算法
    • [1. Huffman编码](#1. Huffman编码)
    • [2. Lempel-Ziv-Welch (LZW) 编码](#2. Lempel-Ziv-Welch (LZW) 编码)
    • [3. Run-Length Encoding (RLE)](#3. Run-Length Encoding (RLE))
  • 二、有损压缩算法
    • [1. JPEG压缩(有损)](#1. JPEG压缩(有损))
    • [2. DEFLATE(ZIP压缩)](#2. DEFLATE(ZIP压缩))
    • [3. Brotli](#3. Brotli)
    • [4. LZMA](#4. LZMA)
    • [5. Zstandard (Zstd)](#5. Zstandard (Zstd))
  • 总结

课题摘要:

介绍一些常见的数据压缩算法,并提供更详细的Python代码实现。


一、无损压缩算法

1. Huffman编码

Huffman编码是一种基于字符频率的编码方法,通过构建一棵Huffman树来生成每个字符的唯一编码。

详细代码示例(Python)

python 复制代码
import heapq
from collections import defaultdict, Counter

class Node:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(frequency):
    heap = [Node(char, freq) for char, freq in frequency.items()]
    heapq.heapify(heap)
    
    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(heap, merged)
    
    return heap[0]

def generate_codes(node, prefix="", code_dict=None):
    if code_dict is None:
        code_dict = {}
    if node is not None:
        if node.char is not None:
            code_dict[node.char] = prefix
        generate_codes(node.left, prefix + "0", code_dict)
        generate_codes(node.right, prefix + "1", code_dict)
    return code_dict

def huffman_encode(s):
    frequency = Counter(s)
    huffman_tree = build_huffman_tree(frequency)
    huffman_codes = generate_codes(huffman_tree)
    encoded_string = ''.join(huffman_codes[char] for char in s)
    return encoded_string, huffman_codes

def huffman_decode(encoded_string, huffman_codes):
    reverse_dict = {code: char for char, code in huffman_codes.items()}
    current_code = ""
    decoded_string = ""
    for bit in encoded_string:
        current_code += bit
        if current_code in reverse_dict:
            decoded_string += reverse_dict[current_code]
            current_code = ""
    return decoded_string

# 示例
s = "this is an example for huffman encoding"
encoded_string, huffman_codes = huffman_encode(s)
print("Encoded string:", encoded_string)
print("Huffman dictionary:", huffman_codes)
decoded_string = huffman_decode(encoded_string, huffman_codes)
print("Decoded string:", decoded_string)

2. Lempel-Ziv-Welch (LZW) 编码

LZW编码是一种基于字典的压缩算法,通过动态构建字典来编码重复的字符串。

详细代码示例(Python)

python 复制代码
def lzw_encode(s):
    dictionary = {chr(i): i for i in range(256)}
    w = ""
    result = []
    for c in s:
        wc = w + c
        if wc in dictionary:
            w = wc
        else:
            result.append(dictionary[w])
            dictionary[wc] = len(dictionary)
            w = c
    if w:
        result.append(dictionary[w])
    return result

def lzw_decode(encoded):
    dictionary = {i: chr(i) for i in range(256)}
    w = chr(encoded.pop(0))
    result = [w]
    for k in encoded:
        if k in dictionary:
            entry = dictionary[k]
        elif k == len(dictionary):
            entry = w + w[0]
        result.append(entry)
        dictionary[len(dictionary)] = w + entry[0]
        w = entry
    return ''.join(result)

# 示例
s = "TOBEORNOTTOBEORTOBEORNOT"
encoded = lzw_encode(s)
print("Encoded:", encoded)
decoded = lzw_decode(encoded)
print("Decoded:", decoded)

3. Run-Length Encoding (RLE)

RLE是一种简单的无损压缩算法,通过将连续重复的字符替换为字符和重复次数的组合。

详细代码示例(Python)

python 复制代码
def rle_encode(s):
    if not s:
        return ""
    
    result = []
    prev_char = s[0]
    count = 1
    
    for char in s[1:]:
        if char == prev_char:
            count += 1
        else:
            result.append((prev_char, count))
            prev_char = char
            count = 1
    result.append((prev_char, count))
    
    return ''.join([f"{char}{count}" for char, count in result])

def rle_decode(encoded):
    result = []
    i = 0
    while i < len(encoded):
        char = encoded[i]
        count = int(encoded[i+1])
        result.append(char * count)
        i += 2
    return ''.join(result)

# 示例
s = "AAAABBBCCDAA"
encoded = rle_encode(s)
print("Encoded:", encoded)
decoded = rle_decode(encoded)
print("Decoded:", decoded)

二、有损压缩算法

1. JPEG压缩(有损)

JPEG是一种广泛使用的图像压缩标准,通常用于有损压缩。虽然JPEG压缩的实现较为复杂,但可以使用Python的Pillow库来处理JPEG图像。

详细代码示例(Python)

python 复制代码
from PIL import Image

# 压缩图像
def compress_image(input_path, output_path, quality=85):
    image = Image.open(input_path)
    image.save(output_path, "JPEG", quality=quality)

# 示例
compress_image("input.jpg", "output.jpg", quality=50)

2. DEFLATE(ZIP压缩)

DEFLATE是一种结合了LZ77算法和Huffman编码的压缩算法,广泛用于ZIP文件格式。

详细代码示例(Python)

python 复制代码
import zlib

def deflate_compress(data):
    compressed_data = zlib.compress(data.encode())
    return compressed_data

def deflate_decompress(compressed_data):
    decompressed_data = zlib.decompress(compressed_data)
    return decompressed_data.decode()

# 示例
data = "this is an example for deflate compression"
compressed_data = deflate_compress(data)
print("Compressed data:", compressed_data)
decompressed_data = deflate_decompress(compressed_data)
print("Decompressed data:", decompressed_data)

3. Brotli

Brotli是一种现代的压缩算法,结合了多种压缩技术,提供比DEFLATE更好的压缩率。

详细代码示例(Python)

python 复制代码
import brotli

def brotli_compress(data):
    compressed_data = brotli.compress(data.encode())
    return compressed_data

def brotli_decompress(compressed_data):
    decompressed_data = brotli.decompress(compressed_data)
    return decompressed_data.decode()

# 示例
data = "this is an example for brotli compression"
compressed_data = brotli_compress(data)
print("Compressed data:", compressed_data)
decompressed_data = brotli_decompress(compressed_data)
print("Decompressed data:", decompressed_data)

4. LZMA

LZMA是一种高效的压缩算法,广泛用于7z文件格式。

详细代码示例(Python)

python 复制代码
import lzma

def lzma_compress(data):
    compressed_data = lzma.compress(data.encode())
    return compressed_data

def lzma_decompress(compressed_data):
    decompressed_data = lzma.decompress(compressed_data)
    return decompressed_data.decode()

# 示例
data = "this is an example for lzma compression"
compressed_data = lzma_compress(data)
print("Compressed data:", compressed_data)
decompressed_data = lzma_decompress(compressed_data)
print("Decompressed data:", decompressed_data)

5. Zstandard (Zstd)

Zstd是一种现代的压缩算法,结合了高压缩率和快速解压缩的特点。

详细代码示例(Python)

python 复制代码
import zstandard

def zstd_compress(data):
    compressed_data = zstandard.compress(data.encode())
    return compressed_data

def zstd_decompress(compressed_data):
    decompressed_data = zstandard.decompress(compressed_data)
    return decompressed_data.decode()

# 示例
data = "this is an example for zstd compression"
compressed_data = zstd_compress(data)
print("Compressed data:", compressed_data)
decompressed_data = zstd_decompress(compressed_data)
print("Decompressed data:", decompressed_data)

总结

这些数据压缩算法在不同的场景下具有各自的优势和适用性。无损压缩算法如Huffman编码、LZW编码和RLE适用于需要完全恢复原始数据的场景,而有损压缩算法如JPEG压缩则适用于对数据质量要求不高的场景。根据具体需求选择合适的压缩算法可以有效节省存储空间和传输带宽。

相关推荐
颜酱6 分钟前
图结构完全解析:从基础概念到遍历实现
javascript·后端·算法
m0_7369191018 分钟前
C++代码风格检查工具
开发语言·c++·算法
yugi98783820 分钟前
基于MATLAB强化学习的单智能体与多智能体路径规划算法
算法·matlab
喵手25 分钟前
Python爬虫实战:旅游数据采集实战 - 携程&去哪儿酒店机票价格监控完整方案(附CSV导出 + SQLite持久化存储)!
爬虫·python·爬虫实战·零基础python爬虫教学·采集结果csv导出·旅游数据采集·携程/去哪儿酒店机票价格监控
2501_9449347330 分钟前
高职大数据技术专业,CDA和Python认证优先考哪个?
大数据·开发语言·python
helloworldandy36 分钟前
使用Pandas进行数据分析:从数据清洗到可视化
jvm·数据库·python
DuHz39 分钟前
超宽带脉冲无线电(Ultra Wideband Impulse Radio, UWB)简介
论文阅读·算法·汽车·信息与通信·信号处理
Polaris北极星少女1 小时前
TRSV优化2
算法
黎雁·泠崖1 小时前
【魔法森林冒险】5/14 Allen类(三):任务进度与状态管理
java·开发语言
代码游侠2 小时前
C语言核心概念复习——网络协议与TCP/IP
linux·运维·服务器·网络·算法