青少年编程与数学 02-018 C++数据结构与算法 26课题、数据压缩算法

青少年编程与数学 02-018 C++数据结构与算法 26课题、数据压缩算法

  • 一、无损压缩算法
    • [1. Huffman编码](#1. Huffman编码)
    • [2. Lempel-Ziv-Welch (LZW) 编码](#2. Lempel-Ziv-Welch (LZW) 编码)
    • [3. Run-Length Encoding (RLE)](#3. Run-Length Encoding (RLE))
  • 二、有损压缩算法
    • [1. DEFLATE(ZIP压缩)](#1. DEFLATE(ZIP压缩))
    • [2. Brotli](#2. Brotli)
    • [3. LZMA](#3. LZMA)
    • [4. Zstandard (Zstd)](#4. Zstandard (Zstd))
  • 总结

课题摘要:

介绍一些常见的数据压缩算法,并提供更详细的C++代码实现。


一、无损压缩算法

1. Huffman编码

Huffman编码是一种基于字符频率的编码方法,通过构建一棵Huffman树来生成每个字符的唯一编码。

详细代码示例(C++)

  1. 定义节点结构:

    cpp 复制代码
    struct Node {
        char ch;
        int freq;
        Node* left;
        Node* right;
        Node(char ch, int freq) : ch(ch), freq(freq), left(nullptr), right(nullptr) {}
    };
  2. 定义比较函数:

    cpp 复制代码
    struct Compare {
        bool operator()(Node* left, Node* right) {
            return left->freq > right->freq;
        }
    };
  3. 生成编码:

    cpp 复制代码
    void generateCodes(Node* root, std::string str, std::unordered_map<char, std::string>& huffmanCode) {
        if (!root)
            return;
        if (!root->left && !root->right) {
            huffmanCode[root->ch] = str;
        }
        generateCodes(root->left, str + "0", huffmanCode);
        generateCodes(root->right, str + "1", huffmanCode);
    }
  4. 编码函数:

    cpp 复制代码
    std::string huffmanEncode(const std::string& text) {
        std::unordered_map<char, int> freq;
        for (char ch : text) {
            freq[ch]++;
        }
    
        std::priority_queue<Node*, std::vector<Node*>, Compare> heap;
        for (auto pair : freq) {
            heap.push(new Node(pair.first, pair.second));
        }
    
        while (heap.size() > 1) {
            Node* left = heap.top();
            heap.pop();
            Node* right = heap.top();
            heap.pop();
            Node* combined = new Node('\0', left->freq + right->freq);
            combined->left = left;
            combined->right = right;
            heap.push(combined);
        }
    
        std::unordered_map<char, std::string> huffmanCode;
        generateCodes(heap.top(), "", huffmanCode);
    
        std::string encodedText;
        for (char ch : text) {
            encodedText += huffmanCode[ch];
        }
        return encodedText;
    }
  5. 解码函数:

    cpp 复制代码
    std::string huffmanDecode(const std::string& encodedText, Node* root) {
        std::string decodedText;
        Node* current = root;
        for (char bit : encodedText) {
            if (bit == '0') {
                current = current->left;
            } else {
                current = current->right;
            }
            if (!current->left && !current->right) {
                decodedText += current->ch;
                current = root;
            }
        }
        return decodedText;
    }
  6. 主函数:

    cpp 复制代码
    int main() {
        std::string text = "this is an example for huffman encoding";
        std::string encodedText = huffmanEncode(text);
        std::cout << "Encoded string: " << encodedText << std::endl;
    
        Node* root = nullptr; // Assume root is the root of the Huffman tree
        std::string decodedText = huffmanDecode(encodedText, root);
        std::cout << "Decoded string: " << decodedText << std::endl;
        return 0;
    }

2. Lempel-Ziv-Welch (LZW) 编码

LZW编码是一种基于字典的压缩算法,通过动态构建字典来编码重复的字符串。

详细代码示例(C++)

  1. 编码函数:

    cpp 复制代码
    std::vector<int> lzwEncode(const std::string& text) {
        std::unordered_map<std::string, int> dictionary;
        for (int i = 0; i < 256; ++i) {
            dictionary[std::string(1, i)] = i;
        }
    
        std::string w;
        std::vector<int> result;
        for (char c : text) {
            std::string wc = w + c;
            if (dictionary.find(wc) != dictionary.end()) {
                w = wc;
            } else {
                result.push_back(dictionary[w]);
                dictionary[wc] = dictionary.size();
                w = std::string(1, c);
            }
        }
        if (!w.empty()) {
            result.push_back(dictionary[w]);
        }
        return result;
    }
  2. 解码函数:

    cpp 复制代码
    std::string lzwDecode(const std::vector<int>& encoded) {
        std::unordered_map<int, std::string> dictionary;
        for (int i = 0; i < 256; ++i) {
            dictionary[i] = std::string(1, i);
        }
    
        std::string result;
        std::string w = dictionary[encoded[0]];
        result += w;
    
        for (size_t k = 1; k < encoded.size(); ++k) {
            std::string entry;
            if (dictionary.find(encoded[k]) != dictionary.end()) {
                entry = dictionary[encoded[k]];
            } else if (encoded[k] == dictionary.size()) {
                entry = w + w[0];
            }
            result += entry;
    
            dictionary[dictionary.size()] = w + entry[0];
            w = entry;
        }
        return result;
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string text = "TOBEORNOTTOBEORTOBEORNOT";
        std::vector<int> encoded = lzwEncode(text);
        std::cout << "Encoded: ";
        for (int code : encoded) {
            std::cout << code << " ";
        }
        std::cout << std::endl;
    
        std::string decoded = lzwDecode(encoded);
        std::cout << "Decoded: " << decoded << std::endl;
        return 0;
    }

3. Run-Length Encoding (RLE)

RLE是一种简单的无损压缩算法,通过将连续重复的字符替换为字符和重复次数的组合。

详细代码示例(C++)

  1. 编码函数:

    cpp 复制代码
    std::string rleEncode(const std::string& text) {
        if (text.empty()) {
            return "";
        }
    
        std::string result;
        char prevChar = text[0];
        int count = 1;
    
        for (size_t i = 1; i < text.size(); ++i) {
            if (text[i] == prevChar) {
                ++count;
            } else {
                result += prevChar + std::to_string(count);
                prevChar = text[i];
                count = 1;
            }
        }
        result += prevChar + std::to_string(count);
        return result;
    }
  2. 解码函数:

    cpp 复制代码
    std::string rleDecode(const std::string& encoded) {
        std::string result;
        for (size_t i = 0; i < encoded.size(); i += 2) {
            char ch = encoded[i];
            int count = std::stoi(encoded.substr(i + 1, 1));
            result.append(count, ch);
        }
        return result;
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string text = "AAAABBBCCDAA";
        std::string encoded = rleEncode(text);
        std::cout << "Encoded: " << encoded << std::endl;
    
        std::string decoded = rleDecode(encoded);
        std::cout << "Decoded: " << decoded << std::endl;
        return 0;
    }

二、有损压缩算法

1. DEFLATE(ZIP压缩)

DEFLATE是一种结合了LZ77算法和Huffman编码的压缩算法,广泛用于ZIP文件格式。

详细代码示例(C++)

  1. 压缩函数:

    cpp 复制代码
    std::vector<unsigned char> deflateCompress(const std::string& data) {
        z_stream strm;
        strm.zalloc = Z_NULL;
        strm.zfree = Z_NULL;
        strm.opaque = Z_NULL;
        strm.avail_in = static_cast<uInt>(data.size());
        strm.next_in = reinterpret_cast<Bytef*>(const_cast<char*>(data.c_str()));
        deflateInit(&strm, Z_DEFAULT_COMPRESSION);
    
        std::vector<unsigned char> compressedData(data.size() + 100);
        strm.avail_out = static_cast<uInt>(compressedData.size());
        strm.next_out = compressedData.data();
        deflate(&strm, Z_FINISH);
        compressedData.resize(strm.total_out);
        deflateEnd(&strm);
        return compressedData;
    }
  2. 解压缩函数:

    cpp 复制代码
    std::string deflateDecompress(const std::vector<unsigned char>& compressedData) {
        z_stream strm;
        strm.zalloc = Z_NULL;
        strm.zfree = Z_NULL;
        strm.opaque = Z_NULL;
        strm.avail_in = static_cast<uInt>(compressedData.size());
        strm.next_in = compressedData.data();
        inflateInit(&strm);
    
        std::vector<unsigned char> decompressedData(compressedData.size() * 2);
        strm.avail_out = static_cast<uInt>(decompressedData.size());
        strm.next_out = decompressedData.data();
        inflate(&strm, Z_NO_FLUSH);
        decompressedData.resize(strm.total_out);
        inflateEnd(&strm);
        return std::string(decompressedData.begin(), decompressedData.end());
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string data = "this is an example for deflate compression";
        std::vector<unsigned char> compressedData = deflateCompress(data);
        std::cout << "Compressed data size: " << compressedData.size() << std::endl;
    
        std::string decompressedData = deflateDecompress(compressedData);
        std::cout << "Decompressed data: " << decompressedData << std::endl;
        return 0;
    }

2. Brotli

Brotli是一种现代的压缩算法,结合了多种压缩技术,提供比DEFLATE更好的压缩率。

详细代码示例(C++)

  1. 压缩函数:

    cpp 复制代码
    std::vector<uint8_t> brotliCompress(const std::string& data) {
        size_t outputSize = BrotliEncoderMaxCompressedSize(data.size());
        std::vector<uint8_t> compressedData(outputSize);
        BrotliEncoderParams params;
        BrotliEncoderSetParameter(&params, BROTLI_PARAM_QUALITY, 11);
        BrotliEncoderSetParameter(&params, BROTLI_PARAM_MODE, BROTLI_MODE_TEXT);
        BrotliEncoderSetParameter(&params, BROTLI_PARAM_LGWIN, 22);
        size_t encodedSize = 0;
        BrotliEncoderCompress(data.size(), data.data(), &encodedSize, compressedData.data(), &params);
        compressedData.resize(encodedSize);
        return compressedData;
    }
  2. 解压缩函数:

    cpp 复制代码
    std::string brotliDecompress(const std::vector<uint8_t>& compressedData) {
        size_t outputSize = BrotliDecoderDecompressedSize(compressedData.size(), compressedData.data());
        std::vector<uint8_t> decompressedData(outputSize);
        BrotliDecoderResult result = BrotliDecoderDecompress(compressedData.size(), compressedData.data(), &outputSize, decompressedData.data());
        if (result != BROTLI_DECODER_RESULT_SUCCESS) {
            throw std::runtime_error("Brotli decompression failed");
        }
        return std::string(decompressedData.begin(), decompressedData.end());
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string data = "this is an example for brotli compression";
        std::vector<uint8_t> compressedData = brotliCompress(data);
        std::cout << "Compressed data size: " << compressedData.size() << std::endl;
    
        std::string decompressedData = brotliDecompress(compressedData);
        std::cout << "Decompressed data: " << decompressedData << std::endl;
        return 0;
    }

3. LZMA

LZMA是一种高效的压缩算法,广泛用于7z文件格式。

详细代码示例(C++)

  1. 压缩函数:

    cpp 复制代码
    std::vector<uint8_t> lzmaCompress(const std::string& data) {
        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_ret ret = lzma_easy_encoder(&strm, LZMA_PRESET_DEFAULT, LZMA_CHECK_CRC64);
        if (ret != LZMA_OK) {
            throw std::runtime_error("LZMA encoder initialization failed");
        }
    
        strm.avail_in = data.size();
        strm.next_in = reinterpret_cast<uint8_t*>(const_cast<char*>(data.data()));
    
        std::vector<uint8_t> compressedData(data.size() + LZMA_HEADER_SIZE + LZMA_FOOTER_SIZE);
        strm.avail_out = compressedData.size();
        strm.next_out = compressedData.data();
    
        ret = lzma_code(&strm, LZMA_FINISH);
        if (ret != LZMA_STREAM_END) {
            throw std::runtime_error("LZMA compression failed");
        }
    
        compressedData.resize(strm.total_out);
        lzma_end(&strm);
        return compressedData;
    }
  2. 解压缩函数:

    cpp 复制代码
    std::string lzmaDecompress(const std::vector<uint8_t>& compressedData) {
        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_ret ret = lzma_stream_decoder(&strm, UINT64_MAX, LZMA_CONCATENATED);
        if (ret != LZMA_OK) {
            throw std::runtime_error("LZMA decoder initialization failed");
        }
    
        strm.avail_in = compressedData.size();
        strm.next_in = compressedData.data();
    
        std::vector<uint8_t> decompressedData(compressedData.size() * 2);
        strm.avail_out = decompressedData.size();
        strm.next_out = decompressedData.data();
    
        ret = lzma_code(&strm, LZMA_FINISH);
        if (ret != LZMA_STREAM_END) {
            throw std::runtime_error("LZMA decompression failed");
        }
    
        decompressedData.resize(strm.total_out);
        lzma_end(&strm);
        return std::string(decompressedData.begin(), decompressedData.end());
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string data = "this is an example for lzma compression";
        std::vector<uint8_t> compressedData = lzmaCompress(data);
        std::cout << "Compressed data size: " << compressedData.size() << std::endl;
    
        std::string decompressedData = lzmaDecompress(compressedData);
        std::cout << "Decompressed data: " << decompressedData << std::endl;
        return 0;
    }

4. Zstandard (Zstd)

Zstd是一种现代的压缩算法,结合了高压缩率和快速解压缩的特点。

详细代码示例(C++)

  1. 压缩函数:

    cpp 复制代码
    std::vector<uint8_t> zstdCompress(const std::string& data) {
        size_t compressedSize = ZSTD_compressBound(data.size());
        std::vector<uint8_t> compressedData(compressedSize);
        size_t result = ZSTD_compress(compressedData.data(), compressedSize, data.data(), data.size(), ZSTD_defaultCLevel());
        if (ZSTD_isError(result)) {
            throw std::runtime_error("Zstd compression failed");
        }
        compressedData.resize(result);
        return compressedData;
    }
  2. 解压缩函数:

    cpp 复制代码
    std::string zstdDecompress(const std::vector<uint8_t>& compressedData) {
        size_t decompressedSize = ZSTD_getFrameContentSize(compressedData.data(), compressedData.size());
        if (decompressedSize == ZSTD_CONTENTSIZE_UNKNOWN || decompressedSize == ZSTD_CONTENTSIZE_ERROR) {
            throw std::runtime_error("Zstd decompression failed: unknown content size");
        }
    
        std::vector<uint8_t> decompressedData(decompressedSize);
        size_t result = ZSTD_decompress(decompressedData.data(), decompressedSize, compressedData.data(), compressedData.size());
        if (ZSTD_isError(result)) {
            throw std::runtime_error("Zstd decompression failed");
        }
        return std::string(decompressedData.begin(), decompressedData.end());
    }
  3. 主函数:

    cpp 复制代码
    int main() {
        std::string data = "this is an example for zstd compression";
        std::vector<uint8_t> compressedData = zstdCompress(data);
        std::cout << "Compressed data size: " << compressedData.size() << std::endl;
    
        std::string decompressedData = zstdDecompress(compressedData);
        std::cout << "Decompressed data: " << decompressedData << std::endl;
        return 0;
    }

总结

这些数据压缩算法在不同的场景下具有各自的优势和适用性。无损压缩算法如Huffman编码、LZW编码和RLE适用于需要完全恢复原始数据的场景,而有损压缩算法如JPEG压缩则适用于对数据质量要求不高的场景。根据具体需求选择合适的压缩算法可以有效节省存储空间和传输带宽。

相关推荐
Sheep Shaun3 分钟前
C++ STL简介:构建高效程序的基石
开发语言·数据结构·c++·算法
点云SLAM6 分钟前
C++ 中二级指针的正确释放方法
开发语言·数据结构·c++·人工智能·算法
倔强的石头10624 分钟前
【C++指南】STL list容器完全解读(一):从入门到掌握基础操作
开发语言·c++·list
一匹电信狗1 小时前
【Linux我做主】进度条小程序深度解析
linux·运维·服务器·c++·ubuntu·小程序·unix
北冥没有鱼啊1 小时前
UE 像素和线框盒子 材质
c++·ue5·游戏开发·虚幻·材质
yuanManGan1 小时前
C++入门小馆:继承
java·开发语言·c++
m0_748319082 小时前
图论之幻想迷宫
算法
jia_jia_LL2 小时前
备战蓝桥杯国赛第一天-atcoder-beginner-contest404
c++·算法·蓝桥杯·图论
喜欢吃燃面2 小时前
数据结构之二叉树(4)
c语言·数据结构·c++·学习
朝九晚五ฺ3 小时前
【算法学习】递归、搜索与回溯算法(一)
数据结构·学习·算法·深度优先