Redis系列之底层数据结构ZipList

实验环境

Redis 6.0

什么是Ziplist？

Ziplist，压缩列表，这种数据结构会根据存入数据的类型和大小，分配大小不同的空间，所以是为了节省内存而采用的。因为这种数据结构是一种完整连续的数据单元，所以一旦发生数据改变，会产生连锁更新，严重影响访问性能，所以这种数据结构只适应于数据量比较小的情况。

ZipList可以存储字符串或者整数，存储整数时是采用整数的二进制而不是字符串形式存储。

ZipList数据结构

在Redis的源码找到ziplist.c，链接：https://github.com/redis/redis/blob/6.0/src/ziplist.c

java 复制代码

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient. It stores both strings and integer values,
 * where integers are encoded as actual integers instead of a series of
 * characters. It allows push and pop operations on either side of the list
 * in O(1) time. However, because every operation requires a reallocation of
 * the memory used by the ziplist, the actual complexity is related to the
 * amount of memory used by the ziplist.
 *
 * ----------------------------------------------------------------------------
 *
 * ZIPLIST OVERALL LAYOUT
 * ======================
 *
 * The general layout of the ziplist is as follows:
 *
 * <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend> // ziplist的整体布局
 *
 * NOTE: all fields are stored in little endian, if not specified otherwise.
 *
 * <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that
 * the ziplist occupies, including the four bytes of the zlbytes field itself.
 * This value needs to be stored to be able to resize the entire structure
 * without the need to traverse it first.
 *
 * <uint32_t zltail> is the offset to the last entry in the list. This allows
 * a pop operation on the far side of the list without the need for full
 * traversal.
 *
 * <uint16_t zllen> is the number of entries. When there are more than
 * 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
 * entire list to know how many items it holds.
 *
 * <uint8_t zlend> is a special entry representing the end of the ziplist.
 * Is encoded as a single byte equal to 255. No other normal entry starts
 * with a byte set to the value of 255.
 *
 * ZIPLIST ENTRIES
 * ===============
 *
 * Every entry in the ziplist is prefixed by metadata that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the entry encoding is
 * provided. It represents the entry type, integer or string, and in the case
 * of strings it also represents the length of the string payload.
 * So a complete entry is stored like this:
 *
 * <prevlen> <encoding> <entry-data> // entry里面的结构
 * ...
 */

根据源码里的说明，画出ziplist的数据结构：

zlbytes：字段的类型是unit32_t，这个字段存储的是整个ziplist所占用的内存字节数
zltail：字段的类型是unit32_t，指的是ziplist中最后一个entry的偏移量，用于快速定义最后一个entry，可以快速完成pop等操作
zllen：字段的类型是unit16_t，指的是整个ziplist中entry的数量
zlend：是一个终止字节，值为全F，即0xff

entry数据结构

从上面的介绍，还知道entry也是很重要的数据结构，看Redis源码里的介绍，entry一般分为如下圈出的两种结构

第一种结构，<prevlen> <encoding> <entry-data>，这种是常规的结构
- prevlen：前一个entry的大小
- encoding：不同的情况encoding的值不同，encoding用于表示当前的entry的类型和长度
- entry-data：用于存储entry的数据
第二种结构，<prevlen><encoding>，当entry存储的是int类型的数据时，encoding和entry-data会合并在encoding中表示，此时数据结构为<prevlen><encoding>

在Redis中，存储数据时会尝试将string类型数据转为int类型数据，以节省内存空间

prevlen编码
当前一个元素长度小于254的时候，prevlen长度为1个字节，值即为前一个entry的长度；如果元素长度大于等于254的时候，prevlen用5个字节表示，第一字节设置为254，后面4个字节存储一个小端的无符号整型，用来表示前一个entry的长度

shell 复制代码

<prevlen from 0 to 253> <encoding> <entry> // 元素长度小于254
0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry> // 元素长度大于254

源码里的原文：

encoding编码

encoding的长度和值根据存储的数据类型和长度而定，前两位表示数据类型，比如int类型，用"11"表示，其它的表示string类型

源码里的注释给出的例子，分别举例表示encoding存储String类型数据和int类型数据的场景

entry-data
entry-data存储entry的数据，存储的数据为字符串类型时才会有，为int类型时，encoding和entry-data会合并

什么情况使用ZipList?

上面说了，ziplist数据是连续的，是一个完整的内存空间，所以如果发生数据变更，会产生连锁更新，影响访问性能，所以只适应于数据量比较小的情况。

在Redis中有两种数据结构，hashes和sorted Sets在某些情况会使用ZipList，hashes只有小数据量的时候才会用到ziplist，当hash对象符合如下两个条件的时候，使用ziplist

hash对象保存的键值对数量小于512个；
所有的键值对的键和值的字符长度都小于64byte

redis.conf配置

shell 复制代码

hash-max-ziplist-value 64  // ziplist中最大能存放的值长度
hash-max-ziplist-entries 512 // ziplist中最多能存放的entry数量

sorted Sets在某些情况使用zipList，如果元素数量小于128个，或者任一个member长度小于64字节，如果两个条件都超过，即大于等于，使用Skiplist+dict存储

redis.conf配置：

shell 复制代码

zset-max-ziplist-entries 128
zset-max-ziplist-value 64

Ziplist为什么省内存？

ZipList在设计的时候就尽量让每个元素按照实际大小存储，所以在entry里增加encoding字段，这个字段表示entry的类型和长度，redis针对不同的encoding来细化存储大小，所以比起普通的List，ziplist会更省内存，普通的list每个元素占用的内存空间大小取决于最大元素所占用的内存空间。

按照这种设计方式，Ziplist无疑会更省内存，但是也有一个问题，就是遍历比起普通的list会更麻烦，所以新增了prelen字段，这个字段记录上一个元素的length