Redis压缩列表 - 技术栈

前言

压缩列表是列表键和哈希键的底层实现之一。当一个列表键只包含少量列表项，并且列表项要么就是小整数值，要么就是长度比较短的字符串，那么redis就会使用压缩列表来做列表键的底层实现。

另外当一个哈希键只包含少量键值对，并且每一个键值对的键和值要么就是小整数值，要么就是长度比较短的字符串，那么Redis就会使用压缩列表来做哈希键的底层实现。

一. 压缩列表的构成

压缩列表是Redis为了节约内存而开发的，是由一系列特殊编码的连续内存块组成的顺序型数据结构。一个压缩列表可以包含任意多个节点(entry)。每一个节点可以保存一个字节数组或者一个整数值。

下图展示了压缩列表的各个组成部分，以及记录了各个组成部分的类型，长度以及用途。

cpp 复制代码

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

示例：

列表zlbytes属性值为0xd2(十进制210)，表示压缩列表的总长度为210字节。
列表zltail属性的值为0xb3(十进制179)，表示如果我们有一个指向压缩列表的起始指针p，那么只要用真正p加上偏移量179，就可以计算出表尾节点entry5的地址。
列表zllen属性值为0x5(十进制5)，表示压缩列表包含五个节点。

二. 压缩列表节点的构成

每一个压缩列表节点可以保存一个字节数组或者一个整数值，其中字节数组可以是以下三种长度的其中一种：

长度小于等于63(2^6-1)字节的字节数组
长度小于等于16383(2^14-1)字节的字节数组
长度小于等于4294967295(2^32-1)字节的字节数组

而整数值则可以是以下六种长度的其中一种：

4位长，介于0~12之间的无符号整数
1字节长的有符号整数
3字节长的有符号整数
int16_t类型整数
int32_t类型整数
int64_t类型整数

cpp 复制代码

/* Different encoding/length possibilities */
#define ZIP_STR_MASK 0xc0
#define ZIP_INT_MASK 0x30
#define ZIP_STR_06B (0 << 6)
#define ZIP_STR_14B (1 << 6)
#define ZIP_STR_32B (2 << 6)
#define ZIP_INT_16B (0xc0 | 0<<4)
#define ZIP_INT_32B (0xc0 | 1<<4)
#define ZIP_INT_64B (0xc0 | 2<<4)
#define ZIP_INT_24B (0xc0 | 3<<4)
#define ZIP_INT_8B 0xfe

2.1 节点介绍

在Redis源码中ziplist.c/zlentry结构为压缩列表节点组成：

cpp 复制代码

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    //prevrawlen用到的字节数
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    //上一个entry长度
    unsigned int prevrawlen;     /* Previous entry len. */
    //len使用的字节数
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    //保存字符串表示字符串长度
    //保存整数使用的字节数
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    //整数或字符串编码
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    //节点起始指针
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

注意：压缩列表实际存储的时候并不是这样，只是在计算的时候需要把节点从内存编码中转换出来才方便写逻辑，于是有了这么一个结构。

实际上一个压缩节点的内存存储如下图：

2.1.1 previous_entry_length

节点的 previous_entry_length属性以字节为单位，记录了压缩列表中前一个节点的长度。属性长度可以是1字节或者5字节。

如果前一个节点的长度小于254字节，那么previous_entry_length属性长度为1字节，前一个节点的长度就保存在这一个字节里面。
如果前一个节点的长度大于等于254字节，那么previous_entry_length属性长度为5字节，其中属性的第一个字节会被设置为0xfe(十进制254)，而之后的四个字节则用来保存前一个节点的长度。

cpp 复制代码

#define ZIP_BIG_PREVLEN 254 /* ZIP_BIG_PREVLEN - 1 is the max number of bytes of
                               the previous entry, for the "prevlen" field prefixing
                               each entry, to be represented with just a single byte.
                               Otherwise it is represented as FE AA BB CC DD, where
                               AA BB CC DD are a 4 bytes unsigned integer
                               representing the previous entry len. */

/* Encode the length of the previous entry and write it to "p". Return the
 * number of bytes needed to encode this length if "p" is NULL. */
unsigned int zipStorePrevEntryLength(unsigned char *p, unsigned int len) {
    if (p == NULL) {
        return (len < ZIP_BIG_PREVLEN) ? 1 : sizeof(uint32_t) + 1;
    } else {
        if (len < ZIP_BIG_PREVLEN) {
            p[0] = len;
            return 1;
        } else {
            return zipStorePrevEntryLengthLarge(p,len);
        }
    }
}

/* Encode the length of the previous entry and write it to "p". This only
 * uses the larger encoding (required in __ziplistCascadeUpdate). */
int zipStorePrevEntryLengthLarge(unsigned char *p, unsigned int len) {
    uint32_t u32;
    if (p != NULL) {
        p[0] = ZIP_BIG_PREVLEN;
        u32 = len;
        memcpy(p+1,&u32,sizeof(u32));
        memrev32ifbe(p+1);
    }
    return 1 + sizeof(uint32_t);
}

示例：

下图展示了一个包含一字节长的previous_entry_length属性的压缩列表节点，属性值为0x05，表示前一个节点大小为5字节。

下图展示了一个包含5字节长的 previous_entry_length属性的压缩节点，属性值为0xFE00002766，其中高位0xFE表示previous_entry_length的大小为5字节。而后四个字节0x00002766(十进制10086)前世前一个节点的实际长度。

获得前一个节点的长度大小：

cpp 复制代码

/* Return the number of bytes used to encode the length of the previous
 * entry. The length is returned by setting the var 'prevlensize'. */
#define ZIP_DECODE_PREVLENSIZE(ptr, prevlensize) do {                          \
    if ((ptr)[0] < ZIP_BIG_PREVLEN) {                                          \
        (prevlensize) = 1;                                                     \
    } else {                                                                   \
        (prevlensize) = 5;                                                     \
    }                                                                          \
} while(0)

/* Return the length of the previous element, and the number of bytes that
 * are used in order to encode the previous element length.
 * 'ptr' must point to the prevlen prefix of an entry (that encodes the
 * length of the previous entry in order to navigate the elements backward).
 * The length of the previous entry is stored in 'prevlen', the number of
 * bytes needed to encode the previous entry length are stored in
 * 'prevlensize'. */
#define ZIP_DECODE_PREVLEN(ptr, prevlensize, prevlen) do {                     \
    ZIP_DECODE_PREVLENSIZE(ptr, prevlensize);                                  \
    if ((prevlensize) == 1) {                                                  \
        (prevlen) = (ptr)[0];                                                  \
    } else { /* prevlensize == 5 */                                            \
        (prevlen) = ((ptr)[4] << 24) |                                         \
                    ((ptr)[3] << 16) |                                         \
                    ((ptr)[2] <<  8) |                                         \
                    ((ptr)[1]);                                                \
    }                                                                          \
} while(0)

因为节点previous_entry_length属性记录了前一个节点的长度，所以程序可以通过指针运算，根据当前节点的起始地址来计算出前一个节点的起始地址。从而可以达到压缩列表从表尾向表头遍历。

2.1.2 encoding

节点的encoding属性记录了节点的content属性所保存数据的类型以及长度。

一字节，两字节或者五字节长，值的最高位为00，01，10的是字节数组编码。这种编码表示节点的content属性保存着字节数组，数组长度由编码除去最高两位之后的其他位记录。
一字节长，值的最高位以11开头的是整数编码，这种编码表示节点的content属性保存的是整数值，整数的类型和长度由编码去除最高两位之后的其它位记录。

下表1记录了所有可用的字节数组编码，表2则记录了所有可用的整数编码。表格中的"_"表示留空，而b，x等遍历表示实际的二进制数据。

a、ZIP_INT_16B 第1个字节为 |11000000|，总共占用 3 个字节。后 2 字节表示 16位整数；

b、ZIP_INT_32B 第1个字节为 |11010000|，总共占用 5 个字节。后 4 字节表示 32位整数；

c、ZIP_INT_64B 第1个字节为 |11100000|，总共占用 9 个字节。后 8 字节表示 64位整数；

d、ZIP_INT_24B 第1个字节为 |11110000|，总共占用 4 个字节。后 3 字节表示 24位整数；

e、ZIP_INT_8B 第1个字节为 |11111110|，总共占用 2 个字节。后 1 字节表示 8 位整数；

f、|1111xxxx| 用来表示 0 到 12 的 4 位整数，|xxxx| 的取值为 |0001| 到 |1101| （其中 |0000| 、|1110|、 |1111| 因为已经有编码占用，所以不能用）。举个例子， |0001| 代表的是 0， |0002| 代表 1，以此类推。

cpp 复制代码

/* Macro to determine if the entry is a string. String entries never start
 * with "11" as most significant bits of the first byte. */
#define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK)

/* Different encoding/length possibilities */
#define ZIP_STR_MASK 0xc0
#define ZIP_INT_MASK 0x30
#define ZIP_STR_06B (0 << 6)
#define ZIP_STR_14B (1 << 6)
#define ZIP_STR_32B (2 << 6)
#define ZIP_INT_16B (0xc0 | 0<<4)
#define ZIP_INT_32B (0xc0 | 1<<4)
#define ZIP_INT_64B (0xc0 | 2<<4)
#define ZIP_INT_24B (0xc0 | 3<<4)
#define ZIP_INT_8B 0xfe

/* 4 bit integer immediate encoding |1111xxxx| with xxxx between
 * 0001 and 1101. */
#define ZIP_INT_IMM_MASK 0x0f   /* Mask to extract the 4 bits value. To add
                                   one is needed to reconstruct the value. */
#define ZIP_INT_IMM_MIN 0xf1    /* 11110001 */
#define ZIP_INT_IMM_MAX 0xfd    /* 11111101 */

2.1.3 content

节点的content属性负责保存节点的值，节点值可以是字符数组或者整数，值的类型和长度由节点的encoding属性决定。

下图展示了一个保存字节数组的节点示例：

编码的最高两位00表示节点保存的是一个字节数组。
编码后六位001011表示字符数组的长度是11。
content字符数组保存的值为"hello world"。

下图展示了一个保存整数的节点示例：

编码的最高两位11表示节点保存的是一个整数。
编码后六位000000表示保存整数的类型为int16_t。
content属性保存的节点值为10086。

三. 连锁更新

3.1 新增节点导致连锁更新

节点中的previous_entry_length属性保存的是前一个节点占的字节数。

当前一个节点占字节数小于254字节时，previous_entry_length属性大小为一字节。
前一个节点占字节数大于等于254字节时，previous_entry_length属性大小为五字节。

有一种情况，当压缩列表中，有多个连续长度介于250到253字节大小的节点e1至eN。如下图：

因为e1到eN的所有节点的长度都小于254字节，所有 previous_entry_length属性大小为1字节。也就是，e1至eN所有节点的previous_entry_length属性的大小都是1字节。

这时，如果我们将一个长度大于254字节的新节点new设置为压缩列表的表头节点，也就是将new节点设置为e1节点的前置节点。

因为e1节点的previous_entry_length属性大小为1字节，不能保存前一个new的大小，所以程序将对压缩列表执行空间重分配操作，并将e1节点的previous_entry_length属性大小从1字节扩展到5字节长。

这时，e1的大小从介于250到253字节之间，扩展到了254到257字节。而这种长度使用1字节的previous_entry_length属性时无法保存的。

因此为了让e2的previous_entry_length属性能保存e1节点的长度，e2节点的也需要进行空间重分配操作，previous_entry_length属性大小也需要从1字节扩展到5字节。这样会出现一连串的连锁反应。e2节点也会引发对e3的扩展，而扩展e3节点也会引发对e4的扩展...... 为了让每一个节点的previous_entry_length属性都符合压缩列表对节点的要求，程序需要不断地对压缩列表进行空间重分配，直到eN为止。

Redis将这种在特殊情况下尝试多次空间扩展操作称之为"连锁更新"。

3.2 删除节点导致连锁更新

除了增加节点会导致连锁更新外，删除节点也会引发连锁更新。

当有一连串大小介于250到253字节之间节点e1到eN，而e1地前一个small节点长度小于 254字节，e1节点只需要previous_entry_length属性1字节就能保存small节点地大小。而small节点地前一个big节点长度大于等于254字节，需要small节点previous_entry_length属性5字节才能保存big节点地长度，但是small节点地长度是小于254字节。

当我们删除small节点时，big节点变成了e1节点地前一个节点。由于bid节点大于等于254字节，程序需要对e1节点进行扩展操作，previous_entry_length属性从1字节扩展到了5字节。e1节点地大小也从介于250字节到253字节之间，扩展到了254字节到257字节之间。

e1节点地扩展，引发了e2节点地扩展，e2节点地扩展引发了e3节点地扩展......由此引发了之后地连锁更新。

3.3 连锁更新地时间复杂度

连锁更新最坏地情况下，需要对压缩列表执行N次空间分配操作，而每次空间重分配地最坏复杂度为O(N)，所以连锁更新地最坏时间复杂度为O(N^2)。

但是，尽管连锁更新的时间复杂度很高，但是它真正造成性能问题的几率是很低的：

首先，压缩列表中要恰好有多个连续，长度介于250字节到253字节之间的节点，连锁更新才会被引发，在实际中并不常见。
其次，即使出现连锁更新，但只有更新的节点并不多，就不会对性能照成任何影响。

因为以上原因，ziplistPush等命令的平均时间复杂度仅为O(N)，在实际中，我们可以放心的使用这些函数，而不必担心连锁更新会影响压缩列表的性能。

四. 压缩列表API

因为ziplistPush，ziplistInsert，ziplistDelete，ziplistDeleteRange四个函数都可能引发连锁更新，所以他们的最坏时间复杂度为O(N^2)。