Redis系列之淘汰策略介绍
文章目录
为什么需要Redis淘汰策略?
由于Redis内存是有大小的,当内存快满的时候,又没有过期数据,这个时候就会导致内存被占满,内存满了,自然就不能再放入新的数据。所以,就需要Redis的淘汰策略来保证可用性。
Redis淘汰策略分类
在Redis中提供了好几种淘汰策略,查看官方文档
https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/,找到如下几种淘汰策略:
Eviction Policy | Description |
---|---|
noeviction | New values aren't saved when memory limit is reached When a database uses replication, this applies to the primary database // 默认策略,默认不淘汰数据,能读不能写 |
allkeys-lru | Keeps most recently used keys; removes least recently used (LRU) keys // 基于伪LRU算法,在所有的key中去淘汰 |
allkeys-lfu | Keeps frequently used keys; removes least frequently used (LFU) keys // 基于伪LRU算法,在所有的key中去淘汰 |
allkeys-random | Randomly removes keys // 基于随机算法,在所有的key中去淘汰 |
volatile-lru | Removes least recently used keys with expire field set to true // 基于伪LRU算法,在设置了过期时间的key中去淘汰 |
volatile-lfu | Removes least frequently used keys with expire field set to true // 基于伪LFU算法,在设置了过期时间的key中去淘汰 |
volatile-random | Randomly removes keys with expire field set to true // 基于随机算法,在设置了过期时间的key中去淘汰 |
volatile-ttl | Removes least frequently used keys with expire field set to true and the shortest remaining time-to-live (TTL) value // 根据过期时间来,淘汰即将过期的 |
我们发现redis提供了8种不同的策略,只要在我们的config中配置maxmemory-policy即可指定相关的淘汰策略。
shell
maxmemory-policy noeviction # 默认淘汰策略,只能读不能写
Redis数据淘汰流程
淘汰流程:
- 首先,我们会有一个淘汰池,默认大小是16,并且里面的数据都是末尾淘汰机制。
- 每次指令操作的时候,会自旋判断当前的内存是否满足指令所需要的内存,内存满足,继续指令操作
- 如果当前内存不能满足时,判断淘汰机制是否为
noeviction
,是默认的noeviction
机制,OOM报错给用户,只能读不能写,如果不是默认的noeviction
机制会从淘汰池中的尾部拿取一个最适合淘汰的数据。- 取样,从Redis中随机获取取样的数据,不一次性读取所有的数据。
- 在取样的数据中,根据淘汰算法,找到最适合淘汰的数据
- 将最合适淘汰的取样数据跟淘汰池中的数据比较,是否比淘汰池中的数据更适合淘汰,如果更合适,才放入淘汰池
- 淘汰池按照适合的程度进行排序,最适合的数据放在尾部
- 将需要淘汰的数据从redis中删除,并且从淘汰池移除
源码验证淘汰流程
每次执行操作指令都会走freeMemoryIfNeeded
函数(evict.c文件)
cpp
/* This function is periodically called to see if there is memory to free
* according to the current "maxmemory" settings. In case we are over the
* memory limit, the function will try to free some memory to return back
* under the limit.
*
* The function returns C_OK if we are under the memory limit or if we
* were over the limit, but the attempt to free memory was successful.
* Otherwise if we are over the memory limit, but not enough memory
* was freed to return back under the limit, the function returns C_ERR. */
int freeMemoryIfNeeded(void) {
int keys_freed = 0;
/* By default replicas should ignore maxmemory
* and just be masters exact copies. */
/* 从库是否忽略内存淘汰机制,server.masterhost有配置,说明是从库 */
if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;
size_t mem_reported, mem_tofree, mem_freed;
mstime_t latency, eviction_latency, lazyfree_latency;
long long delta;
int slaves = listLength(server.slaves);
int result = C_ERR;
/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (clientsArePaused()) return C_OK;
/* 判断内存是否满,如果没有超过内存,直接返回 */
if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
return C_OK;
mem_freed = 0;
latencyStartMonitor(latency);
/* 如果策略为noeviction,默认不淘汰数据,直接报错OOM */
if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
goto cant_free; /* We need to free memory, but policy forbids. */
/* 内存不够的情况,一直自旋释放内存 */
while (mem_freed < mem_tofree) {
int j, k, i;
static unsigned int next_db = 0;
sds bestkey = NULL; // 定义最好的删除key
int bestdbid;
redisDb *db;
dict *dict;
dictEntry *de;
if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
{ // 如果淘汰算法是LRU、LFU、TTL
struct evictionPoolEntry *pool = EvictionPoolLRU; // 淘汰池,默认大小为16
// 自旋,找到合适的要淘汰的key
while(bestkey == NULL) {
unsigned long total_keys = 0, keys;
/* We don't want to make local-db choices when expiring keys,
* so to start populate the eviction pool sampling keys from
* every DB. */
/* 去不同的DB查找 */
for (i = 0; i < server.dbnum; i++) {
db = server.db+i;
dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
db->dict : db->expires; // 判断需要淘汰的范围,是所有数据还是过期的数据
if ((keys = dictSize(dict)) != 0) {
evictionPoolPopulate(i, dict, db->dict, pool);// 关键方法,从范围中取样,拿到最适合淘汰的数据
total_keys += keys;
}
}
if (!total_keys) break; /* No keys to evict. */ /*没有过期的key*/
/* Go backward from best to worst element to evict. */
for (k = EVPOOL_SIZE-1; k >= 0; k--) {
if (pool[k].key == NULL) continue;
bestdbid = pool[k].dbid;
if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
de = dictFind(server.db[pool[k].dbid].dict,
pool[k].key);
} else {
de = dictFind(server.db[pool[k].dbid].expires,
pool[k].key);
}
/* Remove the entry from the pool. */
if (pool[k].key != pool[k].cached)
sdsfree(pool[k].key);
pool[k].key = NULL;
pool[k].idle = 0;
/* If the key exists, is our pick. Otherwise it is
* a ghost and we need to try the next element. */
if (de) {
bestkey = dictGetKey(de);
break;
} else {
/* Ghost... Iterate again. */
}
}
}
}
/* volatile-random and allkeys-random policy */
else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
{
/* When evicting a random key, we try to evict a key for
* each DB, so we use the static 'next_db' variable to
* incrementally visit all DBs. */
for (i = 0; i < server.dbnum; i++) {
j = (++next_db) % server.dbnum;
db = server.db+j;
dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
db->dict : db->expires;
if (dictSize(dict) != 0) {
de = dictGetRandomKey(dict);
bestkey = dictGetKey(de);
bestdbid = j;
break;
}
}
}
/* Finally remove the selected key. */
/* 移除这个key */
if (bestkey) {
db = server.db+bestdbid;
robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
/* We compute the amount of memory freed by db*Delete() alone.
* It is possible that actually the memory needed to propagate
* the DEL in AOF and replication link is greater than the one
* we are freeing removing the key, but we can't account for
* that otherwise we would never exit the loop.
*
* Same for CSC invalidation messages generated by signalModifiedKey.
*
* AOF and Output buffer memory will be freed eventually so
* we only care about memory used by the key space. */
delta = (long long) zmalloc_used_memory();
latencyStartMonitor(eviction_latency);
/* 如果是异步淘汰,进行异步淘汰*/
if (server.lazyfree_lazy_eviction)
dbAsyncDelete(db,keyobj);// 异步淘汰机制
else
dbSyncDelete(db,keyobj); // 同步淘汰机制
latencyEndMonitor(eviction_latency);
latencyAddSampleIfNeeded("eviction-del",eviction_latency);
delta -= (long long) zmalloc_used_memory();
mem_freed += delta;
server.stat_evictedkeys++;
signalModifiedKey(NULL,db,keyobj);
notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
keyobj, db->id);
decrRefCount(keyobj);
keys_freed++;
/* When the memory to free starts to be big enough, we may
* start spending so much time here that is impossible to
* deliver data to the slaves fast enough, so we force the
* transmission here inside the loop. */
if (slaves) flushSlavesOutputBuffers();
/* Normally our stop condition is the ability to release
* a fixed, pre-computed amount of memory. However when we
* are deleting objects in another thread, it's better to
* check, from time to time, if we already reached our target
* memory, since the "mem_freed" amount is computed only
* across the dbAsyncDelete() call, while the thread can
* release the memory all the time. */
if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
/* Let's satisfy our stop condition. */
mem_freed = mem_tofree;
}
}
} else {
goto cant_free; /* nothing to free... */
}
}
result = C_OK;
cant_free:
/* We are here if we are not able to reclaim memory. There is only one
* last thing we can try: check if the lazyfree thread has jobs in queue
* and wait... */
if (result != C_OK) {
latencyStartMonitor(lazyfree_latency);
while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
result = C_OK;
break;
}
usleep(1000);
}
latencyEndMonitor(lazyfree_latency);
latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("eviction-cycle",latency);
return result;
}
evictionPoolPopulate
方法(evict.c文件)
cpp
/* This is an helper function for freeMemoryIfNeeded(), it is used in order
* to populate the evictionPool with a few entries every time we want to
* expire a key. Keys with idle time smaller than one of the current
* keys are added. Keys are always added if there are free entries.
*
* We insert keys on place in ascending order, so keys with the smaller
* idle time are on the left, and keys with the higher idle time on the
* right. */
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
int j, k, count;
// 需要取样的数据
dictEntry *samples[server.maxmemory_samples];
// 随机从需要取样的范围中得到取样的数据
count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
// 循环取样数据
for (j = 0; j < count; j++) {
unsigned long long idle;
sds key;
robj *o;
dictEntry *de;
de = samples[j];
key = dictGetKey(de);
/* If the dictionary we are sampling from is not the main
* dictionary (but the expires one) we need to lookup the key
* again in the key dictionary to obtain the value object. */
if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) { // 如果是ttl,只能从带有过期时间的数据中获取,所以不需要获取对象,其它的淘汰策略都需要去我们的键值对中获取值对象
if (sampledict != keydict) de = dictFind(keydict, key);
o = dictGetVal(de);
}
/* Calculate the idle time according to the policy. This is called
* idle just because the code initially handled LRU, but is in fact
* just a score where an higher score means better candidate. */
if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) { // 如果是LRU算法,采用LRU算法得到最长时间没访问的
idle = estimateObjectIdleTime(o);
} else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) { // 如果是LFU算法,根据LFU算法得到最少访问的,idle越大,越容易淘汰,因为是用255-LFUDecrAndReturn(o);
/* When we use an LRU policy, we sort the keys by idle time
* so that we expire keys starting from greater idle time.
* However when the policy is an LFU one, we have a frequency
* estimation, and we want to evict keys with lower frequency
* first. So inside the pool we put objects using the inverted
* frequency subtracting the actual frequency to the maximum
* frequency of 255. */
idle = 255-LFUDecrAndReturn(o);
} else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) { // ttl 直接根据时间来
/* In this case the sooner the expire the better. */
idle = ULLONG_MAX - (long)dictGetVal(de);
} else {
serverPanic("Unknown eviction policy in evictionPoolPopulate()");
}
/* Insert the element inside the pool.
* First, find the first empty bucket or the first populated
* bucket that has an idle time smaller than our idle time. */
/* 取样的数据,计算好淘汰的idle后,放入淘汰池中 */
k = 0;
while (k < EVPOOL_SIZE &&
pool[k].key &&
pool[k].idle < idle) k++; // 自旋,找到淘汰池中比当前key的idle小的最后一个下标
// k=0说明上面循环没进,也就是淘汰池中的所有数据都比当前数据的idle大,并且淘汰池的最后一个不为空,说明淘汰池也是满的,所以优先淘汰淘汰池中的数据
if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
/* Can't insert if the element is < the worst element we have
* and there are no empty buckets. */
continue;
} else if (k < EVPOOL_SIZE && pool[k].key == NULL) { // 插入到桶后面
/* Inserting into empty position. No setup needed before insert. */
} else { // 插入到中间,会进行淘汰池的数据移动
/* Inserting in the middle. Now k points to the first element
* greater than the element to insert. */
if (pool[EVPOOL_SIZE-1].key == NULL) {
/* Free space on the right? Insert at k shifting
* all the elements from k to end to the right. */
/* Save SDS before overwriting. */
sds cached = pool[EVPOOL_SIZE-1].cached;
memmove(pool+k+1,pool+k,
sizeof(pool[0])*(EVPOOL_SIZE-k-1));
// 假如当前数据比淘汰池的有些数据大,那么淘汰最小的
pool[k].cached = cached;
} else {
/* No free space on right? Insert at k-1 */
k--;
/* Shift all elements on the left of k (included) to the
* left, so we discard the element with smaller idle time. */
sds cached = pool[0].cached; /* Save SDS before overwriting. */
if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
memmove(pool,pool+1,sizeof(pool[0])*k);
pool[k].cached = cached;
}
}
/* Try to reuse the cached SDS string allocated in the pool entry,
* because allocating and deallocating this object is costly
* (according to the profiler, not my fantasy. Remember:
* premature optimization bla bla bla. */
/* 将当前的放入淘汰池 */
int klen = sdslen(key);
if (klen > EVPOOL_CACHED_SDS_SIZE) {
pool[k].key = sdsdup(key);
} else {
memcpy(pool[k].cached,key,klen+1);
sdssetlen(pool[k].cached,klen);
pool[k].key = pool[k].cached;
}
pool[k].idle = idle;
pool[k].dbid = dbid;
}
}
简要看了一遍源码,我们对redis数据的淘汰机制有了一定的理解,并且知道淘汰算法有8种,所以下面主要介绍一下Redis中比较重要的LRU算法和LFU算法
Redis中的LRU算法
LRU,Least Recently Used翻译过来就是最久未使用,LRU算法根据使用时间淘汰数据,越久没使用的数据越容易淘汰。
- 实现原理
- 首先,LRU算法是根据这个对象的操作访问时间来进行淘汰的,所以我们就需要知道这个对象最后的访问时间。
- 知道了对象的最后访问时间后,我们就需要跟当前的系统时间进行对比,计算出这个对象已经多久没访问
- 源码验证
在Redis源码中,有一个redisObject对象,这个对象就是我们redis中所有数据结构的对外对象,它里面有个字段叫做lru
redisObject对象 (server.h文件)
cpp
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
看注释,大概也能猜出来,redis去实现lru淘汰算法跟这个lru对象有关,这个字段大小为24bit,记录的是对象操作访问时候的秒单位的后24位(bit),然后怎么获取秒单位的后24位?看一下例子:
java
long currentTimeMillis = System.currentTimeMillis();
System.out.println(currentTimeMillis/1000); // 获取当前秒
System.out.println(currentTimeMillis/1000 & ((1<<24)-1));// 获取秒的后24位
控制台打印一下,得到两个10进制参数
用二进制转换平台转换一下,1715915460
二进制1100110010001101100101011000100
4639428
二进制10001101100101011000100
两个参数对比一下,确实是拿到了最后24位
currentTimeMillis/1000 & ((1<<24)-1)
为什么能获取到当前时间(二进制)的最后24位?还是画图看看,一个数和24个1进行二进制的与运算,就是获取最后24位数,如图所示
然后怎么获取24个1?细心的读者可能已经知道了,没错,就是(1<<24)-1
,1左移24位再减1,如图所示:
二进制不熟悉,可以参考二进制运算
生活中的例子:
场景一:数据在5月份被访问,现在是8月份,我们可以通过8-3=5
,得到这个对象3个月没访问
场景二:数据在5月份被访问,现在是3月份,我们可以通过:3+12-5得到这个对象10个月没访问
同理:
如果redisObject.lru < lruclock
,直接通过lruclock-redisObject.lru
得到这个对象多久没访问。
如果redisObject.lru > lruclock
,直接通过lruclock+(24bit的最大值-redisObject.lru)
通过redis源码验证一下,发现源码的思路和我们上面所说是差不多的,查看estimateObjectIdleTime方法(evict.c)
cpp
/* Given an object returns the min number of milliseconds the object was never
* requested, using an approximated LRU algorithm. */
unsigned long long estimateObjectIdleTime(robj *o) {
// 获取秒单位时间的最后24位
unsigned long long lruclock = LRU_CLOCK();
// 因为只有24位,所有最大值为2的24次方-1
// 超过最大值从0开始,所以需要判断lruclock(当前系统时间)跟缓存对象的lru字段的大小
if (lruclock >= o->lru) {
// 如果lruclock>=robj.lru,返回lruclock->lru,再转换单位
return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
} else {
// 否则,lruclock+(LRU_CLOCK_MAX - o->lru),得到的对象的值越小,返回的值越大,越大越容易被淘汰
return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
LRU_CLOCK_RESOLUTION;
}
}
Redis中的LFU算法
LFU,英文Least Frequently Used,翻译成中文就是最不常用的优先淘汰。不常用,它的衡量标准就是次数,次数越少的越容易淘汰。
- LFU的时效性问题
LFU算法有个问题需要去考虑,就是这个时效性问题,什么是时效性问题?就是去统计这个次数的时候,不能仅仅只考虑数量,而不考虑时间
举个例子,假如去年有一个新闻,很火,假如点击量是3000w,那么今年再有一个新闻出来,刚出来,点击量是1000w,本来我们应该让今年这个新闻显示出来的,去年的新闻虽然太火,但是也是去年的,我们推荐系统肯定不希望这个新闻继续上热搜的,所以推荐系统就需要考虑到数量同时兼顾这个时间问题
所以,如果根据LFU来做的话,仅根据使用次数来淘汰数据,很容易淘汰今年的新闻,所以容易导致新的数据进不去,旧的数据出不来,不过Redis里的LFU算法肯定是有考虑到这个问题的,具体是怎么实现的?
- 源码分析
来看redisObject的结构体,在server.h代码里,看里面注释,大概也知道在LFU算法的时候,里面这个lru,它前面16位代表的是时间,后8位代表的是一个数值,frequenct频率,应该就是代表这个对象的访问次数,我们先给它叫做counter
cpp
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
前16bits代表时间,有啥用?跟时间相关,可以猜想应该和时效性有关。大胆猜测,这个时间是不是去记录对象多久没访问,如果多久没访问,就去减少对应的次数
找到Redis源码里的evict.c的LFUDecrAndReturn
函数:
cpp
/* If the object decrement time is reached decrement the LFU counter but
* do not update LFU fields of the object, we update the access time
* and counter in an explicit way when the object is really accessed.
* And we will times halve the counter according to the times of
* elapsed time than server.lfu_decay_time.
* Return the object frequency counter.
*
* This function is used in order to scan the dataset for the best object
* to fit: as we check for the candidate, we incrementally decrement the
* counter of the scanned objects if needed. */
unsigned long LFUDecrAndReturn(robj *o) {
// lru字段右移8位,得到前面16位的时间
unsigned long ldt = o->lru >> 8;
// lru字段与255进行&运算,255代表8位的最大值,也就是二进制的8个1,得到8位counter值
unsigned long counter = o->lru & 255;
// 如果配置了lfu_decay_time,用LFUTimeElapsed(ldt)除以配置的值,总的没访问的分钟时间除以配置值,得到每分钟没访问,需要减少多少访问次数
unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
if (num_periods)
// 不能减少为负数
counter = (num_periods > counter) ? 0 : counter - num_periods;
return counter;
}
redis配置
lfu-decay-time 1 // 多少分钟没操作访问就减1次
而对应8bit的次数,最大值是255,可以看下redis源码LFULongIncr
函数,在evict.c
cpp
/* Logarithmically increment a counter. The greater is the current counter value
* the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
// 如果等于255,直接返回255,8位的最大值
if (counter == 255) return 255;
// 得到随机数,0到1之间
double r = (double)rand()/RAND_MAX;
// LFU_INIT_VAL表示基数值,默认为5,在server.h配置
double baseval = counter - LFU_INIT_VAL;
// 如果当前counter小于基数,那么p=1,r肯定小于p,所以counter肯定加
if (baseval < 0) baseval = 0;
// 不然,按照几率来校验counter是否加,跟baseval和lfu_log_factor这两个参数相关,因为都是在分母,所以两个值越大,p越小,也就是counter++的概率越小
double p = 1.0/(baseval*server.lfu_log_factor+1);
if (r < p) counter++;// p越小,counter++的几率就越小,反之亦然
return counter;
}
所以,LFU
的实现逻辑,可以总结一下:
- 如果达到255最大值,
counter
就不加,因为达到255的几率不是很高,可以支撑很大的数据量 counter
是随机添加,添加的概率和已有的counter
值和配置的lfu-log-factor
两个参数相关,已有的counter
值越大,添加的几率越小,配置的lfu-log-factor
值越大,添加的几率也越小
在redis官网找到如图的压测数据图,里面facror
就是配置的lfu_log_factor
,可以看到配置的值越大,需要达到255的最大值就需要更多的hits