Redis Key 的生与死

往 redis 写入一个 key，如果设置了过期时间，那么在到达过期时间时，这个 key 应该被删除。此时 redis 有两种方式进行删除：

主动删除
被动删除

理论上也可以注册一个 timer 在到期时提醒，不过这样做开销太大了

1、主动删除

一种是客户端主动使用 DEL 命令删除。

另一种则是 redis server 定时开个任务慢慢找，找到过期的就删除，也称为慢速定期回收。

通过定时任务 serverCron 清理过期的 key。serverCron 间隔 100ms（1000/hz）（可配置）调用 databasesCron 方法检测删除过期的 key：

void databasesCron(void){
   /* Expire keys by random sampling. Not required for slaves
    * as master will synthesize DELs for us. */
    if (server.active_expire_enabled && server.masterhost == NULL)
        activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
    …..
}

activeExpireCycle() 方法实现逻辑：

遍历至多 16 个DB，【由宏 CRON_DBS_PER_CALL 定义，默认为 16】
随机挑选 20 个带过期时间的 key【由宏 ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 定义，默认 20】
如果 key 过期，则将 key 相关的内存释放或者放入失效队列
如果操作时间超过允许的限定时间（至多 25 ms）。(timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100，,ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC=25,server.hz 默认为10)，则此次淘汰操作结束返回，否则进入步骤 5
如果该 DB 下，有超过 5 个 key (ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4=5) 实际失效，则进入 2，否则选择下一个 DB，再次进入 2
遍历完成，结束

2、被动删除

实现很简单，在用户访问 key 的时候检查一下 key 是否过期，如果过期，则释放掉，并返回 null，否则返回 key 的值。这样也能保证 key 的及时清理，除非没有访问，但还有主动删除检测呢。

lazy free

Redis 4 可以在异步线程中删除 key 而不会阻塞主线程，名字叫 lazy free 。新的 UNLINK 命令与 DEL 相同，以非阻塞的方式工作。

类似地，为了让整个数据集或单个数据库异步释放，在 FLUSHALL 和 FLUSHDB 中添加了 ASYNC 选项。(手动清除大的 key 可以使用unlink，不阻塞)。

内存满了之后怎么办？

很多人认为 redis 达到最大内存之后会删除一些 key 来保证新 key 的写入，这个看法只能说部分正确，能否保证新 key 的写入要看当前设置的内存淘汰策略。

内存淘汰策略（Eviction Policies）

在内存没满的时候，这些策略是不生效的。

noeviction：不允许淘汰，在写操作发生，但内存不够时，返回错误（默认设置）

在命名上，带 volatile 的表示操作范围是设置了过期时间的 key。allkeys 则是所有的 key。

需要 key 达到过期时间么？

不需要 key 达到过期时间。这些策略不会考虑 key 是否到达过期时间。只是圈定的范围不同。

volatile-lru（默认）

evict keys by trying to remove the less recently used (LRU) keys first, but only among keys that have an expire set, in order to make space for the new data added.

尝试移除最近最少使用的 keys

范围：限制在带有过期时间（volatile）的 key 上，因此如果 key 没有过期时间，就不生效了，缓存慢慢会被打满，无法淘汰。

适用场景：适用于有很多带有过期时间的 key 的场景，并且对最近不常访问的数据进行驱逐是可接受的。

allkeys-lru

从整体数据集（server.db[i].dict）中挑选最近最少使用的数据淘汰。

范围：所有的 key（对比 volatile-lru 只处理带有过期时间的 key）

evict keys by trying to remove the less recently used (LRU) keys first, in order to make space for the new data added.

Use the allkeys-lru policy when you expect a power-law distribution in the popularity of your requests, that is, you expect that a subset of elements will be accessed far more often than the rest. This is a good pick if you are unsure.

二八定律，20% 的 key 服务 80% 的请求。

如果 key 的访问符合二八原则，适合使用这个配置。

当采用 LRU 时，可以看到，从 0 号数据库开始(默认16个)，根据不同的策略，选择redisDb的dict(全部键)或者 expires(有过期时间的键)，用来更新候选键池子 pool，pool 更新策略是evictionPoolPopulate:

void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *samples[server.maxmemory_samples];
 
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;
 
        de = samples[j];
        key = dictGetKey(de);
 
        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }
 
        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
            idle = estimateObjectIdleTime(o);
        } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
            idle = 255-LFUDecrAndReturn(o);
        } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }
 
        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
        k = 0;
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */
 
                /* Save SDS before overwriting. */
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }
 
        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimizbla bla bla bla. */
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

随机选择 maxmeory_samples 数量的 key，然后计算这些 key 的空闲时间 idle time，当满足条件时（比 poll 中的某些键的空闲时间还大，就可以进入 pool）。pool 更新之后，淘汰 pool 中空闲时间最大的 key。

volatile-random

Remove a random key among the ones with an expire set.

在设置了过期时间的 key 中，使用随机淘汰策略

allkeys-random

对所有的 key 随机选择一个释放。

volatile-lfu

Evict using approximated LFU among the keys with an expire set.

从已设置过期时间的数据集挑选使用频率最低的数据淘汰。

allkeys-lfu

Evict any key using approximated LFU.

从整体数据集（server.db[i].dict）中挑选使用频率最低的数据淘汰。

volatile-ttl

Remove the key with the nearest expire time (minor TTL)

从已设置过期时间的数据集中挑选具有更早过期时间的 key 优先淘汰。

实现

几种淘汰策略就是在这个 freeMemoryIfNeeded 函数里面实现的。

int freeMemoryIfNeeded(void) {
    /* By default replicas should ignore maxmemory
     * and just be masters exact copies. */
    if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;
 
    size_t mem_reported, mem_tofree, mem_freed;
    mstime_t latency, eviction_latency;
    long long delta;
    int slaves = listLength(server.slaves);
 
    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return C_OK;
    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
        return C_OK;
 
    mem_freed = 0;
 
    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
        goto cant_free; /* We need to free memory, but policy forbids. */
 
    latencyStartMonitor(latency);
    while (mem_freed < mem_tofree) {
        int j, k, i, keys_freed = 0;
        static unsigned int next_db = 0;
        sds bestkey = NULL;
        int bestdbid;
        redisDb *db;
        dict *dict;
        dictEntry *de;
 
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {
            struct evictionPoolEntry *pool = EvictionPoolLRU;
 
            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;
 
                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. */
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
                if (!total_keys) break; /* No keys to evict. */
 
                /* Go backward from best to worst element to evict. */
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;
 
                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key);
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key);
                    }
 
                    /* Remove the entry from the pool. */
                    if (pool[k].key != pool[k].cached)
                        sdsfree(pool[k].key);
                    pool[k].key = NULL;
                    pool[k].idle = 0;
 
                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            }
        }
 
        /* volatile-random and allkeys-random policy */
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            /* When evicting a random key, we try to evict a key for
             * each DB, so we use the static 'next_db' variable to
             * incrementally visit all DBs. */
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                        db->dict : db->expires;
                if (dictSize(dict) != 0) {
                    de = dictGetRandomKey(dict);
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }
 
        /* Finally remove the selected key. */
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            /* We compute the amount of memory freed by db*Delete() alone.
             * It is possible that actually the memory needed to propagate
             * the DEL in AOF and replication link is greater than the one
             * we are freeing removing the key, but we can't account for
             * that otherwise we would never exit the loop.
             *
             * AOF and Output buffer memory will be freed eventually so
             * we only care about memory used by the key space. */
            delta = (long long) zmalloc_used_memory();
            latencyStartMonitor(eviction_latency);
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);
            else
                dbSyncDelete(db,keyobj);
            latencyEndMonitor(eviction_latency);
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);
            latencyRemoveNestedEvent(latency,eviction_latency);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id);
            decrRefCount(keyobj);
            keys_freed++;
 
            /* When the memory to free starts to be big enough, we may
             * start spending so much time here that is impossible to
             * deliver data to the slaves fast enough, so we force the
             * transmission here inside the loop. */
            if (slaves) flushSlavesOutputBuffers();
 
            /* Normally our stop condition is the ability to release
             * a fixed, pre-computed amount of memory. However when we
             * are deleting objects in another thread, it's better to
             * check, from time to time, if we already reached our target
             * memory, since the "mem_freed" amount is computed only
             * across the dbAsyncDelete() call, while the thread can
             * release the memory all the time. */
            if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
                if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                    /* Let's satisfy our stop condition. */
                    mem_freed = mem_tofree;
                }
            }
        }
 
        if (!keys_freed) {
            latencyEndMonitor(latency);
            latencyAddSampleIfNeeded("eviction-cycle",latency);
            goto cant_free; /* nothing to free... */
        }
    }
    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return C_OK;
 
cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
        if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
            break;
        usleep(1000);
    }
    return C_ERR;
}
 
/* This is a wrapper for freeMemoryIfNeeded() that only really calls the
 * function if right now there are the conditions to do so safely:
 *
 * - There must be no script in timeout condition.
 * - Nor we are loading data right now.
 *
 */
int freeMemoryIfNeededAndSafe(void) {
    if (server.lua_timedout || server.loading) return C_OK;
    return freeMemoryIfNeeded();
}

How the eviction process works

It is important to understand that the eviction process works like this:

A client runs a new command, resulting in more data added.
Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts keys according to the policy.
A new command is executed, and so forth.

So we continuously cross the boundaries of the memory limit, by going over it, and then by evicting keys to return back under the limits.

If a command results in a lot of memory being used (like a big set intersection stored into a new key) for some time the memory limit can be surpassed by a noticeable amount.

总结

redis 的 key 大致有三类死法：

客户端自行删除
有过期时间
1. 到达过期时间被主动扫描到之后删除
2. 到达过期时间在被动访问时被删除
3. 在内存满时被某种淘汰策略（volatile、allkeys）干掉
无过期时间
1. 在内存满时被某种 allkeys 策略干掉