由此可见c->querybuf在连接第一次读取命令后的大小就会被分配至少1024*32,所以回过头再去看resize的清理逻辑就明显存在问题,每个被使用到的query buffer的大小至少就是1024*32,但是清理的时候判断条件是>1024,也就是说,所有的idle>2的被使用过的连接都会被resize掉,下次接收到请求的时候再重新分配到1024*32,这个其实是没有必要的,在访问比较频繁的群集,内存会被频繁得回收重分配,所以我们尝试将清理的判断条件改造为如下,就可以避免大部分没有必要的resize操作:
if (((querybuf_size > REDIS_MBULK_BIG_ARG) && (querybuf_size/(c->querybuf_peak+1)) > 2) || (querybuf_size > 1024*32 && idletime > 2)) { /* Only resize the query buffer if it is actually wasting space. */ if (sdsavail(c->querybuf) > 1024*32) { c->querybuf = sdsRemoveFreeSpace(c->querybuf); } }这个改造的副作用是内存的开销,按照一个实例5k连接计算,5000*1024*32=160M,这点内存消耗对于上百G内存的服务器完全可以接受。
【问题重现】在使用修改过源码的Redis server后,问题仍然重现了,客户端还是会报同类型的错误,且报错的时候,服务器内存依然会出现抖动。抓取内存堆栈信息如下:
Thu Jun 14 21:56:54 CST 2018 #3 0x0000003729ee893d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f2dc108d720 (LWP 27851)): #0 0x0000003729ee5400 in madvise () from /lib64/libc.so.6 #1 0x0000000000493a1e in je_pages_purge () #2 0x000000000048cf40 in arena_purge () #3 0x00000000004a7dad in je_tcache_bin_flush_large () #4 0x00000000004a85e9 in je_tcache_event_hard () #5 0x000000000042c0b5 in decrRefCount () #6 0x000000000042744d in resetClient () #7 0x000000000042963b in processInputBuffer () #8 0x0000000000429762 in readQueryFromClient () #9 0x000000000041847c in aeProcessEvents () #10 0x000000000041873b in aeMain () #11 0x0000000000420fce in main () Thu Jun 14 21:56:54 CST 2018 Thread 1 (Thread 0x7f2dc108d720 (LWP 27851)): #0 0x0000003729ee5400 in madvise () from /lib64/libc.so.6 #1 0x0000000000493a1e in je_pages_purge () #2 0x000000000048cf40 in arena_purge () #3 0x00000000004a7dad in je_tcache_bin_flush_large () #4 0x00000000004a85e9 in je_tcache_event_hard () #5 0x000000000042c0b5 in decrRefCount () #6 0x000000000042744d in resetClient () #7 0x000000000042963b in processInputBuffer () #8 0x0000000000429762 in readQueryFromClient () #9 0x000000000041847c in aeProcessEvents () #10 0x000000000041873b in aeMain () #11 0x0000000000420fce in main ()显然,Querybuffer被频繁resize的问题已经得到了优化,但是还是会出现客户端报错。这就又陷入了僵局。难道还有其他因素导致query buffer resize变慢?我们再次抓取pstack。但这时,jemalloc引起了我们的注意。此时回想Redis的内存分配机制,Redis为避免libc内存不被释放导致大量内存碎片的问题,默认使用的是jemalloc用作内存分配管理,这次报错的堆栈信息中都是je_pages_purge () redis在调用jemalloc回收脏页。我们看下jemalloc做了些什么:
arena_purge(arena.c) static void arena_purge(arena_t *arena, bool all) { arena_chunk_t *chunk; size_t npurgatory; if (config_debug) { size_t ndirty = 0; arena_chunk_dirty_iter(&arena->chunks_dirty, NULL, chunks_dirty_iter_cb, (void *)&ndirty); assert(ndirty == arena->ndirty); } assert(arena->ndirty > arena->npurgatory || all); assert((arena->nactive >> opt_lg_dirty_mult) < (arena->ndirty - arena->npurgatory) || all); if (config_stats) arena->stats.npurge++; npurgatory = arena_compute_npurgatory(arena, all); arena->npurgatory += npurgatory; while (npurgatory > 0) { size_t npurgeable, npurged, nunpurged; /* Get next chunk with dirty pages. */ chunk = arena_chunk_dirty_first(&arena->chunks_dirty); if (chunk == NULL) { arena->npurgatory -= npurgatory; return; } npurgeable = chunk->ndirty; assert(npurgeable != 0); if (npurgeable > npurgatory && chunk->nruns_adjac == 0) { arena->npurgatory += npurgeable - npurgatory; npurgatory = npurgeable; } arena->npurgatory -= npurgeable; npurgatory -= npurgeable; npurged = arena_chunk_purge(arena, chunk, all); nunpurged = npurgeable - npurged; arena->npurgatory += nunpurged; npurgatory += nunpurged; } }