XV6学习(12)Lab lock: Parallelism/locking

这一次实验是要对XV6内部的锁进行优化,减少锁争用,提高系统的性能。

Memory allocator (moderate)

第一个实验是对XV6内核的内存页面分配器进行改进,改进的策略在前面的章节中也讲过了。XV6原本是使用一个空闲页面链表,但是这样就会导致不同CPU上的kalloc和kfree会产生锁争用,内存页面的分配被完全串行化了,降低了系统的性能。

而一个改进策略就是为每个CPU核心分配一个空闲链表,kalloc和kfree都在本核心的链表上进行,只有当当前核心的链表为空时才去访问其他核心的链表。通过这种策略就可以减少锁的争用,只有当某核心的链表为空时才会发生锁争用。

首先定义NCPU个kmem结构体,并在kinit函数中对锁进行初始化。

struct { struct spinlock lock; struct run *freelist; char lock_name[7]; } kmem[NCPU]; void kinit() { for (int i = 0; i < NCPU; i++) { snprintf(kmem[i].lock_name, sizeof(kmem[i].lock_name), "kmem_%d", i); initlock(&kmem[i].lock, kmem[i].lock_name); } freerange(end, (void*)PHYSTOP); }

对于kfree函数只需要将释放的页面插入到当前核心对应链表上就行了

void kfree(void *pa) { ... r = (struct run*)pa; push_off(); int id = cpuid(); acquire(&kmem[id].lock); r->next = kmem[id].freelist; kmem[id].freelist = r; release(&kmem[id].lock); pop_off(); }

对于kalloc函数,当在当前核心上申请失败时,就尝试从其他核心上获取页面。使用快慢指针来找到链表的中点,之后将一半的页面移动到当前核心的链表上。

void * kalloc(void) { struct run *r; push_off(); int id = cpuid(); acquire(&kmem[id].lock); r = kmem[id].freelist; if(r) { kmem[id].freelist = r->next; } else { // alloc failed, try to steal from other cpu int success = 0; int i = 0; for(i = 0; i < NCPU; i++) { if (i == id) continue; acquire(&kmem[i].lock); struct run *p = kmem[i].freelist; if(p) { // steal half of memory struct run *fp = p; // faster pointer struct run *pre = p; while (fp && fp->next) { fp = fp->next->next; pre = p; p = p->next; } kmem[id].freelist = kmem[i].freelist; if (p == kmem[i].freelist) { // only have one page kmem[i].freelist = 0; } else { kmem[i].freelist = p; pre->next = 0; } success = 1; } release(&kmem[i].lock); if (success) { r = kmem[id].freelist; kmem[id].freelist = r->next; break; } } } release(&kmem[id].lock); pop_off(); if(r) memset((char*)r, 5, PGSIZE); // fill with junk return (void*)r; }

实验结果如下:

$ kalloctest start test1 test1 results: --- lock kmem/bcache stats lock: kmem_0: #fetch-and-add 0 #acquire() 77186 lock: kmem_1: #fetch-and-add 0 #acquire() 182362 lock: kmem_2: #fetch-and-add 0 #acquire() 173534 lock: bcache_bucket: #fetch-and-add 0 #acquire() 128 lock: bcache_bucket: #fetch-and-add 0 #acquire() 138 lock: bcache_bucket: #fetch-and-add 0 #acquire() 142 lock: bcache_bucket: #fetch-and-add 0 #acquire() 148 lock: bcache_bucket: #fetch-and-add 0 #acquire() 132 lock: bcache_bucket: #fetch-and-add 0 #acquire() 6 lock: bcache_bucket: #fetch-and-add 0 #acquire() 42 lock: bcache_bucket: #fetch-and-add 0 #acquire() 34 lock: bcache_bucket: #fetch-and-add 0 #acquire() 5916 lock: bcache_bucket: #fetch-and-add 0 #acquire() 32 lock: bcache_bucket: #fetch-and-add 0 #acquire() 242 lock: bcache_bucket: #fetch-and-add 0 #acquire() 128 lock: bcache_bucket: #fetch-and-add 0 #acquire() 128 --- top 5 contended locks: lock: proc: #fetch-and-add 31954 #acquire() 206502 lock: proc: #fetch-and-add 24395 #acquire() 206518 lock: proc: #fetch-and-add 9306 #acquire() 206501 lock: proc: #fetch-and-add 7463 #acquire() 206481 lock: proc: #fetch-and-add 5209 #acquire() 206480 tot= 0 test1 OK start test2 total free number of pages: 32493 (out of 32768) ..... test2 OK Buffer cache (hard)

这个实验是要对XV6的磁盘缓冲区进行优化。在初始的XV6磁盘缓冲区中是使用一个LRU链表来维护的,而这就导致了每次获取、释放缓冲区时就要对整个链表加锁,也就是说缓冲区的操作是完全串行进行的。

为了提高并行性能,我们可以用哈希表来代替链表,这样每次获取和释放的时候,都只需要对哈希表的一个桶进行加锁,桶之间的操作就可以并行进行。只有当需要对缓冲区进行驱逐替换时,才需要对整个哈希表加锁来查找要替换的块。

使用哈希表就不能使用链表来维护LRU信息,因此需要在buf结构体中添加timestamp域来记录释放的事件,同时prev域也不再需要。

struct buf { int valid; // has data been read from disk? int disk; // does disk "own" buf? uint dev; uint blockno; struct sleeplock lock; uint refcnt; // struct buf *prev; // LRU cache list struct buf *next; uchar data[BSIZE]; uint timestamp; };

在brelse函数中对timestamp域进行维护,同时将链表的锁替换为桶级锁:

void brelse(struct buf *b) { if(!holdingsleep(&b->lock)) panic("brelse"); releasesleep(&b->lock); int idx = hash(b->dev, b->blockno); acquire(&hashtable[idx].lock); b->refcnt--; if (b->refcnt == 0) { // no one is waiting for it. b->timestamp = ticks; } release(&hashtable[idx].lock); }

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wspdjy.html