返璞归真的Linux BFS调度器(3)

BFS调度器初始版本的链表的非O(n)遍历BFS调度器的发展历程中也经历了一个为了优化性能而引入“小手段”的时期,该“小手段”是如此合理,以至于每一个细节都值得品味,现表述如下:
大家都知道,遍历一个链表的时间复杂度是O(n),然而这只是遍历的开销,在BFS调度器中,遍历的目的其实就是pick-next,如果该链表某种意义上是预排序的,那么pick-next的开销可以减少到接近O(1)。BFS如何做到的呢?我们首先看一下virtual deadline的概念
virtual deadline(VD)
VD=jiffies + (prio_ratio * rr_interval)

其中prio_ratio为进程优先级,rr_interval为一个Deadline,表示该进程在最多多久内被调度,链表中的每一个entry代表一个进程,都有一个VD与之相关。VD的存在使得entry在链表的位置得以预排序,这里的预排序指的是vitrual deadline expire的影响下的预排序,BFS和O(n)的差别就在于这个expire,由于这个expire在,一般都会在遍历的途中遇到VD expire,进而不需要O(n)。基于VD的O(n)和基于优先级的O(n)是不同的,其区别在于根据上述的计算公式,VD是单调向前的,而优先级几乎是不怎么变化的,因此基于VD的O(n)调度器某种程度上和基于红黑树的CFS是一样的,VD也正类似于CFS中的虚拟时钟,只是数据结构不同而已,BFS用链表实现,CFS用红黑树实现。
        其实,O(n)并没有那么可怕,特别是在桌面环境中,你倒是有多少进程需要调度呢?理论上O(n)会随着进程数量的增加而效率降低,然而桌面环境下实际上没有太多的进程需要被调度,所以采用了BFS而抛弃了诸多小手段的调度器效果会更好些。理论上,CFS或者O(1)可以支持SMP下的诸多进程调度的高效性,然而,桌面环境下,第一,SMP也只是2到4个处理器,进程数也大多不超过1000个,进程在CPU之间蹦来蹦去,很累,何必杀鸡用牛刀呢?瓶颈不是鸡,而是杀鸡的刀,是吧!
pick-next算法BFS的pick-next算法对于SCHED_ISO进程依照以下的原则进行:
a.依照FIFO原则进行,不再遍历链表
BFS的pick-next算法对于SCHED_NORMAL或者SCHED_IDLEPRIO进程依照以下的原则进行:
a.遍历运行链表,比较每一个entry的VD,找出最小的entry,从链表中删除,投入运行
b.如果发现有entry的VD小于当前的jiffers,则停止遍历,取出该entry,投入运行--小手段
以上的原则可以总结为“最小最负最优先”原则。作者一席话如下:
BFS has 103 priority queues. 100 of these are dedicated to the static priority
of realtime tasks, and the remaining 3 are, in order of best to worst priority,
SCHED_ISO (isochronous), SCHED_NORMAL, and SCHED_IDLEPRIO (idle priority
scheduling). When a task of these priorities is queued, a bitmap of running
priorities is set showing which of these priorities has tasks waiting for CPU
time. When a CPU is made to reschedule, the lookup for the next task to get
CPU time is performed in the following way:

First the bitmap is checked to see what static priority tasks are queued. If
any realtime priorities are found, the corresponding queue is checked and the
first task listed there is taken (provided CPU affinity is suitable) and lookup
is complete. If the priority corresponds to a SCHED_ISO task, they are also
taken in FIFO order (as they behave like SCHED_RR). If the priority corresponds
to either SCHED_NORMAL or SCHED_IDLEPRIO, then the lookup becomes O(n). At this
stage, every task in the runlist that corresponds to that priority is checked
to see which has the earliest set deadline, and (provided it has suitable CPU
affinity) it is taken off the runqueue and given the CPU. If a task has an
expired deadline, it is taken and the rest of the lookup aborted (as they are
chosen in FIFO order).

Thus, the lookup is O(n) in the worst case only, where n is as described
earlier, as tasks may be chosen before the whole task list is looked over.

使用virtual deadline,类似于CFS的virtual runtime的概念,然而不要红黑树,而采用了双向链表来实现,因为红黑树的插入效率不如链表插入效率,在pick-next算法上虽然红黑树占优势,然而由于VD expire的存在也使得pick-next不再是O(n)了。
        BFS初始版本的小手段的意义在于减少O(n)遍历比较时间复杂度带来的恐惧。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/22177.html