HDFS patch前后Ganglia看到running processes变化的分析

日期：2021-04-02 栏目：程序人生浏览：次

Ganglia running processes是怎么算出来的？ganglia是通过 cat /proc/loadavg获得running processes的。
可得到如下值：0.00 0.28 0.61 1/591 2993。其中，1是running process，591是total process。

为了追踪ganglia图上突然出现的14个running processes，调查了一下，ps Haxh查询出来的total processes和/proc/loadavg的total processes是一致的，状态为R的即是running process。于是又写了个脚本，在启动namenode的期间，每隔0.5秒打印出/proc/loadavg 和 ps Haxh的数据，查看何时出现14个running processes的。

最后发现，不管有没有这个patch，都有可能出现running processes从1上升至6，甚至14的情况。这些processes是namenode的子线程，在某些情况下状态为Rl，R是运行态，l是多线程，在/proc/loadavg中被计算成了running processes。这些子线程运行的时间很短，ganglia是每分钟获得一次数据，很有可能没有采集到。

因此，我之前测试时ganglia图示上的差别，只是巧合，导致我认为加上patch之后有问题。我通过几次实验，看到没有patch时，也会出现ganglia图中running processes上升的情况。

PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process.
       D    Uninterruptible sleep (usually IO)
       R    Running or runnable (on run queue)
       S    Interruptible sleep (waiting for an event to complete)
       T    Stopped, either by a job control signal or because it is being traced.
       W    paging (not valid since the 2.6.xx kernel)
       X    dead (should never be seen)
       Z    Defunct ("zombie") process, terminated but not reaped by its parent.

For BSD formats and when the stat keyword is used, additional characters may be displayed:
       <    high-priority (not nice to other users)
       N    low-priority (nice to other users)
       L    has pages locked into memory (for real-time and custom IO)
       s    is a session leader
       l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
       +    is in the foreground process group

更多Hadoop相关信息见Hadoop 专题页面 ?tid=13

转载注明出处：http://www.heiqu.com/382f05879a99c039df4a887337de4a98.html

HDFS patch前后Ganglia看到running processes变化的分析

相关推荐