系统级性能分析工具perf的介绍与使用【转】 (8)

如果需要统计更多的项,需要使用-e,如:

perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls  

结果如下,关注的特殊项也纳入统计。

al@al-System-Product-Name:~/perf$ sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

Performance counter stats for \'ls\':

2.319422      task-clock (msec)         #    0.719 CPUs utilized         
                  0      context-switches          #    0.000 K/sec                 
                  0      cpu-migrations            #    0.000 K/sec                 
                 89      page-faults               #    0.038 M/sec                 
          2,142,386      cycles                    #    0.924 GHz                   
            659,800      stalled-cycles-frontend   #   30.80% frontend cycles idle  
            725,343      stalled-cycles-backend    #   33.86% backend cycles idle   
          1,344,518      instructions              #    0.63  insn per cycle        
                                                   #    0.54  stalled cycles per insn
      <not counted>      branches                                                   
      <not counted>      branch-misses                                              
      <not counted>      L1-dcache-loads                                            
      <not counted>      L1-dcache-load-misses                                      
      <not counted>      LLC-loads                                                  
      <not counted>      LLC-load-misses                                            
      <not counted>      dTLB-loads                                                 
      <not counted>      dTLB-load-misses                                           

0.003227507 seconds time elapsed

3.4 perf bench

perf bench作为benchmark工具的通用框架,包含sched/mem/numa/futex等子系统,all可以指定所有。

perf bench可用于评估系统sched/mem等特定性能。

perf bench sched:调度器和IPC机制。包含messaging和pipe两个功能。

perf bench mem:内存存取性能。包含memcpy和memset两个功能。

perf bench numa:NUMA架构的调度和内存处理性能。包含mem功能。

perf bench futex:futex压力测试。包含hash/wake/wake-parallel/requeue/lock-pi功能。

perf bench all:所有bench测试的集合

3.4.1 perf bench sched all

测试messaging和pipi两部分性能。

3.4.1.1 sched messaging评估进程调度和核间通信

sched message 是从经典的测试程序 hackbench 移植而来,用来衡量调度器的性能,overhead 以及可扩展性。

该 benchmark 启动 N 个 reader/sender 进程或线程对,通过 IPC(socket 或者 pipe) 进行并发的读写。一般人们将 N 不断加大来衡量调度器的可扩展性。

sched message 的用法及用途和 hackbench 一样,可以通过修改参数进行不同目的测试:

-g, --group <n> Specify number of groups

-l, --nr_loops <n> Specify the number of loops to run (default: 100)

-p, --pipe Use pipe() instead of socketpair()

-t, --thread Be multi thread instead of multi process

测试结果:

al@al-System-Product-Name:~/perf$ perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.173 [sec]

# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes

Total time: 12.233 [sec]

12.233170 usecs/op
           81744 ops/sec

使用pipe()和socketpair()对测试影响:

1. perf bench sched messaging

# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.176 [sec]


2. perf bench sched messaging -p

# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.093 [sec]

可见socketpair()性能要明显低于pipe()。

3.4.1.2 sched pipe评估pipe性能

sched pipe 从 Ingo Molnar 的 pipe-test-1m.c 移植而来。当初 Ingo 的原始程序是为了测试不同的调度器的性能和公平性的。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zzzgpg.html