如果需要统计更多的项,需要使用-e,如:
perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls结果如下,关注的特殊项也纳入统计。
al@al-System-Product-Name:~/perf$ sudo perf stat -e
task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses
ls
Performance counter stats for \'ls\':
2.319422 task-clock (msec) # 0.719 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
89 page-faults # 0.038 M/sec
2,142,386 cycles # 0.924 GHz
659,800 stalled-cycles-frontend # 30.80% frontend cycles idle
725,343 stalled-cycles-backend # 33.86% backend cycles idle
1,344,518 instructions # 0.63 insn per cycle
# 0.54 stalled cycles per insn
<not counted> branches
<not counted> branch-misses
<not counted> L1-dcache-loads
<not counted> L1-dcache-load-misses
<not counted> LLC-loads
<not counted> LLC-load-misses
<not counted> dTLB-loads
<not counted> dTLB-load-misses
0.003227507 seconds time elapsed
perf bench作为benchmark工具的通用框架,包含sched/mem/numa/futex等子系统,all可以指定所有。
perf bench可用于评估系统sched/mem等特定性能。
perf bench sched:调度器和IPC机制。包含messaging和pipe两个功能。
perf bench mem:内存存取性能。包含memcpy和memset两个功能。
perf bench numa:NUMA架构的调度和内存处理性能。包含mem功能。
perf bench futex:futex压力测试。包含hash/wake/wake-parallel/requeue/lock-pi功能。
perf bench all:所有bench测试的集合
3.4.1 perf bench sched all测试messaging和pipi两部分性能。
3.4.1.1 sched messaging评估进程调度和核间通信sched message 是从经典的测试程序 hackbench 移植而来,用来衡量调度器的性能,overhead 以及可扩展性。
该 benchmark 启动 N 个 reader/sender 进程或线程对,通过 IPC(socket 或者 pipe) 进行并发的读写。一般人们将 N 不断加大来衡量调度器的可扩展性。
sched message 的用法及用途和 hackbench 一样,可以通过修改参数进行不同目的测试:
-g, --group <n> Specify number of groups
-l, --nr_loops <n> Specify the number of loops to run (default: 100)
-p, --pipe Use pipe() instead of socketpair()
-t, --thread Be multi thread instead of multi process
测试结果:
al@al-System-Product-Name:~/perf$ perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.173 [sec]
# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes
Total time: 12.233 [sec]
12.233170 usecs/op
81744 ops/sec
使用pipe()和socketpair()对测试影响:
1. perf bench sched messaging
# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.176 [sec]
2. perf bench sched messaging -p
# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.093 [sec]
可见socketpair()性能要明显低于pipe()。
3.4.1.2 sched pipe评估pipe性能sched pipe 从 Ingo Molnar 的 pipe-test-1m.c 移植而来。当初 Ingo 的原始程序是为了测试不同的调度器的性能和公平性的。