系统级性能分析工具perf的介绍与使用【转】 (8)

日期：2022-03-24 栏目：程序人生浏览：次

如果需要统计更多的项，需要使用-e，如：

perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

结果如下，关注的特殊项也纳入统计。

al@al-System-Product-Name:~/perf$ sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

Performance counter stats for \'ls\':

2.319422      task-clock (msec)         #    0.719 CPUs utilized
                  0      context-switches          #    0.000 K/sec
                  0      cpu-migrations            #    0.000 K/sec
                 89      page-faults               #    0.038 M/sec
          2,142,386      cycles                    #    0.924 GHz
            659,800      stalled-cycles-frontend   #   30.80% frontend cycles idle
            725,343      stalled-cycles-backend    #   33.86% backend cycles idle
          1,344,518      instructions              #    0.63 insn per cycle
                                                   #    0.54 stalled cycles per insn
      <not counted>      branches
      <not counted>      branch-misses
      <not counted>      L1-dcache-loads
      <not counted>      L1-dcache-load-misses
      <not counted>      LLC-loads
      <not counted>      LLC-load-misses
      <not counted>      dTLB-loads
      <not counted>      dTLB-load-misses

0.003227507 seconds time elapsed

3.4 perf bench

perf bench作为benchmark工具的通用框架，包含sched/mem/numa/futex等子系统，all可以指定所有。

perf bench可用于评估系统sched/mem等特定性能。

perf bench sched：调度器和IPC机制。包含messaging和pipe两个功能。

perf bench mem：内存存取性能。包含memcpy和memset两个功能。

perf bench numa：NUMA架构的调度和内存处理性能。包含mem功能。

perf bench futex：futex压力测试。包含hash/wake/wake-parallel/requeue/lock-pi功能。

perf bench all：所有bench测试的集合

3.4.1 perf bench sched all

测试messaging和pipi两部分性能。

3.4.1.1 sched messaging评估进程调度和核间通信

sched message 是从经典的测试程序 hackbench 移植而来，用来衡量调度器的性能，overhead 以及可扩展性。

该 benchmark 启动 N 个 reader/sender 进程或线程对，通过 IPC(socket 或者 pipe) 进行并发的读写。一般人们将 N 不断加大来衡量调度器的可扩展性。

sched message 的用法及用途和 hackbench 一样，可以通过修改参数进行不同目的测试：

-g, --group <n> Specify number of groups

-l, --nr_loops <n> Specify the number of loops to run (default: 100)

-p, --pipe Use pipe() instead of socketpair()

-t, --thread Be multi thread instead of multi process

测试结果：

al@al-System-Product-Name:~/perf$ perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.173 [sec]

# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes

Total time: 12.233 [sec]

12.233170 usecs/op
81744 ops/sec

使用pipe()和socketpair()对测试影响：

1. perf bench sched messaging

# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.176 [sec]

2. perf bench sched messaging -p

# Running \'sched/messaging\' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 0.093 [sec]

可见socketpair()性能要明显低于pipe()。

3.4.1.2 sched pipe评估pipe性能

sched pipe 从 Ingo Molnar 的 pipe-test-1m.c 移植而来。当初 Ingo 的原始程序是为了测试不同的调度器的性能和公平性的。

转载注明出处：https://www.heiqu.com/zzzgpg.html

系统级性能分析工具perf的介绍与使用【转】 (8)

相关推荐