和gprof类似的分析工具,但它对程序的运行观察更为入微,能给我们提供更多的信息。和gprof不同的是,它不需要在编译源代码时附加特殊选项,但还是推荐加上调试选项。Callgrind收集程序运行时的一些数据,建立函数调用关系图,还可以有选择地进行cache模拟。在运行结束时,它会把分析数据写入一个文件。callgrind_annotate可以把这个文件的内容转化成可读的形式。
测试程序
#include <stdio.h> #include <unistd.h> void test() { sleep(1); } void func() { for(int i = 0; i < 10; i++) { test(); } } int main() { func(); printf("process is over!\n"); return 0; }编译后,用valgrind检测程序。
$ g++ -g -o test callgrind.cpp $ valgrind --tool=callgrind ./test $ ls callgrind.cpp callgrind.out.3490 testcallgrind.out.3490就是callgrind生成的文件。
这里介绍一个图形化性能分析工具Kcachegrind
Kcachegrind官网地址
下载安装后可以用来分析callgrind生成的文件。
用Kcachegrind打开callgrind.out.3490这个文件,如下图:
通过图形化,我们可以很直观的知道哪段程序执行慢,并且了解相关调用关系。
3) CachegrindCache分析器,它模拟CPU中的一级缓存和二级缓存,能够精确地指出程序中cache的丢失和命中。如果需要,它还能够为我们提供cache丢失次数,内存引用次数,以及每行代码,每个函数,每个模块,整个程序产生的指令数。这对优化程序有很大的帮助。
它的使用方法也是:valgrind –tool=cachegrind ./程序名
4) Helgrind它主要用来检查多线程程序中出现的竞争问题。Helgrind寻找内存中被多个线程访问,而又没有一贯加锁的区域,这些区域往往是线程之间失去同步的地方,而且会导致难以发觉的错误。Helgrind实现了名为Eraser的竞争检测算法,并做了进一步改进,减少了报告错误的次数。不过,Helgrind仍然处于实验状态。
测试代码:
#include <stdio.h> #include <pthread.h> #define NUM 10 int counter = 0; void *threadfunc(void*) { for (int i = 0; i < NUM; i++) { counter += i; } } int main() { pthread_t tid1, tid2; pthread_create(&tid1, NULL, &threadfunc, NULL); pthread_create(&tid2, NULL, &threadfunc, NULL); // wait for thread to terminate pthread_join(tid1, NULL); pthread_join(tid2, NULL); printf("counter = %d\n", counter); return 0; }编译后,用valgrind检测程序。
$ g++ -g -o test helgrind.cpp -lpthread $ valgrind --tool=helgrind ./test检测结果:
==27722== Helgrind, a thread error detector ==27722== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al. ==27722== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==27722== Command: ./test ==27722== ==27722== ---Thread-Announcement------------------------------------------ ==27722== ==27722== Thread #3 was created ==27722== at 0x597589E: clone (in /usr/lib64/libc-2.17.so) ==27722== by 0x4E43059: do_clone.constprop.4 (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x4E44569: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x4C30CFA: pthread_create_WRK (hg_intercepts.c:425) ==27722== by 0x4C31DD8: pthread_create@* (hg_intercepts.c:458) ==27722== by 0x400728: main (helgrind.cpp:17) ==27722== ==27722== ---Thread-Announcement------------------------------------------ ==27722== ==27722== Thread #2 was created ==27722== at 0x597589E: clone (in /usr/lib64/libc-2.17.so) ==27722== by 0x4E43059: do_clone.constprop.4 (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x4E44569: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x4C30CFA: pthread_create_WRK (hg_intercepts.c:425) ==27722== by 0x4C31DD8: pthread_create@* (hg_intercepts.c:458) ==27722== by 0x40070D: main (helgrind.cpp:16) ==27722== ==27722== ---------------------------------------------------------------- ==27722== ==27722== Possible data race during read of size 4 at 0x601048 by thread #3 ==27722== Locks held: none ==27722== at 0x4006CE: threadfunc(void*) (helgrind.cpp:9) ==27722== by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387) ==27722== by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x59758DC: clone (in /usr/lib64/libc-2.17.so) ==27722== ==27722== This conflicts with a previous write of size 4 by thread #2 ==27722== Locks held: none ==27722== at 0x4006D9: threadfunc(void*) (helgrind.cpp:9) ==27722== by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387) ==27722== by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x59758DC: clone (in /usr/lib64/libc-2.17.so) ==27722== Address 0x601048 is 0 bytes inside data symbol "counter" ==27722== ==27722== ---------------------------------------------------------------- ==27722== ==27722== Possible data race during write of size 4 at 0x601048 by thread #3 ==27722== Locks held: none ==27722== at 0x4006D9: threadfunc(void*) (helgrind.cpp:9) ==27722== by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387) ==27722== by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x59758DC: clone (in /usr/lib64/libc-2.17.so) ==27722== ==27722== This conflicts with a previous write of size 4 by thread #2 ==27722== Locks held: none ==27722== at 0x4006D9: threadfunc(void*) (helgrind.cpp:9) ==27722== by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387) ==27722== by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so) ==27722== by 0x59758DC: clone (in /usr/lib64/libc-2.17.so) ==27722== Address 0x601048 is 0 bytes inside data symbol "counter" ==27722== counter = 90 ==27722== ==27722== Use --history-level=approx or =none to gain increased speed, at ==27722== the cost of reduced accuracy of conflicting-access information ==27722== For lists of detected and suppressed errors, rerun with: -s ==27722== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)从上述结果知道,valgrind分析出了竞态的情况。
5) Massif