记一次Linux内核崩溃:kdump,crash,vmcore

Linux内核发送崩溃时,kdump会生成一个内核转储文件vmcore。 可以通过分析vmcore分析出内核崩溃的原因。
crash是一个被广泛应用的内核奔溃转储文件分析工具。使用crash调试内核转储文件,需要安装crash工具和内核调试工具kernel-debuginfo。

安装需要的软件

1、查看系统内核

[root@qd01-stop-free015 ~]# uname -r 3.10.0-1160.15.2.el7.x86_64

2、安装kdump,crash

yum install crash kexec-tools -y

3、安装kernel-debuginfo
下载链接

rpm -ivh kernel-debuginfo-3.10.0-1160.15.2.el7.x86_64.rpm kernel-debuginfo-common-x86_64-3.10.0-1160.15.2.el7.x86_64.rpm crash报告分析

1、使用crash命令加载vmcore文件

[root@qd01-stop-free015 kdump]# crash /usr/lib/debug/lib/modules/3.10.0-1160.15.2.el7.x86_64/vmlinux vmcore crash 7.2.3-11.el7_9.1 Copyright (C) 2002-2017 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [274MB]: patching 87300 gdb minimal_symbol values KERNEL: /usr/lib/debug/lib/modules/3.10.0-1160.15.2.el7.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 8 DATE: Thu Mar 4 10:12:38 2021 UPTIME: 00:05:04 LOAD AVERAGE: 5.28, 3.20, 1.38 TASKS: 256 NODENAME: zf-dbslave001 RELEASE: 3.10.0-1160.15.2.el7.x86_64 VERSION: #1 SMP Wed Feb 3 15:06:38 UTC 2021 MACHINE: x86_64 (2500 Mhz) MEMORY: 63 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000074" PID: 1362 COMMAND: "AliYunDun" TASK: ffff90f972365280 [THREAD_INFO: ffff90f9767a4000] CPU: 5 STATE: TASK_RUNNING (PANIC)

输出注释如下:

KERNEL:系统崩溃时运行的 kernel 文件

DUMPFILE: 内核转储文件

CPUS: 所在机器的 CPU 数量

DATE:系统崩溃的时间

TASKS:系统崩溃时内存中的任务数

NODENAME:崩溃的系统主机名

RELEASE: 和 VERSION:内核版本号

MACHINE:CPU 架构

MEMORY:崩溃主机的物理内存

PANIC:崩溃类型,常见的崩溃类型包括:

SysRq (System Request):通过魔法组合键导致的系统崩溃,通常是测试使用。通过 echo c > /proc/sysrq-trigger,就可以触发系统崩溃。

oops:可以看成是内核级的 Segmentation Fault。应用程序如果进行了非法内存访问或执行了非法指令,会得到 Segfault 信号,一般行为是 coredump,应用程序也可以自己截获 Segfault 信号,自行处理。如果内核自己犯了这样的错误,则会弹出 oops 信息。

从以上输出可以知道,本次系统奔溃的原因是:PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000074",然后导致AliYunDun把系统重启了。
PS:搞不懂阿里云的破逻辑,服务器被黑了居然只会不断重启服务器?

2、使用bt 命令用于查看系统崩溃前的堆栈信息。

crash> bt PID: 1362 TASK: ffff90f972365280 CPU: 5 COMMAND: "AliYunDun" #0 [ffff90f9767a77a0] machine_kexec at ffffffff922662c4 #1 [ffff90f9767a7800] __crash_kexec at ffffffff923227a2 #2 [ffff90f9767a78d0] crash_kexec at ffffffff92322890 #3 [ffff90f9767a78e8] oops_end at ffffffff9298c798 #4 [ffff90f9767a7910] no_context at ffffffff92275d14 #5 [ffff90f9767a7960] __bad_area_nosemaphore at ffffffff92275fe2 #6 [ffff90f9767a79b0] bad_area_nosemaphore at ffffffff92276104 #7 [ffff90f9767a79c0] __do_page_fault at ffffffff9298f750 #8 [ffff90f9767a7a30] trace_do_page_fault at ffffffff9298fa26 #9 [ffff90f9767a7a70] do_async_page_fault at ffffffff9298efa2 #10 [ffff90f9767a7a90] async_page_fault at ffffffff9298b7a8 #11 [ffff90f9767a7b98] kmem_cache_alloc_trace at ffffffff92428a0c #12 [ffff90f9767a7c98] mntput at ffffffff92471d94 #13 [ffff90f9767a7d88] kvm_sched_clock_read at ffffffff9226d3be #14 [ffff90f9767a7ec8] putname at ffffffff9245fd3d #15 [ffff90f9767a7f50] system_call_fastpath at ffffffff92994f92 RIP: 00007f84fd928315 RSP: 00007f84fb011af8 RFLAGS: 00000206 RAX: 000000000000004e RBX: 000000000244e010 RCX: ffffffffffffffff RDX: 0000000000008000 RSI: 000000000244e010 RDI: 0000000000000012 RBP: 000000000244e010 R8: 0000000000000020 R9: 0000000000008030 R10: 0000000000000076 R11: 0000000000000246 R12: ffffffffffffff30 R13: 0000000000000000 R14: 000000000244dfe0 R15: 000000000000052a ORIG_RAX: 000000000000004e CS: 0033 SS: 002b

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wpzxwz.html