Linux 内存错误诊断 (2)

内存安装情况

1 Memory Component Status 2 3 Proc 1 DIMM 1A 16384 MB 1333 MHz 4 5 Proc 1 DIMM 2I Not installed Not installed 6 7 Proc 1 DIMM 3E Not installed Not installed 8 9 Proc 1 DIMM 4C Not installed Not installed 10 11 Proc 1 DIMM 5K Not installed Not installed 12 13 Proc 1 DIMM 6G Not installed Not installed 14 15 Proc 1 DIMM 7B 16384 MB 1333 MHz 16 17 Proc 1 DIMM 8J Not installed Not installed 18 19 Proc 1 DIMM 9F Not installed Not installed 20 21 Proc 1 DIMM 10D Not installed Not installed 22 23 Proc 1 DIMM 11L Not installed Not installed 24 25 Proc 1 DIMM 12H Not installed Not installed 26 27 Proc 2 DIMM 1A 16384 MB 1333 MHz 28 29 Proc 2 DIMM 2I Not installed Not installed 30 31 Proc 2 DIMM 3E Not installed Not installed 32 33 Proc 2 DIMM 4C Not installed Not installed 34 35 Proc 2 DIMM 5K Not installed Not installed 36 37 Proc 2 DIMM 6G Not installed Not installed 38 39 Proc 2 DIMM 7B 16384 MB 1333 MHz 40 41 Proc 2 DIMM 8J Not installed Not installed 42 43 Proc 2 DIMM 9F Not installed Not installed 44 45 Proc 2 DIMM 10D Not installed Not installed 46 47 Proc 2 DIMM 11L Not installed Not installed 48 49 Proc 2 DIMM 12H Not installed Not installed 50 51 Proc 3 DIMM 1A 16384 MB 1333 MHz 52 53 Proc 3 DIMM 2I Not installed Not installed 54 55 Proc 3 DIMM 3E Not installed Not installed 56 57 Proc 3 DIMM 4C Not installed Not installed 58 59 Proc 3 DIMM 5K Not installed Not installed 60 61 Proc 3 DIMM 6G Not installed Not installed 62 63 Proc 3 DIMM 7B 16384 MB 1333 MHz 64 65 Proc 3 DIMM 8J Not installed Not installed 66 67 Proc 3 DIMM 9F Not installed Not installed 68 69 Proc 3 DIMM 10D Not installed Not installed 70 71 Proc 3 DIMM 11L Not installed Not installed 72 73 Proc 3 DIMM 12H Not installed Not installed 74 75 Proc 4 DIMM 1A 16384 MB 1333 MHz 76 77 Proc 4 DIMM 2I Not installed Not installed 78 79 Proc 4 DIMM 3E Not installed Not installed 80 81 Proc 4 DIMM 4C Not installed Not installed 82 83 Proc 4 DIMM 5K Not installed Not installed 84 85 Proc 4 DIMM 6G Not installed Not installed 86 87 Proc 4 DIMM 7B 16384 MB 1333 MHz 88 89 Proc 4 DIMM 8J Not installed Not installed 90 91 Proc 4 DIMM 9F Not installed Not installed 92 93 Proc 4 DIMM 10D Not installed Not installed 94 95 Proc 4 DIMM 11L Not installed Not installed 96 97 Proc 4 DIMM 12H Not installed Not installed 使用edac工具来检测服务器内存故障

随着虚拟化,Redis,BDB内存数据库等应用的普及,现在越来越多的服务器配置了大容量内存,拿DELL的R620来说在配置双路CPU下,其24个内存插槽,支持的内存高达960GB。对于ECC,REG这些带有纠错功能的内存故障检测是一件很头疼的事情,出现故障,还是可以连续运行几个月甚至几年,但如果运气不好,随时都会挂掉,好在linux中提供了一个edac-utils 内存纠错诊断工具,可以用来检查服务器内存潜在的故障。
下面以CentOS为例,介绍下edac-utils 工具的使用.
在使用edac-utils 工具之前,需要先了解服务器的硬件架构,以DELL R620为例,(其它如HP DL360P G8,IBM X3650 M4 机型都使用了 E5-2600 系列CPU,C600 系列芯片组.大致相同) 其CPU内存控制器对应通道,内存槽关系,如下所示。

处理器0 (对应一个内存控制器)
通道0:内存插槽A1、A5 和A9
通道1:内存插槽A2、A6 和A10
通道2:内存插槽A3、A7 和A11
通道3:内存插槽A4、A8 和A12

处理器1 (对应一个内存控制器)
通道0:内存插槽B1、B5 和B9
通道1:内存插槽B2、B6 和B10
通道2:内存插槽B3、B7 和B11
通道3:内存插槽B4、B8 和B12

1.安装 edac-utils 工具

yum install -y libsysfs edac-utils

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zzgzwd.html